Hi,
Here are the 3rd version of kselftest fixes some on 32bit arch
(e.g. arm)
In this version, I updated [1/5] to make va_max 1MB unconditionally
according to Alexey's comment.
When I built the ksefltest on arm, I hit some 32bit related warnings.
Here are the patches to fix those issues.
- [1/5] va_max was set 2^32 even on 32bit arch. This can make
va_max == 0 and always fail. Make it 1GB unconditionally.
- [2/5] Some VM tests requires 64bit user space, which should
not run on 32bit arch.
- [3/5] For counting the size of large file, we should use
size_t instead of unsinged long.
- [4/5] Gcc warns printf format for size_t and int64_t on
32bit arch. Use %llu and cast it.
- [5/5] Gcc warns __u64 and pointer type castings. It should
once translated to unsigned long.
Thank you,
---
Masami Hiramatsu (5):
selftests: proc: Make va_max 1MB
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: net: Use size_t and ssize_t for counting file size
selftests: net: Fix printf format warnings on arm
selftests: sync: Fix cast warnings on arm
tools/testing/selftests/net/so_txtime.c | 4 ++--
tools/testing/selftests/net/tcp_mmap.c | 8 ++++----
tools/testing/selftests/net/udpgso.c | 3 ++-
tools/testing/selftests/net/udpgso_bench_tx.c | 3 ++-
.../selftests/proc/proc-self-map-files-002.c | 6 +++++-
tools/testing/selftests/sync/sync.c | 6 +++---
tools/testing/selftests/vm/Makefile | 5 +++++
tools/testing/selftests/vm/run_vmtests | 10 ++++++++++
8 files changed, 33 insertions(+), 12 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
Hi Shua,
Here is the set with cleanup as suggested by Kees on v3.
Configured, built, and tested all modules loaded by
tools/testing/selftests/lib/*.sh
>From previous cover letters ...
While doing the testing for strscpy_pad() it was noticed that there is
duplication in how test modules are being fed to kselftest and also in
the test modules themselves.
This set makes an attempt at adding a framework to kselftest for writing
kernel test modules. It also adds a script for use in creating script
test runners for kselftest. My macro-foo is not great, all criticism
and suggestions very much appreciated. The design is based on test
modules lib/test_printf.c, lib/test_bitmap.c, lib/test_xarray.c.
Changes since last version:
- Remove dependency on Bash (thanks Kees)
- Use oneliner to implement kselftest test runners (thanks Kees)
- Squash patch that adds kselftest script creator script with patch
that uses it.
- Fix typos (thanks Randy)
- Add Kees' Acked-by tags to all patches
thanks,
Tobin.
Tobin C. Harding (6):
lib/test_printf: Add empty module_exit function
kselftest: Add test runner creation script
kselftest: Add test module framework header
lib: Use new kselftest header
lib/string: Add strscpy_pad() function
lib: Add test module for strscpy_pad
Documentation/dev-tools/kselftest.rst | 94 +++++++++++-
include/linux/string.h | 4 +
lib/Kconfig.debug | 3 +
lib/Makefile | 1 +
lib/string.c | 47 +++++-
lib/test_bitmap.c | 20 +--
lib/test_printf.c | 17 +--
lib/test_strscpy.c | 150 +++++++++++++++++++
tools/testing/selftests/kselftest_module.h | 48 ++++++
tools/testing/selftests/kselftest_module.sh | 84 +++++++++++
tools/testing/selftests/lib/Makefile | 2 +-
tools/testing/selftests/lib/bitmap.sh | 18 +--
tools/testing/selftests/lib/config | 1 +
tools/testing/selftests/lib/prime_numbers.sh | 17 +--
tools/testing/selftests/lib/printf.sh | 19 +--
tools/testing/selftests/lib/strscpy.sh | 3 +
16 files changed, 440 insertions(+), 88 deletions(-)
create mode 100644 lib/test_strscpy.c
create mode 100644 tools/testing/selftests/kselftest_module.h
create mode 100755 tools/testing/selftests/kselftest_module.sh
create mode 100755 tools/testing/selftests/lib/strscpy.sh
--
2.21.0
This patchset is being developed here:
<https://github.com/cyphar/linux/tree/openat2/master>
Patch changelog:
v14: [<https://lore.kernel.org/lkml/20191010054140.8483-1-cyphar@cyphar.com/>]
* The magic-link changes (and O_EMPTYPATH) have been dropped from this series
-- they will be developed and sent separately. The main reason is that we
need to restrict things other than open(2) (examples include truncate(2) as
well as mount(MS_BIND)). This will require a fair amount of extra work, and
there's no point stalling openat2(2) for that work to be completed.
* Minor rework of 'struct open_how':
* To avoid future headaches, make it a non-const argument.
* Expand ->flags and ->resolve to 64-bit fields to allow for more flag
extensions without needing to add separate fields too early. This
requires adding a bit of explicit padding (32 bits) to avoid userspace
putting garbage in the alignment padding -- this can be repurposed for
future extensions.
* upgrade_mask is dropped (and will be a separate field when we add it
again in the future) to avoid userspace foot-guns.
* Expand -EINVAL checks in build_open_flags(). Rather than silently
ignoring silly flag combinations (such as O_TMPFILE|O_PATH or
O_PATH|<most flags>), give an -EINVAL. All of the silent ignore semantics
were added to open(2) because we couldn't return -EINVAL -- but we can
now!
* open(2) and openat(2) clean up their flags before passing them to
build_open_flags(), so all mixed flags will continue to work. There is
one exception which is (O_PATH|O_TMPFILE) -- this is no longer
permitted (as far as I can tell this appears to be a bug, and there are
no userspace users that I've hit after running this code for a few
days). If it turns out that userspace does depend on (O_PATH|O_TMPFILE)
working, we can only disallow it for openat2(2).
* Don't zero out nd->root in complete_walk() for RCU-walk if we're doing a
scoped-lookup (this prevents a needless REF-walk retry).
* Attempt all tests on kernels that don't have openat2(2), rather than just
skipping everything.
v13: <https://lore.kernel.org/lkml/20190930183316.10190-1-cyphar@cyphar.com/>
v12: <https://lore.kernel.org/lkml/20190904201933.10736-1-cyphar@cyphar.com/>
v11: <https://lore.kernel.org/lkml/20190820033406.29796-1-cyphar@cyphar.com/>
<https://lore.kernel.org/lkml/20190728010207.9781-1-cyphar@cyphar.com/>
v10: <https://lore.kernel.org/lkml/20190719164225.27083-1-cyphar@cyphar.com/>
v09: <https://lore.kernel.org/lkml/20190706145737.5299-1-cyphar@cyphar.com/>
v08: <https://lore.kernel.org/lkml/20190520133305.11925-1-cyphar@cyphar.com/>
v07: <https://lore.kernel.org/lkml/20190507164317.13562-1-cyphar@cyphar.com/>
v06: <https://lore.kernel.org/lkml/20190506165439.9155-1-cyphar@cyphar.com/>
v05: <https://lore.kernel.org/lkml/20190320143717.2523-1-cyphar@cyphar.com/>
v04: <https://lore.kernel.org/lkml/20181112142654.341-1-cyphar@cyphar.com/>
v03: <https://lore.kernel.org/lkml/20181009070230.12884-1-cyphar@cyphar.com/>
v02: <https://lore.kernel.org/lkml/20181009065300.11053-1-cyphar@cyphar.com/>
v01: <https://lore.kernel.org/lkml/20180929103453.12025-1-cyphar@cyphar.com/>
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Furthermore, the need for some sort of control over VFS's path resolution (to
avoid malicious paths resulting in inadvertent breakouts) has been a very
long-standing desire of many userspace applications. This patchset is a revival
of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David
Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum
project[5]) with a few additions and changes made based on the previous
discussion within [6] as well as others I felt were useful.
In line with the conclusions of the original discussion of AT_NO_JUMPS, the
flag has been split up into separate flags. However, instead of being an
openat(2) flag it is provided through a new syscall openat2(2) which provides
several other improvements to the openat(2) interface (see the patch
description for more details). The following new LOOKUP_* flags are added:
* LOOKUP_NO_XDEV blocks all mountpoint crossings (upwards, downwards,
or through absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is also
blocked (though magic-link jumps on the same vfsmount are permitted).
* LOOKUP_NO_MAGICLINKS blocks resolution through /proc/$pid/fd-style
links. This is done by blocking the usage of nd_jump_link() during
resolution in a filesystem. The term "magic-links" is used to match
with the only reference to these links in Documentation/, but I'm
happy to change the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.
* LOOKUP_BENEATH disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed. Conceptually this flag is to
ensure you "stay below" a certain point in the filesystem tree --
but this requires some additional to protect against various races
that would allow escape using "..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done as
in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
* LOOKUP_NO_SYMLINKS does what it says on the tin. No symlink
resolution is allowed at all, including magic-links. Just as with
LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an
fd for the symlink as long as no parent path had a symlink
component.
* LOOKUP_IN_ROOT is an extension of LOOKUP_BENEATH that, rather than
blocking attempts to move past the root, forces all such movements
to be scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that chroot(2)
is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to cross
magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container. There is a long list of
CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT
(such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and
CVE-2019-5736, just to name a few).
In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution. It
features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVj…
[3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk
[4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@goo…
[5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@goo…
[6]: https://lwn.net/Articles/723057/
[7]: https://github.com/cyphar/filepath-securejoin
[8]: https://github.com/openSUSE/libpathrs
The current draft of the openat2(2) man-page is included below.
--8<---------------------------------------------------------------------------
OPENAT2(2) Linux Programmer's Manual OPENAT2(2)
NAME
openat2 - open and possibly create a file (extended)
SYNOPSIS
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int openat2(int dirfd, const char *pathname, struct open_how *how, size_t size);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
The openat2() system call opens the file specified by pathname. If the specified file
does not exist, it may optionally (if O_CREAT is specified in how.flags) be created by
openat2().
As with openat(2), if pathname is relative, then it is interpreted relative to the direc-
tory referred to by the file descriptor dirfd (or the current working directory of the
calling process, if dirfd is the special value AT_FDCWD.) If pathname is absolute, then
dirfd is ignored (unless how.resolve contains RESOLVE_IN_ROOT, in which case pathname is
resolved relative to dirfd.)
The openat2() system call is an extension of openat(2) and provides a superset of its
functionality. Rather than taking a single flag argument, an extensible structure (how)
is passed instead to allow for future extensions. size must be set to sizeof(struct
open_how), to facilitate future extensions (see the "Extensibility" section of the NOTES
for more detail on how extensions are handled.)
The open_how structure
The following structure indicates how pathname should be opened, and acts as a superset of
the flag and mode arguments to openat(2).
struct open_how {
__aligned_u64 flags; /* O_* flags. */
__u16 mode; /* Mode for O_{CREAT,TMPFILE}. */
__u16 __padding[3]; /* Must be zeroed. */
__aligned_u64 resolve; /* RESOLVE_* flags. */
};
Any future extensions to openat2() will be implemented as new fields appended to the above
structure (or through reuse of pre-existing padding space), with the zero value of the new
fields acting as though the extension were not present.
The meaning of each field is as follows:
flags
The file creation and status flags to use for this operation. All of the
O_* flags defined for openat(2) are valid openat2() flag values.
Unlike openat(2), it is an error to provide openat2() unknown or conflicting
flags in flags.
mode
File mode for the new file, with identical semantics to the mode argument to
openat(2). However, unlike openat(2), it is an error to provide openat2()
with a mode which contains bits other than 0777.
It is an error to provide openat2() a non-zero mode if flags does not con-
tain O_CREAT or O_TMPFILE.
resolve
Change how the components of pathname will be resolved (see path_resolu-
tion(7) for background information.) The primary use case for these flags
is to allow trusted programs to restrict how untrusted paths (or paths in-
side untrusted directories) are resolved. The full list of resolve flags is
given below.
RESOLVE_NO_XDEV
Disallow traversal of mount points during path resolution (including
all bind mounts).
Users of this flag are encouraged to make its use configurable (un-
less it is used for a specific security purpose), as bind mounts are
very widely used by end-users. Setting this flag indiscrimnately for
all uses of openat2() may result in spurious errors on previously-
functional systems.
RESOLVE_NO_SYMLINKS
Disallow resolution of symbolic links during path resolution. This
option implies RESOLVE_NO_MAGICLINKS.
If the trailing component is a symbolic link, and flags contains both
O_PATH and O_NOFOLLOW, then an O_PATH file descriptor referencing the
symbolic link will be returned.
Users of this flag are encouraged to make its use configurable (un-
less it is used for a specific security purpose), as symbolic links
are very widely used by end-users. Setting this flag indiscrimnately
for all uses of openat2() may result in spurious errors on previ-
ously-functional systems.
RESOLVE_NO_MAGICLINKS
Disallow all magic link resolution during path resolution.
If the trailing component is a magic link, and flags contains both
O_PATH and O_NOFOLLOW, then an O_PATH file descriptor referencing the
magic link will be returned.
Magic-links are symbolic link-like objects that are most notably
found in proc(5) (examples include /proc/[pid]/exe and
/proc/[pid]/fd/*.) Due to the potential danger of unknowingly open-
ing these magic links, it may be preferable for users to disable
their resolution entirely (see symboliclink(7) for more details.)
RESOLVE_BENEATH
Do not permit the path resolution to succeed if any component of the
resolution is not a descendant of the directory indicated by dirfd.
This results in absolute symbolic links (and absolute values of path-
name) to be rejected.
Currently, this flag also disables magic link resolution. However,
this may change in the future. The caller should explicitly specify
RESOLVE_NO_MAGICLINKS to ensure that magic links are not resolved.
RESOLVE_IN_ROOT
Treat dirfd as the root directory while resolving pathname (as though
the user called chroot(2) with dirfd as the argument.) Absolute sym-
bolic links and ".." path components will be scoped to dirfd. If
pathname is an absolute path, it is also treated relative to dirfd.
However, unlike chroot(2) (which changes the filesystem root perma-
nently for a process), RESOLVE_IN_ROOT allows a program to effi-
ciently restrict path resolution for only certain operations. It
also has several hardening features (such detecting escape attempts
during .. resolution) which chroot(2) does not.
Currently, this flag also disables magic link resolution. However,
this may change in the future. The caller should explicitly specify
RESOLVE_NO_MAGICLINKS to ensure that magic links are not resolved.
It is an error to provide openat2() unknown flags in resolve.
RETURN VALUE
On success, a new file descriptor is returned. On error, -1 is returned, and errno is set
appropriately.
ERRORS
The set of errors returned by openat2() includes all of the errors returned by openat(2),
as well as the following additional errors:
EINVAL An unknown flag or invalid value was specified in how.
EINVAL mode is non-zero, but flags does not contain O_CREAT or O_TMPFILE.
EINVAL size was smaller than any known version of struct open_how.
E2BIG An extension was specified in how, which the current kernel does not support (see
the "Extensibility" section of the NOTES for more detail on how extensions are han-
dled.)
EAGAIN resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and the kernel could
not ensure that a ".." component didn't escape (due to a race condition or poten-
tial attack.) Callers may choose to retry the openat2() call.
EXDEV resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and an escape from the
root during path resolution was detected.
EXDEV resolve contains RESOLVE_NO_XDEV, and a path component attempted to cross a mount
point.
ELOOP resolve contains RESOLVE_NO_SYMLINKS, and one of the path components was a symbolic
link (or magic link).
ELOOP resolve contains RESOLVE_NO_MAGICLINKS, and one of the path components was a magic
link.
VERSIONS
openat2() was added to Linux in kernel 5.FOO.
CONFORMING TO
This system call is Linux-specific.
The semantics of RESOLVE_BENEATH were modelled after FreeBSD's O_BENEATH.
NOTES
Glibc does not provide a wrapper for this system call; call it using systemcall(2).
Extensibility
In order to allow for struct open_how to be extended in future kernel revisions, openat2()
requires userspace to specify the size of struct open_how structure they are passing. By
providing this information, it is possible for openat2() to provide both forwards- and
backwards-compatibility — with size acting as an implicit version number (because new ex-
tension fields will always be appended, the size will always increase.) This extensibil-
ity design is very similar to other system calls such as perf_setattr(2),
perf_event_open(2), and clone(3).
If we let usize be the size of the structure according to userspace and ksize be the size
of the structure which the kernel supports, then there are only three cases to consider:
* If ksize equals usize, then there is no version mismatch and how can be used
verbatim.
* If ksize is larger than usize, then there are some extensions the kernel sup-
ports which the userspace program is unaware of. Because all extensions must
have their zero values be a no-op, the kernel treats all of the extension fields
not set by userspace to have zero values. This provides backwards-compatibil-
ity.
* If ksize is smaller than usize, then there are some extensions which the
userspace program is aware of but the kernel does not support. Because all ex-
tensions must have their zero values be a no-op, the kernel can safely ignore
the unsupported extension fields if they are all-zero. If any unsupported ex-
tension fields are non-zero, then -1 is returned and errno is set to E2BIG.
This provides forwards-compatibility.
Therefore, most userspace programs will not need to have any special handling of exten-
sions. However, if a userspace program wishes to determine what extensions the running
kernel supports, they may conduct a binary search on size (to find the largest value which
doesn't produce an error of E2BIG.)
SEE ALSO
openat(2), path_resolution(7), symlink(7)
Linux 2019-10-27 OPENAT2(2)
--8<---------------------------------------------------------------------------
Aleksa Sarai (6):
namei: O_BENEATH-style resolution restriction flags
namei: LOOKUP_IN_ROOT: chroot-like path resolution
namei: permit ".." resolution with LOOKUP_{IN_ROOT,BENEATH}
open: introduce openat2(2) syscall
selftests: add openat2(2) selftests
Documentation: path-lookup: mention LOOKUP_MAGICLINK_JUMPED
CREDITS | 4 +-
Documentation/filesystems/path-lookup.rst | 18 +-
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/namei.c | 167 +++++-
fs/open.c | 154 ++++--
include/linux/fcntl.h | 12 +-
include/linux/namei.h | 12 +
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/fcntl.h | 41 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/openat2/.gitignore | 1 +
tools/testing/selftests/openat2/Makefile | 8 +
tools/testing/selftests/openat2/helpers.c | 109 ++++
tools/testing/selftests/openat2/helpers.h | 107 ++++
.../testing/selftests/openat2/openat2_test.c | 297 ++++++++++
.../selftests/openat2/rename_attack_test.c | 160 ++++++
.../testing/selftests/openat2/resolve_test.c | 523 ++++++++++++++++++
35 files changed, 1571 insertions(+), 71 deletions(-)
create mode 100644 tools/testing/selftests/openat2/.gitignore
create mode 100644 tools/testing/selftests/openat2/Makefile
create mode 100644 tools/testing/selftests/openat2/helpers.c
create mode 100644 tools/testing/selftests/openat2/helpers.h
create mode 100644 tools/testing/selftests/openat2/openat2_test.c
create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
create mode 100644 tools/testing/selftests/openat2/resolve_test.c
--
2.23.0
From: John Hubbard <jhubbard(a)nvidia.com>
[ Upstream commit 6f24c8d30d08f270b54f4c2cb9b08dfccbe59c57 ]
Even though gup_benchmark.c has code to handle the -w command-line option,
the "w" is not part of the getopt string. It looks as if it has been
missing the whole time.
On my machine, this leads naturally to the following predictable result:
$ sudo ./gup_benchmark -w
./gup_benchmark: invalid option -- 'w'
...which is fixed with this commit.
Link: http://lkml.kernel.org/r/20191014184639.1512873-2-jhubbard@nvidia.com
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Keith Busch <keith.busch(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: "Aneesh Kumar K . V" <aneesh.kumar(a)linux.ibm.com>
Cc: Ira Weiny <ira.weiny(a)intel.com>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: kbuild test robot <lkp(a)intel.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/vm/gup_benchmark.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index c0534e298b512..cb3fc09645c48 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -37,7 +37,7 @@ int main(int argc, char **argv)
char *file = "/dev/zero";
char *p;
- while ((opt = getopt(argc, argv, "m:r:n:f:tTLUSH")) != -1) {
+ while ((opt = getopt(argc, argv, "m:r:n:f:tTLUwSH")) != -1) {
switch (opt) {
case 'm':
size = atoi(optarg) * MB;
--
2.20.1
From: Jiri Benc <jbenc(a)redhat.com>
[ Upstream commit fd418b01fe26c2430b1091675cceb3ab2b52e1e0 ]
Many distributions enable rp_filter. However, the flow dissector test
generates packets that have 1.1.1.1 set as (inner) source address without
this address being reachable. This causes the selftest to fail.
The selftests should not assume a particular initial configuration. Switch
off rp_filter.
Fixes: 50b3ed57dee9 ("selftests/bpf: test bpf flow dissection")
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Petar Penkov <ppenkov(a)google.com>
Link: https://lore.kernel.org/bpf/513a298f53e99561d2f70b2e60e2858ea6cda754.157053…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_flow_dissector.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_flow_dissector.sh b/tools/testing/selftests/bpf/test_flow_dissector.sh
index d23d4da66b834..e2d06191bd35c 100755
--- a/tools/testing/selftests/bpf/test_flow_dissector.sh
+++ b/tools/testing/selftests/bpf/test_flow_dissector.sh
@@ -63,6 +63,9 @@ fi
# Setup
tc qdisc add dev lo ingress
+echo 0 > /proc/sys/net/ipv4/conf/default/rp_filter
+echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
+echo 0 > /proc/sys/net/ipv4/conf/lo/rp_filter
echo "Testing IPv4..."
# Drops all IP/UDP packets coming from port 9
--
2.20.1
Commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second
timeout per test") introduced a timeout per test. Livepatch tests could
run longer than 45 seconds, especially on slower machines. They do not
hang and they detect if something goes awry with internal accounting.
Better than looking for an arbitrary value, just disable the timeout for
livepatch selftests.
Signed-off-by: Miroslav Benes <mbenes(a)suse.cz>
---
tools/testing/selftests/livepatch/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/livepatch/settings
diff --git a/tools/testing/selftests/livepatch/settings b/tools/testing/selftests/livepatch/settings
new file mode 100644
index 000000000000..e7b9417537fb
--- /dev/null
+++ b/tools/testing/selftests/livepatch/settings
@@ -0,0 +1 @@
+timeout=0
--
2.23.0
Hi
this patchset aims to add the initial arch-specific arm64 support to
kselftest starting with signals-related test-cases.
This series is based on v5.4-rc2.
A common internal test-case layout is proposed for signal tests and it is
wired-up to the toplevel kselftest Makefile, so that it should be possible
at the end to run it on an arm64 target in the usual way with KSFT.
~/linux# make TARGETS=arm64 kselftest
New KSFT arm64 testcases live inside tools/testing/selftests/arm64 grouped
by family inside subdirectories: arm64/signal is the first family proposed
with this series.
This series converts also to this subdirectory scheme the pre-existing
KSFT arm64 tags tests (already merged in v5.3), moving them into their own
arm64/tags subdirectory.
Thanks
Cristian
Notes:
-----
- further details in the included READMEs
- more tests still to be written (current strategy is going through the
related Kernel signal-handling code and write a test for each possible
and sensible code-path)
A few ideas for more TODO testcases:
- mangle_pstate_invalid_ssbs_regs: mess with SSBS bits on every
possible configured behavior
- fake_sigreturn_unmapped_sp: SP into unmapped addrs
- fake_sigreturn_kernelspace_sp: SP into kernel addrs
- fake_sigreturn_sve_bad_extra_context: SVE extra context badly formed
- fake_sigreturn_misaligned_sp_4: misaligned SP by 4
(i.e., __alignof__(struct _aarch64_ctx))
- fake_sigreturn_misaligned_sp_8: misaligned SP by 8
(i.e., sizeof(struct _aarch64_ctx))
- fake_sigreturn_bad_size_non_aligned: a size that doesn't overflow
__reserved[], but is not a multiple of 16
- fake_sigreturn_bad_size_tiny: a size that is less than 16
- fake_sigreturn_bad_size_overflow_tiny: a size that does overflow
__reserved[], but by less than 16 bytes?
- mangle_sve_invalid_extra_context: SVE extra_context invalid
- SVE signal testcases and special handling will be part of an additional patch
still to be released
- KSFT arm64 tags test patch
https://lore.kernel.org/linux-arm-kernel/c1e6aad230658bc175b42d92daeff2e300…
is relocated into its own directory under tools/testing/selftests/arm64/tags
Changes:
--------
v8-->v9:
- fixed a couple of misplaced .gitignore
v7-->v8:
- removed SSBS test case
- split remnants of SSBS patch (v7 05/11), containing some helpers,
into two distinct patches
v6-->v7:
- rebased on v5.4-rc2
- renamed SUBTARGETS arm64/ toplevel Makefile ENV to ARM64_SUBTARGETS
- fixed fake_sigreturn alignment routines (off by one)
- fixed SSBS test: avoid using MRS/MSR as whole and SKIP when SSBS not
supported
- reporting KSFT_SKIP when needed (usually if test_init(0 fails)
- using ID_AA64PFR1_EL1.SSBS to check SSBS support instead of HWCAP_SSBS
v5-->v6:
- added arm64 toplevel Makefile SUBTARGETS env var to be able to selectively
build only some arm64/ tests subdirectories
- removed unneed toplevel Makefile exports and fixed Copyright
- better checks for supported features and features names helpers
- converted some run-time critical assert() to abort() to avoid
issues when -NDEBUG is set
- default_handler() signal handler refactored and split
- using SIGTRAP for get_current_context()
- use volatile where proper
- refactor and relocate test_init() invocation
- review usage of MRS SSBS instructions depending on HW_SSBS
- cleanup fake_sigreturn trampoline
- cleanup get_starting_header helper
- avoiding timeout test failures wherever possible (fail immediately
if possible)
v4-->v5:
- rebased on arm64/for-next-core merging 01/11 with KSFT tags tests:
commit 9ce1263033cd ("selftests, arm64: add a selftest for passing tagged pointers to kernel")
- moved .gitignore up on elevel
- moved kernel header search mechanism into KSFT arm64 toplevel Makefile
so that it can be used easily also by each arm64 KSFT subsystem inside
subdirs of arm64
v3-->v4:
- rebased on v5.3-rc6
- added test descriptions
- fixed commit messages (imperative mood)
- added missing includes and removed unneeded ones
- added/used new get_starting_head() helper
- fixed/simplified signal.S::fakke_sigreturn()
- added set_regval() macro and .init initialization func
- better synchonization in get_current_context()
- macroization of mangle_pstate_invalid_mode_el
- split mangle_pstate_invalid_mode_el h/t
- removed standalone mode
- simplified CPU features checks
- fixed/refactored get_header() and validation routines
- simplfied docs
v2-->v3:
- rebased on v5.3-rc2
- better test result characterization looking for
SEGV_ACCERR in si_code on SIGSEGV
- using KSFT Framework macros for retvalues
- removed SAFE_WRITE()/dump_uc: buggy, un-needed and unused
- reviewed generation process of test_arm64_signals.sh runner script
- re-added a fixed fake_sigreturn_misaligned_sp testcase and a properly
extended fake_sigreturn() helper
- added tests' TODO notes
v1-->v2:
- rebased on 5.2-rc7
- various makefile's cleanups
- mixed READMEs fixes
- fixed test_arm64_signals.sh runner script
- cleaned up assembly code in signal.S
- improved get_current_context() logic
- fixed SAFE_WRITE()
- common support code split into more chunks, each one introduced when
needed by some new testcases
- fixed some headers validation routines in testcases.c
- removed some still broken/immature tests:
+ fake_sigreturn_misaligned
+ fake_sigreturn_overflow_reserved
+ mangle_pc_invalid
+ mangle_sp_misaligned
- fixed some other testcases:
+ mangle_pstate_ssbs_regs: better checks of SSBS bit when feature unsupported
+ mangle_pstate_invalid_compat_toggle: name fix
+ mangle_pstate_invalid_mode_el[1-3]: precautionary zeroing PSTATE.MODE
+ fake_sigreturn_bad_magic, fake_sigreturn_bad_size,
fake_sigreturn_bad_size_for_magic0:
- accounting for available space...dropping extra when needed
- keeping alignent
- new testcases on FPSMID context:
+ fake_sigreturn_missing_fpsimd
+ fake_sigreturn_duplicated_fpsimd
Cristian Marussi (12):
kselftest: arm64: extend toplevel skeleton Makefile
kselftest: arm64: mangle_pstate_invalid_compat_toggle and common utils
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_mode_el[123][ht]
kselftest: arm64: extend test_init functionalities
kselftest: arm64: add helper get_current_context
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_bad_size
kselftest: arm64: fake_sigreturn_misaligned_sp
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/arm64/Makefile | 64 +++-
tools/testing/selftests/arm64/README | 25 ++
.../testing/selftests/arm64/signal/.gitignore | 3 +
tools/testing/selftests/arm64/signal/Makefile | 32 ++
tools/testing/selftests/arm64/signal/README | 59 +++
.../testing/selftests/arm64/signal/signals.S | 64 ++++
.../selftests/arm64/signal/test_signals.c | 29 ++
.../selftests/arm64/signal/test_signals.h | 116 ++++++
.../arm64/signal/test_signals_utils.c | 340 ++++++++++++++++++
.../arm64/signal/test_signals_utils.h | 120 +++++++
.../testcases/fake_sigreturn_bad_magic.c | 52 +++
.../testcases/fake_sigreturn_bad_size.c | 77 ++++
.../fake_sigreturn_bad_size_for_magic0.c | 46 +++
.../fake_sigreturn_duplicated_fpsimd.c | 50 +++
.../testcases/fake_sigreturn_misaligned_sp.c | 37 ++
.../testcases/fake_sigreturn_missing_fpsimd.c | 50 +++
.../mangle_pstate_invalid_compat_toggle.c | 31 ++
.../mangle_pstate_invalid_daif_bits.c | 35 ++
.../mangle_pstate_invalid_mode_el1h.c | 15 +
.../mangle_pstate_invalid_mode_el1t.c | 15 +
.../mangle_pstate_invalid_mode_el2h.c | 15 +
.../mangle_pstate_invalid_mode_el2t.c | 15 +
.../mangle_pstate_invalid_mode_el3h.c | 15 +
.../mangle_pstate_invalid_mode_el3t.c | 15 +
.../mangle_pstate_invalid_mode_template.h | 28 ++
.../arm64/signal/testcases/testcases.c | 196 ++++++++++
.../arm64/signal/testcases/testcases.h | 104 ++++++
.../selftests/arm64/{ => tags}/.gitignore | 0
tools/testing/selftests/arm64/tags/Makefile | 7 +
.../arm64/{ => tags}/run_tags_test.sh | 0
.../selftests/arm64/{ => tags}/tags_test.c | 0
32 files changed, 1651 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/arm64/README
create mode 100644 tools/testing/selftests/arm64/signal/.gitignore
create mode 100644 tools/testing/selftests/arm64/signal/Makefile
create mode 100644 tools/testing/selftests/arm64/signal/README
create mode 100644 tools/testing/selftests/arm64/signal/signals.S
create mode 100644 tools/testing/selftests/arm64/signal/test_signals.c
create mode 100644 tools/testing/selftests/arm64/signal/test_signals.h
create mode 100644 tools/testing/selftests/arm64/signal/test_signals_utils.c
create mode 100644 tools/testing/selftests/arm64/signal/test_signals_utils.h
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_bad_magic.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_bad_size.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_bad_size_for_magic0.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_duplicated_fpsimd.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_misaligned_sp.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_missing_fpsimd.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_compat_toggle.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_daif_bits.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el1h.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el1t.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el2h.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el2t.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el3h.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el3t.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_template.h
create mode 100644 tools/testing/selftests/arm64/signal/testcases/testcases.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/testcases.h
rename tools/testing/selftests/arm64/{ => tags}/.gitignore (100%)
create mode 100644 tools/testing/selftests/arm64/tags/Makefile
rename tools/testing/selftests/arm64/{ => tags}/run_tags_test.sh (100%)
rename tools/testing/selftests/arm64/{ => tags}/tags_test.c (100%)
--
2.17.1
This patchset is being developed here:
<https://github.com/cyphar/linux/tree/openat2/master>
Patch changelog:
v14:
* The magic-link changes (and O_EMPTYPATH) have been dropped from this series
-- they will be developed and sent separately. The main reason is that we
need to restrict things other than open(2) (examples include truncate(2) as
well as mount(MS_BIND)). This will require a fair amount of extra work, and
there's no point stalling openat2(2) for that work to be completed.
* Minor rework of 'struct open_how':
* To avoid future headaches, make it a non-const argument.
* Expand ->flags and ->resolve to 64-bit fields to allow for more flag
extensions without needing to add separate fields too early. This
requires adding a bit of explicit padding (32 bits) to avoid userspace
putting garbage in the alignment padding -- this can be repurposed for
future extensions.
* upgrade_mask is dropped (and will be a separate field when we add it
again in the future) to avoid userspace foot-guns.
* Expand -EINVAL checks in build_open_flags(). Rather than silently
ignoring silly flag combinations (such as O_TMPFILE|O_PATH or
O_PATH|<most flags>), give an -EINVAL. All of the silent ignore semantics
were added to open(2) because we couldn't return -EINVAL -- but we can
now!
* open(2) and openat(2) clean up their flags before passing them to
build_open_flags(), so all mixed flags will continue to work. There is
one exception which is (O_PATH|O_TMPFILE) -- this is no longer
permitted (as far as I can tell this appears to be a bug, and there are
no userspace users that I've hit after running this code for a few
days). If it turns out that userspace does depend on (O_PATH|O_TMPFILE)
working, we can only disallow it for openat2(2).
* Don't zero out nd->root in complete_walk() for RCU-walk if we're doing a
scoped-lookup (this prevents a needless REF-walk retry).
* Attempt all tests on kernels that don't have openat2(2), rather than just
skipping everything.
v13: <https://lore.kernel.org/lkml/20190930183316.10190-1-cyphar@cyphar.com/>
v12: <https://lore.kernel.org/lkml/20190904201933.10736-1-cyphar@cyphar.com/>
v11: <https://lore.kernel.org/lkml/20190820033406.29796-1-cyphar@cyphar.com/>
<https://lore.kernel.org/lkml/20190728010207.9781-1-cyphar@cyphar.com/>
v10: <https://lore.kernel.org/lkml/20190719164225.27083-1-cyphar@cyphar.com/>
v09: <https://lore.kernel.org/lkml/20190706145737.5299-1-cyphar@cyphar.com/>
v08: <https://lore.kernel.org/lkml/20190520133305.11925-1-cyphar@cyphar.com/>
v07: <https://lore.kernel.org/lkml/20190507164317.13562-1-cyphar@cyphar.com/>
v06: <https://lore.kernel.org/lkml/20190506165439.9155-1-cyphar@cyphar.com/>
v05: <https://lore.kernel.org/lkml/20190320143717.2523-1-cyphar@cyphar.com/>
v04: <https://lore.kernel.org/lkml/20181112142654.341-1-cyphar@cyphar.com/>
v03: <https://lore.kernel.org/lkml/20181009070230.12884-1-cyphar@cyphar.com/>
v02: <https://lore.kernel.org/lkml/20181009065300.11053-1-cyphar@cyphar.com/>
v01: <https://lore.kernel.org/lkml/20180929103453.12025-1-cyphar@cyphar.com/>
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Furthermore, the need for some sort of control over VFS's path resolution (to
avoid malicious paths resulting in inadvertent breakouts) has been a very
long-standing desire of many userspace applications. This patchset is a revival
of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David
Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum
project[5]) with a few additions and changes made based on the previous
discussion within [6] as well as others I felt were useful.
In line with the conclusions of the original discussion of AT_NO_JUMPS, the
flag has been split up into separate flags. However, instead of being an
openat(2) flag it is provided through a new syscall openat2(2) which provides
several other improvements to the openat(2) interface (see the patch
description for more details). The following new LOOKUP_* flags are added:
* LOOKUP_NO_XDEV blocks all mountpoint crossings (upwards, downwards,
or through absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is also
blocked (though magic-link jumps on the same vfsmount are permitted).
* LOOKUP_NO_MAGICLINKS blocks resolution through /proc/$pid/fd-style
links. This is done by blocking the usage of nd_jump_link() during
resolution in a filesystem. The term "magic-links" is used to match
with the only reference to these links in Documentation/, but I'm
happy to change the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.
* LOOKUP_BENEATH disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed. Conceptually this flag is to
ensure you "stay below" a certain point in the filesystem tree --
but this requires some additional to protect against various races
that would allow escape using "..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done as
in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
* LOOKUP_NO_SYMLINKS does what it says on the tin. No symlink
resolution is allowed at all, including magic-links. Just as with
LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an
fd for the symlink as long as no parent path had a symlink
component.
* LOOKUP_IN_ROOT is an extension of LOOKUP_BENEATH that, rather than
blocking attempts to move past the root, forces all such movements
to be scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that chroot(2)
is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to cross
magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container. There is a long list of
CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT
(such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and
CVE-2019-5736, just to name a few).
In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution. It
features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVj…
[3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk
[4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@goo…
[5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@goo…
[6]: https://lwn.net/Articles/723057/
[7]: https://github.com/cyphar/filepath-securejoin
[8]: https://github.com/openSUSE/libpathrs
The current draft of the openat2(2) man-page is included below.
--8<---------------------------------------------------------------------------
OPENAT2(2) Linux Programmer's Manual OPENAT2(2)
NAME
openat2 - open and possibly create a file (extended)
SYNOPSIS
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int openat2(int dirfd, const char *pathname, struct open_how *how, size_t size);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
The openat2() system call opens the file specified by pathname. If the specified file
does not exist, it may optionally (if O_CREAT is specified in how.flags) be created by
openat2().
As with openat(2), if pathname is relative, then it is interpreted relative to the
directory referred to by the file descriptor dirfd (or the current working directory of
the calling process, if dirfd is the special value AT_FDCWD.) If pathname is absolute,
then dirfd is ignored (unless how.resolve contains RESOLVE_IN_ROOT, in which case pathname
is resolved relative to dirfd.)
The openat2() system call is an extension of openat(2) and provides a superset of its
functionality. Rather than taking a single flag argument, an extensible structure (how)
is passed instead to allow for future extensions. size must be set to sizeof(struct
open_how), to facilitate future extensions (see the "Extensibility" section of the NOTES
for more detail on how extensions are handled.)
The open_how structure
The following structure indicates how pathname should be opened, and acts as a superset of
the flag and mode arguments to openat(2).
struct open_how {
__aligned_u64 flags; /* O_* flags. */
__u16 mode; /* Mode for O_{CREAT,TMPFILE}. */
__u16 __padding[3]; /* Must be zeroed. */
__aligned_u64 resolve; /* RESOLVE_* flags. */
};
Any future extensions to openat2() will be implemented as new fields appended to the above
structure (or through reuse of pre-existing padding space), with the zero value of the new
fields acting as though the extension were not present.
The meaning of each field is as follows:
flags
The file creation and status flags to use for this operation. All of the
O_* flags defined for openat(2) are valid openat2() flag values.
Unlike openat(2), it is an error to provide openat2() unknown or conflicting
flags in flags.
mode
File mode for the new file, with identical semantics to the mode argument to
openat(2). However, unlike openat(2), it is an error to provide openat2()
with a mode which contains bits other than 0777.
It is an error to provide openat2() a non-zero mode if flags does not
contain O_CREAT or O_TMPFILE.
resolve
Change how the components of pathname will be resolved (see
path_resolution(7) for background information.) The primary use case for
these flags is to allow trusted programs to restrict how untrusted paths (or
paths inside untrusted directories) are resolved. The full list of resolve
flags is given below.
RESOLVE_NO_XDEV
Disallow traversal of mount points during path resolution (including
all bind mounts).
Users of this flag are encouraged to make its use configurable
(unless it is used for a specific security purpose), as bind mounts
are very widely used by end-users. Setting this flag indiscrimnately
for all uses of openat2() may result in spurious errors on
previously-functional systems.
RESOLVE_NO_SYMLINKS
Disallow resolution of symbolic links during path resolution. This
option implies RESOLVE_NO_MAGICLINKS.
If the trailing component is a symbolic link, and flags contains both
O_PATH and O_NOFOLLOW, then an O_PATH file descriptor referencing the
symbolic link will be returned.
Users of this flag are encouraged to make its use configurable
(unless it is used for a specific security purpose), as symbolic
links are very widely used by end-users. Setting this flag
indiscrimnately for all uses of openat2() may result in spurious
errors on previously-functional systems.
RESOLVE_NO_MAGICLINKS
Disallow all magic link resolution during path resolution.
If the trailing component is a magic link, and flags contains both
O_PATH and O_NOFOLLOW, then an O_PATH file descriptor referencing the
magic link will be returned.
Magic-links are symbolic link-like objects that are most notably
found in proc(5) (examples include /proc/[pid]/exe and
/proc/[pid]/fd/*.) Due to the potential danger of unknowingly
opening these magic links, it may be preferable for users to disable
their resolution entirely (see symboliclink(7) for more details.)
RESOLVE_BENEATH
Do not permit the path resolution to succeed if any component of the
resolution is not a descendant of the directory indicated by dirfd.
This results in absolute symbolic links (and absolute values of
pathname) to be rejected.
Currently, this flag also disables magic link resolution. However,
this may change in the future. The caller should explicitly specify
RESOLVE_NO_MAGICLINKS to ensure that magic links are not resolved.
RESOLVE_IN_ROOT
Treat dirfd as the root directory while resolving pathname (as though
the user called chroot(2) with dirfd as the argument.) Absolute
symbolic links and ".." path components will be scoped to dirfd. If
pathname is an absolute path, it is also treated relative to dirfd.
However, unlike chroot(2) (which changes the filesystem root
permanently for a process), RESOLVE_IN_ROOT allows a program to
efficiently restrict path resolution for only certain operations. It
also has several hardening features (such detecting escape attempts
during .. resolution) which chroot(2) does not.
Currently, this flag also disables magic link resolution. However,
this may change in the future. The caller should explicitly specify
RESOLVE_NO_MAGICLINKS to ensure that magic links are not resolved.
It is an error to provide openat2() unknown flags in resolve.
RETURN VALUE
On success, a new file descriptor is returned. On error, -1 is returned, and errno is set
appropriately.
ERRORS
The set of errors returned by openat2() includes all of the errors returned by openat(2),
as well as the following additional errors:
EINVAL An unknown flag or invalid value was specified in how.
EINVAL mode is non-zero, but flags does not contain O_CREAT or O_TMPFILE.
EINVAL size was smaller than any known version of struct open_how.
E2BIG An extension was specified in how, which the current kernel does not support (see
the "Extensibility" section of the NOTES for more detail on how extensions are
handled.)
EAGAIN resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and the kernel could
not ensure that a ".." component didn't escape (due to a race condition or
potential attack.) Callers may choose to retry the openat2() call.
EXDEV resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and an escape from the
root during path resolution was detected.
EXDEV resolve contains RESOLVE_NO_XDEV, and a path component attempted to cross a mount
point.
ELOOP resolve contains RESOLVE_NO_SYMLINKS, and one of the path components was a symbolic
link (or magic link).
ELOOP resolve contains RESOLVE_NO_MAGICLINKS, and one of the path components was a magic
link.
VERSIONS
openat2() was added to Linux in kernel 5.FOO.
CONFORMING TO
This system call is Linux-specific.
The semantics of RESOLVE_BENEATH were modelled after FreeBSD's O_BENEATH.
NOTES
Glibc does not provide a wrapper for this system call; call it using systemcall(2).
Extensibility
In order to allow for struct open_how to be extended in future kernel revisions, openat2()
requires userspace to specify the size of struct open_how structure they are passing. By
providing this information, it is possible for openat2() to provide both forwards- and
backwards-compatibility — with size acting as an implicit version number (because new
extension fields will always be appended, the size will always increase.) This
extensibility design is very similar to other system calls such as perf_setattr(2),
perf_event_open(2), and clone(3).
If we let usize be the size of the structure according to userspace and ksize be the size
of the structure which the kernel supports, then there are only three cases to consider:
* If ksize equals usize, then there is no version mismatch and how can be used
verbatim.
* If ksize is larger than usize, then there are some extensions the kernel
supports which the userspace program is unaware of. Because all extensions must
have their zero values be a no-op, the kernel treats all of the extension fields
not set by userspace to have zero values. This provides backwards-
compatibility.
* If ksize is smaller than usize, then there are some extensions which the
userspace program is aware of but the kernel does not support. Because all
extensions must have their zero values be a no-op, the kernel can safely ignore
the unsupported extension fields if they are all-zero. If any unsupported
extension fields are non-zero, then -1 is returned and errno is set to E2BIG.
This provides forwards-compatibility.
Therefore, most userspace programs will not need to have any special handling of
extensions. However, if a userspace program wishes to determine what extensions the
running kernel supports, they may conduct a binary search on size (to find the largest
value which doesn't produce an error of E2BIG.)
SEE ALSO
openat(2), path_resolution(7), symboliclink(7)
Linux 2019-10-10 OPENAT2(2)
--8<---------------------------------------------------------------------------
Aleksa Sarai (6):
namei: O_BENEATH-style resolution restriction flags
namei: LOOKUP_IN_ROOT: chroot-like path resolution
namei: permit ".." resolution with LOOKUP_{IN_ROOT,BENEATH}
open: introduce openat2(2) syscall
selftests: add openat2(2) selftests
Documentation: path-lookup: mention LOOKUP_MAGICLINK_JUMPED
CREDITS | 4 +-
Documentation/filesystems/path-lookup.rst | 18 +-
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/namei.c | 167 +++++-
fs/open.c | 154 ++++--
include/linux/fcntl.h | 12 +-
include/linux/namei.h | 12 +
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/fcntl.h | 41 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/openat2/.gitignore | 1 +
tools/testing/selftests/openat2/Makefile | 8 +
tools/testing/selftests/openat2/helpers.c | 109 ++++
tools/testing/selftests/openat2/helpers.h | 107 ++++
.../testing/selftests/openat2/openat2_test.c | 297 ++++++++++
.../selftests/openat2/rename_attack_test.c | 160 ++++++
.../testing/selftests/openat2/resolve_test.c | 523 ++++++++++++++++++
35 files changed, 1571 insertions(+), 71 deletions(-)
create mode 100644 tools/testing/selftests/openat2/.gitignore
create mode 100644 tools/testing/selftests/openat2/Makefile
create mode 100644 tools/testing/selftests/openat2/helpers.c
create mode 100644 tools/testing/selftests/openat2/helpers.h
create mode 100644 tools/testing/selftests/openat2/openat2_test.c
create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
create mode 100644 tools/testing/selftests/openat2/resolve_test.c
--
2.23.0
The current kunit execution model is to provide base kunit functionality
and tests built-in to the kernel. The aim of this series is to allow
building kunit itself and tests as modules. This in turn allows a
simple form of selective execution; load the module you wish to test.
In doing so, kunit itself (if also built as a module) will be loaded as
an implicit dependency.
Because this requires a core API modification - if a module delivers
multiple suites, they must be declared with the kunit_test_suites()
macro - we're proposing this patch as a candidate to be applied to the
test tree before too many kunit consumers appear. We attempt to deal
with existing consumers in patch 1.
Changes since v1:
- sent correct patch set; apologies, previous patch set was built
prior to kunit move to lib/ and should be ignored.
Patch 1 consists changes needed to support loading tests as modules.
Patch 2 allows kunit itself to be loaded as a module.
Patch 3 documents module support.
Alan Maguire (3):
kunit: allow kunit tests to be loaded as a module
kunit: allow kunit to be loaded as a module
kunit: update documentation to describe module-based build
Documentation/dev-tools/kunit/faq.rst | 3 ++-
Documentation/dev-tools/kunit/index.rst | 3 +++
Documentation/dev-tools/kunit/usage.rst | 16 ++++++++++++++++
include/kunit/test.h | 30 +++++++++++++++++++++++-------
kernel/sysctl-test.c | 6 +++++-
lib/Kconfig.debug | 4 ++--
lib/kunit/Kconfig | 6 +++---
lib/kunit/Makefile | 4 +++-
lib/kunit/assert.c | 8 ++++++++
lib/kunit/example-test.c | 6 +++++-
lib/kunit/string-stream-test.c | 9 +++++++--
lib/kunit/string-stream.c | 7 +++++++
lib/kunit/test-test.c | 8 ++++++--
lib/kunit/test.c | 12 ++++++++++++
lib/kunit/try-catch.c | 8 ++++++--
15 files changed, 108 insertions(+), 22 deletions(-)
--
1.8.3.1
Hi,
Here are the 2nd version of kselftest fixes some on 32bit arch
(e.g. arm). In this version, I updated [1/5] to make va_max 1GB
instead of 3GB, according to Alexey's comment.
When I built the ksefltest on arm, I hit some 32bit related warnings.
Here are the patches to fix those issues.
- [1/5] va_max was set 2^32 even on 32bit arch. This can make
va_max == 0 and always fail. Make it 3GB on 32bit.
- [2/5] Some VM tests requires 64bit user space, which should
not run on 32bit arch.
- [3/5] For counting the size of large file, we should use
size_t instead of unsinged long.
- [4/5] Gcc warns printf format for size_t and int64_t on
32bit arch. Use %llu and cast it.
- [5/5] Gcc warns __u64 and pointer type castings. It should
once translated to unsigned long.
Thank you,
---
Masami Hiramatsu (5):
selftests: proc: Make va_max 1GB on 32bit arch
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: net: Use size_t and ssize_t for counting file size
selftests: net: Fix printf format warnings on arm
selftests: sync: Fix cast warnings on arm
tools/testing/selftests/net/so_txtime.c | 4 ++--
tools/testing/selftests/net/tcp_mmap.c | 8 ++++----
tools/testing/selftests/net/udpgso.c | 3 ++-
tools/testing/selftests/net/udpgso_bench_tx.c | 3 ++-
.../selftests/proc/proc-self-map-files-002.c | 11 ++++++++++-
tools/testing/selftests/sync/sync.c | 6 +++---
tools/testing/selftests/vm/Makefile | 5 +++++
tools/testing/selftests/vm/run_vmtests | 10 ++++++++++
8 files changed, 38 insertions(+), 12 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
Hi,
Here's another gup_benchmark.c fix, which I ran into while adding
support for the upcoming FOLL_PIN work. Anyway, the problem is
clearly described in the patch commit description, and the fix seems
like the best way to me, but the fix is not *completely* black and
white.
This fix forces MAP_ANONYMOUS for the MAP_HUGETLB case. However,
another way to do it might be to mmap() against a valid hugetlb
page file, instead of /dev/zero. But that seems like a lot of
trouble and if I'm reading the intent correctly, MAP_ANONYMOUS
is what's desired anyway.
John Hubbard (1):
mm/gup_benchmark: fix MAP_HUGETLB case
tools/testing/selftests/vm/gup_benchmark.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.23.0
BugLink: https://bugs.launchpad.net/bugs/1849281
The ifndef for SECCOMP_USER_NOTIF_FLAG_CONTINUE was placed under the
ifndef for the SECCOMP_FILTER_FLAG_NEW_LISTENER feature. This will not
work on systems that do support SECCOMP_FILTER_FLAG_NEW_LISTENER but do not
support SECCOMP_USER_NOTIF_FLAG_CONTINUE. So move the latter ifndef out of
the former ifndef's scope.
2019-10-20 11:14:01 make run_tests -C seccomp
make: Entering directory '/usr/src/perf_selftests-x86_64-rhel-7.6-0eebfed2954f152259cae0ad57b91d3ea92968e8/tools/testing/selftests/seccomp'
gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf
seccomp_bpf.c: In function ‘user_notification_continue’:
seccomp_bpf.c:3562:15: error: ‘SECCOMP_USER_NOTIF_FLAG_CONTINUE’ undeclared (first use in this function)
resp.flags = SECCOMP_USER_NOTIF_FLAG_CONTINUE;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
seccomp_bpf.c:3562:15: note: each undeclared identifier is reported only once for each function it appears in
Makefile:12: recipe for target 'seccomp_bpf' failed
make: *** [seccomp_bpf] Error 1
make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-7.6-0eebfed2954f152259cae0ad57b91d3ea92968e8/tools/testing/selftests/seccomp'
Reported-by: kernel test robot <rong.a.chen(a)intel.com>
Fixes: 0eebfed2954f ("seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE")
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
Reviewed-by: Tycho Andersen <tycho(a)tycho.ws>
Link: https://lore.kernel.org/r/20191021091055.4644-1-christian.brauner@ubuntu.com
Signed-off-by: Kees Cook <keescook(a)chromium.org>
(cherry picked from commit 2aa8d8d04ca29c3269154e1d48855e498be8882f
https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git)
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 6021baecb386..bf834ee02b69 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -167,10 +167,6 @@ struct seccomp_metadata {
#define SECCOMP_RET_USER_NOTIF 0x7fc00000U
-#ifndef SECCOMP_USER_NOTIF_FLAG_CONTINUE
-#define SECCOMP_USER_NOTIF_FLAG_CONTINUE 0x00000001
-#endif
-
#define SECCOMP_IOC_MAGIC '!'
#define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr)
#define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type)
@@ -209,6 +205,10 @@ struct seccomp_notif_sizes {
#define PTRACE_EVENTMSG_SYSCALL_EXIT 2
#endif
+#ifndef SECCOMP_USER_NOTIF_FLAG_CONTINUE
+#define SECCOMP_USER_NOTIF_FLAG_CONTINUE 0x00000001
+#endif
+
#ifndef seccomp
int seccomp(unsigned int op, unsigned int flags, void *args)
{
--
2.23.0
Reset all signal handlers of the child not set to SIG_IGN to SIG_DFL.
Mutually exclusive with CLONE_SIGHAND to not disturb other thread's
signal handler.
In the spirit of closer cooperation between glibc developers and kernel
developers (cf. [2]) this patchset came out of a discussion on the glibc
mailing list for improving posix_spawn() (cf. [1], [3], [4]). Kernel
support for this feature has been explicitly requested by glibc and I
see no reason not to help them with this.
The child helper process on Linux posix_spawn must ensure that no signal
handlers are enabled, so the signal disposition must be either SIG_DFL
or SIG_IGN. However, it requires a sigprocmask to obtain the current
signal mask and at least _NSIG sigaction calls to reset the signal
handlers for each posix_spawn call or complex state tracking that might
lead to data corruption in glibc. Adding this flags lets glibc avoid
these problems.
[1]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00149.html
[3]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00158.html
[4]: https://www.sourceware.org/ml/libc-alpha/2019-10/msg00160.html
[2]: https://lwn.net/Articles/799331/
'[...] by asking for better cooperation with the C-library projects
in general. They should be copied on patches containing ABI
changes, for example. I noted that there are often times where
C-library developers wish the kernel community had done things
differently; how could those be avoided in the future? Members of
the audience suggested that more glibc developers should perhaps
join the linux-api list. The other suggestion was to "copy Florian
on everything".'
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: libc-alpha(a)sourceware.org
Cc: linux-api(a)vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
/* v1 */
Link: https://lore.kernel.org/r/20191010133518.5420-1-christian.brauner@ubuntu.com
/* v2 */
Link: https://lore.kernel.org/r/20191011102537.27502-1-christian.brauner@ubuntu.c…
- Florian Weimer <fweimer(a)redhat.com>:
- update comment in clone3_args_valid()
/* v3 */
- "Michael Kerrisk (man-pages)" <mtk.manpages(a)gmail.com>:
- s/CLONE3_CLEAR_SIGHAND/CLONE_CLEAR_SIGHAND/g
---
include/uapi/linux/sched.h | 3 +++
kernel/fork.c | 16 +++++++++++-----
2 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 99335e1f4a27..1d500ed03c63 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -33,6 +33,9 @@
#define CLONE_NEWNET 0x40000000 /* New network namespace */
#define CLONE_IO 0x80000000 /* Clone io context */
+/* Flags for the clone3() syscall. */
+#define CLONE_CLEAR_SIGHAND 0x100000000ULL /* Clear any signal handler and reset to SIG_DFL. */
+
#ifndef __ASSEMBLY__
/**
* struct clone_args - arguments for the clone3 syscall
diff --git a/kernel/fork.c b/kernel/fork.c
index 1f6c45f6a734..aa5b5137f071 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1517,6 +1517,11 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)
spin_lock_irq(¤t->sighand->siglock);
memcpy(sig->action, current->sighand->action, sizeof(sig->action));
spin_unlock_irq(¤t->sighand->siglock);
+
+ /* Reset all signal handler not set to SIG_IGN to SIG_DFL. */
+ if (clone_flags & CLONE_CLEAR_SIGHAND)
+ flush_signal_handlers(tsk, 0);
+
return 0;
}
@@ -2563,11 +2568,8 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
static bool clone3_args_valid(const struct kernel_clone_args *kargs)
{
- /*
- * All lower bits of the flag word are taken.
- * Verify that no other unknown flags are passed along.
- */
- if (kargs->flags & ~CLONE_LEGACY_FLAGS)
+ /* Verify that no unknown flags are passed along. */
+ if (kargs->flags & ~(CLONE_LEGACY_FLAGS | CLONE_CLEAR_SIGHAND))
return false;
/*
@@ -2577,6 +2579,10 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
if (kargs->flags & (CLONE_DETACHED | CSIGNAL))
return false;
+ if ((kargs->flags & (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND)) ==
+ (CLONE_SIGHAND | CLONE_CLEAR_SIGHAND))
+ return false;
+
if ((kargs->flags & (CLONE_THREAD | CLONE_PARENT)) &&
kargs->exit_signal)
return false;
--
2.23.0
These counters will track hugetlb reservations rather than hugetlb
memory faulted in. This patch only adds the counter, following patches
add the charging and uncharging of the counter.
Problem:
Currently tasks attempting to allocate more hugetlb memory than is available get
a failure at mmap/shmget time. This is thanks to Hugetlbfs Reservations [1].
However, if a task attempts to allocate hugetlb memory only more than its
hugetlb_cgroup limit allows, the kernel will allow the mmap/shmget call,
but will SIGBUS the task when it attempts to fault the memory in.
We have developers interested in using hugetlb_cgroups, and they have expressed
dissatisfaction regarding this behavior. We'd like to improve this
behavior such that tasks violating the hugetlb_cgroup limits get an error on
mmap/shmget time, rather than getting SIGBUS'd when they try to fault
the excess memory in.
The underlying problem is that today's hugetlb_cgroup accounting happens
at hugetlb memory *fault* time, rather than at *reservation* time.
Thus, enforcing the hugetlb_cgroup limit only happens at fault time, and
the offending task gets SIGBUS'd.
Proposed Solution:
A new page counter named hugetlb.xMB.reservation_[limit|usage]_in_bytes. This
counter has slightly different semantics than
hugetlb.xMB.[limit|usage]_in_bytes:
- While usage_in_bytes tracks all *faulted* hugetlb memory,
reservation_usage_in_bytes tracks all *reserved* hugetlb memory and
hugetlb memory faulted in without a prior reservation.
- If a task attempts to reserve more memory than limit_in_bytes allows,
the kernel will allow it to do so. But if a task attempts to reserve
more memory than reservation_limit_in_bytes, the kernel will fail this
reservation.
This proposal is implemented in this patch series, with tests to verify
functionality and show the usage. We also added cgroup-v2 support to
hugetlb_cgroup so that the new use cases can be extended to v2.
Alternatives considered:
1. A new cgroup, instead of only a new page_counter attached to
the existing hugetlb_cgroup. Adding a new cgroup seemed like a lot of code
duplication with hugetlb_cgroup. Keeping hugetlb related page counters under
hugetlb_cgroup seemed cleaner as well.
2. Instead of adding a new counter, we considered adding a sysctl that modifies
the behavior of hugetlb.xMB.[limit|usage]_in_bytes, to do accounting at
reservation time rather than fault time. Adding a new page_counter seems
better as userspace could, if it wants, choose to enforce different cgroups
differently: one via limit_in_bytes, and another via
reservation_limit_in_bytes. This could be very useful if you're
transitioning how hugetlb memory is partitioned on your system one
cgroup at a time, for example. Also, someone may find usage for both
limit_in_bytes and reservation_limit_in_bytes concurrently, and this
approach gives them the option to do so.
Testing:
- Added tests passing.
- libhugetlbfs tests mostly passing, but some tests have trouble with and
without this patch series. Seems environment issue rather than code:
- Overall results:
********** TEST SUMMARY
* 2M
* 32-bit 64-bit
* Total testcases: 84 0
* Skipped: 0 0
* PASS: 66 0
* FAIL: 14 0
* Killed by signal: 0 0
* Bad configuration: 4 0
* Expected FAIL: 0 0
* Unexpected PASS: 0 0
* Test not present: 0 0
* Strange test result: 0 0
**********
- Failing tests:
- elflink_rw_and_share_test("linkhuge_rw") segfaults with and without this
patch series.
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc (2M: 32):
FAIL Address is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_RESTRICT_EXE=unknown:malloc
HUGETLB_MORECORE=yes malloc (2M: 32):
FAIL Address is not hugepage
- LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes malloc_manysmall (2M: 32):
FAIL Address is not hugepage
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
HUGETLB_MORECORE=yes heapshrink (2M: 32):
FAIL Heap not on hugepages
- GLIBC_TUNABLES=glibc.malloc.tcache_count=0 LD_PRELOAD=libhugetlbfs.so
libheapshrink.so HUGETLB_MORECORE=yes heapshrink (2M: 32):
FAIL Heap not on hugepages
- HUGETLB_ELFMAP=RW linkhuge_rw (2M: 32): FAIL small_data is not hugepage
- HUGETLB_ELFMAP=RW HUGETLB_MINIMAL_COPY=no linkhuge_rw (2M: 32):
FAIL small_data is not hugepage
- alloc-instantiate-race shared (2M: 32):
Bad configuration: sched_setaffinity(cpu1): Invalid argument -
FAIL Child 1 killed by signal Killed
- shmoverride_linked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- HUGETLB_SHM=yes shmoverride_linked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- shmoverride_linked_static (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- HUGETLB_SHM=yes shmoverride_linked_static (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- LD_PRELOAD=libhugetlbfs.so shmoverride_unlinked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
- LD_PRELOAD=libhugetlbfs.so HUGETLB_SHM=yes shmoverride_unlinked (2M: 32):
FAIL shmget failed size 2097152 from line 176: Invalid argument
[1]: https://www.kernel.org/doc/html/latest/vm/hugetlbfs_reserv.html
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Acked-by: Hillf Danton <hdanton(a)sina.com>
---
include/linux/hugetlb.h | 23 ++++++++-
mm/hugetlb_cgroup.c | 111 ++++++++++++++++++++++++++++++----------
2 files changed, 107 insertions(+), 27 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 53fc34f930d08..9c49a0ba894d3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -320,6 +320,27 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
#ifdef CONFIG_HUGETLB_PAGE
+enum {
+ /* Tracks hugetlb memory faulted in. */
+ HUGETLB_RES_USAGE,
+ /* Tracks hugetlb memory reserved. */
+ HUGETLB_RES_RESERVATION_USAGE,
+ /* Limit for hugetlb memory faulted in. */
+ HUGETLB_RES_LIMIT,
+ /* Limit for hugetlb memory reserved. */
+ HUGETLB_RES_RESERVATION_LIMIT,
+ /* Max usage for hugetlb memory faulted in. */
+ HUGETLB_RES_MAX_USAGE,
+ /* Max usage for hugetlb memory reserved. */
+ HUGETLB_RES_RESERVATION_MAX_USAGE,
+ /* Faulted memory accounting fail count. */
+ HUGETLB_RES_FAILCNT,
+ /* Reserved memory accounting fail count. */
+ HUGETLB_RES_RESERVATION_FAILCNT,
+ HUGETLB_RES_NULL,
+ HUGETLB_RES_MAX,
+};
+
#define HSTATE_NAME_LEN 32
/* Defines one hugetlb page size */
struct hstate {
@@ -340,7 +361,7 @@ struct hstate {
unsigned int surplus_huge_pages_node[MAX_NUMNODES];
#ifdef CONFIG_CGROUP_HUGETLB
/* cgroup control files */
- struct cftype cgroup_files[5];
+ struct cftype cgroup_files[HUGETLB_RES_MAX];
#endif
char name[HSTATE_NAME_LEN];
};
diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
index f1930fa0b445d..1ed4448ca41d3 100644
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -25,6 +25,10 @@ struct hugetlb_cgroup {
* the counter to account for hugepages from hugetlb.
*/
struct page_counter hugepage[HUGE_MAX_HSTATE];
+ /*
+ * the counter to account for hugepage reservations from hugetlb.
+ */
+ struct page_counter reserved_hugepage[HUGE_MAX_HSTATE];
};
#define MEMFILE_PRIVATE(x, val) (((x) << 16) | (val))
@@ -33,6 +37,14 @@ struct hugetlb_cgroup {
static struct hugetlb_cgroup *root_h_cgroup __read_mostly;
+static inline struct page_counter *
+hugetlb_cgroup_get_counter(struct hugetlb_cgroup *h_cg, int idx, bool reserved)
+{
+ if (reserved)
+ return &h_cg->reserved_hugepage[idx];
+ return &h_cg->hugepage[idx];
+}
+
static inline
struct hugetlb_cgroup *hugetlb_cgroup_from_css(struct cgroup_subsys_state *s)
{
@@ -254,30 +266,33 @@ void hugetlb_cgroup_uncharge_cgroup(int idx, unsigned long nr_pages,
return;
}
-enum {
- RES_USAGE,
- RES_LIMIT,
- RES_MAX_USAGE,
- RES_FAILCNT,
-};
-
static u64 hugetlb_cgroup_read_u64(struct cgroup_subsys_state *css,
struct cftype *cft)
{
struct page_counter *counter;
+ struct page_counter *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(css);
counter = &h_cg->hugepage[MEMFILE_IDX(cft->private)];
+ reserved_counter = &h_cg->reserved_hugepage[MEMFILE_IDX(cft->private)];
switch (MEMFILE_ATTR(cft->private)) {
- case RES_USAGE:
+ case HUGETLB_RES_USAGE:
return (u64)page_counter_read(counter) * PAGE_SIZE;
- case RES_LIMIT:
+ case HUGETLB_RES_RESERVATION_USAGE:
+ return (u64)page_counter_read(reserved_counter) * PAGE_SIZE;
+ case HUGETLB_RES_LIMIT:
return (u64)counter->max * PAGE_SIZE;
- case RES_MAX_USAGE:
+ case HUGETLB_RES_RESERVATION_LIMIT:
+ return (u64)reserved_counter->max * PAGE_SIZE;
+ case HUGETLB_RES_MAX_USAGE:
return (u64)counter->watermark * PAGE_SIZE;
- case RES_FAILCNT:
+ case HUGETLB_RES_RESERVATION_MAX_USAGE:
+ return (u64)reserved_counter->watermark * PAGE_SIZE;
+ case HUGETLB_RES_FAILCNT:
return counter->failcnt;
+ case HUGETLB_RES_RESERVATION_FAILCNT:
+ return reserved_counter->failcnt;
default:
BUG();
}
@@ -291,6 +306,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
int ret, idx;
unsigned long nr_pages;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
+ bool reserved = false;
if (hugetlb_cgroup_is_root(h_cg)) /* Can't set limit on root */
return -EINVAL;
@@ -304,9 +320,14 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));
switch (MEMFILE_ATTR(of_cft(of)->private)) {
- case RES_LIMIT:
+ case HUGETLB_RES_RESERVATION_LIMIT:
+ reserved = true;
+ /* Fall through. */
+ case HUGETLB_RES_LIMIT:
mutex_lock(&hugetlb_limit_mutex);
- ret = page_counter_set_max(&h_cg->hugepage[idx], nr_pages);
+ ret = page_counter_set_max(hugetlb_cgroup_get_counter(h_cg, idx,
+ reserved),
+ nr_pages);
mutex_unlock(&hugetlb_limit_mutex);
break;
default:
@@ -320,18 +341,26 @@ static ssize_t hugetlb_cgroup_reset(struct kernfs_open_file *of,
char *buf, size_t nbytes, loff_t off)
{
int ret = 0;
- struct page_counter *counter;
+ struct page_counter *counter, *reserved_counter;
struct hugetlb_cgroup *h_cg = hugetlb_cgroup_from_css(of_css(of));
counter = &h_cg->hugepage[MEMFILE_IDX(of_cft(of)->private)];
+ reserved_counter =
+ &h_cg->reserved_hugepage[MEMFILE_IDX(of_cft(of)->private)];
switch (MEMFILE_ATTR(of_cft(of)->private)) {
- case RES_MAX_USAGE:
+ case HUGETLB_RES_MAX_USAGE:
page_counter_reset_watermark(counter);
break;
- case RES_FAILCNT:
+ case HUGETLB_RES_RESERVATION_MAX_USAGE:
+ page_counter_reset_watermark(reserved_counter);
+ break;
+ case HUGETLB_RES_FAILCNT:
counter->failcnt = 0;
break;
+ case HUGETLB_RES_RESERVATION_FAILCNT:
+ reserved_counter->failcnt = 0;
+ break;
default:
ret = -EINVAL;
break;
@@ -357,37 +386,67 @@ static void __init __hugetlb_cgroup_file_init(int idx)
struct hstate *h = &hstates[idx];
/* format the size */
- mem_fmt(buf, 32, huge_page_size(h));
+ mem_fmt(buf, sizeof(buf), huge_page_size(h));
/* Add the limit file */
- cft = &h->cgroup_files[0];
+ cft = &h->cgroup_files[HUGETLB_RES_LIMIT];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.limit_in_bytes", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_LIMIT);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_LIMIT);
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+ cft->write = hugetlb_cgroup_write;
+
+ /* Add the reservation limit file */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_LIMIT];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_limit_in_bytes",
+ buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_LIMIT);
cft->read_u64 = hugetlb_cgroup_read_u64;
cft->write = hugetlb_cgroup_write;
/* Add the usage file */
- cft = &h->cgroup_files[1];
+ cft = &h->cgroup_files[HUGETLB_RES_USAGE];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.usage_in_bytes", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_USAGE);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_USAGE);
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the reservation usage file */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_USAGE];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_usage_in_bytes",
+ buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_USAGE);
cft->read_u64 = hugetlb_cgroup_read_u64;
/* Add the MAX usage file */
- cft = &h->cgroup_files[2];
+ cft = &h->cgroup_files[HUGETLB_RES_MAX_USAGE];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.max_usage_in_bytes", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_MAX_USAGE);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_MAX_USAGE);
+ cft->write = hugetlb_cgroup_reset;
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the MAX reservation usage file */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_MAX_USAGE];
+ snprintf(cft->name, MAX_CFTYPE_NAME,
+ "%s.reservation_max_usage_in_bytes", buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_MAX_USAGE);
cft->write = hugetlb_cgroup_reset;
cft->read_u64 = hugetlb_cgroup_read_u64;
/* Add the failcntfile */
- cft = &h->cgroup_files[3];
+ cft = &h->cgroup_files[HUGETLB_RES_FAILCNT];
snprintf(cft->name, MAX_CFTYPE_NAME, "%s.failcnt", buf);
- cft->private = MEMFILE_PRIVATE(idx, RES_FAILCNT);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_FAILCNT);
+ cft->write = hugetlb_cgroup_reset;
+ cft->read_u64 = hugetlb_cgroup_read_u64;
+
+ /* Add the reservation failcntfile */
+ cft = &h->cgroup_files[HUGETLB_RES_RESERVATION_FAILCNT];
+ snprintf(cft->name, MAX_CFTYPE_NAME, "%s.reservation_failcnt", buf);
+ cft->private = MEMFILE_PRIVATE(idx, HUGETLB_RES_RESERVATION_FAILCNT);
cft->write = hugetlb_cgroup_reset;
cft->read_u64 = hugetlb_cgroup_read_u64;
/* NULL terminate the last cft */
- cft = &h->cgroup_files[4];
+ cft = &h->cgroup_files[HUGETLB_RES_NULL];
memset(cft, 0, sizeof(*cft));
WARN_ON(cgroup_add_legacy_cftypes(&hugetlb_cgrp_subsys,
--
2.23.0.700.g56cf767bdb-goog
The ifndef for SECCOMP_USER_NOTIF_FLAG_CONTINUE was placed under the
ifndef for the SECCOMP_FILTER_FLAG_NEW_LISTENER feature. This will not
work on systems that do support SECCOMP_FILTER_FLAG_NEW_LISTENER but do not
support SECCOMP_USER_NOTIF_FLAG_CONTINUE. So move the latter ifndef out of
the former ifndef's scope.
2019-10-20 11:14:01 make run_tests -C seccomp
make: Entering directory '/usr/src/perf_selftests-x86_64-rhel-7.6-0eebfed2954f152259cae0ad57b91d3ea92968e8/tools/testing/selftests/seccomp'
gcc -Wl,-no-as-needed -Wall seccomp_bpf.c -lpthread -o seccomp_bpf
seccomp_bpf.c: In function ‘user_notification_continue’:
seccomp_bpf.c:3562:15: error: ‘SECCOMP_USER_NOTIF_FLAG_CONTINUE’ undeclared (first use in this function)
resp.flags = SECCOMP_USER_NOTIF_FLAG_CONTINUE;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
seccomp_bpf.c:3562:15: note: each undeclared identifier is reported only once for each function it appears in
Makefile:12: recipe for target 'seccomp_bpf' failed
make: *** [seccomp_bpf] Error 1
make: Leaving directory '/usr/src/perf_selftests-x86_64-rhel-7.6-0eebfed2954f152259cae0ad57b91d3ea92968e8/tools/testing/selftests/seccomp'
Reported-by: kernel test robot <rong.a.chen(a)intel.com>
Fixes: 0eebfed2954f ("seccomp: test SECCOMP_USER_NOTIF_FLAG_CONTINUE")
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 2519377ebda3..9669b81086cf 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -168,10 +168,6 @@ struct seccomp_metadata {
#define SECCOMP_RET_USER_NOTIF 0x7fc00000U
-#ifndef SECCOMP_USER_NOTIF_FLAG_CONTINUE
-#define SECCOMP_USER_NOTIF_FLAG_CONTINUE 0x00000001
-#endif
-
#define SECCOMP_IOC_MAGIC '!'
#define SECCOMP_IO(nr) _IO(SECCOMP_IOC_MAGIC, nr)
#define SECCOMP_IOR(nr, type) _IOR(SECCOMP_IOC_MAGIC, nr, type)
@@ -205,6 +201,10 @@ struct seccomp_notif_sizes {
};
#endif
+#ifndef SECCOMP_USER_NOTIF_FLAG_CONTINUE
+#define SECCOMP_USER_NOTIF_FLAG_CONTINUE 0x00000001
+#endif
+
#ifndef seccomp
int seccomp(unsigned int op, unsigned int flags, void *args)
{
--
2.23.0
Hi,
Here are some patches to fix some warnings/issues on 32bit arch
(e.g. arm).
When I built the ksefltest on arm, I hit some 32bit related warnings.
Here are the patches to fix those issues.
- [1/5] va_max was set 2^32 even on 32bit arch. This can make
va_max == 0 and always fail. Make it 3GB on 32bit.
- [2/5] Some VM tests requires 64bit user space, which should
not run on 32bit arch.
- [3/5] For counting the size of large file, we should use
size_t instead of unsinged long.
- [4/5] Gcc warns printf format for size_t and int64_t on
32bit arch. Use %llu and cast it.
- [5/5] Gcc warns __u64 and pointer type castings. It should
once translated to unsigned long.
Thank you,
---
Masami Hiramatsu (5):
selftests: proc: Make va_max 3GB on 32bit arch
selftests: vm: Build/Run 64bit tests only on 64bit arch
selftests: net: Use size_t and ssize_t for counting file size
selftests: net: Fix printf format warnings on arm
selftests: sync: Fix cast warnings on arm
tools/testing/selftests/net/so_txtime.c | 4 ++--
tools/testing/selftests/net/tcp_mmap.c | 8 ++++----
tools/testing/selftests/net/udpgso.c | 3 ++-
tools/testing/selftests/net/udpgso_bench_tx.c | 3 ++-
.../selftests/proc/proc-self-map-files-002.c | 11 ++++++++++-
tools/testing/selftests/sync/sync.c | 6 +++---
tools/testing/selftests/vm/Makefile | 5 +++++
tools/testing/selftests/vm/run_vmtests | 10 ++++++++++
8 files changed, 38 insertions(+), 12 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
Hi All.
The patch 5c069b6dedef "selftests: Move test output to diagnostic lines"
from Apr 24, 2019,
leads to `make run_tests -C bpf` hanging forever.
Bpf includes many subtest, when cmd `make run_tests -C bpf` runs to
test_lwt_seg6local.sh, task will hang and runner.sh never run next task.
I checked ps aux, prefix.pl will never exit.
```
91058 [ 811.451584] # [25] VAR __license type_id=24 linkage=1
91059 [ 811.451586]-
91060 [ 811.455365] # [26] DATASEC license size=0 vlen=1 size == 0
91061 [ 811.455367]-
91062 [ 811.457424] #-
91063 [ 811.457425]-
91064 [ 811.460912] # selftests: test_lwt_seg6local [PASS]
91065 [ 811.460914]-
91066 [ 3620.461986] Thu Oct 17 14:54:05 CST 2019 detected soft_timeout
```
Ignore test_lwt_seg6local and run `make run_tests -C bpf` again, task
will hang on test_tc_tunnel.sh.
Kushwaha also meet this issue, `make run_tests -C bpf` hang on
test_lwt_ip_encap.sh (This test failed on my localhost).
--
Best regards.
Liu Yiding
Hi,friend,
This is Daniel Murray and i am purchasing manager from Sinara Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Thanks and Best Regards,
Daniel Murray
Purchasing Manager
Sinara Group Co.,LTD
Hi All,
I am trying to build kselftest on Linux-5.4-rc3+ on ubuntu 18.04. I
installed LLVM-9.0.0 and Clang-9.0.0 from below links after following
steps from [1] because of discussion [2]
https://releases.llvm.org/9.0.0/llvm-9.0.0.src.tar.xzhttps://releases.llvm.org/9.0.0/clang-tools-extra-9.0.0.src.tar.xzhttps://releases.llvm.org/9.0.0/cfe-9.0.0.src.tar.xz
After that I started this error.
make[2]: Leaving directory '/usr/src/tovards/linux/tools/lib/bpf'
(clang -I. -I./include/uapi -I../../../include/uapi
-I/usr/src/tovards/linux/tools/testing/selftests/bpf/../usr/include
-D__TARGET_ARCH_arm64 -idirafter /usr/local/include -idirafter
/usr/local/lib/clang/9.0.0/include -idirafter
/usr/include/aarch64-linux-gnu -idirafter /usr/include
-Wno-compare-distinct-pointer-types -O2 -target bpf -emit-llvm \
-c progs/test_core_reloc_ints.c -o - || echo "clang failed") | \
llc -march=arm64 -mcpu=generic -filetype=obj -o
/usr/src/tovards/linux/tools/testing/selftests/bpf/test_core_reloc_ints.o
progs/test_core_reloc_ints.c:32:6: error: using
builtin_preserve_access_index() without -g
if (BPF_CORE_READ(&out->u8_field, &in->u8_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:33:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->s8_field, &in->s8_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:34:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->u16_field, &in->u16_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:35:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->s16_field, &in->s16_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:36:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->u32_field, &in->u32_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:37:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->s32_field, &in->s32_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:38:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->u64_field, &in->u64_field) ||
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
progs/test_core_reloc_ints.c:39:6: error: using
builtin_preserve_access_index() without -g
BPF_CORE_READ(&out->s64_field, &in->s64_field))
^
./bpf_helpers.h:533:10: note: expanded from macro 'BPF_CORE_READ'
__builtin_preserve_access_index(src))
^
8 errors generated.
llc: error: llc: <stdin>:1:1: error: expected top-level entity
clang failed
In order to solve this error, I modifed bpf/Makefile as
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
- -Wno-compare-distinct-pointer-types
+ -Wno-compare-distinct-pointer-types -g
Now I am getting this error
(clang -I. -I./include/uapi -I../../../include/uapi
-I/usr/src/tovards/linux/tools/testing/selftests/bpf/../usr/include
-D__TARGET_ARCH_arm64 -idirafter /usr/local/include -idirafter
/usr/local/lib/clang/9.0.0/include -idirafter
/usr/include/aarch64-linux-gnu -idirafter /usr/include
-Wno-compare-distinct-pointer-types -g -O2 -target bpf -emit-llvm \
-c progs/test_core_reloc_ints.c -o - || echo "clang failed") | \
llc -march=arm64 -mcpu=generic -filetype=obj -o
/usr/src/tovards/linux/tools/testing/selftests/bpf/test_core_reloc_ints.o
LLVM ERROR: Cannot select: intrinsic %llvm.preserve.struct.access.index
Makefile:267: recipe for target
'/usr/src/tovards/linux/tools/testing/selftests/bpf/test_core_reloc_ints.o'
failed
make[1]: ***
[/usr/src/tovards/linux/tools/testing/selftests/bpf/test_core_reloc_ints.o]
Error 1
Please suggest!!
--prabhakar(pk)
[1]
https://stackoverflow.com/questions/47255526/how-to-build-the-latest-clang-…
[2] https://www.mail-archive.com/netdev@vger.kernel.org/msg315096.html
Linux top-commit
----------------
commit bc88f85c6c09306bd21917e1ae28205e9cd775a7 (HEAD -> master,
origin/master, origin/HEAD)
Author: Ben Dooks <ben.dooks(a)codethink.co.uk>
Date: Wed Oct 16 12:24:58 2019 +0100
kthread: make __kthread_queue_delayed_work static
The __kthread_queue_delayed_work is not exported so
make it static, to avoid the following sparse warning:
kernel/kthread.c:869:6: warning: symbol
'__kthread_queue_delayed_work' was not declared. Should it be static?
Signed-off-by: Ben Dooks <ben.dooks(a)codethink.co.uk>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Hey Knut and Shuah,
Following up on our offline discussion on Wednesday night:
We decided that it would make sense for Knut to try to implement Hybrid
Testing (testing that crosses the kernel userspace boundary) that he
introduced here[1] on top of the existing KUnit infrastructure.
We discussed several possible things in the kernel that Knut could test
with the new Hybrid Testing feature as an initial example. Those were
(in reverse order of expected difficulty):
1. RDS (Reliable Datagram Sockets) - We decided that, although this was
one of the more complicated subsystems to work with, it was probably
the best candidate for Knut to start with because it was in desperate
need of better testing, much of the testing would require crossing
the kernel userspace boundary to be effective, and Knut has access to
RDS (since he works at Oracle).
2. KMOD - Probably much simpler than RDS, and the maintainer, Luis
Chamberlain (CC'ed) would like to see better testing here, but
probably still not as good as RDS because it is in less dire need of
testing, collaboration on this would be more difficult, and Luis is
currently on an extended vacation. Luis and I had already been
discussing testing KMOD here[2].
3. IP over USB - Least desirable option, but still possible. More
complicated than KMOD, and not as easy to collaborate on as RDS.
I don't really think we discussed how this would work. I remember that I
mentioned that it would be easier if I sent out a patch that
centralizes where KUnit tests are dispatched from in the kernel; I will
try to get an RFC for that out, probably sometime next week. That should
provide a pretty straightforward place for Knut to move his work on top
of.
The next question is what the userspace component of this should look
like. To me it seems like we should probably have the kselftest test
runner manage when the test gets run, and collecting and reporting the
result of the test, but I think Knut has thought more about this than I,
and Shuah is the kselftest maintainer, so I am guessing this will
probably mostly be a discussion between the two of you.
So I think we have a couple of TODOs between us:
Brendan:
- Need to send out patch that provides a single place where all tests
are dispatched from.
Knut:
- Start splitting out the hybrid test stuff from the rest of the RFC
you sent previously.
Knut and Shuah:
- Start figuring out what the userspace component of this will look
like.
Cheers!
[1] https://lore.kernel.org/linux-kselftest/524b4e062500c6a240d4d7c0e1d0a299680…
[2] https://groups.google.com/forum/#!topic/kunit-dev/CdIytJtii00
Sorry for the delayed reply. I was on vacation.
On Fri, Oct 11, 2019 at 5:16 AM Theodore Ts'o <theodore.tso(a)gmail.com> wrote:
>
>
>
> On Friday, October 11, 2019 at 7:19:49 AM UTC-4, Brendan Higgins wrote:
>>
>>
>> Should we maybe drop `--build_dir` in favor of `O`?
>
>
> How about if "make kunit" results in "./tools/testing/kunit/kunit.py run --build_dir=/.kunit --allconfig"
>
> ... where --allconfig automatically creates kunitconfig but in includes all of the CONFIG options which depends on CONFIG_KUNIT, so that all unit tests are run? That way, we make it super easy for people to run the unit tests. Since most users are used using make targets, this I bet will significantly increase the number of developers using kunit, because it will be super-duper convenient for them.
>
> Also, it would be nice if kunit.py first looks for kunitconfig in build_dir, and then in the top-level of the kernel sources, and we put .kunit in .gitignore. That will make "git status" look nice and clean.
>
> What do folks think?
Having something like --allconfig is the ultimate goal. I had been
talking to Luis and Shuah about this for some time.
I think the best way to make this work would be for kunit_tool to be
able to detect all the tests with CONFIG_KUNIT as you suggest (or
something like it). Luis actually already suggested it; however, we
identified that this would likely not be as easy as it sounds as it is
possible to have mutually exclusive CONFIGs. Luis pointed out that
some researchers are currently working on a sat solver for Kconfig
that we could use to potentially address this problem. Nevertheless, a
complete solution in this regard is actually somewhat difficult.
Shuah's solution was just to use CONFIG fragments in the meantime
similar to what kselftest already does. I was leaning in that
direction since kselftest already does that and we know that it works.
Shuah, Luis, does this still match what you have been thinking?
+open list:KERNEL SELFTEST FRAMEWORK In case anyone in kselftest has
any thoughts.
On Thu, Oct 10, 2019 at 7:05 PM Theodore Ts'o <theodore.tso(a)gmail.com> wrote:
>
> I've been experimenting with the ext4 kunit test case, and something that would be really helpful is if the default is to store the object files for the ARCM=um kernel and its .config file in the top-level directory .kunit. That is, that the default for --build_dir should be .kunit.
>
> Why does this important? Because the kernel developer will want to be running unit tests as well as building kernels that can be run under whatever architecture they are normally developing for (for example, an x86 kernel that can be run using kvm; or a arm64 kernel that gets run on an Android device by using the "fastboot" command). So that means we don't want to be overwriting the object files and .config files for building the kernel for x86 when building the kunit kernel using the um arch. For example, for ext4, my ideal workflow might go something like this:
That's a good point.
> <hack hack hack>
> % ./tools/testing/kunit/kunit.py run
> <watch to see that unit tests succeed, and since most of the object files have already been built for the kunit kernel in be stored in the .kunit directory, this will be fast, since only the modified files will need to be recompiled>
> % kbuild
> <this is a script that builds an x86 kernel in /build/ext4-64 that is designed to be run under either kvm or in a GCE VM; since the kunit object files are stored in /build/ext4-kunit, the pre-existing files when building for x86_64 haven't been disturbed, so this build is fast as well>
> % kvm-xfstests smoke
> <this will run xfstests using the kernel plucked from /build/ext-64, using kvm>
>
> The point is when I'm developing an ext4 feature, or reviewing and merging ext4 commits, I need to be able to maintain separate build trees and separate config files for ARCH=um as well as ARCH=x86_64, and if the ARCH=um are stored in the kernel sources, then building with O=... doesn't work:
>
> <tytso@lambda> {/usr/projects/linux/kunit} (kunit)
> 1084% make O=/build/test-dir
> make[1]: Entering directory '/build/test-dir'
> ***
> *** The source tree is not clean, please run 'make mrproper'
> *** in /usr/projects/linux/kunit
> ***
Should we maybe drop `--build_dir` in favor of `O`?
> One of the other reasons why it would be good to use --build_dir by default is that way, building with a separate O= build directory is regularly tested. Right now, "kunit.py --build_dir=" seems to be broken.
Good point.
> % ./tools/testing/kunit/kunit.py run --build_dir=/build/ext4-kunit
> Generating .config ...
> [22:04:12] Building KUnit Kernel ...
> /usr/projects/linux/kunit/arch/x86/um/user-offsets.c:20:10: fatal error: asm/syscalls_64.h: No such file or directory
> 20 | #include <asm/syscalls_64.h>
> | ^~~~~~~~~~~~~~~~~~~
> compilation terminated.
>
> (This appears to be an ARCH=um bug, not a kunit bug, though.)
Yeah, I encountered this before. Some file is not getting properly
cleaned up by `make mrproper`. It works if you do `git clean -fdx` (I
know that's not a real solution for most people). Nevertheless, it
sounds like we need to sit down and actually solve this problem since
it is affecting users now.
I think you make a compelling argument. Anyone else have any thoughts on this?
From: Kees Cook <keescook(a)chromium.org>
[ Upstream commit 852c8cbf34d3b3130a05c38064dd98614f97d3a8 ]
Commit a745f7af3cbd ("selftests/harness: Add 30 second timeout per
test") solves the problem of kselftest_harness.h-using binary tests
possibly hanging forever. However, scripts and other binaries can still
hang forever. This adds a global timeout to each test script run.
To make this configurable (e.g. as needed in the "rtc" test case),
include a new per-test-directory "settings" file (similar to "config")
that can contain kselftest-specific settings. The first recognized field
is "timeout".
Additionally, this splits the reporting for timeouts into a specific
"TIMEOUT" not-ok (and adds exit code reporting in the remaining case).
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kselftest/runner.sh | 36 +++++++++++++++++++--
tools/testing/selftests/rtc/settings | 1 +
2 files changed, 34 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/rtc/settings
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 00c9020bdda8b..84de7bc74f2cf 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -3,9 +3,14 @@
#
# Runs a set of tests in a given subdirectory.
export skip_rc=4
+export timeout_rc=124
export logfile=/dev/stdout
export per_test_logging=
+# Defaults for "settings" file fields:
+# "timeout" how many seconds to let each test run before failing.
+export kselftest_default_timeout=45
+
# There isn't a shell-agnostic way to find the path of a sourced file,
# so we must rely on BASE_DIR being set to find other tools.
if [ -z "$BASE_DIR" ]; then
@@ -24,6 +29,16 @@ tap_prefix()
fi
}
+tap_timeout()
+{
+ # Make sure tests will time out if utility is available.
+ if [ -x /usr/bin/timeout ] ; then
+ /usr/bin/timeout "$kselftest_timeout" "$1"
+ else
+ "$1"
+ fi
+}
+
run_one()
{
DIR="$1"
@@ -32,6 +47,18 @@ run_one()
BASENAME_TEST=$(basename $TEST)
+ # Reset any "settings"-file variables.
+ export kselftest_timeout="$kselftest_default_timeout"
+ # Load per-test-directory kselftest "settings" file.
+ settings="$BASE_DIR/$DIR/settings"
+ if [ -r "$settings" ] ; then
+ while read line ; do
+ field=$(echo "$line" | cut -d= -f1)
+ value=$(echo "$line" | cut -d= -f2-)
+ eval "kselftest_$field"="$value"
+ done < "$settings"
+ fi
+
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
if [ ! -x "$TEST" ]; then
@@ -44,14 +71,17 @@ run_one()
echo "not ok $test_num $TEST_HDR_MSG"
else
cd `dirname $TEST` > /dev/null
- (((((./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
- (if [ $? -eq $skip_rc ]; then \
+ (rc=$?; \
+ if [ $rc -eq $skip_rc ]; then \
echo "not ok $test_num $TEST_HDR_MSG # SKIP"
+ elif [ $rc -eq $timeout_rc ]; then \
+ echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT"
else
- echo "not ok $test_num $TEST_HDR_MSG"
+ echo "not ok $test_num $TEST_HDR_MSG # exit=$rc"
fi)
cd - >/dev/null
fi
diff --git a/tools/testing/selftests/rtc/settings b/tools/testing/selftests/rtc/settings
new file mode 100644
index 0000000000000..ba4d85f74cd6b
--- /dev/null
+++ b/tools/testing/selftests/rtc/settings
@@ -0,0 +1 @@
+timeout=90
--
2.20.1
From: Cristian Marussi <cristian.marussi(a)arm.com>
[ Upstream commit 131b30c94fbc0adb15f911609884dd39dada8f00 ]
A TARGET which failed to be built/installed should not be included in the
runlist generated inside the run_kselftest.sh script.
Signed-off-by: Cristian Marussi <cristian.marussi(a)arm.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/Makefile | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 25b43a8c2b159..1779923d7a7b8 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -198,8 +198,12 @@ ifdef INSTALL_PATH
echo " cat /dev/null > \$$logfile" >> $(ALL_SCRIPT)
echo "fi" >> $(ALL_SCRIPT)
+ @# While building run_kselftest.sh skip also non-existent TARGET dirs:
+ @# they could be the result of a build failure and should NOT be
+ @# included in the generated runlist.
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
+ [ ! -d $$INSTALL_PATH/$$TARGET ] && echo "Skipping non-existent dir: $$TARGET" && continue; \
echo "[ -w /dev/kmsg ] && echo \"kselftest: Running tests in $$TARGET\" >> /dev/kmsg" >> $(ALL_SCRIPT); \
echo "cd $$TARGET" >> $(ALL_SCRIPT); \
echo -n "run_many" >> $(ALL_SCRIPT); \
--
2.20.1