The current kunit execution model is to provide base kunit functionality
and tests built-in to the kernel. The aim of this series is to allow
building kunit itself and tests as modules. This in turn allows a
simple form of selective execution; load the module you wish to test.
In doing so, kunit itself (if also built as a module) will be loaded as
an implicit dependency.
Because this requires a core API modification - if a module delivers
multiple suites, they must be declared with the kunit_test_suites()
macro - we're proposing this patch set as a candidate to be applied to the
test tree before too many kunit consumers appear. We attempt to deal
with existing consumers in patch 3.
Changes since v6:
- reintroduce kunit_test_suite() definition to handle users in other trees
not yet converted to using kunit_test_suites() (kbuild error when
applying patches to ext4/dev tree)
- modify drivers/base/power/qos-test.c to use kunit_test_suites()
to register suite. We do not convert it to support module build now as
the suite uses a few unexported function; see patch 3 for details.
Changes since v5:
- fixed fs/ext4/Makefile to remove unneeded conditional compilation
(Iurii, patch 3)
- added Reviewed-by, Acked-by to patches 3, 4, 5 and 6
Changes since v4:
- fixed signoff chain to use Co-developed-by: prior to Knut's signoff
(Stephen, all patches)
- added Reviewed-by, Tested-by for patches 1, 2, 4 and 6
- updated comment describing try-catch-impl.h (Stephen, patch 2)
- fixed MODULE_LICENSEs to be GPL v2 (Stephen, patches 3, 5)
- added __init to kunit_init() (Stephen, patch 5)
Changes since v3:
- removed symbol lookup patch for separate submission later
- removed use of sysctl_hung_task_timeout_seconds (patch 4, as discussed
with Brendan and Stephen)
- disabled build of string-stream-test when CONFIG_KUNIT_TEST=m; this
is to avoid having to deal with symbol lookup issues
- changed string-stream-impl.h back to string-stream.h (Brendan)
- added module build support to new list, ext4 tests
Changes since v2:
- moved string-stream.h header to lib/kunit/string-stream-impl.h (Brendan)
(patch 1)
- split out non-exported interfaces in try-catch-impl.h (Brendan)
(patch 2)
- added kunit_find_symbol() and KUNIT_INIT_SYMBOL to lookup non-exported
symbols (patches 3, 4)
- removed #ifdef MODULE around module licenses (Randy, Brendan, Andy)
(patch 4)
- replaced kunit_test_suite() with kunit_test_suites() rather than
supporting both (Brendan) (patch 4)
- lookup sysctl_hung_task_timeout_secs as kunit may be built as a module
and the symbol may not be available (patch 5)
Alan Maguire (6):
kunit: move string-stream.h to lib/kunit
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: allow kunit tests to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit to be loaded as a module
kunit: update documentation to describe module-based build
Documentation/dev-tools/kunit/faq.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 3 ++
Documentation/dev-tools/kunit/usage.rst | 16 ++++++++++
drivers/base/power/qos-test.c | 2 +-
fs/ext4/Kconfig | 2 +-
fs/ext4/Makefile | 3 +-
fs/ext4/inode-test.c | 4 ++-
include/kunit/assert.h | 3 +-
include/kunit/test.h | 37 ++++++++++++++++------
include/kunit/try-catch.h | 10 ------
kernel/sysctl-test.c | 4 ++-
lib/Kconfig.debug | 4 +--
lib/kunit/Kconfig | 6 ++--
lib/kunit/Makefile | 14 +++++---
lib/kunit/assert.c | 10 ++++++
lib/kunit/{example-test.c => kunit-example-test.c} | 4 ++-
lib/kunit/{test-test.c => kunit-test.c} | 7 ++--
lib/kunit/string-stream-test.c | 5 +--
lib/kunit/string-stream.c | 3 +-
{include => lib}/kunit/string-stream.h | 0
lib/kunit/test.c | 25 ++++++++++++++-
lib/kunit/try-catch-impl.h | 27 ++++++++++++++++
lib/kunit/try-catch.c | 37 +++++-----------------
lib/list-test.c | 4 ++-
24 files changed, 160 insertions(+), 73 deletions(-)
rename lib/kunit/{example-test.c => kunit-example-test.c} (97%)
rename lib/kunit/{test-test.c => kunit-test.c} (98%)
rename {include => lib}/kunit/string-stream.h (100%)
create mode 100644 lib/kunit/try-catch-impl.h
--
1.8.3.1
Hi Morimoto-san, Karl,
On Wed, Dec 18, 2019 at 6:22 AM Kuninori Morimoto
<kuninori.morimoto.gx(a)renesas.com> wrote:
> From: Kuninori Morimoto <kuninori.morimoto.gx(a)renesas.com>
>
> Current SH will get below warning at strncpy()
>
> In file included from ${LINUX}/arch/sh/include/asm/string.h:3,
> from ${LINUX}/include/linux/string.h:20,
> from ${LINUX}/include/linux/bitmap.h:9,
> from ${LINUX}/include/linux/nodemask.h:95,
> from ${LINUX}/include/linux/mmzone.h:17,
> from ${LINUX}/include/linux/gfp.h:6,
> from ${LINUX}/innclude/linux/slab.h:15,
> from ${LINUX}/linux/drivers/mmc/host/vub300.c:38:
> ${LINUX}/drivers/mmc/host/vub300.c: In function 'new_system_port_status':
> ${LINUX}/arch/sh/include/asm/string_32.h:51:42: warning: array subscript\
> 80 is above array bounds of 'char[26]' [-Warray-bounds]
> : "0" (__dest), "1" (__src), "r" (__src+__n)
> ~~~~~^~~~
>
> In general, strncpy() should behave like below.
>
> char dest[10];
> char *src = "12345";
>
> strncpy(dest, src, 10);
> // dest = {'1', '2', '3', '4', '5',
> '\0','\0','\0','\0','\0'}
>
> But, current SH strnpy() has 2 issues.
> 1st is it will access to out-of-memory (= src + 10).
I believe this is not correct: the code does not really access memory
beyond the end of the source string. (Recent) gcc just thinks so,
because "__src+__n" is used as a parameter to the routine.
> 2nd is it needs big fixup for it, and maintenance __asm__
> code is difficult.
Yeah, the padding is missing.
> To solve these issues, this patch simply uses generic strncpy()
> instead of architecture specific one.
That will definitely fix the issue, as we assume the generic
implementation is correct ;-)
Now, I've just tried, naively, to enable CONFIG_STRING_SELFTEST=y in my
rts7751r2d build (without your patch), and boot it in qemu:
String selftests succeeded
Woops, turns out lib/test_string.c does not have any testcases for
strncpy()...
So adding test code for the corner cases may be a valuable contribution.
Thanks!
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
The design of the original open_how struct layout was such that it
ensured that there would be no un-labelled (and thus potentially
non-zero) padding to avoid issues with struct expansion, as well as
providing a uniform representation on all architectures (to avoid
complications with OPEN_HOW_SIZE versioning).
However, there were a few other desirable features which were not
fulfilled by the previous struct layout:
* Adding new features (other than new flags) should always result in
the struct getting larger. However, by including a padding field, it
was possible for new fields to be added without expanding the
structure. This would somewhat complicate version-number based
checking of feature support.
* A non-zero bit in __padding yielded -EINVAL when it should arguably
have been -E2BIG (because the padding bits are effectively
yet-to-be-used fields). However, the semantics are not entirely clear
because userspace may expect -E2BIG to only signify that the
structure is too big. It's much simpler to just provide the guarantee
that new fields will always result in a struct size increase, and
-E2BIG indicates you're using a field that's too recent for an older
kernel.
* While the alignment for u64s was manually backed by extra padding
fields, some languages (such as Rust) do not currently support
enforcing alignment of struct field members.
* The padding wasted space needlessly, and would very likely not be
used up entirely by future extensions for a long time (because it
couldn't fit a u64).
While none of these outstanding issues are deal-breakers, we can iron
out these warts before openat2(2) lands in Linus's tree. Instead of
using alignment and padding, we simply pack the structure with
__attribute__((packed)). Rust supports #[repr(packed)] and it removes
all of the issues with having explicit padding.
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
fs/open.c | 2 --
include/uapi/linux/fcntl.h | 11 +++++------
tools/testing/selftests/openat2/helpers.h | 11 +++++------
tools/testing/selftests/openat2/openat2_test.c | 18 +-----------------
4 files changed, 11 insertions(+), 31 deletions(-)
diff --git a/fs/open.c b/fs/open.c
index 50a46501bcc9..8cdb2b675867 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -993,8 +993,6 @@ static inline int build_open_flags(const struct open_how *how,
return -EINVAL;
if (how->resolve & ~VALID_RESOLVE_FLAGS)
return -EINVAL;
- if (memchr_inv(how->__padding, 0, sizeof(how->__padding)))
- return -EINVAL;
/* Deal with the mode. */
if (WILL_CREATE(flags)) {
diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
index d886bdb585e4..0e070c7f568a 100644
--- a/include/uapi/linux/fcntl.h
+++ b/include/uapi/linux/fcntl.h
@@ -109,17 +109,16 @@
* O_TMPFILE} are set.
*
* @flags: O_* flags.
- * @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
+ * @mode: O_CREAT/O_TMPFILE file mode.
*/
struct open_how {
- __aligned_u64 flags;
+ __u64 flags;
+ __u64 resolve;
__u16 mode;
- __u16 __padding[3]; /* must be zeroed */
- __aligned_u64 resolve;
-};
+} __attribute__((packed));
-#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
+#define OPEN_HOW_SIZE_VER0 18 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
/* how->resolve flags for openat2(2). */
diff --git a/tools/testing/selftests/openat2/helpers.h b/tools/testing/selftests/openat2/helpers.h
index 43ca5ceab6e3..eb1535c8fa2e 100644
--- a/tools/testing/selftests/openat2/helpers.h
+++ b/tools/testing/selftests/openat2/helpers.h
@@ -32,17 +32,16 @@
* O_TMPFILE} are set.
*
* @flags: O_* flags.
- * @mode: O_CREAT/O_TMPFILE file mode.
* @resolve: RESOLVE_* flags.
+ * @mode: O_CREAT/O_TMPFILE file mode.
*/
struct open_how {
- __aligned_u64 flags;
+ __u64 flags;
+ __u64 resolve;
__u16 mode;
- __u16 __padding[3]; /* must be zeroed */
- __aligned_u64 resolve;
-};
+} __attribute__((packed));
-#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
+#define OPEN_HOW_SIZE_VER0 18 /* sizeof first published struct */
#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
bool needs_openat2(const struct open_how *how);
diff --git a/tools/testing/selftests/openat2/openat2_test.c b/tools/testing/selftests/openat2/openat2_test.c
index 0b64fedc008b..cbf95d160b1b 100644
--- a/tools/testing/selftests/openat2/openat2_test.c
+++ b/tools/testing/selftests/openat2/openat2_test.c
@@ -40,7 +40,7 @@ struct struct_test {
int err;
};
-#define NUM_OPENAT2_STRUCT_TESTS 10
+#define NUM_OPENAT2_STRUCT_TESTS 7
#define NUM_OPENAT2_STRUCT_VARIATIONS 13
void test_openat2_struct(void)
@@ -57,22 +57,6 @@ void test_openat2_struct(void)
.arg.inner.flags = O_RDONLY,
.size = sizeof(struct open_how_ext) },
- /* Normal struct with broken padding. */
- { .name = "normal struct (non-zero padding[0])",
- .arg.inner.flags = O_RDONLY,
- .arg.inner.__padding = {0xa0, 0x00, 0x00},
- .size = sizeof(struct open_how_ext), .err = -EINVAL },
- { .name = "normal struct (non-zero padding[1])",
- .arg.inner.flags = O_RDONLY,
- .arg.inner.__padding = {0x00, 0x1a, 0x00},
- .size = sizeof(struct open_how_ext), .err = -EINVAL },
- { .name = "normal struct (non-zero padding[2])",
- .arg.inner.flags = O_RDONLY,
- .arg.inner.__padding = {0x00, 0x00, 0xef},
- .size = sizeof(struct open_how_ext), .err = -EINVAL },
-
- /* TODO: Once expanded, check zero-padding. */
-
/* Smaller than version-0 struct. */
{ .name = "zero-sized 'struct'",
.arg.inner.flags = O_RDONLY, .size = 0, .err = -EINVAL },
base-commit: 912dfe068c43fa13c587b8d30e73d335c5ba7d44
--
2.24.0
livepatch test configures the system and debug environment to run
tests. Some of these actions fail without root access and test
dumps several permission denied messages before it exits.
Fix test-state.sh to call setup_config instead of set_dynamic_debug
as suggested by Petr Mladek <pmladek(a)suse.com>
Fix it to check root uid and exit with skip code instead.
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/livepatch/functions.sh | 15 ++++++++++++++-
tools/testing/selftests/livepatch/test-state.sh | 3 +--
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 31eb09e38729..a6e3d5517a6f 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -7,6 +7,9 @@
MAX_RETRIES=600
RETRY_INTERVAL=".1" # seconds
+# Kselftest framework requirement - SKIP code is 4
+ksft_skip=4
+
# log(msg) - write message to kernel log
# msg - insightful words
function log() {
@@ -18,7 +21,16 @@ function log() {
function skip() {
log "SKIP: $1"
echo "SKIP: $1" >&2
- exit 4
+ exit $ksft_skip
+}
+
+# root test
+function is_root() {
+ uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
}
# die(msg) - game over, man
@@ -62,6 +74,7 @@ function set_ftrace_enabled() {
# for verbose livepatching output and turn on
# the ftrace_enabled sysctl.
function setup_config() {
+ is_root
push_config
set_dynamic_debug
set_ftrace_enabled 1
diff --git a/tools/testing/selftests/livepatch/test-state.sh b/tools/testing/selftests/livepatch/test-state.sh
index dc2908c22c26..a08212708115 100755
--- a/tools/testing/selftests/livepatch/test-state.sh
+++ b/tools/testing/selftests/livepatch/test-state.sh
@@ -8,8 +8,7 @@ MOD_LIVEPATCH=test_klp_state
MOD_LIVEPATCH2=test_klp_state2
MOD_LIVEPATCH3=test_klp_state3
-set_dynamic_debug
-
+setup_config
# TEST: Loading and removing a module that modifies the system state
--
2.20.1
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a point of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
Changes since v8:
* Merged the "mm/gup: pass flags arg to __gup_device_* functions" patch
into the "mm/gup: track FOLL_PIN pages" patch, as requested by
Christoph and Jan.
* Changed void grab_page() to bool try_grab_page(), and handled errors
at the call sites. (From Jan's review comments.) try_grab_page()
attempts to avoid page refcount overflows, even when counting up with
GUP_PIN_COUNTING_BIAS increments.
* Fixed a bug that I'd introduced, when changing a BUG() to a WARN().
* Added Jan's reviewed-by tag to the " mm/gup: allow FOLL_FORCE for
get_user_pages_fast()" patch.
* Documentation: pin_user_pages.rst: fixed an incorrect gup_benchmark
invocation, left over from the pin_longterm days, spotted while preparing
this version.
* Rebased onto today's linux.git (-rc1), and re-tested.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (24):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 232 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 149 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 598 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 26 +-
mm/hugetlb.c | 25 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1109 insertions(+), 355 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a point of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
Changes since v9: Fixes resulting from Jan Kara's and Jonathan Corbet's
reviews:
* Removed reviewed-by tags from the "mm/gup: track FOLL_PIN pages" (those
were improperly inherited from the much smaller refactoring patch that
was merged into it).
* Made try_grab_compound_head() and try_grab_page() behavior similar in
their behavior with flags, in order to avoid "gotchas" later.
* follow_trans_huge_pmd(): moved the try_grab_page() to earlier in the
routine, in order to avoid having to undo mlock_vma_page().
* follow_hugetlb_page(): removed a refcount overflow check that is now
extraneous (and weaker than what try_grab_page() provides a few lines
further down).
* Fixed up two Documentation flaws, pointed out by Jonathan Corbet's
review.
Changes since v8:
* Merged the "mm/gup: pass flags arg to __gup_device_* functions" patch
into the "mm/gup: track FOLL_PIN pages" patch, as requested by
Christoph and Jan.
* Changed void grab_page() to bool try_grab_page(), and handled errors
at the call sites. (From Jan's review comments.) try_grab_page()
attempts to avoid page refcount overflows, even when counting up with
GUP_PIN_COUNTING_BIAS increments.
* Fixed a bug that I'd introduced, when changing a BUG() to a WARN().
* Added Jan's reviewed-by tag to the " mm/gup: allow FOLL_FORCE for
get_user_pages_fast()" patch.
* Documentation: pin_user_pages.rst: fixed an incorrect gup_benchmark
invocation, left over from the pin_longterm days, spotted while preparing
this version.
* Rebased onto today's linux.git (-rc1), and re-tested.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (24):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 232 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 149 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 626 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 45 +-
mm/hugetlb.c | 38 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1147 insertions(+), 377 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
On 2019-12-16, Florian Weimer <fweimer(a)redhat.com> wrote:
> > diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> > index 1d338357df8a..58c3a0e543c6 100644
> > --- a/include/uapi/linux/fcntl.h
> > +++ b/include/uapi/linux/fcntl.h
> > @@ -93,5 +93,40 @@
> >
> > #define AT_RECURSIVE 0x8000 /* Apply to the entire subtree */
> >
> > +/*
> > + * Arguments for how openat2(2) should open the target path. If @resolve is
> > + * zero, then openat2(2) operates very similarly to openat(2).
> > + *
> > + * However, unlike openat(2), unknown bits in @flags result in -EINVAL rather
> > + * than being silently ignored. @mode must be zero unless one of {O_CREAT,
> > + * O_TMPFILE} are set.
> > + *
> > + * @flags: O_* flags.
> > + * @mode: O_CREAT/O_TMPFILE file mode.
> > + * @resolve: RESOLVE_* flags.
> > + */
> > +struct open_how {
> > + __aligned_u64 flags;
> > + __u16 mode;
> > + __u16 __padding[3]; /* must be zeroed */
> > + __aligned_u64 resolve;
> > +};
> > +
> > +#define OPEN_HOW_SIZE_VER0 24 /* sizeof first published struct */
> > +#define OPEN_HOW_SIZE_LATEST OPEN_HOW_SIZE_VER0
> > +
> > +/* how->resolve flags for openat2(2). */
> > +#define RESOLVE_NO_XDEV 0x01 /* Block mount-point crossings
> > + (includes bind-mounts). */
> > +#define RESOLVE_NO_MAGICLINKS 0x02 /* Block traversal through procfs-style
> > + "magic-links". */
> > +#define RESOLVE_NO_SYMLINKS 0x04 /* Block traversal through all symlinks
> > + (implies OEXT_NO_MAGICLINKS) */
> > +#define RESOLVE_BENEATH 0x08 /* Block "lexical" trickery like
> > + "..", symlinks, and absolute
> > + paths which escape the dirfd. */
> > +#define RESOLVE_IN_ROOT 0x10 /* Make all jumps to "/" and ".."
> > + be scoped inside the dirfd
> > + (similar to chroot(2)). */
> >
> > #endif /* _UAPI_LINUX_FCNTL_H */
>
> Would it be possible to move these to a new UAPI header?
>
> In glibc, we currently do not #include <linux/fcntl.h>. We need some of
> the AT_* constants in POSIX mode, and the header is not necessarily
> namespace-clean. If there was a separate header for openat2 support, we
> could use that easily, and we would only have to maintain the baseline
> definitions (which never change).
Sure, (assuming nobody objects) I can move it to "linux/openat2.h".
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
When handling page faults for many vCPUs during demand paging, KVM's MMU
lock becomes highly contended. This series creates a test with a naive
userfaultfd based demand paging implementation to demonstrate that
contention. This test serves both as a functional test of userfaultfd
and a microbenchmark of demand paging performance with a variable number
of vCPUs and memory per vCPU.
The test creates N userfaultfd threads, N vCPUs, and a region of memory
with M pages per vCPU. The N userfaultfd polling threads are each set up
to serve faults on a region of memory corresponding to one of the vCPUs.
Each of the vCPUs is then started, and touches each page of its disjoint
memory region, sequentially. In response to faults, the userfaultfd
threads copy a static buffer into the guest's memory. This creates a
worst case for MMU lock contention as we have removed most of the
contention between the userfaultfd threads and there is no time required
to fetch the contents of guest memory.
This test was run successfully on Intel Haswell, Broadwell, and
Cascadelake hosts with a variety of vCPU counts and memory sizes.
This test was adapted from the dirty_log_test.
The series can also be viewed in Gerrit here:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/1464
(Thanks to Dmitry Vyukov <dvyukov(a)google.com> for setting up the Gerrit
instance)
Ben Gardon (9):
KVM: selftests: Create a demand paging test
KVM: selftests: Add demand paging content to the demand paging test
KVM: selftests: Add memory size parameter to the demand paging test
KVM: selftests: Pass args to vCPU instead of using globals
KVM: selftests: Support multiple vCPUs in demand paging test
KVM: selftests: Time guest demand paging
KVM: selftests: Add parameter to _vm_create for memslot 0 base paddr
KVM: selftests: Support large VMs in demand paging test
Add static flag
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 4 +-
.../selftests/kvm/demand_paging_test.c | 610 ++++++++++++++++++
tools/testing/selftests/kvm/dirty_log_test.c | 2 +-
.../testing/selftests/kvm/include/kvm_util.h | 3 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
6 files changed, 621 insertions(+), 6 deletions(-)
create mode 100644 tools/testing/selftests/kvm/demand_paging_test.c
--
2.23.0.444.g18eeb5a265-goog
Hi Linus,
Please pull the following Kselftest update for Linux 5.5-rc3.
This Kselftest fixes update for Linux 5.5-rc2 consists of
-- ftrace and safesetid test fixes from Masami Hiramatsu
-- Kunit fixes from Brendan Higgins, Iurii Zaikin, and Heidi Fahim
-- Kselftest framework fixes from SeongJae Park and Michael Ellerman
I was planning to send this for rc2 and ran into kernel.org outage
and decided to wait on it.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit e42617b825f8073569da76dc4510bfa019b1c35a:
Linux 5.5-rc1 (2019-12-08 14:57:55 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.5-rc2
for you to fetch changes up to 4eac734486fd431e0756cc5e929f140911a36a53:
kselftest: Support old perl versions (2019-12-11 10:31:16 -0700)
----------------------------------------------------------------
linux-kselftest-5.5-rc2
This Kselftest fixes update for Linux 5.5-rc2 consists of
-- ftrace and safesetid test fixes from Masami Hiramatsu
-- Kunit fixes from Brendan Higgins, Iurii Zaikin, and Heidi Fahim
-- Kselftest framework fixes from SeongJae Park and Michael Ellerman
----------------------------------------------------------------
Brendan Higgins (2):
Documentation: kunit: fix typos and gramatical errors
Documentation: kunit: add documentation for kunit_tool
Heidi Fahim (1):
kunit: testing kunit: Bug fix in test_run_timeout function
Iurii Zaikin (1):
fs/ext4/inode-test: Fix inode test on 32 bit platforms.
Masami Hiramatsu (7):
selftests/ftrace: Fix to check the existence of set_ftrace_filter
selftests/ftrace: Fix ftrace test cases to check unsupported
selftests/ftrace: Do not to use absolute debugfs path
selftests/ftrace: Fix multiple kprobe testcase
selftests: safesetid: Move link library to LDLIBS
selftests: safesetid: Check the return value of setuid/setgid
selftests: safesetid: Fix Makefile to set correct test program
Michael Ellerman (1):
selftests: Fix dangling documentation references to
kselftest_module.sh
SeongJae Park (2):
kselftest/runner: Print new line in print of timeout log
kselftest: Support old perl versions
Documentation/dev-tools/kselftest.rst | 8 +--
Documentation/dev-tools/kunit/index.rst | 1 +
Documentation/dev-tools/kunit/kunit-tool.rst | 57
++++++++++++++++++++++
Documentation/dev-tools/kunit/start.rst | 13 +++--
Documentation/dev-tools/kunit/usage.rst | 24 ++++-----
fs/ext4/inode-test.c | 2 +-
tools/testing/kunit/kunit_tool_test.py | 2 +-
.../ftrace/test.d/ftrace/func-filter-stacktrace.tc | 2 +
.../selftests/ftrace/test.d/ftrace/func_cpumask.tc | 5 ++
tools/testing/selftests/ftrace/test.d/functions | 5 +-
.../ftrace/test.d/kprobe/multiple_kprobes.tc | 6 +--
.../inter-event/trigger-action-hist-xfail.tc | 4 +-
.../inter-event/trigger-onchange-action-hist.tc | 2 +-
.../inter-event/trigger-snapshot-action-hist.tc | 4 +-
tools/testing/selftests/kselftest/module.sh | 2 +-
tools/testing/selftests/kselftest/prefix.pl | 1 +
tools/testing/selftests/kselftest/runner.sh | 1 +
tools/testing/selftests/safesetid/Makefile | 5 +-
tools/testing/selftests/safesetid/safesetid-test.c | 15 ++++--
19 files changed, 119 insertions(+), 40 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/kunit-tool.rst
----------------------------------------------------------------
livepatch test configures the system and debug environment to run
tests. Some of these actions fail without root access and test
dumps several permission denied messages before it exits.
Fix it to check root uid and exit with skip code instead.
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/livepatch/functions.sh | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh
index 31eb09e38729..014b587692f0 100644
--- a/tools/testing/selftests/livepatch/functions.sh
+++ b/tools/testing/selftests/livepatch/functions.sh
@@ -7,6 +7,9 @@
MAX_RETRIES=600
RETRY_INTERVAL=".1" # seconds
+# Kselftest framework requirement - SKIP code is 4
+ksft_skip=4
+
# log(msg) - write message to kernel log
# msg - insightful words
function log() {
@@ -18,7 +21,16 @@ function log() {
function skip() {
log "SKIP: $1"
echo "SKIP: $1" >&2
- exit 4
+ exit $ksft_skip
+}
+
+# root test
+function is_root() {
+ uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
}
# die(msg) - game over, man
@@ -45,6 +57,7 @@ function pop_config() {
}
function set_dynamic_debug() {
+ is_root
cat <<-EOF > /sys/kernel/debug/dynamic_debug/control
file kernel/livepatch/* +p
func klp_try_switch_task -p
@@ -62,6 +75,7 @@ function set_ftrace_enabled() {
# for verbose livepatching output and turn on
# the ftrace_enabled sysctl.
function setup_config() {
+ is_root
push_config
set_dynamic_debug
set_ftrace_enabled 1
--
2.20.1
This test only works when [1] is applied, which was rejected.
Basically, the errors are reported and cleared. In this particular case of
tls sockets, following reads will block.
The test case was originally submitted with the rejected patch, but, then,
was included as part of a different patchset, possibly by mistake.
[1] https://lore.kernel.org/netdev/20191007035323.4360-2-jakub.kicinski@netrono…
Thanks Paolo Pisati for pointing out the original patchset where this
appeared.
Cc: Jakub Kicinski <jakub.kicinski(a)netronome.com>
Fixes: 65190f77424d (selftests/tls: add a test for fragmented messages)
Reported-by: Paolo Pisati <paolo.pisati(a)canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
---
tools/testing/selftests/net/tls.c | 28 ----------------------------
1 file changed, 28 deletions(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 13e5ef615026..0ea44d975b6c 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -722,34 +722,6 @@ TEST_F(tls, recv_lowat)
EXPECT_EQ(memcmp(send_mem, recv_mem + 10, 5), 0);
}
-TEST_F(tls, recv_rcvbuf)
-{
- char send_mem[4096];
- char recv_mem[4096];
- int rcv_buf = 1024;
-
- memset(send_mem, 0x1c, sizeof(send_mem));
-
- EXPECT_EQ(setsockopt(self->cfd, SOL_SOCKET, SO_RCVBUF,
- &rcv_buf, sizeof(rcv_buf)), 0);
-
- EXPECT_EQ(send(self->fd, send_mem, 512, 0), 512);
- memset(recv_mem, 0, sizeof(recv_mem));
- EXPECT_EQ(recv(self->cfd, recv_mem, sizeof(recv_mem), 0), 512);
- EXPECT_EQ(memcmp(send_mem, recv_mem, 512), 0);
-
- if (self->notls)
- return;
-
- EXPECT_EQ(send(self->fd, send_mem, 4096, 0), 4096);
- memset(recv_mem, 0, sizeof(recv_mem));
- EXPECT_EQ(recv(self->cfd, recv_mem, sizeof(recv_mem), 0), -1);
- EXPECT_EQ(errno, EMSGSIZE);
-
- EXPECT_EQ(recv(self->cfd, recv_mem, sizeof(recv_mem), 0), -1);
- EXPECT_EQ(errno, EMSGSIZE);
-}
-
TEST_F(tls, bidir)
{
char const *test_str = "test_read";
--
2.24.0
firmware attempts to load test modules that require root access
and fail. Fix it to check for root uid and exit with skip code
instead.
Before this fix:
selftests: firmware: fw_run_tests.sh
modprobe: ERROR: could not insert 'test_firmware': Operation not permitted
You must have the following enabled in your kernel:
CONFIG_TEST_FIRMWARE=y
CONFIG_FW_LOADER=y
CONFIG_FW_LOADER_USER_HELPER=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
With this fix:
selftests: firmware: fw_run_tests.sh
skip all tests: must be run as root
not ok 1 selftests: firmware: fw_run_tests.sh # SKIP
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
---
tools/testing/selftests/firmware/fw_lib.sh | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/firmware/fw_lib.sh b/tools/testing/selftests/firmware/fw_lib.sh
index b879305a766d..5b8c0fedee76 100755
--- a/tools/testing/selftests/firmware/fw_lib.sh
+++ b/tools/testing/selftests/firmware/fw_lib.sh
@@ -34,6 +34,12 @@ test_modprobe()
check_mods()
{
+ local uid=$(id -u)
+ if [ $uid -ne 0 ]; then
+ echo "skip all tests: must be run as root" >&2
+ exit $ksft_skip
+ fi
+
trap "test_modprobe" EXIT
if [ ! -d $DIR ]; then
modprobe test_firmware
--
2.20.1
From: SeongJae Park <sjpark(a)amazon.de>
This patchset contains trivial fixes for the kunit documentations and
the wrapper python scripts.
This patchset is based on 'kselftest/test' branch of linux-kselftest[1]
and depends on Heidi's patch[2]. A complete tree is available at my repo:
https://github.com/sjp38/linux/tree/kunit_fix/20191205_v5
Changes from v4
(https://lore.kernel.org/linux-doc/1575490683-13015-1-git-send-email-sj38.pa…):
- Rebased on Heidi Fahim's patch[2]
- Fix failing kunit_tool_test test
- Add 'build_dir' option test in 'kunit_tool_test.py'
Changes from v3
(https://lore.kernel.org/linux-kselftest/20191204192141.GA247851@google.com):
- Fix the 4th patch, "kunit: Place 'test.log' under the 'build_dir'" to
set default value of 'build_dir' as '' instead of NULL so that kunit
can run even though '--build_dir' option is not given.
Changes from v2
(https://lore.kernel.org/linux-kselftest/1575361141-6806-1-git-send-email-sj…):
- Make 'build_dir' if not exists (missed from v3 by mistake)
Changes from v1
(https://lore.kernel.org/linux-doc/1575242724-4937-1-git-send-email-sj38.par…):
- Remove "docs/kunit/start: Skip wrapper run command" (A similar
approach is ongoing)
- Make 'build_dir' if not exists
SeongJae Park (6):
docs/kunit/start: Use in-tree 'kunit_defconfig'
kunit: Remove duplicated defconfig creation
kunit: Create default config in '--build_dir'
kunit: Place 'test.log' under the 'build_dir'
kunit: Rename 'kunitconfig' to '.kunitconfig'
kunit/kunit_tool_test: Test '--build_dir' option run
Documentation/dev-tools/kunit/start.rst | 13 +++++--------
tools/testing/kunit/kunit.py | 18 +++++++++++-------
tools/testing/kunit/kunit_kernel.py | 10 +++++-----
tools/testing/kunit/kunit_tool_test.py | 10 +++++++++-
4 files changed, 30 insertions(+), 21 deletions(-)
--
[1] git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
[2] "kunit: testing kunit: Bug fix in test_run_timeout function",
https://lore.kernel.org/linux-kselftest/CAFd5g47a7a8q7by+1ALBtepeegLvfkgwvC…)
2.17.1
From: SeongJae Park <sjpark(a)amazon.de>
If a timeout failure occurs, kselftest kills the test process and prints
the timeout log. If the test process has killed while printing a log
that ends with new line, the timeout log can be printed in middle of the
test process output so that it can be seems like a comment, as below:
# test_process_log not ok 3 selftests: timers: nsleep-lat # TIMEOUT
This commit avoids such problem by printing one more line before the
TIMEOUT failure log.
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
tools/testing/selftests/kselftest/runner.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 84de7bc74f2c..a8d20cbb711c 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -79,6 +79,7 @@ run_one()
if [ $rc -eq $skip_rc ]; then \
echo "not ok $test_num $TEST_HDR_MSG # SKIP"
elif [ $rc -eq $timeout_rc ]; then \
+ echo "#"
echo "not ok $test_num $TEST_HDR_MSG # TIMEOUT"
else
echo "not ok $test_num $TEST_HDR_MSG # exit=$rc"
--
2.17.1
Commit c78fd76f2b67 ("selftests: Move kselftest_module.sh into
kselftest/") moved kselftest_module.sh but missed updating a few
references to the path in documentation.
Fixes: c78fd76f2b67 ("selftests: Move kselftest_module.sh into kselftest/")
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
---
Documentation/dev-tools/kselftest.rst | 8 ++++----
tools/testing/selftests/kselftest/module.sh | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index ecdfdc9d4b03..61ae13c44f91 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -203,12 +203,12 @@ Test Module
Kselftest tests the kernel from userspace. Sometimes things need
testing from within the kernel, one method of doing this is to create a
test module. We can tie the module into the kselftest framework by
-using a shell script test runner. ``kselftest_module.sh`` is designed
+using a shell script test runner. ``kselftest/module.sh`` is designed
to facilitate this process. There is also a header file provided to
assist writing kernel modules that are for use with kselftest:
- ``tools/testing/kselftest/kselftest_module.h``
-- ``tools/testing/kselftest/kselftest_module.sh``
+- ``tools/testing/kselftest/kselftest/module.sh``
How to use
----------
@@ -247,7 +247,7 @@ Example Module
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
- #include "../tools/testing/selftests/kselftest_module.h"
+ #include "../tools/testing/selftests/kselftest/module.h"
KSTM_MODULE_GLOBALS();
@@ -276,7 +276,7 @@ Example test script
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0+
- $(dirname $0)/../kselftest_module.sh "foo" test_foo
+ $(dirname $0)/../kselftest/module.sh "foo" test_foo
Test Harness
diff --git a/tools/testing/selftests/kselftest/module.sh b/tools/testing/selftests/kselftest/module.sh
index 18e1c7992d30..fb4733faff12 100755
--- a/tools/testing/selftests/kselftest/module.sh
+++ b/tools/testing/selftests/kselftest/module.sh
@@ -9,7 +9,7 @@
#
# #!/bin/sh
# SPDX-License-Identifier: GPL-2.0+
-# $(dirname $0)/../kselftest_module.sh "description" module_name
+# $(dirname $0)/../kselftest/module.sh "description" module_name
#
# Example: tools/testing/selftests/lib/printf.sh
--
2.21.0
Hi Mathieu,
I am seeing rseq test build failure on Linux 5.5-rc1.
gcc -O2 -Wall -g -I./ -I../../../../usr/include/ -L./ -Wl,-rpath=./
param_test.c -lpthread -lrseq -o ...tools/testing/selftests/rseq/param_test
param_test.c:18:21: error: static declaration of ‘gettid’ follows
non-static declaration
18 | static inline pid_t gettid(void)
| ^~~~~~
In file included from /usr/include/unistd.h:1170,
from param_test.c:11:
/usr/include/x86_64-linux-gnu/bits/unistd_ext.h:34:16: note: previous
declaration of ‘gettid’ was here
34 | extern __pid_t gettid (void) __THROW;
| ^~~~~~
make: *** [Makefile:28: ...tools/testing/selftests/rseq/param_test] Error 1
The following obvious change fixes it. However, there could be reason
why this was defined here. If you think this is the right fix, I can
send the patch. I started seeing this with gcc version 9.2.1 20191008
diff --git a/tools/testing/selftests/rseq/param_test.c
b/tools/testing/selftests/rseq/param_test.c
index eec2663261f2..18a0fa1235a7 100644
--- a/tools/testing/selftests/rseq/param_test.c
+++ b/tools/testing/selftests/rseq/param_test.c
@@ -15,11 +15,6 @@
#include <errno.h>
#include <stddef.h>
-static inline pid_t gettid(void)
-{
- return syscall(__NR_gettid);
-}
-
thanks,
-- Shuah
Hi,
This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also implements tracking of FOLL_PIN pages. It
extends that tracking to a few select subsystems. More subsystems will
be added in follow up work.
Christoph Hellwig, a couple of points of interest:
a) I've moved the bulk of the code out of the inline functions, as
requested, for the devmap changes (patch 4: "mm: devmap: refactor
1-based refcounting for ZONE_DEVICE pages").
b) Contrary to my earlier response to your review, I have not actually
merged patch 23 ("mm/gup: pass flags arg to __gup_device_*
functions") into patch 24 ("mm/gup: track FOLL_PIN pages"). This is
because I suspect that it's better to avoid making patch 24 any larger
and worse to review than it already is. But if you feel strongly
about it, I'll combine them anyway.
Changes since v7:
* Rebased onto Linux 5.5-rc1
* Reworked the grab_page() and try_grab_compound_head(), for API
consistency and less diffs (thanks to Jan Kara's reviews).
* Added Leon Romanovsky's reviewed-by tags for two of the IB-related
patches.
* patch 4 refactoring changes, as mentioned above.
There is a git repo and branch, for convenience:
git@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8
For the remaining list of "changes since version N", those are all in
v7, which is here:
https://lore.kernel.org/r/20191121071354.456618-1-jhubbard@nvidia.com
============================================================
Overview:
This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)
A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.
I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".
In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:
get_user_pages() (sets FOLL_GET)
put_page()
to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()
============================================================
Testing:
* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've
been able to runtime test the core get_user_pages() and
pin_user_pages() and related routines, but not so much on several of
the call sites--but those are generally just a couple of lines
changed, each.
Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.
Runtime testing for the call sites so far is pretty light:
* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
passes.
* infiniband: ran "ib_write_bw", which exercises the umem.c changes,
but not the other changes.
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...
[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/
Dan Williams (1):
mm: Cleanup __put_devmap_managed_page() vs ->page_free()
John Hubbard (25):
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM
vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call
mm/gup: allow FOLL_FORCE for get_user_pages_fast()
IB/umem: use get_user_pages_fast() to pin DMA pages
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
media/v4l2-core: set pages dirty upon releasing DMA buffers
media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_user_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm, tree-wide: rename put_user_page*() to unpin_user_page*()
mm/gup: pass flags arg to __gup_device_* functions
mm/gup: track FOLL_PIN pages
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 233 ++++++++
arch/powerpc/mm/book3s64/iommu_api.c | 12 +-
drivers/gpu/drm/via/via_dmablit.c | 6 +-
drivers/infiniband/core/umem.c | 19 +-
drivers/infiniband/core/umem_odp.c | 13 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 8 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 4 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 8 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 4 +-
drivers/infiniband/sw/siw/siw_mem.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 8 +-
drivers/nvdimm/pmem.c | 6 -
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 6 +-
include/linux/mm.h | 145 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 595 +++++++++++++++-----
mm/gup_benchmark.c | 74 ++-
mm/huge_memory.c | 23 +-
mm/hugetlb.c | 15 +-
mm/memremap.c | 76 ++-
mm/process_vm_access.c | 28 +-
mm/swap.c | 24 +
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 21 +-
tools/testing/selftests/vm/run_vmtests | 22 +
31 files changed, 1093 insertions(+), 354 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst
--
2.24.0
From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
[ Upstream commit c588146378962786ddeec817f7736a53298a7b01 ]
The "path" buf is supposed to contain path + printf msg up to 24 bytes.
It will be cut anyway, but compiler generates truncation warns like:
"
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c: In
function ‘setup_cgroup_environment’:
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:34:
warning: ‘/cgroup.controllers’ directive output may be truncated
writing 19 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:2:
note: ‘snprintf’ output between 20 and 4116 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:34:
warning: ‘/cgroup.subtree_control’ directive output may be truncated
writing 23 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
^~~~~~~~~~~~~~~~~~~~~~~
cgroup_path);
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:2:
note: ‘snprintf’ output between 24 and 4120 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
cgroup_path);
"
In order to avoid warns, lets decrease buf size for cgroup workdir on
24 bytes with assumption to include also "/cgroup.subtree_control" to
the address. The cut will never happen anyway.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20191002120404.26962-3-ivan.khoronzhuk@linaro.o…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index cf16948aad4ad..6af24f9a780de 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -44,7 +44,7 @@
*/
int setup_cgroup_environment(void)
{
- char cgroup_workdir[PATH_MAX + 1];
+ char cgroup_workdir[PATH_MAX - 24];
format_cgroup_path(cgroup_workdir, "");
--
2.20.1
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 2ea2612b987ad703235c92be21d4e98ee9c2c67c ]
Currently, with latest llvm trunk, selftest test_progs failed obj
file test_seg6_loop.o with the following error in verifier:
infinite loop detected at insn 76
The byte code sequence looks like below, and noted that alu32 has been
turned off by default for better generated codes in general:
48: w3 = 100
49: *(u32 *)(r10 - 68) = r3
...
; if (tlv.type == SR6_TLV_PADDING) {
76: if w3 == 5 goto -18 <LBB0_19>
...
85: r1 = *(u32 *)(r10 - 68)
; for (int i = 0; i < 100; i++) {
86: w1 += -1
87: if w1 == 0 goto +5 <LBB0_20>
88: *(u32 *)(r10 - 68) = r1
The main reason for verification failure is due to partial spills at
r10 - 68 for induction variable "i".
Current verifier only handles spills with 8-byte values. The above 4-byte
value spill to stack is treated to STACK_MISC and its content is not
saved. For the above example:
w3 = 100
R3_w=inv100 fp-64_w=inv1086626730498
*(u32 *)(r10 - 68) = r3
R3_w=inv100 fp-64_w=inv1086626730498
...
r1 = *(u32 *)(r10 - 68)
R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
fp-64=inv1086626730498
To resolve this issue, verifier needs to be extended to track sub-registers
in spilling, or llvm needs to enhanced to prevent sub-register spilling
in register allocation phase. The former will increase verifier complexity
and the latter will need some llvm "hacking".
Let us workaround this issue by declaring the induction variable as "long"
type so spilling will happen at non sub-register level. We can revisit this
later if sub-register spilling causes similar or other verification issues.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andriin(a)fb.com>
Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/test_seg6_loop.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/test_seg6_loop.c b/tools/testing/selftests/bpf/progs/test_seg6_loop.c
index c4d104428643e..69880c1e7700c 100644
--- a/tools/testing/selftests/bpf/progs/test_seg6_loop.c
+++ b/tools/testing/selftests/bpf/progs/test_seg6_loop.c
@@ -132,8 +132,10 @@ static __always_inline int is_valid_tlv_boundary(struct __sk_buff *skb,
*pad_off = 0;
// we can only go as far as ~10 TLVs due to the BPF max stack size
+ // workaround: define induction variable "i" as "long" instead
+ // of "int" to prevent alu32 sub-register spilling.
#pragma clang loop unroll(disable)
- for (int i = 0; i < 100; i++) {
+ for (long i = 0; i < 100; i++) {
struct sr6_tlv_t tlv;
if (cur_off == *tlv_off)
--
2.20.1
From: Jiri Benc <jbenc(a)redhat.com>
[ Upstream commit 3b054b7133b4ad93671c82e8d6185258e3f1a7a5 ]
When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason
is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time,
starting it again every time it is expected to terminate. The exception is
the final client_connect: the server is not started anymore, which ensures
no process is kept running after the test is finished.
For a sit test, though, the script is terminated prematurely without the
final client_connect and the 'nc' process keeps running. This in turn causes
the run_one function in kselftest/runner.sh to hang forever, waiting for the
runaway process to finish.
Ensure a remaining server is terminated on cleanup.
Fixes: f6ad6accaa99 ("selftests/bpf: expand test_tc_tunnel with SIT encap")
Signed-off-by: Jiri Benc <jbenc(a)redhat.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Willem de Bruijn <willemb(a)google.com>
Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.157382…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index ff0d31d38061f..7c76b841b17bb 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -62,6 +62,10 @@ cleanup() {
if [[ -f "${infile}" ]]; then
rm "${infile}"
fi
+
+ if [[ -n $server_pid ]]; then
+ kill $server_pid 2> /dev/null
+ fi
}
server_listen() {
@@ -77,6 +81,7 @@ client_connect() {
verify_data() {
wait "${server_pid}"
+ server_pid=
# sha1sum returns two fields [sha1] [filepath]
# convert to bash array and access first elem
insum=($(sha1sum ${infile}))
--
2.20.1
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit b7a0d65d80a0c5034b366392624397a0915b7556 ]
With latest llvm compiler, running test_progs will have the following
verifier failure for test_sysctl_loop1.o:
libbpf: load bpf program failed: Permission denied
libbpf: -- BEGIN DUMP LOG ---
libbpf:
invalid indirect read from stack var_off (0x0; 0xff)+196 size 7
...
libbpf: -- END LOG --
libbpf: failed to load program 'cgroup/sysctl'
libbpf: failed to load object 'test_sysctl_loop1.o'
The related bytecode looks as below:
0000000000000308 LBB0_8:
97: r4 = r10
98: r4 += -288
99: r4 += r7
100: w8 &= 255
101: r1 = r10
102: r1 += -488
103: r1 += r8
104: r2 = 7
105: r3 = 0
106: call 106
107: w1 = w0
108: w1 += -1
109: if w1 > 6 goto -24 <LBB0_5>
110: w0 += w8
111: r7 += 8
112: w8 = w0
113: if r7 != 224 goto -17 <LBB0_8>
And source code:
for (i = 0; i < ARRAY_SIZE(tcp_mem); ++i) {
ret = bpf_strtoul(value + off, MAX_ULONG_STR_LEN, 0,
tcp_mem + i);
if (ret <= 0 || ret > MAX_ULONG_STR_LEN)
return 0;
off += ret & MAX_ULONG_STR_LEN;
}
Current verifier is not able to conclude that register w0 before '+'
at insn 110 has a range of 1 to 7 and thinks it is from 0 - 255. This
leads to more conservative range for w8 at insn 112, and later verifier
complaint.
Let us workaround this issue until we found a compiler and/or verifier
solution. The workaround in this patch is to make variable 'ret' volatile,
which will force a reload and then '&' operation to ensure better value
range. With this patch, I got the below byte code for the loop:
0000000000000328 LBB0_9:
101: r4 = r10
102: r4 += -288
103: r4 += r7
104: w8 &= 255
105: r1 = r10
106: r1 += -488
107: r1 += r8
108: r2 = 7
109: r3 = 0
110: call 106
111: *(u32 *)(r10 - 64) = r0
112: r1 = *(u32 *)(r10 - 64)
113: if w1 s< 1 goto -28 <LBB0_5>
114: r1 = *(u32 *)(r10 - 64)
115: if w1 s> 7 goto -30 <LBB0_5>
116: r1 = *(u32 *)(r10 - 64)
117: w1 &= 7
118: w1 += w8
119: r7 += 8
120: w8 = w1
121: if r7 != 224 goto -21 <LBB0_9>
Insn 117 did the '&' operation and we got more precise value range
for 'w8' at insn 120. The test is happy then:
#3/17 test_sysctl_loop1.o:OK
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20191107170045.2503480-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/test_sysctl_loop1.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
index 608a06871572d..d22e438198cf7 100644
--- a/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
+++ b/tools/testing/selftests/bpf/progs/test_sysctl_loop1.c
@@ -44,7 +44,10 @@ int sysctl_tcp_mem(struct bpf_sysctl *ctx)
unsigned long tcp_mem[TCP_MEM_LOOPS] = {};
char value[MAX_VALUE_STR_LEN];
unsigned char i, off = 0;
- int ret;
+ /* a workaround to prevent compiler from generating
+ * codes verifier cannot handle yet.
+ */
+ volatile int ret;
if (ctx->write)
return 0;
--
2.20.1
From: Masami Hiramatsu <mhiramat(a)kernel.org>
[ Upstream commit 2f3571ea71311bbb2cbb9c3bbefc9c1969a3e889 ]
Currently proc-self-map-files-002.c sets va_max (max test address
of user virtual address) to 4GB, but it is too big for 32bit
arch and 1UL << 32 is overflow on 32bit long.
Also since this value should be enough bigger than vm.mmap_min_addr
(64KB or 32KB by default), 1MB should be enough.
Make va_max 1MB unconditionally.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/proc/proc-self-map-files-002.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-self-map-files-002.c b/tools/testing/selftests/proc/proc-self-map-files-002.c
index 47b7473dedef7..e6aa00a183bcd 100644
--- a/tools/testing/selftests/proc/proc-self-map-files-002.c
+++ b/tools/testing/selftests/proc/proc-self-map-files-002.c
@@ -47,7 +47,11 @@ static void fail(const char *fmt, unsigned long a, unsigned long b)
int main(void)
{
const int PAGE_SIZE = sysconf(_SC_PAGESIZE);
- const unsigned long va_max = 1UL << 32;
+ /*
+ * va_max must be enough bigger than vm.mmap_min_addr, which is
+ * 64KB/32KB by default. (depends on CONFIG_LSM_MMAP_MIN_ADDR)
+ */
+ const unsigned long va_max = 1UL << 20;
unsigned long va;
void *p;
int fd;
--
2.20.1
From: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
[ Upstream commit c588146378962786ddeec817f7736a53298a7b01 ]
The "path" buf is supposed to contain path + printf msg up to 24 bytes.
It will be cut anyway, but compiler generates truncation warns like:
"
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c: In
function ‘setup_cgroup_environment’:
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:34:
warning: ‘/cgroup.controllers’ directive output may be truncated
writing 19 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:52:2:
note: ‘snprintf’ output between 20 and 4116 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.controllers", cgroup_path);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:34:
warning: ‘/cgroup.subtree_control’ directive output may be truncated
writing 23 bytes into a region of size between 1 and 4097
[-Wformat-truncation=]
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
^~~~~~~~~~~~~~~~~~~~~~~
cgroup_path);
samples/bpf/../../tools/testing/selftests/bpf/cgroup_helpers.c:72:2:
note: ‘snprintf’ output between 24 and 4120 bytes into a destination
of size 4097
snprintf(path, sizeof(path), "%s/cgroup.subtree_control",
cgroup_path);
"
In order to avoid warns, lets decrease buf size for cgroup workdir on
24 bytes with assumption to include also "/cgroup.subtree_control" to
the address. The cut will never happen anyway.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk(a)linaro.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Song Liu <songliubraving(a)fb.com>
Link: https://lore.kernel.org/bpf/20191002120404.26962-3-ivan.khoronzhuk@linaro.o…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/cgroup_helpers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/cgroup_helpers.c b/tools/testing/selftests/bpf/cgroup_helpers.c
index e95c33e333a40..b29a73fe64dbc 100644
--- a/tools/testing/selftests/bpf/cgroup_helpers.c
+++ b/tools/testing/selftests/bpf/cgroup_helpers.c
@@ -98,7 +98,7 @@ int enable_all_controllers(char *cgroup_path)
*/
int setup_cgroup_environment(void)
{
- char cgroup_workdir[PATH_MAX + 1];
+ char cgroup_workdir[PATH_MAX - 24];
format_cgroup_path(cgroup_workdir, "");
--
2.20.1
Currently, when some of the KSFT subsystems fails to build, the toplevel
KSFT Makefile just keeps carrying on with the build process.
This behaviour is expected and desirable especially in the context of a CI
system running KSelfTest, since it is not always easy to guarantee that the
most recent and esoteric dependencies are respected across all KSFT TARGETS
in a timely manner.
Unfortunately, as of now, this holds true only if the very last of the
built subsystems could have been successfully compiled: if the last of
those subsystem instead failed to build, such failure is taken as the whole
outcome of the Makefile target and the complete build/install process halts
even though many other preceding subsytems were in fact already built
successfully.
Fix the KSFT Makefile behaviour related to all/install targets in order
to fail as a whole only when the all/install targets have failed for all
of the requested TARGETS, while succeeding when at least one of TARGETS
has been successfully built.
Signed-off-by: Cristian Marussi <cristian.marussi(a)arm.com>
---
This patch is based on ksft/fixes branch from:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
on top of commit (~5.5-rc1):
99e51aa8f701 Documentation: kunit: add documentation for kunit_tool
Building with either:
make kselftest-install \
KSFT_INSTALL_PATH=/tmp/KSFT \
TARGETS="exec arm64 bpf"
make -C tools/testing/selftests install \
KSFT_INSTALL_PATH=/tmp/KSFT \
TARGETS="exec arm64 bpf"
(with 'bpf' not building clean on my setup in the above case)
and veryfying that build/install completes if at least one of TARGETS can
be successfully built, and any successfully built subsystem is installed.
Changes:
-------
V1 --> V2
- rebased on 5.5-rc1
- rewording commit message
- dropped RFC tag
---
tools/testing/selftests/Makefile | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index b001c602414b..86b2a3fca04d 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -143,11 +143,13 @@ else
endif
all: khdr
- @for TARGET in $(TARGETS); do \
- BUILD_TARGET=$$BUILD/$$TARGET; \
- mkdir $$BUILD_TARGET -p; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET;\
- done;
+ @ret=1; \
+ for TARGET in $(TARGETS); do \
+ BUILD_TARGET=$$BUILD/$$TARGET; \
+ mkdir $$BUILD_TARGET -p; \
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET; \
+ ret=$$((ret * $$?)); \
+ done; exit $$ret;
run_tests: all
@for TARGET in $(TARGETS); do \
@@ -196,10 +198,12 @@ ifdef INSTALL_PATH
install -m 744 kselftest/module.sh $(INSTALL_PATH)/kselftest/
install -m 744 kselftest/runner.sh $(INSTALL_PATH)/kselftest/
install -m 744 kselftest/prefix.pl $(INSTALL_PATH)/kselftest/
- @for TARGET in $(TARGETS); do \
+ @ret=1; \
+ for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install; \
- done;
+ ret=$$((ret * $$?)); \
+ done; exit $$ret;
@# Ask all targets to emit their test scripts
echo "#!/bin/sh" > $(ALL_SCRIPT)
--
2.17.1
When reading the codes, I find the definitions of interrupt-window exiting and
nmi-window exiting don't match the names in latest intel SDM.
I have no idea whether it's the historical names, rename them to match the
latest SDM to avoid confusion.
CPU_BASED_USE_TSC_OFFSETING mis-spelling in Patch 3 is found by checkpatch.pl.
Xiaoyao Li (3):
KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW
KVM: VMX: Rename NMI_PENDING to NMI_WINDOW
KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING
arch/x86/include/asm/vmx.h | 6 ++--
arch/x86/include/uapi/asm/vmx.h | 4 +--
arch/x86/kvm/vmx/nested.c | 28 +++++++++----------
arch/x86/kvm/vmx/vmx.c | 20 ++++++-------
tools/arch/x86/include/uapi/asm/vmx.h | 4 +--
.../selftests/kvm/include/x86_64/vmx.h | 8 +++---
.../kvm/x86_64/vmx_tsc_adjust_test.c | 2 +-
7 files changed, 36 insertions(+), 36 deletions(-)
--
2.19.1
When using fragments with size 8 and payload larger than 8000, the backlog
might fill up and packets will be dropped, causing the test to fail. This
happens often enough when conntrack is on during the IPv6 test.
As the larger payload in the test is 10000, using a backlog of 1250 allow
the test to run repeatedly without failure. At least a 1000 runs were
possible with no failures, when usually less than 50 runs were good enough
for showing a failure.
As netdev_max_backlog is not a pernet setting, this sets the backlog to
1000 during exit to prevent disturbing following tests.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Fixes: 4c3510483d26 (selftests: net: ip_defrag: cover new IPv6 defrag behavior)
---
tools/testing/selftests/net/ip_defrag.sh | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/net/ip_defrag.sh b/tools/testing/selftests/net/ip_defrag.sh
index 15d3489ecd9c..c91cfecfa245 100755
--- a/tools/testing/selftests/net/ip_defrag.sh
+++ b/tools/testing/selftests/net/ip_defrag.sh
@@ -12,6 +12,8 @@ setup() {
ip netns add "${NETNS}"
ip -netns "${NETNS}" link set lo up
+ sysctl -w net.core.netdev_max_backlog=1250 >/dev/null 2>&1
+
ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_high_thresh=9000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_low_thresh=7000000 >/dev/null 2>&1
ip netns exec "${NETNS}" sysctl -w net.ipv4.ipfrag_time=1 >/dev/null 2>&1
@@ -30,6 +32,7 @@ setup() {
cleanup() {
ip netns del "${NETNS}"
+ sysctl -w net.core.netdev_max_backlog=1000 >/dev/null 2>&1
}
trap cleanup EXIT
--
2.24.0
This patchset is being developed here:
<https://github.com/cyphar/linux/tree/openat2/master>
Patch changelog:
v18:
* Further fixups from Al Viro:
- Don't WARN_ON in complete_walk() check since it can be trivially
triggered by userspace. Also, improve the comment so the purpose of the
check is more clear.
- Avoid duplicate smp_rmb() when in handle_dots() by doing
__read_seqcount_retry().
- Drop vestigial UPGRADE_NO* flag definitions in uapi.
* Update non-zero __padding test to include all bytes of the padding.
v17: <https://lore.kernel.org/lkml/20191117011713.13032-1-cyphar@cyphar.com/>
<https://lore.kernel.org/lkml/20191120050631.12816-1-cyphar@cyphar.com/>
v16: <https://lore.kernel.org/lkml/20191116002802.6663-1-cyphar@cyphar.com/>
v15: <https://lore.kernel.org/lkml/20191105090553.6350-1-cyphar@cyphar.com/>
v14: <https://lore.kernel.org/lkml/20191010054140.8483-1-cyphar@cyphar.com/>
<https://lore.kernel.org/lkml/20191026185700.10708-1-cyphar@cyphar.com>
v13: <https://lore.kernel.org/lkml/20190930183316.10190-1-cyphar@cyphar.com/>
v12: <https://lore.kernel.org/lkml/20190904201933.10736-1-cyphar@cyphar.com/>
v11: <https://lore.kernel.org/lkml/20190820033406.29796-1-cyphar@cyphar.com/>
<https://lore.kernel.org/lkml/20190728010207.9781-1-cyphar@cyphar.com/>
v10: <https://lore.kernel.org/lkml/20190719164225.27083-1-cyphar@cyphar.com/>
v09: <https://lore.kernel.org/lkml/20190706145737.5299-1-cyphar@cyphar.com/>
v08: <https://lore.kernel.org/lkml/20190520133305.11925-1-cyphar@cyphar.com/>
v07: <https://lore.kernel.org/lkml/20190507164317.13562-1-cyphar@cyphar.com/>
v06: <https://lore.kernel.org/lkml/20190506165439.9155-1-cyphar@cyphar.com/>
v05: <https://lore.kernel.org/lkml/20190320143717.2523-1-cyphar@cyphar.com/>
v04: <https://lore.kernel.org/lkml/20181112142654.341-1-cyphar@cyphar.com/>
v03: <https://lore.kernel.org/lkml/20181009070230.12884-1-cyphar@cyphar.com/>
v02: <https://lore.kernel.org/lkml/20181009065300.11053-1-cyphar@cyphar.com/>
v01: <https://lore.kernel.org/lkml/20180929103453.12025-1-cyphar@cyphar.com/>
For a very long time, extending openat(2) with new features has been
incredibly frustrating. This stems from the fact that openat(2) is
possibly the most famous counter-example to the mantra "don't silently
accept garbage from userspace" -- it doesn't check whether unknown flags
are present[1].
This means that (generally) the addition of new flags to openat(2) has
been fraught with backwards-compatibility issues (O_TMPFILE has to be
defined as __O_TMPFILE|O_DIRECTORY|[O_RDWR or O_WRONLY] to ensure old
kernels gave errors, since it's insecure to silently ignore the
flag[2]). All new security-related flags therefore have a tough road to
being added to openat(2).
Furthermore, the need for some sort of control over VFS's path resolution (to
avoid malicious paths resulting in inadvertent breakouts) has been a very
long-standing desire of many userspace applications. This patchset is a revival
of Al Viro's old AT_NO_JUMPS[3] patchset (which was a variant of David
Drysdale's O_BENEATH patchset[4] which was a spin-off of the Capsicum
project[5]) with a few additions and changes made based on the previous
discussion within [6] as well as others I felt were useful.
In line with the conclusions of the original discussion of AT_NO_JUMPS, the
flag has been split up into separate flags. However, instead of being an
openat(2) flag it is provided through a new syscall openat2(2) which provides
several other improvements to the openat(2) interface (see the patch
description for more details). The following new LOOKUP_* flags are added:
* LOOKUP_NO_XDEV blocks all mountpoint crossings (upwards, downwards,
or through absolute links). Absolute pathnames alone in openat(2) do not
trigger this. Magic-link traversal which implies a vfsmount jump is also
blocked (though magic-link jumps on the same vfsmount are permitted).
* LOOKUP_NO_MAGICLINKS blocks resolution through /proc/$pid/fd-style
links. This is done by blocking the usage of nd_jump_link() during
resolution in a filesystem. The term "magic-links" is used to match
with the only reference to these links in Documentation/, but I'm
happy to change the name.
It should be noted that this is different to the scope of
~LOOKUP_FOLLOW in that it applies to all path components. However,
you can do openat2(NO_FOLLOW|NO_MAGICLINKS) on a magic-link and it
will *not* fail (assuming that no parent component was a
magic-link), and you will have an fd for the magic-link.
In order to correctly detect magic-links, the introduction of a new
LOOKUP_MAGICLINK_JUMPED state flag was required.
* LOOKUP_BENEATH disallows escapes to outside the starting dirfd's
tree, using techniques such as ".." or absolute links. Absolute
paths in openat(2) are also disallowed. Conceptually this flag is to
ensure you "stay below" a certain point in the filesystem tree --
but this requires some additional to protect against various races
that would allow escape using "..".
Currently LOOKUP_BENEATH implies LOOKUP_NO_MAGICLINKS, because it
can trivially beam you around the filesystem (breaking the
protection). In future, there might be similar safety checks done as
in LOOKUP_IN_ROOT, but that requires more discussion.
In addition, two new flags are added that expand on the above ideas:
* LOOKUP_NO_SYMLINKS does what it says on the tin. No symlink
resolution is allowed at all, including magic-links. Just as with
LOOKUP_NO_MAGICLINKS this can still be used with NOFOLLOW to open an
fd for the symlink as long as no parent path had a symlink
component.
* LOOKUP_IN_ROOT is an extension of LOOKUP_BENEATH that, rather than
blocking attempts to move past the root, forces all such movements
to be scoped to the starting point. This provides chroot(2)-like
protection but without the cost of a chroot(2) for each filesystem
operation, as well as being safe against race attacks that chroot(2)
is not.
If a race is detected (as with LOOKUP_BENEATH) then an error is
generated, and similar to LOOKUP_BENEATH it is not permitted to cross
magic-links with LOOKUP_IN_ROOT.
The primary need for this is from container runtimes, which
currently need to do symlink scoping in userspace[7] when opening
paths in a potentially malicious container. There is a long list of
CVEs that could have bene mitigated by having RESOLVE_THIS_ROOT
(such as CVE-2017-1002101, CVE-2017-1002102, CVE-2018-15664, and
CVE-2019-5736, just to name a few).
In order to make all of the above more usable, I'm working on
libpathrs[8] which is a C-friendly library for safe path resolution. It
features a userspace-emulated backend if the kernel doesn't support
openat2(2). Hopefully we can get userspace to switch to using it, and
thus get openat2(2) support for free once it's ready.
Future work would include implementing things like RESOLVE_NO_AUTOMOUNT and
possibly a RESOLVE_NO_REMOTE (to allow programs to be sure they don't hit DoSes
though stale NFS handles).
[1]: https://lwn.net/Articles/588444/
[2]: https://lore.kernel.org/lkml/CA+55aFyyxJL1LyXZeBsf2ypriraj5ut1XkNDsunRBqgVj…
[3]: https://lore.kernel.org/lkml/20170429220414.GT29622@ZenIV.linux.org.uk
[4]: https://lore.kernel.org/lkml/1415094884-18349-1-git-send-email-drysdale@goo…
[5]: https://lore.kernel.org/lkml/1404124096-21445-1-git-send-email-drysdale@goo…
[6]: https://lwn.net/Articles/723057/
[7]: https://github.com/cyphar/filepath-securejoin
[8]: https://github.com/openSUSE/libpathrs
The current draft of the openat2(2) man-page is included below.
--8<---------------------------------------------------------------------------
OPENAT2(2) Linux Programmer's Manual OPENAT2(2)
NAME
openat2 - open and possibly create a file (extended)
SYNOPSIS
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int openat2(int dirfd, const char *pathname, struct open_how *how, size_t size);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
The openat2() system call opens the file specified by pathname. If the specified file does not exist, it may
optionally (if O_CREAT is specified in how.flags) be created by openat2().
As with openat(2), if pathname is relative, then it is interpreted relative to the directory referred to by
the file descriptor dirfd (or the current working directory of the calling process, if dirfd is the special
value AT_FDCWD.) If pathname is absolute, then dirfd is ignored (unless how.resolve contains RESOLVE_IN_ROOT,
in which case pathname is resolved relative to dirfd.)
The openat2() system call is an extension of openat(2) and provides a superset of its functionality. Rather
than taking a single flag argument, an extensible structure (how) is passed instead to allow for future exten-
sions. size must be set to sizeof(struct open_how), to facilitate future extensions (see the "Extensibility"
section of the NOTES for more detail on how extensions are handled.)
The open_how structure
The following structure indicates how pathname should be opened, and acts as a superset of the flag and mode
arguments to openat(2).
struct open_how {
__aligned_u64 flags; /* O_* flags. */
__u16 mode; /* Mode for O_{CREAT,TMPFILE}. */
__u16 __padding[3]; /* Must be zeroed. */
__aligned_u64 resolve; /* RESOLVE_* flags. */
};
Any future extensions to openat2() will be implemented as new fields appended to the above structure (or
through reuse of pre-existing padding space), with the zero value of the new fields acting as though the ex-
tension were not present.
The meaning of each field is as follows:
flags
The file creation and status flags to use for this operation. All of the O_* flags defined for
openat(2) are valid openat2() flag values.
Unlike openat(2), it is an error to provide openat2() unknown or conflicting flags in flags.
mode
File mode for the new file, with identical semantics to the mode argument to openat(2). How-
ever, unlike openat(2), it is an error to provide openat2() with a mode which contains bits
other than 0777.
It is an error to provide openat2() a non-zero mode if flags does not contain O_CREAT or O_TMP-
FILE.
resolve
Change how the components of pathname will be resolved (see path_resolution(7) for background
information.) The primary use case for these flags is to allow trusted programs to restrict how
untrusted paths (or paths inside untrusted directories) are resolved. The full list of resolve
flags is given below.
RESOLVE_NO_XDEV
Disallow traversal of mount points during path resolution (including all bind mounts).
Users of this flag are encouraged to make its use configurable (unless it is used for a
specific security purpose), as bind mounts are very widely used by end-users. Setting
this flag indiscrimnately for all uses of openat2() may result in spurious errors on pre-
viously-functional systems.
RESOLVE_NO_SYMLINKS
Disallow resolution of symbolic links during path resolution. This option implies RE-
SOLVE_NO_MAGICLINKS.
If the trailing component is a symbolic link, and flags contains both O_PATH and O_NOFOL-
LOW, then an O_PATH file descriptor referencing the symbolic link will be returned.
Users of this flag are encouraged to make its use configurable (unless it is used for a
specific security purpose), as symbolic links are very widely used by end-users. Setting
this flag indiscrimnately for all uses of openat2() may result in spurious errors on pre-
viously-functional systems.
RESOLVE_NO_MAGICLINKS
Disallow all magic link resolution during path resolution.
If the trailing component is a magic link, and flags contains both O_PATH and O_NOFOLLOW,
then an O_PATH file descriptor referencing the magic link will be returned.
Magic-links are symbolic link-like objects that are most notably found in proc(5) (exam-
ples include /proc/[pid]/exe and /proc/[pid]/fd/*.) Due to the potential danger of un-
knowingly opening these magic links, it may be preferable for users to disable their res-
olution entirely (see symboliclink(7) for more details.)
RESOLVE_BENEATH
Do not permit the path resolution to succeed if any component of the resolution is not a
descendant of the directory indicated by dirfd. This results in absolute symbolic links
(and absolute values of pathname) to be rejected.
Currently, this flag also disables magic link resolution. However, this may change in
the future. The caller should explicitly specify RESOLVE_NO_MAGICLINKS to ensure that
magic links are not resolved.
RESOLVE_IN_ROOT
Treat dirfd as the root directory while resolving pathname (as though the user called ch-
root(2) with dirfd as the argument.) Absolute symbolic links and ".." path components
will be scoped to dirfd. If pathname is an absolute path, it is also treated relative to
dirfd.
However, unlike chroot(2) (which changes the filesystem root permanently for a process),
RESOLVE_IN_ROOT allows a program to efficiently restrict path resolution for only certain
operations. It also has several hardening features (such detecting escape attempts dur-
ing .. resolution) which chroot(2) does not.
Currently, this flag also disables magic link resolution. However, this may change in
the future. The caller should explicitly specify RESOLVE_NO_MAGICLINKS to ensure that
magic links are not resolved.
It is an error to provide openat2() unknown flags in resolve.
RETURN VALUE
On success, a new file descriptor is returned. On error, -1 is returned, and errno is set appropriately.
ERRORS
The set of errors returned by openat2() includes all of the errors returned by openat(2), as well as the fol-
lowing additional errors:
EINVAL An unknown flag or invalid value was specified in how.
EINVAL mode is non-zero, but flags does not contain O_CREAT or O_TMPFILE.
EINVAL size was smaller than any known version of struct open_how.
E2BIG An extension was specified in how, which the current kernel does not support (see the "Extensibility"
section of the NOTES for more detail on how extensions are handled.)
EAGAIN resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and the kernel could not ensure that a ".."
component didn't escape (due to a race condition or potential attack.) Callers may choose to retry the
openat2() call.
EXDEV resolve contains either RESOLVE_IN_ROOT or RESOLVE_BENEATH, and an escape from the root during path
resolution was detected.
EXDEV resolve contains RESOLVE_NO_XDEV, and a path component attempted to cross a mount point.
ELOOP resolve contains RESOLVE_NO_SYMLINKS, and one of the path components was a symbolic link (or magic
link).
ELOOP resolve contains RESOLVE_NO_MAGICLINKS, and one of the path components was a magic link.
VERSIONS
openat2() first appeared in Linux 5.6.
CONFORMING TO
This system call is Linux-specific.
The semantics of RESOLVE_BENEATH were modelled after FreeBSD's O_BENEATH.
NOTES
Glibc does not provide a wrapper for this system call; call it using syscall(2).
Extensibility
In order to allow for struct open_how to be extended in future kernel revisions, openat2() requires userspace
to specify the size of struct open_how structure they are passing. By providing this information, it is pos-
sible for openat2() to provide both forwards- and backwards-compatibility — with size acting as an implicit
version number (because new extension fields will always be appended, the size will always increase.) This
extensibility design is very similar to other system calls such as perf_setattr(2), perf_event_open(2), and
clone(3).
If we let usize be the size of the structure according to userspace and ksize be the size of the structure
which the kernel supports, then there are only three cases to consider:
* If ksize equals usize, then there is no version mismatch and how can be used verbatim.
* If ksize is larger than usize, then there are some extensions the kernel supports which the
userspace program is unaware of. Because all extensions must have their zero values be a no-op, the
kernel treats all of the extension fields not set by userspace to have zero values. This provides
backwards-compatibility.
* If ksize is smaller than usize, then there are some extensions which the userspace program is aware
of but the kernel does not support. Because all extensions must have their zero values be a no-op,
the kernel can safely ignore the unsupported extension fields if they are all-zero. If any unsup-
ported extension fields are non-zero, then -1 is returned and errno is set to E2BIG. This provides
forwards-compatibility.
Therefore, most userspace programs will not need to have any special handling of extensions. However, if a
userspace program wishes to determine what extensions the running kernel supports, they may conduct a binary
search on size (to find the largest value which doesn't produce an error of E2BIG.)
SEE ALSO
openat(2), path_resolution(7), symlink(7)
Linux 2019-11-05 OPENAT2(2)
--8<---------------------------------------------------------------------------
Aleksa Sarai (13):
namei: only return -ECHILD from follow_dotdot_rcu()
nsfs: clean-up ns_get_path() signature to return int
namei: allow nd_jump_link() to produce errors
namei: allow set_root() to produce errors
namei: LOOKUP_NO_SYMLINKS: block symlink resolution
namei: LOOKUP_NO_MAGICLINKS: block magic-link resolution
namei: LOOKUP_NO_XDEV: block mountpoint crossing
namei: LOOKUP_BENEATH: O_BENEATH-like scoped resolution
namei: LOOKUP_IN_ROOT: chroot-like scoped resolution
namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution
open: introduce openat2(2) syscall
selftests: add openat2(2) selftests
Documentation: path-lookup: include new LOOKUP flags
CREDITS | 4 +-
Documentation/filesystems/path-lookup.rst | 68 ++-
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
fs/namei.c | 199 +++++--
fs/nsfs.c | 29 +-
fs/open.c | 149 +++--
fs/proc/base.c | 3 +-
fs/proc/namespaces.c | 20 +-
include/linux/fcntl.h | 12 +-
include/linux/namei.h | 12 +-
include/linux/proc_ns.h | 4 +-
include/linux/syscalls.h | 3 +
include/uapi/asm-generic/unistd.h | 5 +-
include/uapi/linux/fcntl.h | 35 ++
kernel/bpf/offload.c | 12 +-
kernel/events/core.c | 2 +-
security/apparmor/apparmorfs.c | 6 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/openat2/.gitignore | 1 +
tools/testing/selftests/openat2/Makefile | 8 +
tools/testing/selftests/openat2/helpers.c | 109 ++++
tools/testing/selftests/openat2/helpers.h | 107 ++++
.../testing/selftests/openat2/openat2_test.c | 320 +++++++++++
.../selftests/openat2/rename_attack_test.c | 160 ++++++
.../testing/selftests/openat2/resolve_test.c | 523 ++++++++++++++++++
42 files changed, 1697 insertions(+), 115 deletions(-)
create mode 100644 tools/testing/selftests/openat2/.gitignore
create mode 100644 tools/testing/selftests/openat2/Makefile
create mode 100644 tools/testing/selftests/openat2/helpers.c
create mode 100644 tools/testing/selftests/openat2/helpers.h
create mode 100644 tools/testing/selftests/openat2/openat2_test.c
create mode 100644 tools/testing/selftests/openat2/rename_attack_test.c
create mode 100644 tools/testing/selftests/openat2/resolve_test.c
base-commit: 219d54332a09e8d8741c1e1982f5eae56099de85
--
2.24.0
The current kunit execution model is to provide base kunit functionality
and tests built-in to the kernel. The aim of this series is to allow
building kunit itself and tests as modules. This in turn allows a
simple form of selective execution; load the module you wish to test.
In doing so, kunit itself (if also built as a module) will be loaded as
an implicit dependency.
Because this requires a core API modification - if a module delivers
multiple suites, they must be declared with the kunit_test_suites()
macro - we're proposing this patch set as a candidate to be applied to the
test tree before too many kunit consumers appear. We attempt to deal
with existing consumers in patch 3.
Changes since v4:
- fixed signoff chain to use Co-developed-by: prior to Knut's signoff
(Stephen, all patches)
- added Reviewed-by, Tested-by for patches 1, 2, 4 and 6
- updated comment describing try-catch-impl.h (Stephen, patch 2)
- fixed MODULE_LICENSEs to be GPL v2 (Stephen, patches 3, 5)
- added __init to kunit_init() (Stephen, patch 5)
Changes since v3:
- removed symbol lookup patch for separate submission later
- removed use of sysctl_hung_task_timeout_seconds (patch 4, as discussed
with Brendan and Stephen)
- disabled build of string-stream-test when CONFIG_KUNIT_TEST=m; this
is to avoid having to deal with symbol lookup issues
- changed string-stream-impl.h back to string-stream.h (Brendan)
- added module build support to new list, ext4 tests
Changes since v2:
- moved string-stream.h header to lib/kunit/string-stream-impl.h (Brendan)
(patch 1)
- split out non-exported interfaces in try-catch-impl.h (Brendan)
(patch 2)
- added kunit_find_symbol() and KUNIT_INIT_SYMBOL to lookup non-exported
symbols (patches 3, 4)
- removed #ifdef MODULE around module licenses (Randy, Brendan, Andy)
(patch 4)
- replaced kunit_test_suite() with kunit_test_suites() rather than
supporting both (Brendan) (patch 4)
- lookup sysctl_hung_task_timeout_secs as kunit may be built as a module
and the symbol may not be available (patch 5)
Alan Maguire (6):
kunit: move string-stream.h to lib/kunit
kunit: hide unexported try-catch interface in try-catch-impl.h
kunit: allow kunit tests to be loaded as a module
kunit: remove timeout dependence on sysctl_hung_task_timeout_seconds
kunit: allow kunit to be loaded as a module
kunit: update documentation to describe module-based build
Documentation/dev-tools/kunit/faq.rst | 3 +-
Documentation/dev-tools/kunit/index.rst | 3 ++
Documentation/dev-tools/kunit/usage.rst | 16 ++++++++++
fs/ext4/Kconfig | 2 +-
fs/ext4/Makefile | 5 +++
fs/ext4/inode-test.c | 4 ++-
include/kunit/assert.h | 3 +-
include/kunit/test.h | 35 ++++++++++++++------
include/kunit/try-catch.h | 10 ------
kernel/sysctl-test.c | 4 ++-
lib/Kconfig.debug | 4 +--
lib/kunit/Kconfig | 6 ++--
lib/kunit/Makefile | 14 +++++---
lib/kunit/assert.c | 10 ++++++
lib/kunit/{example-test.c => kunit-example-test.c} | 4 ++-
lib/kunit/{test-test.c => kunit-test.c} | 7 ++--
lib/kunit/string-stream-test.c | 5 +--
lib/kunit/string-stream.c | 3 +-
{include => lib}/kunit/string-stream.h | 0
lib/kunit/test.c | 25 ++++++++++++++-
lib/kunit/try-catch-impl.h | 27 ++++++++++++++++
lib/kunit/try-catch.c | 37 +++++-----------------
lib/list-test.c | 4 ++-
23 files changed, 160 insertions(+), 71 deletions(-)
rename lib/kunit/{example-test.c => kunit-example-test.c} (97%)
rename lib/kunit/{test-test.c => kunit-test.c} (98%)
rename {include => lib}/kunit/string-stream.h (100%)
create mode 100644 lib/kunit/try-catch-impl.h
--
1.8.3.1
Hi,
Here is the v2 series to fix build warnings and erorrs on
kselftest safesetid.
This version includes a fix for a runtime error.
Thank you,
---
Masami Hiramatsu (3):
selftests: safesetid: Move link library to LDLIBS
selftests: safesetid: Check the return value of setuid/setgid
selftests: safesetid: Fix Makefile to set correct test program
tools/testing/selftests/safesetid/Makefile | 5 +++--
tools/testing/selftests/safesetid/safesetid-test.c | 15 ++++++++++-----
2 files changed, 13 insertions(+), 7 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
Hi,
Here are the patches to fix build warnings and erorrs on
kselftest safesetid.
Thank you,
---
Masami Hiramatsu (2):
selftests: safesetid: Move link library to LDLIBS
selftests: safesetid: Check the return value of setuid/setgid
tools/testing/selftests/safesetid/Makefile | 3 ++-
tools/testing/selftests/safesetid/safesetid-test.c | 15 ++++++++++-----
2 files changed, 12 insertions(+), 6 deletions(-)
--
Masami Hiramatsu (Linaro) <mhiramat(a)kernel.org>
This patchset contains trivial fixes for the kunit documentations and the
wrapper python scripts.
Changes from v2 (https://lore.kernel.org/linux-kselftest/1575361141-6806-1-git-send-email-sj…):
- Make 'build_dir' if not exists (missed from v3 by mistake)
SeongJae Park (5):
docs/kunit/start: Use in-tree 'kunit_defconfig'
kunit: Remove duplicated defconfig creation
kunit: Create default config in '--build_dir'
kunit: Place 'test.log' under the 'build_dir'
kunit: Rename 'kunitconfig' to '.kunitconfig'
Documentation/dev-tools/kunit/start.rst | 13 +++++--------
tools/testing/kunit/kunit.py | 16 ++++++++++------
tools/testing/kunit/kunit_kernel.py | 8 ++++----
3 files changed, 19 insertions(+), 18 deletions(-)
--
2.7.4
Since single_step_syscall.c depends on sys/syscall.h and
its include, asm/unistd.h, we should check the availability
of those headers.
Without this fix, if gcc-multilib is not installed but
libc6-dev-i386 is installed, kselftest tries to build 32bit
binary and failed with following error message.
In file included from single_step_syscall.c:18:
/usr/include/sys/syscall.h:24:10: fatal error: asm/unistd.h: No such file or directory
#include <asm/unistd.h>
^~~~~~~~~~~~~~
compilation terminated.
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
---
.../testing/selftests/x86/trivial_32bit_program.c | 1 +
.../testing/selftests/x86/trivial_64bit_program.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/testing/selftests/x86/trivial_32bit_program.c b/tools/testing/selftests/x86/trivial_32bit_program.c
index aa1f58c2f71c..6b455eda24f7 100644
--- a/tools/testing/selftests/x86/trivial_32bit_program.c
+++ b/tools/testing/selftests/x86/trivial_32bit_program.c
@@ -8,6 +8,7 @@
# error wrong architecture
#endif
+#include <sys/syscall.h>
#include <stdio.h>
int main()
diff --git a/tools/testing/selftests/x86/trivial_64bit_program.c b/tools/testing/selftests/x86/trivial_64bit_program.c
index 39f4b84fbf15..07ae86df18ff 100644
--- a/tools/testing/selftests/x86/trivial_64bit_program.c
+++ b/tools/testing/selftests/x86/trivial_64bit_program.c
@@ -8,6 +8,7 @@
# error wrong architecture
#endif
+#include <sys/syscall.h>
#include <stdio.h>
int main()