Fix the following coccicheck warnings:
./tools/testing/selftests/bpf/test_sockmap.c:735:35-37: WARNING !A || A
&& B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 427ca00..eefd445 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -732,7 +732,7 @@ static int sendmsg_test(struct sockmap_options *opt)
* socket is not a valid test. So in this case lets not
* enable kTLS but still run the test.
*/
- if (!txmsg_redir || (txmsg_redir && txmsg_ingress)) {
+ if (!txmsg_redir || txmsg_ingress) {
err = sockmap_init_ktls(opt->verbose, rx_fd);
if (err)
return err;
--
1.8.3.1
On Tue, Mar 02, 2021 at 01:49:41PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: cfe92ab6a3ea700c08ba673b46822d51f38d6b40 ("[PATCH v5 2/8] security/brute: Define a LSM and manage statistical data")
> url: https://github.com/0day-ci/linux/commits/John-Wood/Fork-brute-force-attack-…
> base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-f93256fb_2019-08-28
> with following parameters:
>
> group: ["group-00", "group-01", "group-02", "group-03", "group-04"]
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +-------------------------------------------------------------------------+------------+------------+
> | | 1d53b7aac6 | cfe92ab6a3 |
> +-------------------------------------------------------------------------+------------+------------+
> | WARNING:inconsistent_lock_state | 0 | 6 |
> | inconsistent{IN-SOFTIRQ-W}->{SOFTIRQ-ON-W}usage | 0 | 6 |
> +-------------------------------------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang(a)intel.com>
>
>
> [ 116.852721] ================================
> [ 116.853120] WARNING: inconsistent lock state
> [ 116.853120] 5.11.0-rc7-00013-gcfe92ab6a3ea #1 Tainted: G S
> [ 116.853120] --------------------------------
>
> [...]
Thanks for the report. I will work on this for the next version.
> Thanks,
> Oliver Sang
Thanks,
John Wood
Hi,
This v3 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
---
Yanan Wang (7):
tools headers: sync headers of asm-generic/hugetlb_encode.h
tools headers: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++-
tools/testing/selftests/kvm/lib/test_util.c | 92 +++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
12 files changed, 628 insertions(+), 60 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.19.1
From: John Stultz <john.stultz(a)linaro.org>
[ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ]
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd20829..604b43ece15f5 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.27.0
From: John Stultz <john.stultz(a)linaro.org>
[ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ]
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd20829..604b43ece15f5 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.27.0
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v4 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v4" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
The first argument to namedtuple() should match the name of the type,
which wasn't the case for KconfigEntryBase.
Fixing this is enough to make mypy show no python typing errors again.
Fixes 97752c39bd ("kunit: kunit_tool: Allow .kunitconfig to disable config items")
Signed-off-by: David Gow <davidgow(a)google.com>
---
tools/testing/kunit/kunit_config.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_config.py b/tools/testing/kunit/kunit_config.py
index 0b550cbd667d..1e2683dcc0e7 100644
--- a/tools/testing/kunit/kunit_config.py
+++ b/tools/testing/kunit/kunit_config.py
@@ -13,7 +13,7 @@ from typing import List, Set
CONFIG_IS_NOT_SET_PATTERN = r'^# CONFIG_(\w+) is not set$'
CONFIG_PATTERN = r'^CONFIG_(\w+)=(\S+|".*")$'
-KconfigEntryBase = collections.namedtuple('KconfigEntry', ['name', 'value'])
+KconfigEntryBase = collections.namedtuple('KconfigEntryBase', ['name', 'value'])
class KconfigEntry(KconfigEntryBase):
--
2.30.0.617.g56c4b15f3c-goog
kunit_tool maintains a list of config options which are broken under
UML, which we exclude from an otherwise 'make ARCH=um allyesconfig'
build used to run all tests with the --alltests option.
Something in UML allyesconfig is causing segfaults when page poisining
is enabled (and is poisoning with a non-zero value). Previously, this
didn't occur, as allyesconfig enabled the CONFIG_PAGE_POISONING_ZERO
option, which worked around the problem by zeroing memory. This option
has since been removed, and memory is now poisoned with 0xAA, which
triggers segfaults in many different codepaths, preventing UML from
booting.
Note that we have to disable both CONFIG_PAGE_POISONING and
CONFIG_DEBUG_PAGEALLOC, as the latter will 'select' the former on
architectures (such as UML) which don't implement __kernel_map_pages().
Ideally, we'd fix this properly by tracking down the real root cause,
but since this is breaking KUnit's --alltests feature, it's worth
disabling there in the meantime so the kernel can boot to the point
where tests can actually run.
Fixes: f289041ed4 ("mm, page_poison: remove CONFIG_PAGE_POISONING_ZERO")
Signed-off-by: David Gow <davidgow(a)google.com>
---
As described above, 'make ARCH=um allyesconfig' is broken. KUnit has
been maintaining a list of configs to force-disable for this in
tools/testing/kunit/configs/broken_on_uml.config. The kernels we've
built with this have broken since CONFIG_PAGE_POISONING_ZERO was
removed, panic-ing on startup with:
<0>[ 0.100000][ T11] Kernel panic - not syncing: Segfault with no mm
<4>[ 0.100000][ T11] CPU: 0 PID: 11 Comm: kdevtmpfs Not tainted 5.11.0-rc7-00003-g63381dc6f5f1-dirty #4
<4>[ 0.100000][ T11] Stack:
<4>[ 0.100000][ T11] 677d3d40 677d3f10 0000000e 600c0bc0
<4>[ 0.100000][ T11] 677d3d90 603c99be 677d3d90 62529b93
<4>[ 0.100000][ T11] 603c9ac0 677d3f10 62529b00 603c98a0
<4>[ 0.100000][ T11] Call Trace:
<4>[ 0.100000][ T11] [<600c0bc0>] ? set_signals+0x0/0x60
<4>[ 0.100000][ T11] [<603c99be>] lookup_mnt+0x11e/0x220
<4>[ 0.100000][ T11] [<62529b93>] ? down_write+0x93/0x180
<4>[ 0.100000][ T11] [<603c9ac0>] ? lock_mount+0x0/0x160
<4>[ 0.100000][ T11] [<62529b00>] ? down_write+0x0/0x180
<4>[ 0.100000][ T11] [<603c98a0>] ? lookup_mnt+0x0/0x220
<4>[ 0.100000][ T11] [<603c8160>] ? namespace_unlock+0x0/0x1a0
<4>[ 0.100000][ T11] [<603c9b25>] lock_mount+0x65/0x160
<4>[ 0.100000][ T11] [<6012f360>] ? up_write+0x0/0x40
<4>[ 0.100000][ T11] [<603cbbd2>] do_new_mount_fc+0xd2/0x220
<4>[ 0.100000][ T11] [<603eb560>] ? vfs_parse_fs_string+0x0/0xa0
<4>[ 0.100000][ T11] [<603cbf04>] do_new_mount+0x1e4/0x260
<4>[ 0.100000][ T11] [<603ccae9>] path_mount+0x1c9/0x6e0
<4>[ 0.100000][ T11] [<603a9f4f>] ? getname_kernel+0xaf/0x1a0
<4>[ 0.100000][ T11] [<603ab280>] ? kern_path+0x0/0x60
<4>[ 0.100000][ T11] [<600238ee>] 0x600238ee
<4>[ 0.100000][ T11] [<62523baa>] devtmpfsd+0x52/0xb8
<4>[ 0.100000][ T11] [<62523b58>] ? devtmpfsd+0x0/0xb8
<4>[ 0.100000][ T11] [<600fffd8>] kthread+0x1d8/0x200
<4>[ 0.100000][ T11] [<600a4ea6>] new_thread_handler+0x86/0xc0
Disabling PAGE_POISONING fixes this. The issue can't be repoduced with
just PAGE_POISONING, there's clearly something (or several things) also
enabled by allyesconfig which contribute. Ideally, we'd track these down
and fix this at its root cause, but in the meantime it'd be nice to
disable PAGE_POISONING so we can at least get the kernel to boot. One
way would be to add a 'depends on !UML' or similar, but since
PAGE_POISONING does seem to work in the non-allyesconfig case, adding it
to our list of broken configs seemed the better choice.
Thoughts?
(Note that to reproduce this, you'll want to run
./tools/testing/kunit/kunit.py run --alltests --raw_output
It also depends on a couple of other fixes which are not upstream yet:
https://www.spinics.net/lists/linux-rtc/msg08294.htmlhttps://lore.kernel.org/linux-i3c/20210127040636.1535722-1-davidgow@google.…
Cheers,
-- David
tools/testing/kunit/configs/broken_on_uml.config | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/kunit/configs/broken_on_uml.config b/tools/testing/kunit/configs/broken_on_uml.config
index a7f0603d33f6..690870043ac0 100644
--- a/tools/testing/kunit/configs/broken_on_uml.config
+++ b/tools/testing/kunit/configs/broken_on_uml.config
@@ -40,3 +40,5 @@
# CONFIG_RESET_BRCMSTB_RESCAL is not set
# CONFIG_RESET_INTEL_GW is not set
# CONFIG_ADI_AXI_ADC is not set
+# CONFIG_DEBUG_PAGEALLOC is not set
+# CONFIG_PAGE_POISONING is not set
--
2.30.0.478.g8a0d178c01-goog
Hi,
This v2 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
Yanan Wang (7):
tools include: sync head files of mmap flag encodings about hugetlb
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 58 +--
tools/testing/selftests/kvm/lib/test_util.c | 92 +++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
10 files changed, 623 insertions(+), 60 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.19.1
Base
====
This series is based on top of my series which adds minor fault handling for
hugetlbfs [1]. (And, therefore, it is based on linux-next/akpm and Peter Xu's
series for disabling huge pmd sharing as well.)
[1] https://lore.kernel.org/patchwork/cover/1384095/
Overview
========
See my original series linked above for a detailed overview of minor fault
handling in general. The feature in this series works exactly like the
hugetblfs version (from userspace's perspective).
I'm sending this as a separate series because:
- The original minor fault handling series has been through several rounds of
review and seems close to being merged, so it seems reasonable to start
looking at this next step.
- shmem is different enough that this series may require some additional work
before it's ready, and I don't want to delay the original series
unnecessarily by bundling them together.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android JVM garbage collector using this feature (a paper
describing a somewhat similar approach: https://arxiv.org/pdf/1902.04738.pdf).
Axel Rasmussen (5):
userfaultfd: support minor fault handling for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
fs/userfaultfd.c | 6 +-
include/linux/shmem_fs.h | 26 +-
include/uapi/linux/userfaultfd.h | 4 +-
mm/memory.c | 8 +-
mm/shmem.c | 88 +++----
mm/userfaultfd.c | 27 +-
tools/testing/selftests/vm/userfaultfd.c | 322 +++++++++++++++--------
7 files changed, 293 insertions(+), 188 deletions(-)
--
2.30.0.617.g56c4b15f3c-goog
Hi,
This patch series mainly fixes race condition issues, explains specific
lock rules and improves code related to concurrent calls of
hook_sb_delete() and release_inode(). I exploited these races to
validate the fixes. Userspace tests are also improved along with some
commit messages and comments. Serge Hallyn's review is taken into
account and his Acked-by are added to the corresponding patches.
The SLOC count is 1328 for security/landlock/ and 2539 for
tools/testing/selftest/landlock/ . Test coverage for security/landlock/
is 93.6% of lines. The code not covered only deals with internal kernel
errors (e.g. memory allocation) and race conditions. This series is
being fuzzed by syzkaller (which may cover internal kernel errors), and
patches are on their way: https://github.com/google/syzkaller/pull/2380
The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v29/userspace-api/landlock.html
This series can be applied on top of v5.11-7592-g1a3a9ffb27bb (Linus's
master branch from Sunday). This can be tested with
CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending
"landlock," to CONFIG_LSM. This patch series can be found in a Git
repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v29
This patch series seems ready for upstream and I would really appreciate
final reviews.
# Landlock LSM
The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock is a
stackable LSM [1], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empowers any process, including unprivileged
ones, to securely restrict themselves.
Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This series
still addresses multiple use cases, especially with the combined use of
seccomp-bpf: applications with built-in sandboxing, init systems,
security sandbox tools and security-oriented APIs [2].
Previous version:
https://lore.kernel.org/lkml/20210202162710.657398-1-mic@digikod.net/
[1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler…
[2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.n…
Casey Schaufler (1):
LSM: Infrastructure management of the superblock
Mickaël Salaün (11):
landlock: Add object management
landlock: Add ruleset and domain management
landlock: Set up the security framework and manage credentials
landlock: Add ptrace restrictions
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
selftests/landlock: Add user space tests
samples/landlock: Add a sandbox manager example
landlock: Add user and kernel documentation
Documentation/security/index.rst | 1 +
Documentation/security/landlock.rst | 79 +
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/landlock.rst | 307 ++
MAINTAINERS | 15 +
arch/Kconfig | 7 +
arch/alpha/kernel/syscalls/syscall.tbl | 3 +
arch/arm/tools/syscall.tbl | 3 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 6 +
arch/ia64/kernel/syscalls/syscall.tbl | 3 +
arch/m68k/kernel/syscalls/syscall.tbl | 3 +
arch/microblaze/kernel/syscalls/syscall.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +
arch/parisc/kernel/syscalls/syscall.tbl | 3 +
arch/powerpc/kernel/syscalls/syscall.tbl | 3 +
arch/s390/kernel/syscalls/syscall.tbl | 3 +
arch/sh/kernel/syscalls/syscall.tbl | 3 +
arch/sparc/kernel/syscalls/syscall.tbl | 3 +
arch/um/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 3 +
arch/x86/entry/syscalls/syscall_64.tbl | 3 +
arch/xtensa/kernel/syscalls/syscall.tbl | 3 +
fs/super.c | 1 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
include/linux/syscalls.h | 7 +
include/uapi/asm-generic/unistd.h | 8 +-
include/uapi/linux/landlock.h | 128 +
kernel/sys_ni.c | 5 +
samples/Kconfig | 7 +
samples/Makefile | 1 +
samples/landlock/.gitignore | 1 +
samples/landlock/Makefile | 13 +
samples/landlock/sandboxer.c | 238 ++
security/Kconfig | 11 +-
security/Makefile | 2 +
security/landlock/Kconfig | 21 +
security/landlock/Makefile | 4 +
security/landlock/common.h | 20 +
security/landlock/cred.c | 46 +
security/landlock/cred.h | 58 +
security/landlock/fs.c | 686 +++++
security/landlock/fs.h | 56 +
security/landlock/limits.h | 21 +
security/landlock/object.c | 67 +
security/landlock/object.h | 91 +
security/landlock/ptrace.c | 120 +
security/landlock/ptrace.h | 14 +
security/landlock/ruleset.c | 473 +++
security/landlock/ruleset.h | 165 +
security/landlock/setup.c | 40 +
security/landlock/setup.h | 18 +
security/landlock/syscalls.c | 445 +++
security/security.c | 51 +-
security/selinux/hooks.c | 58 +-
security/selinux/include/objsec.h | 6 +
security/selinux/ss/services.c | 3 +-
security/smack/smack.h | 6 +
security/smack/smack_lsm.c | 35 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/landlock/.gitignore | 2 +
tools/testing/selftests/landlock/Makefile | 24 +
tools/testing/selftests/landlock/base_test.c | 219 ++
tools/testing/selftests/landlock/common.h | 183 ++
tools/testing/selftests/landlock/config | 7 +
tools/testing/selftests/landlock/fs_test.c | 2724 +++++++++++++++++
.../testing/selftests/landlock/ptrace_test.c | 337 ++
tools/testing/selftests/landlock/true.c | 5 +
72 files changed, 6827 insertions(+), 77 deletions(-)
create mode 100644 Documentation/security/landlock.rst
create mode 100644 Documentation/userspace-api/landlock.rst
create mode 100644 include/uapi/linux/landlock.h
create mode 100644 samples/landlock/.gitignore
create mode 100644 samples/landlock/Makefile
create mode 100644 samples/landlock/sandboxer.c
create mode 100644 security/landlock/Kconfig
create mode 100644 security/landlock/Makefile
create mode 100644 security/landlock/common.h
create mode 100644 security/landlock/cred.c
create mode 100644 security/landlock/cred.h
create mode 100644 security/landlock/fs.c
create mode 100644 security/landlock/fs.h
create mode 100644 security/landlock/limits.h
create mode 100644 security/landlock/object.c
create mode 100644 security/landlock/object.h
create mode 100644 security/landlock/ptrace.c
create mode 100644 security/landlock/ptrace.h
create mode 100644 security/landlock/ruleset.c
create mode 100644 security/landlock/ruleset.h
create mode 100644 security/landlock/setup.c
create mode 100644 security/landlock/setup.h
create mode 100644 security/landlock/syscalls.c
create mode 100644 tools/testing/selftests/landlock/.gitignore
create mode 100644 tools/testing/selftests/landlock/Makefile
create mode 100644 tools/testing/selftests/landlock/base_test.c
create mode 100644 tools/testing/selftests/landlock/common.h
create mode 100644 tools/testing/selftests/landlock/config
create mode 100644 tools/testing/selftests/landlock/fs_test.c
create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
create mode 100644 tools/testing/selftests/landlock/true.c
base-commit: 31caf8b2a847214be856f843e251fc2ed2cd1075
--
2.30.0
The top-level kselftest directory is not called kselftest, but
selftests.
Signed-off-by: Antonio Terceiro <antonio.terceiro(a)linaro.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
---
Documentation/dev-tools/kselftest.rst | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/Documentation/dev-tools/kselftest.rst b/Documentation/dev-tools/kselftest.rst
index a901def730d9..dcefee707ccd 100644
--- a/Documentation/dev-tools/kselftest.rst
+++ b/Documentation/dev-tools/kselftest.rst
@@ -239,8 +239,8 @@ using a shell script test runner. ``kselftest/module.sh`` is designed
to facilitate this process. There is also a header file provided to
assist writing kernel modules that are for use with kselftest:
-- ``tools/testing/kselftest/kselftest_module.h``
-- ``tools/testing/kselftest/kselftest/module.sh``
+- ``tools/testing/selftests/kselftest_module.h``
+- ``tools/testing/selftests/kselftest/module.sh``
How to use
----------
--
2.30.1
This is an optimization of a set of sgx-related codes, each of which
is independent of the patch. Because the second and third patches have
conflicting dependencies, these patches are put together.
---
v5 changes:
* Remove the two patches with no actual value
* Typo fix in commit message
v4 changes:
* Improvements suggested by review
v3 changes:
* split free_cnt count and spin lock optimization into two patches
v2 changes:
* review suggested changes
Tianjia Zhang (3):
selftests/x86: Use getauxval() to simplify the code in sgx
x86/sgx: Allows ioctl PROVISION to execute before CREATE
x86/sgx: Remove redundant if conditions in sgx_encl_create
arch/x86/kernel/cpu/sgx/driver.c | 1 +
arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++----
tools/testing/selftests/sgx/main.c | 24 ++++--------------------
3 files changed, 9 insertions(+), 24 deletions(-)
--
2.19.1.3.ge56e4f7
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v3 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v3" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
@Andrew, this is based on v5.11-rc5-mmotm-2021-01-27-23-30, with secretmem
and related patches dropped from there, I can rebase whatever way you
prefer.
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, in the future the secret mappings may be used as a mean to
protect guest memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows usage of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which affects
the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to have
secretmem disabled by default with the ability of a system administrator to
enable it at boot time.
In addition, there is also a long term goal to improve management of the
direct map.
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux…
v17:
* Remove pool of large pages backing secretmem allocations, per Michal Hocko
* Add secretmem pages to unevictable LRU, per Michal Hocko
* Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko
* Make secretmem an opt-in feature that is disabled by default
v16:
* Fix memory leak intorduced in v15
* Clean the data left from previous page user before handing the page to
the userspace
v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org
* Add riscv/Kconfig update to disable set_memory operations for nommu
builds (patch 3)
* Update the code around add_to_page_cache() per Matthew's comments
(patches 6,7)
* Add fixups for build/checkpatch errors discovered by CI systems
v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org
* Finally s/mod_node_page_state/mod_lruvec_page_state/
v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org
* Added Reviewed-by, thanks Catalin and David
* s/mod_node_page_state/mod_lruvec_page_state/ as Shakeel suggested
Older history:
v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org
v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org
v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o…
Arnd Bergmann (1):
arm64: kfence: fix header inclusion
Mike Rapoport (9):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow set_direct_map_*_noflush() for multiple pages
set_memory: allow querying whether set_direct_map_*() is actually enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 -
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 ++
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +-
arch/arm64/mm/pageattr.c | 23 +-
arch/riscv/Kconfig | 4 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/secretmem.h | 30 +++
include/linux/set_memory.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 3 +
mm/Makefile | 1 +
mm/gup.c | 10 +
mm/internal.h | 3 +
mm/mlock.c | 3 +-
mm/mmap.c | 5 +-
mm/secretmem.c | 261 +++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests | 17 ++
39 files changed, 726 insertions(+), 53 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
--
2.28.0
Hi Linus,
Please pull the following Kselftest update for Linux 5.12-rc1.
This Kselftest update for Linux 5.12-rc1 consists of:
- dmabuf-heaps test fixes and cleanups from John Stultz.
- seccomp test fix to accept any valid fd in user_notification_addfd.
- Minor fixes to breakpoints and vDSO tests.
- Minor code cleanups to ipc and x86 tests.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3:
Linux 5.11-rc7 (2021-02-07 13:57:38 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-next-5.12-rc1
for you to fetch changes up to e0c0840a46db9d50ba7391082d665d74f320c39f:
selftests/seccomp: Accept any valid fd in user_notification_addfd
(2021-02-09 17:39:01 -0700)
----------------------------------------------------------------
linux-kselftest-next-5.12-rc1
This Kselftest update for Linux 5.12-rc1 consists of:
- dmabuf-heaps test fixes and cleanups from John Stultz.
- seccomp test fix to accept any valid fd in user_notification_addfd.
- Minor fixes to breakpoints and vDSO tests.
- Minor code cleanups to ipc and x86 tests.
----------------------------------------------------------------
John Stultz (5):
kselftests: dmabuf-heaps: Fix Makefile's inclusion of the
kernel's usr/include dir
kselftests: dmabuf-heaps: Add clearer checks on DMABUF_BEGIN/END_SYNC
kselftests: dmabuf-heaps: Softly fail if don't find a vgem device
kselftests: dmabuf-heaps: Cleanup test output
kselftests: dmabuf-heaps: Add extra checking that allocated
buffers are zeroed
Seth Forshee (1):
selftests/seccomp: Accept any valid fd in user_notification_addfd
Tiezhu Yang (1):
selftests: breakpoints: Use correct error messages in
breakpoint_test_arm64.c
Tobias Klauser (2):
selftests/vDSO: fix ABI selftest on riscv
selftests/timens: add futex binary to .gitignore
Yang Li (2):
selftests/ipc: remove unneeded semicolon
selftests/x86/ldt_gdt: remove unneeded semicolon
.../selftests/breakpoints/breakpoint_test_arm64.c | 4 +-
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 149
++++++++++++++++-----
tools/testing/selftests/ipc/msgque.c | 6 +-
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 +-
tools/testing/selftests/timens/.gitignore | 1 +
tools/testing/selftests/vDSO/vdso_config.h | 4 +-
tools/testing/selftests/x86/ldt_gdt.c | 2 +-
8 files changed, 132 insertions(+), 44 deletions(-)
----------------------------------------------------------------
Hi Linus,
Please pull the following KUnit update for Linux 5.12-rc1.
This KUnit update for Linux 5.12-rc1 consists of consists of:
-- support for filtering test suites using glob from Daniel Latypov.
"kunit_filter.glob" command line option is passed to the UML
kernel, which currently only supports filtering by suite name.
This support allows running different subsets of tests, e.g.
$ ./tools/testing/kunit/kunit.py build
$ ./tools/testing/kunit/kunit.py exec 'list*'
$ ./tools/testing/kunit/kunit.py exec 'kunit*'
-- several fixes and cleanups also from Daniel Latypov.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 92bf22614b21a2706f4993b278017e437f7785b3:
Linux 5.11-rc7 (2021-02-07 13:57:38 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-kunit-5.12-rc1
for you to fetch changes up to 7af29141a31a2a2350589471c8979ff5f22fb9b7:
kunit: tool: fix unintentional statefulness in run_kernel()
(2021-02-08 16:10:22 -0700)
----------------------------------------------------------------
linux-kselftest-kunit-5.12-rc1
This KUnit update for Linux 5.12-rc1 consists of consists of:
-- support for filtering test suites using glob from Daniel Latypov.
"kunit_filter.glob" command line option is passed to the UML
kernel, which currently only supports filtering by suite name.
This support allows running different subsets of tests, e.g.
$ ./tools/testing/kunit/kunit.py build
$ ./tools/testing/kunit/kunit.py exec 'list*'
$ ./tools/testing/kunit/kunit.py exec 'kunit*'
-- several fixes and cleanups also from Daniel Latypov.
----------------------------------------------------------------
Daniel Latypov (12):
kunit: tool: fix unit test cleanup handling
kunit: tool: stop using bare asserts in unit test
kunit: tool: use `with open()` in unit test
minor: kunit: tool: fix unit test so it can run from non-root dir
kunit: tool: simplify kconfig is_subset_of() logic
KUnit: Docs: make start.rst example Kconfig follow style.rst
Documentation: kunit: add tips.rst for small examples
kunit: make kunit_tool accept optional path to .kunitconfig fragment
kunit: don't show `1 == 1` in failed assertion messages
kunit: add kunit.filter_glob cmdline option to filter suites
kunit: tool: add support for filtering suites by glob
kunit: tool: fix unintentional statefulness in run_kernel()
Documentation/dev-tools/kunit/index.rst | 2 +
Documentation/dev-tools/kunit/start.rst | 7 +-
Documentation/dev-tools/kunit/tips.rst | 115 ++++++++++++++++++
lib/kunit/Kconfig | 1 +
lib/kunit/assert.c | 39 +++++-
lib/kunit/executor.c | 93 +++++++++++++--
tools/testing/kunit/kunit.py | 30 +++--
tools/testing/kunit/kunit_config.py | 13 +-
tools/testing/kunit/kunit_kernel.py | 18 ++-
tools/testing/kunit/kunit_tool_test.py | 204
+++++++++++++++++---------------
10 files changed, 390 insertions(+), 132 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/tips.rst
----------------------------------------------------------------
Hi,
This patch series fixes a corner-case with non-overlapping access rights
coming from different layers. This is now handled in a generic way and
verified with new tests. A stricter check is enforced for
landlock_add_rule(2) to forbid useless rules. Finally, the previous
landlock_enforce_ruleset_self(2) is renamed to
landlock_restrict_self(2), which is more consistent.
The SLOC count is 1314 for security/landlock/ and 2484 for
tools/testing/selftest/landlock/ . Test coverage for security/landlock/
is 94.7% of lines. The code not covered only deals with internal kernel
errors (e.g. memory allocation) and race conditions. This series is
being fuzzed by syzkaller, and patches are on their way:
https://github.com/google/syzkaller/pull/2380
The compiled documentation is available here:
https://landlock.io/linux-doc/landlock-v28/userspace-api/landlock.html
This series can be applied on top of v5.11-rc6 . This can be tested
with CONFIG_SECURITY_LANDLOCK, CONFIG_SAMPLE_LANDLOCK and by prepending
"landlock," to CONFIG_LSM. This patch series can be found in a Git
repository here:
https://github.com/landlock-lsm/linux/commits/landlock-v28
This patch series seems ready for upstream and I would really appreciate
final reviews.
# Landlock LSM
The goal of Landlock is to enable to restrict ambient rights (e.g.
global filesystem access) for a set of processes. Because Landlock is a
stackable LSM [1], it makes possible to create safe security sandboxes
as new security layers in addition to the existing system-wide
access-controls. This kind of sandbox is expected to help mitigate the
security impact of bugs or unexpected/malicious behaviors in user-space
applications. Landlock empowers any process, including unprivileged
ones, to securely restrict themselves.
Landlock is inspired by seccomp-bpf but instead of filtering syscalls
and their raw arguments, a Landlock rule can restrict the use of kernel
objects like file hierarchies, according to the kernel semantic.
Landlock also takes inspiration from other OS sandbox mechanisms: XNU
Sandbox, FreeBSD Capsicum or OpenBSD Pledge/Unveil.
In this current form, Landlock misses some access-control features.
This enables to minimize this patch series and ease review. This series
still addresses multiple use cases, especially with the combined use of
seccomp-bpf: applications with built-in sandboxing, init systems,
security sandbox tools and security-oriented APIs [2].
Previous version:
https://lore.kernel.org/lkml/20210121205119.793296-1-mic@digikod.net/
[1] https://lore.kernel.org/lkml/50db058a-7dde-441b-a7f9-f6837fe8b69f@schaufler…
[2] https://lore.kernel.org/lkml/f646e1c7-33cf-333f-070c-0a40ad0468cd@digikod.n…
Casey Schaufler (1):
LSM: Infrastructure management of the superblock
Mickaël Salaün (11):
landlock: Add object management
landlock: Add ruleset and domain management
landlock: Set up the security framework and manage credentials
landlock: Add ptrace restrictions
fs,security: Add sb_delete hook
landlock: Support filesystem access-control
landlock: Add syscall implementations
arch: Wire up Landlock syscalls
selftests/landlock: Add user space tests
samples/landlock: Add a sandbox manager example
landlock: Add user and kernel documentation
Documentation/security/index.rst | 1 +
Documentation/security/landlock.rst | 79 +
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/landlock.rst | 307 ++
MAINTAINERS | 15 +
arch/Kconfig | 7 +
arch/alpha/kernel/syscalls/syscall.tbl | 3 +
arch/arm/tools/syscall.tbl | 3 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 6 +
arch/ia64/kernel/syscalls/syscall.tbl | 3 +
arch/m68k/kernel/syscalls/syscall.tbl | 3 +
arch/microblaze/kernel/syscalls/syscall.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 3 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 3 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 3 +
arch/parisc/kernel/syscalls/syscall.tbl | 3 +
arch/powerpc/kernel/syscalls/syscall.tbl | 3 +
arch/s390/kernel/syscalls/syscall.tbl | 3 +
arch/sh/kernel/syscalls/syscall.tbl | 3 +
arch/sparc/kernel/syscalls/syscall.tbl | 3 +
arch/um/Kconfig | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 3 +
arch/x86/entry/syscalls/syscall_64.tbl | 3 +
arch/xtensa/kernel/syscalls/syscall.tbl | 3 +
fs/super.c | 1 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 3 +
include/linux/security.h | 4 +
include/linux/syscalls.h | 7 +
include/uapi/asm-generic/unistd.h | 8 +-
include/uapi/linux/landlock.h | 128 +
kernel/sys_ni.c | 5 +
samples/Kconfig | 7 +
samples/Makefile | 1 +
samples/landlock/.gitignore | 1 +
samples/landlock/Makefile | 13 +
samples/landlock/sandboxer.c | 238 ++
security/Kconfig | 11 +-
security/Makefile | 2 +
security/landlock/Kconfig | 21 +
security/landlock/Makefile | 4 +
security/landlock/common.h | 20 +
security/landlock/cred.c | 46 +
security/landlock/cred.h | 58 +
security/landlock/fs.c | 627 ++++
security/landlock/fs.h | 56 +
security/landlock/limits.h | 21 +
security/landlock/object.c | 67 +
security/landlock/object.h | 91 +
security/landlock/ptrace.c | 120 +
security/landlock/ptrace.h | 14 +
security/landlock/ruleset.c | 473 +++
security/landlock/ruleset.h | 165 +
security/landlock/setup.c | 40 +
security/landlock/setup.h | 18 +
security/landlock/syscalls.c | 444 +++
security/security.c | 51 +-
security/selinux/hooks.c | 58 +-
security/selinux/include/objsec.h | 6 +
security/selinux/ss/services.c | 3 +-
security/smack/smack.h | 6 +
security/smack/smack_lsm.c | 35 +-
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/landlock/.gitignore | 2 +
tools/testing/selftests/landlock/Makefile | 24 +
tools/testing/selftests/landlock/base_test.c | 219 ++
tools/testing/selftests/landlock/common.h | 169 ++
tools/testing/selftests/landlock/config | 6 +
tools/testing/selftests/landlock/fs_test.c | 2664 +++++++++++++++++
.../testing/selftests/landlock/ptrace_test.c | 314 ++
tools/testing/selftests/landlock/true.c | 5 +
72 files changed, 6668 insertions(+), 77 deletions(-)
create mode 100644 Documentation/security/landlock.rst
create mode 100644 Documentation/userspace-api/landlock.rst
create mode 100644 include/uapi/linux/landlock.h
create mode 100644 samples/landlock/.gitignore
create mode 100644 samples/landlock/Makefile
create mode 100644 samples/landlock/sandboxer.c
create mode 100644 security/landlock/Kconfig
create mode 100644 security/landlock/Makefile
create mode 100644 security/landlock/common.h
create mode 100644 security/landlock/cred.c
create mode 100644 security/landlock/cred.h
create mode 100644 security/landlock/fs.c
create mode 100644 security/landlock/fs.h
create mode 100644 security/landlock/limits.h
create mode 100644 security/landlock/object.c
create mode 100644 security/landlock/object.h
create mode 100644 security/landlock/ptrace.c
create mode 100644 security/landlock/ptrace.h
create mode 100644 security/landlock/ruleset.c
create mode 100644 security/landlock/ruleset.h
create mode 100644 security/landlock/setup.c
create mode 100644 security/landlock/setup.h
create mode 100644 security/landlock/syscalls.c
create mode 100644 tools/testing/selftests/landlock/.gitignore
create mode 100644 tools/testing/selftests/landlock/Makefile
create mode 100644 tools/testing/selftests/landlock/base_test.c
create mode 100644 tools/testing/selftests/landlock/common.h
create mode 100644 tools/testing/selftests/landlock/config
create mode 100644 tools/testing/selftests/landlock/fs_test.c
create mode 100644 tools/testing/selftests/landlock/ptrace_test.c
create mode 100644 tools/testing/selftests/landlock/true.c
base-commit: 1048ba83fb1c00cd24172e23e8263972f6b5d9ac
--
2.30.0
Hi,
This patch series introduces the futex2 syscalls.
* What happened to the current futex()?
For some years now, developers have been trying to add new features to
futex, but maintainers have been reluctant to accept then, given the
multiplexed interface full of legacy features and tricky to do big
changes. Some problems that people tried to address with patchsets are:
NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
NUMA, for instance, just doesn't fit the current API in a reasonable
way. Considering that, it's not possible to merge new features into the
current futex.
** The NUMA problem
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** The 32bit sized futex problem
Embedded systems or anything with memory constrains would benefit of
using smaller sizes for the futex userspace integer. Also, a mutex
implementation can be done using just three values, so 8 bits is enough
for various scenarios.
** The wait on multiple problem
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running
over Wine.
[0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
[1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
[2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.…
* The solution
As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
is required to solve this, which must be designed with those features in
mind. futex2() is that interface. As opposed to the current multiplexed
interface, the new one should have one syscall per operation. This will
allow the maintainability of the API if it gets extended, and will help
users with type checking of arguments.
In particular, the new interface is extended to support the ability to
wait on any of a list of futexes at a time, which could be seen as a
vectored extension of the FUTEX_WAIT semantics.
[3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-…
* The interface
The new interface can be seen in details in the following patches, but
this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- Supports variable sized futexes (8bits, 16bits and 32bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* Implementation
The internal implementation follows a similar design to the original futex.
Given that we want to replicate the same external behavior of current
futex, this should be somewhat expected. For some functions, like the
init and the code to get a shared key, I literally copied code and
comments from kernel/futex.c. I decided to do so instead of exposing the
original function as a public function since in that way we can freely
modify our implementation if required, without any impact on old futex.
Also, the comments precisely describes the details and corner cases of
the implementation.
Each patch contains a brief description of implementation, but patch 6
"docs: locking: futex2: Add documentation" adds a more complete document
about it.
* The patchset
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2
- Patch 1: Implements wait/wake, and the basics foundations of futex2
- Patches 2-4: Implement the remaining features (shared, waitv, requeue).
- Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated
patch since I'm not sure if x86_x32 is still a thing, or if it should
return -ENOSYS.
- Patch 6: Add a documentation file which details the interface and
the internal implementation.
- Patches 7-13: Selftests for all operations along with perf
support for futex2.
- Patch 14: While working on porting glibc for futex2, I found out
that there's a futex_wake() call at the user thread exit path, if
that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
order to make pthreads work with futex2, it was required to add
this patch. Note that this is more a proof-of-concept of what we
will need to do in future, rather than part of the interface and
shouldn't be merged as it is.
* Testing:
This patchset provides selftests for each operation and their flags.
Along with that, the following work was done:
** Stability
To stress the interface in "real world scenarios":
- glibc[4]: nptl's low level locking was modified to use futex2 API
(except for robust and PI things). All relevant nptl/ tests passed.
- Wine[5]: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- Full GNU/Linux distro: I installed the modified glibc in my host
machine, so all pthread's programs would use futex2(). After tweaking
systemd[6] to allow futex2() calls at seccomp, everything worked as
expected (web browsers do some syscall sandboxing and need some
configuration as well).
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
** Performance
- For comparing futex() and futex2() performance, I used the artificial
benchmarks implemented at perf (wake, wake-parallel, hash and
requeue). The setup was 200 runs for each test and using 8, 80, 800,
8000 for the number of threads, Note that for this test, I'm not using
patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
at "The patchset" section.
- For the first three ones, I measured an average of 4% gain in
performance. This is not a big step, but it shows that the new
interface is at least comparable in performance with the current one.
- For requeue, I measured an average of 21% decrease in performance
compared to the original futex implementation. This is expected given
the new design with individual flags. The performance trade-offs are
explained at patch 4 ("futex2: Implement requeue operation").
[4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
[5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
[6] https://gitlab.collabora.com/tonyk/systemd
* FAQ
** "Where's the code for NUMA and FUTEX_8/16?"
The current code is already complex enough to take some time for
review, so I believe it's better to split that work out to a future
iteration of this patchset. Besides that, this RFC is the core part of the
infrastructure, and the following features will not pose big design
changes to it, the work will be more about wiring up the flags and
modifying some functions.
** "And what's about FUTEX_64?"
By supporting 64 bit futexes, the kernel structure for futex would
need to have a 64 bit field for the value, and that could defeat one of
the purposes of having different sized futexes in the first place:
supporting smaller ones to decrease memory usage. This might be
something that could be disabled for 32bit archs (and even for
CONFIG_BASE_SMALL).
Which use case would benefit for FUTEX_64? Does it worth the trade-offs?
** "Where's the PI/robust stuff?"
As said by Peter Zijlstra at [3], all those new features are related to
the "simple" futex interface, that doesn't use PI or robust. Do we want
to have this complexity at futex2() and if so, should it be part of
this patchset or can it be future work?
Thanks,
André
André Almeida (13):
futex2: Implement wait and wake functions
futex2: Add support for shared futexes
futex2: Implement vectorized wait
futex2: Implement requeue operation
futex2: Add compatibility entry point for x86_x32 ABI
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 198 +++
Documentation/locking/index.rst | 1 +
MAINTAINERS | 2 +-
arch/arm/tools/syscall.tbl | 4 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 4 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
fs/inode.c | 1 +
include/linux/compat.h | 23 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 18 +
include/uapi/asm-generic/unistd.h | 14 +-
include/uapi/linux/futex.h | 56 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex2.c | 1255 +++++++++++++++++
kernel/sys_ni.c | 6 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/include/uapi/asm-generic/unistd.h | 11 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 3 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 +
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 8 +-
.../futex/functional/futex2_requeue.c | 164 +++
.../selftests/futex/functional/futex2_wait.c | 209 +++
.../selftests/futex/functional/futex2_waitv.c | 157 +++
.../futex/functional/futex_wait_timeout.c | 58 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 121 ++
38 files changed, 2563 insertions(+), 53 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.30.1
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
v2 -> v3:
Fix kunit_fail_current_test() so it works w/ CONFIG_KUNIT=m
s/_/__ on the helper func to match others in test.c
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 37 +++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 66 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: 1e0d27fce010b0a4a9e595506b6ede75934c31be
--
2.30.0.478.g8a0d178c01-goog
pipe() and socketpair() have different behavior wrt edge-triggered
read epoll, in that no event is generated when data is written into
a non-empty pipe, but an event is generated if socketpair() is used
instead.
This simple modification of the epoll2 testlet from
tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c
(it just adds a second write) shows the different behavior.
The testlet passes with pipe() but fails with socketpair() with 5.10.
They both fail with 4.19.
Is it fair to assume that 5.10 pipe's behavior is the correct one?
Thanks,
Francesco Ruggeri
/*
* t0
* | (ew)
* e0
* | (et)
* s0
*/
TEST(epoll2)
{
int efd;
int sfd[2];
struct epoll_event e;
ASSERT_EQ(socketpair(AF_UNIX, SOCK_STREAM, 0, sfd), 0);
//ASSERT_EQ(pipe(sfd), 0);
efd = epoll_create(1);
ASSERT_GE(efd, 0);
e.events = EPOLLIN | EPOLLET;
ASSERT_EQ(epoll_ctl(efd, EPOLL_CTL_ADD, sfd[0], &e), 0);
ASSERT_EQ(write(sfd[1], "w", 1), 1);
EXPECT_EQ(epoll_wait(efd, &e, 1, 0), 1);
ASSERT_EQ(write(sfd[1], "w", 1), 1);
EXPECT_EQ(epoll_wait(efd, &e, 1, 0), 0);
close(efd);
close(sfd[0]);
close(sfd[1]);
}
This is an optimization of a set of sgx-related codes, each of which
is independent of the patch. Because the second and third patches have
conflicting dependencies, these patches are put together.
---
v3 changes:
* split free_cnt count and spin lock optimization into two patches
v2 changes:
* review suggested changes
Tianjia Zhang (5):
selftests/x86: Simplify the code to get vdso base address in sgx
x86/sgx: Optimize the locking range in sgx_sanitize_section()
x86/sgx: Optimize the free_cnt count in sgx_epc_section
x86/sgx: Allows ioctl PROVISION to execute before CREATE
x86/sgx: Remove redundant if conditions in sgx_encl_create
arch/x86/kernel/cpu/sgx/driver.c | 1 +
arch/x86/kernel/cpu/sgx/ioctl.c | 9 +++++----
arch/x86/kernel/cpu/sgx/main.c | 13 +++++--------
tools/testing/selftests/sgx/main.c | 24 ++++--------------------
4 files changed, 15 insertions(+), 32 deletions(-)
--
2.19.1.3.ge56e4f7
Changelog
---------
v11
- Another build fix reported by robot on i386: moved is_pinnable_page()
below set_page_section() in linux/mm.h
v10
- Fixed !CONFIG_MMU compiler issues by adding is_zero_pfn() stub.
v9
- Renamed gpf_to_alloc_flags() to gfp_to_alloc_flags_cma(); thanks Lecopzer
Chen for noticing.
- Fixed warning reported scripts/checkpatch.pl:
"Logical continuations should be on the previous line"
v8
- Added reviewed by's from John Hubbard
- Fixed subjects for selftests patches
- Moved zero page check inside is_pinnable_page() as requested by Jason
Gunthorpe.
v7
- Added reviewed-by's
- Fixed a compile bug on non-mmu builds reported by robot
v6
Small update, but I wanted to send it out quicker, as it removes a
controversial patch and replaces it with something sane.
- Removed forcing FOLL_WRITE for longterm gup, instead added a patch to
skip zero pages during migration.
- Added reviewed-by's and minor log changes.
v5
- Added the following patches to the beginning of series, which are fixes
to the other existing problems with CMA migration code:
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors also at the beginning of series
mm/gup: do not allow zero page for pinned pages
- remove .gfp_mask/.reclaim_idx changes from mm/vmscan.c
- update movable zone header comment in patch 8 instead of patch 3, fix
the comment
- Added acked, sign-offs
- Updated commit logs based on feedback
- Addressed issues reported by Michal and Jason.
- Remove:
#define PINNABLE_MIGRATE_MAX 10
#define PINNABLE_ISOLATE_MAX 100
Instead: fail on the first migration failure, and retry isolation
forever as their failures are transient.
- In self-set addressed some of the comments from John Hubbard, updated
commit logs, and added comments. Renamed gup->flags with gup->test_flags.
v4
- Address page migration comments. New patch:
mm/gup: limit number of gup migration failures, honor failures
Implements the limiting number of retries for migration failures, and
also check for isolation failures.
Added a test case into gup_test to verify that pages never long-term
pinned in a movable zone, and also added tests to fault both in kernel
and in userland.
v3
- Merged with linux-next, which contains clean-up patch from Jason,
therefore this series is reduced by two patches which did the same
thing.
v2
- Addressed all review comments
- Added Reviewed-by's.
- Renamed PF_MEMALLOC_NOMOVABLE to PF_MEMALLOC_PIN
- Added is_pinnable_page() to check if page can be longterm pinned
- Fixed gup fast path by checking is_in_pinnable_zone()
- rename cma_page_list to movable_page_list
- add a admin-guide note about handling pinned pages in ZONE_MOVABLE,
updated caveat about pinned pages from linux/mmzone.h
- Move current_gfp_context() to fast-path
---------
When page is pinned it cannot be moved and its physical address stays
the same until pages is unpinned.
This is useful functionality to allows userland to implementation DMA
access. For example, it is used by vfio in vfio_pin_pages().
However, this functionality breaks memory hotplug/hotremove assumptions
that pages in ZONE_MOVABLE can always be migrated.
This patch series fixes this issue by forcing new allocations during
page pinning to omit ZONE_MOVABLE, and also to migrate any existing
pages from ZONE_MOVABLE during pinning.
It uses the same scheme logic that is currently used by CMA, and extends
the functionality for all allocations.
For more information read the discussion [1] about this problem.
[1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQS…
Previous versions:
v1
https://lore.kernel.org/lkml/20201202052330.474592-1-pasha.tatashin@soleen.…
v2
https://lore.kernel.org/lkml/20201210004335.64634-1-pasha.tatashin@soleen.c…
v3
https://lore.kernel.org/lkml/20201211202140.396852-1-pasha.tatashin@soleen.…
v4
https://lore.kernel.org/lkml/20201217185243.3288048-1-pasha.tatashin@soleen…
v5
https://lore.kernel.org/lkml/20210119043920.155044-1-pasha.tatashin@soleen.…
v6
https://lore.kernel.org/lkml/20210120014333.222547-1-pasha.tatashin@soleen.…
v7
https://lore.kernel.org/lkml/20210122033748.924330-1-pasha.tatashin@soleen.…
v8
https://lore.kernel.org/lkml/20210125194751.1275316-1-pasha.tatashin@soleen…
v9
https://lore.kernel.org/lkml/20210201153827.444374-1-pasha.tatashin@soleen.…
v10
https://lore.kernel.org/lkml/20210211162427.618913-1-pasha.tatashin@soleen.…
Pavel Tatashin (14):
mm/gup: don't pin migrated cma pages in movable zone
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
mm: apply per-task gfp constraints in fast path
mm: honor PF_MEMALLOC_PIN for all movable pages
mm/gup: do not migrate zero page
mm/gup: migrate pinned pages out of movable zone
memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
mm/gup: change index type to long as it counts pages
mm/gup: longterm pin migration cleanup
selftests/vm: gup_test: fix test flag
selftests/vm: gup_test: test faulting in kernel, and verify pinnable
pages
.../admin-guide/mm/memory-hotplug.rst | 9 +
include/linux/migrate.h | 1 +
include/linux/mm.h | 19 ++
include/linux/mmzone.h | 13 +-
include/linux/pgtable.h | 12 ++
include/linux/sched.h | 2 +-
include/linux/sched/mm.h | 27 +--
include/trace/events/migrate.h | 3 +-
mm/gup.c | 174 ++++++++----------
mm/gup_test.c | 29 +--
mm/gup_test.h | 3 +-
mm/hugetlb.c | 4 +-
mm/page_alloc.c | 33 ++--
tools/testing/selftests/vm/gup_test.c | 36 +++-
14 files changed, 208 insertions(+), 157 deletions(-)
--
2.25.1
Musl libc does not define the glibc specific macro __GLIBC_PREREQ(), but
it has the clock_adjtime() function. Assume that a libc implementation
which does not define __GLIBC_PREREQ at all still implements
clock_adjtime().
This fixes a build problem with musl libc because the __GLIBC_PREREQ()
macro is missing.
Fixes: 42e1358e103d ("ptp: In the testptp utility, use clock_adjtime from glibc when available")
Signed-off-by: Hauke Mehrtens <hauke(a)hauke-m.de>
---
tools/testing/selftests/ptp/testptp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/ptp/testptp.c b/tools/testing/selftests/ptp/testptp.c
index f7911aaeb007..ecffe2c78543 100644
--- a/tools/testing/selftests/ptp/testptp.c
+++ b/tools/testing/selftests/ptp/testptp.c
@@ -38,6 +38,7 @@
#define NSEC_PER_SEC 1000000000LL
/* clock_adjtime is not available in GLIBC < 2.14 */
+#ifdef __GLIBC_PREREQ
#if !__GLIBC_PREREQ(2, 14)
#include <sys/syscall.h>
static int clock_adjtime(clockid_t id, struct timex *tx)
@@ -45,6 +46,7 @@ static int clock_adjtime(clockid_t id, struct timex *tx)
return syscall(__NR_clock_adjtime, id, tx);
}
#endif
+#endif /* __GLIBC_PREREQ */
static void show_flag_test(int rq_index, unsigned int flags, int err)
{
--
2.20.1
This is an optimization of a set of sgx-related codes, each of which
is independent of the patch. Because the second and third patches have
conflicting dependencies, these patches are put together.
---
v4 changes:
* Improvements suggested by review
v3 changes:
* split free_cnt count and spin lock optimization into two patches
v2 changes:
* review suggested changes
Tianjia Zhang (5):
selftests/x86: Use getauxval() to simplify the code in sgx
x86/sgx: Reduce the locking range in sgx_sanitize_section()
x86/sgx: Optimize the free_cnt count in sgx_epc_section
x86/sgx: Allows ioctl PROVISION to execute before CREATE
x86/sgx: Remove redundant if conditions in sgx_encl_create
arch/x86/kernel/cpu/sgx/driver.c | 1 +
arch/x86/kernel/cpu/sgx/ioctl.c | 8 ++++----
arch/x86/kernel/cpu/sgx/main.c | 13 +++++--------
tools/testing/selftests/sgx/main.c | 24 ++++--------------------
4 files changed, 14 insertions(+), 32 deletions(-)
--
2.19.1.3.ge56e4f7
Changelog
---------
v10
- Fixed !CONFIG_MMU compiler issues by adding is_zero_pfn() stub.
v9
- Renamed gpf_to_alloc_flags() to gfp_to_alloc_flags_cma(); thanks Lecopzer
Chen for noticing.
- Fixed warning reported scripts/checkpatch.pl:
"Logical continuations should be on the previous line"
v8
- Added reviewed by's from John Hubbard
- Fixed subjects for selftests patches
- Moved zero page check inside is_pinnable_page() as requested by Jason
Gunthorpe.
v7
- Added reviewed-by's
- Fixed a compile bug on non-mmu builds reported by robot
v6
Small update, but I wanted to send it out quicker, as it removes a
controversial patch and replaces it with something sane.
- Removed forcing FOLL_WRITE for longterm gup, instead added a patch to
skip zero pages during migration.
- Added reviewed-by's and minor log changes.
v5
- Added the following patches to the beginning of series, which are fixes
to the other existing problems with CMA migration code:
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors also at the beginning of series
mm/gup: do not allow zero page for pinned pages
- remove .gfp_mask/.reclaim_idx changes from mm/vmscan.c
- update movable zone header comment in patch 8 instead of patch 3, fix
the comment
- Added acked, sign-offs
- Updated commit logs based on feedback
- Addressed issues reported by Michal and Jason.
- Remove:
#define PINNABLE_MIGRATE_MAX 10
#define PINNABLE_ISOLATE_MAX 100
Instead: fail on the first migration failure, and retry isolation
forever as their failures are transient.
- In self-set addressed some of the comments from John Hubbard, updated
commit logs, and added comments. Renamed gup->flags with gup->test_flags.
v4
- Address page migration comments. New patch:
mm/gup: limit number of gup migration failures, honor failures
Implements the limiting number of retries for migration failures, and
also check for isolation failures.
Added a test case into gup_test to verify that pages never long-term
pinned in a movable zone, and also added tests to fault both in kernel
and in userland.
v3
- Merged with linux-next, which contains clean-up patch from Jason,
therefore this series is reduced by two patches which did the same
thing.
v2
- Addressed all review comments
- Added Reviewed-by's.
- Renamed PF_MEMALLOC_NOMOVABLE to PF_MEMALLOC_PIN
- Added is_pinnable_page() to check if page can be longterm pinned
- Fixed gup fast path by checking is_in_pinnable_zone()
- rename cma_page_list to movable_page_list
- add a admin-guide note about handling pinned pages in ZONE_MOVABLE,
updated caveat about pinned pages from linux/mmzone.h
- Move current_gfp_context() to fast-path
---------
When page is pinned it cannot be moved and its physical address stays
the same until pages is unpinned.
This is useful functionality to allows userland to implementation DMA
access. For example, it is used by vfio in vfio_pin_pages().
However, this functionality breaks memory hotplug/hotremove assumptions
that pages in ZONE_MOVABLE can always be migrated.
This patch series fixes this issue by forcing new allocations during
page pinning to omit ZONE_MOVABLE, and also to migrate any existing
pages from ZONE_MOVABLE during pinning.
It uses the same scheme logic that is currently used by CMA, and extends
the functionality for all allocations.
For more information read the discussion [1] about this problem.
[1] https://lore.kernel.org/lkml/CA+CK2bBffHBxjmb9jmSKacm0fJMinyt3Nhk8Nx6iudcQS…
Previous versions:
v1
https://lore.kernel.org/lkml/20201202052330.474592-1-pasha.tatashin@soleen.…
v2
https://lore.kernel.org/lkml/20201210004335.64634-1-pasha.tatashin@soleen.c…
v3
https://lore.kernel.org/lkml/20201211202140.396852-1-pasha.tatashin@soleen.…
v4
https://lore.kernel.org/lkml/20201217185243.3288048-1-pasha.tatashin@soleen…
v5
https://lore.kernel.org/lkml/20210119043920.155044-1-pasha.tatashin@soleen.…
v6
https://lore.kernel.org/lkml/20210120014333.222547-1-pasha.tatashin@soleen.…
v7
https://lore.kernel.org/lkml/20210122033748.924330-1-pasha.tatashin@soleen.…
v8
https://lore.kernel.org/lkml/20210125194751.1275316-1-pasha.tatashin@soleen…
v9
https://lore.kernel.org/lkml/20210201153827.444374-1-pasha.tatashin@soleen.…
Pavel Tatashin (14):
mm/gup: don't pin migrated cma pages in movable zone
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
mm cma: rename PF_MEMALLOC_NOCMA to PF_MEMALLOC_PIN
mm: apply per-task gfp constraints in fast path
mm: honor PF_MEMALLOC_PIN for all movable pages
mm/gup: do not migrate zero page
mm/gup: migrate pinned pages out of movable zone
memory-hotplug.rst: add a note about ZONE_MOVABLE and page pinning
mm/gup: change index type to long as it counts pages
mm/gup: longterm pin migration cleanup
selftests/vm: gup_test: fix test flag
selftests/vm: gup_test: test faulting in kernel, and verify pinnable
pages
.../admin-guide/mm/memory-hotplug.rst | 9 +
include/linux/migrate.h | 1 +
include/linux/mm.h | 19 ++
include/linux/mmzone.h | 13 +-
include/linux/pgtable.h | 12 ++
include/linux/sched.h | 2 +-
include/linux/sched/mm.h | 27 +--
include/trace/events/migrate.h | 3 +-
mm/gup.c | 174 ++++++++----------
mm/gup_test.c | 29 +--
mm/gup_test.h | 3 +-
mm/hugetlb.c | 4 +-
mm/page_alloc.c | 33 ++--
tools/testing/selftests/vm/gup_test.c | 36 +++-
14 files changed, 208 insertions(+), 157 deletions(-)
--
2.25.1
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b53c73580c5..28bb26cd1f67 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3434,8 +3434,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 3b53c73580c5..28bb26cd1f67 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3434,8 +3434,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
From: Willem de Bruijn <willemb(a)google.com>
FIXTURE_VARIANT data is passed to FIXTURE_SETUP and TEST_F as variant.
In some cases, the variant will change the setup, such that expections
also change on teardown. Also pass variant to FIXTURE_TEARDOWN.
The new FIXTURE_TEARDOWN logic is identical to that in FIXTURE_SETUP,
right above.
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
For one use of this see tentative
tools/testing/selftests/filesystems/selectpoll.c kselftest at
https://github.com/wdebruij/linux-next-mirror/commit/12b4d183ac9140c1360637…
---
tools/testing/selftests/kselftest_harness.h | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index f19804df244c..6a27e79278e8 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -283,7 +283,9 @@
#define FIXTURE_TEARDOWN(fixture_name) \
void fixture_name##_teardown( \
struct __test_metadata __attribute__((unused)) *_metadata, \
- FIXTURE_DATA(fixture_name) __attribute__((unused)) *self)
+ FIXTURE_DATA(fixture_name) __attribute__((unused)) *self, \
+ const FIXTURE_VARIANT(fixture_name) \
+ __attribute__((unused)) *variant)
/**
* FIXTURE_VARIANT(fixture_name) - Optionally called once per fixture
@@ -298,9 +300,9 @@
* ...
* };
*
- * Defines type of constant parameters provided to FIXTURE_SETUP() and TEST_F()
- * as *variant*. Variants allow the same tests to be run with different
- * arguments.
+ * Defines type of constant parameters provided to FIXTURE_SETUP(), TEST_F() and
+ * FIXTURE_TEARDOWN as *variant*. Variants allow the same tests to be run with
+ * different arguments.
*/
#define FIXTURE_VARIANT(fixture_name) struct _fixture_variant_##fixture_name
@@ -382,7 +384,7 @@
if (!_metadata->passed) \
return; \
fixture_name##_##test_name(_metadata, &self, variant->data); \
- fixture_name##_teardown(_metadata, &self); \
+ fixture_name##_teardown(_metadata, &self, variant->data); \
} \
static struct __test_metadata \
_##fixture_name##_##test_name##_object = { \
--
2.29.2.576.ga3fc446d84-goog
We're working on a user space control plane for the BPF sk_lookup
hook [1]. The hook attaches to a network namespace and allows
control over which socket receives a new connection / packet.
Roughly, applications can give a socket to our user space component
to participate in custom bind semantics. This creates an edge case
where an application can provide us with a socket that lives in
a different network namespace than our BPF sk_lookup program.
We'd like to return an error in this case.
Additionally, we have some user space state that is tied to the
network namespace. We currently use the inode of the nsfs entry
in a directory name, but this is suffers from inode reuse.
I'm proposing to fix both of these issues by adding a new
SO_NETNS_COOKIE socket option as well as a NS_GET_COOKIE ioctl.
Using these we get a stable, unique identifier for a network
namespace and check whether a socket belongs to the "correct"
namespace.
NS_GET_COOKIE could be renamed to NS_GET_NET_COOKIE. I kept the
name generic because it seems like other namespace types could
benefit from a cookie as well.
I'm trying to land this via the bpf tree since this is where the
netns cookie originated, please let me know if this isn't
appropriate.
1: https://www.kernel.org/doc/html/latest/bpf/prog_sk_lookup.html
Cc: bpf(a)vger.kernel.org
Cc: linux-alpha(a)vger.kernel.org
Cc: linux-api(a)vger.kernel.org
Cc: linux-arch(a)vger.kernel.org
Cc: linux-fsdevel(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-mips(a)vger.kernel.org
Cc: linux-parisc(a)vger.kernel.org
Cc: netdev(a)vger.kernel.org
Cc: sparclinux(a)vger.kernel.org
Lorenz Bauer (4):
net: add SO_NETNS_COOKIE socket option
nsfs: add an ioctl to discover the network namespace cookie
tools/testing: add test for NS_GET_COOKIE
tools/testing: add a selftest for SO_NETNS_COOKIE
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
fs/nsfs.c | 9 +++
include/linux/sock_diag.h | 20 ++++++
include/net/net_namespace.h | 11 ++++
include/uapi/asm-generic/socket.h | 2 +
include/uapi/linux/nsfs.h | 2 +
net/core/filter.c | 9 ++-
net/core/sock.c | 7 +++
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/so_netns_cookie.c | 61 +++++++++++++++++++
tools/testing/selftests/nsfs/.gitignore | 1 +
tools/testing/selftests/nsfs/Makefile | 2 +-
tools/testing/selftests/nsfs/netns.c | 57 +++++++++++++++++
17 files changed, 185 insertions(+), 7 deletions(-)
create mode 100644 tools/testing/selftests/net/so_netns_cookie.c
create mode 100644 tools/testing/selftests/nsfs/netns.c
--
2.27.0
Hi,
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm.
The following explains what we can exactly do through this test.
And a RFC is sent for comments, thanks.
The function guest_code() is designed to cover conditions where a single vcpu
or multiple vcpus access guest pages within the same memory range, in three
VM stages(before dirty-logging, during dirty-logging, after dirty-logging).
Besides, the backing source memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings or
block mappings can be chosen by users to be created in the test.
If use of ANONYMOUS memory is specified, kvm will create page mappings for the
tested memory region before dirty-logging, and update attributes of the page
mappings from RO to RW during dirty-logging. If use of THP/HUGETLB memory is
specified, kvm will create block mappings for the tested memory region before
dirty-logging, and split the blcok mappings into page mappings during
dirty-logging, and coalesce the page mappings back into block mappings after
dirty-logging is stopped.
So in summary, as a performance tester, this test can present the performance
of kvm creating/updating normal page mappings, or the performance of kvm
creating/splitting/recovering block mappings, through execution time.
When we need to coalesce the page mappings back to block mappings after dirty
logging is stopped, we have to firstly invalidate *all* the TLB entries for the
page mappings right before installation of the block entry, because a TLB conflict
abort error could occur if we can't invalidate the TLB entries fully. We have
hit this TLB conflict twice on aarch64 software implementation and fixed it.
As this test can imulate process from dirty-logging enabled to dirty-logging
stopped of a VM with block mappings, so it can also reproduce this TLB conflict
abort due to inadequate TLB invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Here are some test examples of this test:
platform: HiSilicon Kunpeng920 (aarch64, FWB not supported)
host kernel: Linux mainline
(1) Based on v5.11-rc6
cmdline: ./kvm_page_table_test -m 4 -t 0 -g 4K -s 1G -v 1
(1 vcpu, 1G memory, page mappings(granule 4K))
KVM_CREATE_MAPPINGS: 0.8196s 0.8260s 0.8258s 0.8169s 0.8190s
KVM_UPDATE_MAPPINGS: 1.1930s 1.1949s 1.1940s 1.1934s 1.1946s
cmdline: ./kvm_page_table_test -m 4 -t 0 -g 4K -s 1G -v 20
(20 vcpus, 1G memory, page mappings(granule 4K))
KVM_CREATE_MAPPINGS: 23.4028s 23.8015s 23.6702s 23.9437s 22.1646s
KVM_UPDATE_MAPPINGS: 16.9550s 16.4734s 16.8300s 16.9621s 16.9402s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 1
(1 vcpu, 20G memory, block mappings(granule 1G))
KVM_CREATE_MAPPINGS: 3.7040s 3.7053s 3.7047s 3.7061s 3.7068s
KVM_ADJUST_MAPPINGS: 2.8264s 2.8266s 2.8272s 2.8259s 2.8283s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
(20 vcpus, 20G memory, block mappings(granule 1G))
KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
(2) I have post a patch series to improve efficiency of stage2 page table code,
so test the performance changes.
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
(20 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
After patch: KVM_CREATE_MAPPINGS: 3.7022s 3.7031s 3.7028s 3.7012s 3.7024s
Before patch: KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
After patch: KVM_ADJUST_MAPPINGS: 0.3008s 0.3004s 0.2974s 0.2917s 0.2900s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
(40 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 104.560s 104.556s 104.554s 104.556s 104.550s
After patch: KVM_CREATE_MAPPINGS: 3.7011s 3.7103s 3.7005s 3.7024s 3.7106s
Before patch: KVM_ADJUST_MAPPINGS: 103.931s 103.936s 103.927s 103.942s 103.927s
After patch: KVM_ADJUST_MAPPINGS: 0.3541s 0.3694s 0.3656s 0.3693s 0.3687s
---
Yanan Wang (2):
KVM: selftests: Add a macro to get string of vm_mem_backing_src_type
KVM: selftests: Add a test for kvm page table code
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_page_table_test.c | 518 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 8 +
4 files changed, 532 insertions(+)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.23.0
Only older versions of the RISC-V GCC toolchain define __riscv__. Check
for __riscv as well, which is used by newer GCC toolchains. Also set
VDSO_32BIT based on __riscv_xlen.
Before (on riscv64):
$ ./vdso_test_abi
[vDSO kselftest] VDSO_VERSION: LINUX_4
Could not find __vdso_gettimeofday
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_REALTIME [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_BOOTTIME [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_TAI [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_REALTIME_COARSE [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_MONOTONIC [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_MONOTONIC_RAW [PASS]
Could not find __vdso_clock_gettime
Could not find __vdso_clock_getres
clock_id: CLOCK_MONOTONIC_COARSE [PASS]
Could not find __vdso_time
After (on riscv32):
$ ./vdso_test_abi
[vDSO kselftest] VDSO_VERSION: LINUX_4.15
The time is 1612449376.015086
The time is 1612449376.18340784
The resolution is 0 1
clock_id: CLOCK_REALTIME [PASS]
The time is 774.842586182
The resolution is 0 1
clock_id: CLOCK_BOOTTIME [PASS]
The time is 1612449376.22536565
The resolution is 0 1
clock_id: CLOCK_TAI [PASS]
The time is 1612449376.20885172
The resolution is 0 4000000
clock_id: CLOCK_REALTIME_COARSE [PASS]
The time is 774.845491269
The resolution is 0 1
clock_id: CLOCK_MONOTONIC [PASS]
The time is 774.849534200
The resolution is 0 1
clock_id: CLOCK_MONOTONIC_RAW [PASS]
The time is 774.842139684
The resolution is 0 4000000
clock_id: CLOCK_MONOTONIC_COARSE [PASS]
Could not find __vdso_time
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/vDSO/vdso_config.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vDSO/vdso_config.h b/tools/testing/selftests/vDSO/vdso_config.h
index 6a6fe8d4ff55..6188b16827d1 100644
--- a/tools/testing/selftests/vDSO/vdso_config.h
+++ b/tools/testing/selftests/vDSO/vdso_config.h
@@ -47,10 +47,12 @@
#elif defined(__x86_64__)
#define VDSO_VERSION 0
#define VDSO_NAMES 1
-#elif defined(__riscv__)
+#elif defined(__riscv__) || defined(__riscv)
#define VDSO_VERSION 5
#define VDSO_NAMES 1
+#if __riscv_xlen == 32
#define VDSO_32BIT 1
+#endif
#else /* nds32 */
#define VDSO_VERSION 4
#define VDSO_NAMES 1
--
2.30.0
This test expects fds to have specific values, which works fine
when the test is run standalone. However, the kselftest runner
consumes a couple of extra fds for redirection when running
tests, so the test fails when run via kselftest.
Change the test to pass on any valid fd number.
Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 26c72f2b61b1..9338df6f4ca8 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -4019,18 +4019,14 @@ TEST(user_notification_addfd)
/* Verify we can set an arbitrary remote fd */
fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD, &addfd);
- /*
- * The child has fds 0(stdin), 1(stdout), 2(stderr), 3(memfd),
- * 4(listener), so the newly allocated fd should be 5.
- */
- EXPECT_EQ(fd, 5);
+ EXPECT_GE(fd, 0);
EXPECT_EQ(filecmp(getpid(), pid, memfd, fd), 0);
/* Verify we can set an arbitrary remote fd with large size */
memset(&big, 0x0, sizeof(big));
big.addfd = addfd;
fd = ioctl(listener, SECCOMP_IOCTL_NOTIF_ADDFD_BIG, &big);
- EXPECT_EQ(fd, 6);
+ EXPECT_GE(fd, 0);
/* Verify we can set a specific remote fd */
addfd.newfd = 42;
--
2.29.2
v1 by Uriel is here: [1].
Since it's been a while, I've dropped the Reviewed-By's.
It depended on commit 83c4e7a0363b ("KUnit: KASAN Integration") which
hadn't been merged yet, so that caused some kerfuffle with applying them
previously and the series was reverted.
This revives the series but makes the kunit_fail_current_test() function
take a format string and logs the file and line number of the failing
code, addressing Alan Maguire's comments on the previous version.
As a result, the patch that makes UBSAN errors was tweaked slightly to
include an error message.
[1] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguaja…
Uriel Guajardo (2):
kunit: support failure from dynamic analysis tools
kunit: ubsan integration
include/kunit/test-bug.h | 30 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 36 ++++++++++++++++++++++++++++++++----
lib/ubsan.c | 3 +++
3 files changed, 65 insertions(+), 4 deletions(-)
create mode 100644 include/kunit/test-bug.h
base-commit: 1e0d27fce010b0a4a9e595506b6ede75934c31be
--
2.30.0.478.g8a0d178c01-goog
Sincerely hope I did this right; first time contributor, learning the
patch workflow. I took the opportunity to fix two .gitignore files that
were leaving stale worktree/index outputs after running `make kselftest`
against recent mainline (76c057c84d286140c6c416c3b4ba832cd1d8984e).
Thanks for your time!