The patch itself is straightforward thanks to the infrastructure that is
already in-place.
The tests follows the other '*_map_batch_ops' tests with minor tweaks.
Pedro Tammela (2):
bpf: add support for batched operations in LPM trie maps
bpf: selftests: add tests for batched ops in LPM trie maps
kernel/bpf/lpm_trie.c | 3 +
.../map_tests/lpm_trie_map_batch_ops.c (new) | 158 ++++++++++++++++++
2 files changed, 161 insertions(+)
create mode 100644 tools/testing/selftests/bpf/map_tests/lpm_trie_map_batch_ops.c
--
2.25.1
s/verfied/verified/
Signed-off-by: Bhaskar Chowdhury <unixbhaskar(a)gmail.com>
---
tools/testing/selftests/net/forwarding/fib_offload_lib.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/fib_offload_lib.sh b/tools/testing/selftests/net/forwarding/fib_offload_lib.sh
index 66496659bea7..e134a5f529c9 100644
--- a/tools/testing/selftests/net/forwarding/fib_offload_lib.sh
+++ b/tools/testing/selftests/net/forwarding/fib_offload_lib.sh
@@ -224,7 +224,7 @@ fib_ipv4_plen_test()
ip -n $ns link set dev dummy1 up
# Add two routes with the same key and different prefix length and
- # make sure both are in hardware. It can be verfied that both are
+ # make sure both are in hardware. It can be verified that both are
# sharing the same leaf by checking the /proc/net/fib_trie
ip -n $ns route add 192.0.2.0/24 dev dummy1
ip -n $ns route add 192.0.2.0/25 dev dummy1
--
2.26.2
On Tue, Mar 16, 2021 at 09:42:50PM +0100, Mickaël Salaün wrote:
> From: Mickaël Salaün <mic(a)linux.microsoft.com>
>
> Test all Landlock system calls, ptrace hooks semantic and filesystem
> access-control with multiple layouts.
>
> Test coverage for security/landlock/ is 93.6% of lines. The code not
> covered only deals with internal kernel errors (e.g. memory allocation)
> and race conditions.
>
> Cc: James Morris <jmorris(a)namei.org>
> Cc: Jann Horn <jannh(a)google.com>
> Cc: Kees Cook <keescook(a)chromium.org>
> Cc: Serge E. Hallyn <serge(a)hallyn.com>
> Cc: Shuah Khan <shuah(a)kernel.org>
> Signed-off-by: Mickaël Salaün <mic(a)linux.microsoft.com>
> Reviewed-by: Vincent Dagonneau <vincent.dagonneau(a)ssi.gouv.fr>
> Link: https://lore.kernel.org/r/20210316204252.427806-11-mic@digikod.net
This is terrific. I love the coverage. How did you measure this, BTW?
To increase it into memory allocation failures, have you tried
allocation fault injection:
https://www.kernel.org/doc/html/latest/fault-injection/fault-injection.html
> [...]
> +TEST(inconsistent_attr) {
> + const long page_size = sysconf(_SC_PAGESIZE);
> + char *const buf = malloc(page_size + 1);
> + struct landlock_ruleset_attr *const ruleset_attr = (void *)buf;
> +
> + ASSERT_NE(NULL, buf);
> +
> + /* Checks copy_from_user(). */
> + ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 0, 0));
> + /* The size if less than sizeof(struct landlock_attr_enforce). */
> + ASSERT_EQ(EINVAL, errno);
> + ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 1, 0));
> + ASSERT_EQ(EINVAL, errno);
Almost everywhere you're using ASSERT instead of EXPECT. Is this correct
(in the sense than as soon as an ASSERT fails the rest of the test is
skipped)? I do see you using EXPECT is some places, but I figured I'd
ask about the intention here.
> +/*
> + * TEST_F_FORK() is useful when a test drop privileges but the corresponding
> + * FIXTURE_TEARDOWN() requires them (e.g. to remove files from a directory
> + * where write actions are denied). For convenience, FIXTURE_TEARDOWN() is
> + * also called when the test failed, but not when FIXTURE_SETUP() failed. For
> + * this to be possible, we must not call abort() but instead exit smoothly
> + * (hence the step print).
> + */
Hm, interesting. I think this should be extracted into a separate patch
and added to the test harness proper.
Could this be solved with TEARDOWN being called on SETUP failure?
> +#define TEST_F_FORK(fixture_name, test_name) \
> + static void fixture_name##_##test_name##_child( \
> + struct __test_metadata *_metadata, \
> + FIXTURE_DATA(fixture_name) *self, \
> + const FIXTURE_VARIANT(fixture_name) *variant); \
> + TEST_F(fixture_name, test_name) \
> + { \
> + int status; \
> + const pid_t child = fork(); \
> + if (child < 0) \
> + abort(); \
> + if (child == 0) { \
> + _metadata->no_print = 1; \
> + fixture_name##_##test_name##_child(_metadata, self, variant); \
> + if (_metadata->skip) \
> + _exit(255); \
> + if (_metadata->passed) \
> + _exit(0); \
> + _exit(_metadata->step); \
> + } \
> + if (child != waitpid(child, &status, 0)) \
> + abort(); \
> + if (WIFSIGNALED(status) || !WIFEXITED(status)) { \
> + _metadata->passed = 0; \
> + _metadata->step = 1; \
> + return; \
> + } \
> + switch (WEXITSTATUS(status)) { \
> + case 0: \
> + _metadata->passed = 1; \
> + break; \
> + case 255: \
> + _metadata->passed = 1; \
> + _metadata->skip = 1; \
> + break; \
> + default: \
> + _metadata->passed = 0; \
> + _metadata->step = WEXITSTATUS(status); \
> + break; \
> + } \
> + } \
This looks like a subset of __wait_for_test()? Could __TEST_F_IMPL() be
updated instead to do this? (Though the fork overhead might not be great
for everyone.)
--
Kees Cook
From: Dave Hansen <dave.hansen(a)linux.intel.com>
The SGX device file (/dev/sgx_enclave) is unusual in that it requires
execute permissions. It has to be both "chmod +x" *and* be on a
filesystem without 'noexec'.
In the future, udev and systemd should get updates to set up systems
automatically. But, for now, nobody's systems do this automatically,
and everybody gets error messages like this when running ./test_sgx:
0x0000000000000000 0x0000000000002000 0x03
0x0000000000002000 0x0000000000001000 0x05
0x0000000000003000 0x0000000000003000 0x03
mmap() failed, errno=1.
That isn't very user friendly, even for forgetful kernel developers.
Further, the test case is rather haphazard about its use of fprintf()
versus perror().
Improve the error messages. Use perror() where possible. Lastly,
do some sanity checks on opening and mmap()ing the device file so
that we can get a decent error message out to the user.
Now, if your user doesn't have permission, you'll get the following:
$ ls -l /dev/sgx_enclave
crw------- 1 root root 10, 126 Mar 18 11:29 /dev/sgx_enclave
$ ./test_sgx
Unable to open /dev/sgx_enclave: Permission denied
If you then 'chown dave:dave /dev/sgx_enclave' (or whatever), but
you leave execute permissions off, you'll get:
$ ls -l /dev/sgx_enclave
crw------- 1 dave dave 10, 126 Mar 18 11:29 /dev/sgx_enclave
$ ./test_sgx
no execute permissions on device file
If you fix that with "chmod ug+x /dev/sgx" but you leave /dev as
noexec, you'll get this:
$ mount | grep "/dev .*noexec"
udev on /dev type devtmpfs (rw,nosuid,noexec,...)
$ ./test_sgx
ERROR: mmap for exec: Operation not permitted
mmap() succeeded for PROT_READ, but failed for PROT_EXEC
check that user has execute permissions on /dev/sgx_enclave and
that /dev does not have noexec set: 'mount | grep "/dev .*noexec"'
That can be fixed with:
mount -o remount,noexec /devESC
Hopefully, the combination of better error messages and the search
engines indexing this message will help people fix their systems
until we do this properly.
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Jarkko Sakkinen <jarkko(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: x86(a)kernel.org
Cc: linux-sgx(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
---
b/tools/testing/selftests/sgx/load.c | 66 +++++++++++++++++++++++++++--------
b/tools/testing/selftests/sgx/main.c | 2 -
2 files changed, 53 insertions(+), 15 deletions(-)
diff -puN tools/testing/selftests/sgx/load.c~sgx-selftest-err-rework tools/testing/selftests/sgx/load.c
--- a/tools/testing/selftests/sgx/load.c~sgx-selftest-err-rework 2021-03-18 12:18:38.649828215 -0700
+++ b/tools/testing/selftests/sgx/load.c 2021-03-18 12:40:46.388824904 -0700
@@ -45,19 +45,19 @@ static bool encl_map_bin(const char *pat
fd = open(path, O_RDONLY);
if (fd == -1) {
- perror("open()");
+ perror("enclave executable open()");
return false;
}
ret = stat(path, &sb);
if (ret) {
- perror("stat()");
+ perror("enclave executable stat()");
goto err;
}
bin = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (bin == MAP_FAILED) {
- perror("mmap()");
+ perror("enclave executable mmap()");
goto err;
}
@@ -90,8 +90,7 @@ static bool encl_ioc_create(struct encl
ioc.src = (unsigned long)secs;
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_CREATE, &ioc);
if (rc) {
- fprintf(stderr, "SGX_IOC_ENCLAVE_CREATE failed: errno=%d\n",
- errno);
+ perror("SGX_IOC_ENCLAVE_CREATE failed");
munmap((void *)secs->base, encl->encl_size);
return false;
}
@@ -116,31 +115,69 @@ static bool encl_ioc_add_pages(struct en
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_ADD_PAGES, &ioc);
if (rc < 0) {
- fprintf(stderr, "SGX_IOC_ENCLAVE_ADD_PAGES failed: errno=%d.\n",
- errno);
+ perror("SGX_IOC_ENCLAVE_ADD_PAGES failed");
return false;
}
return true;
}
+
+
bool encl_load(const char *path, struct encl *encl)
{
+ const char device_path[] = "/dev/sgx_enclave";
Elf64_Phdr *phdr_tbl;
off_t src_offset;
Elf64_Ehdr *ehdr;
+ struct stat sb;
+ void *ptr;
int i, j;
int ret;
+ int fd = -1;
memset(encl, 0, sizeof(*encl));
- ret = open("/dev/sgx_enclave", O_RDWR);
- if (ret < 0) {
- fprintf(stderr, "Unable to open /dev/sgx_enclave\n");
+ fd = open(device_path, O_RDWR);
+ if (fd < 0) {
+ perror("Unable to open /dev/sgx_enclave");
+ goto err;
+ }
+
+ ret = stat(device_path, &sb);
+ if (ret) {
+ perror("device file stat()");
+ goto err;
+ }
+
+ /*
+ * This just checks if the /dev file has these permission
+ * bits set. It does not check that the current user is
+ * the owner or in the owning group.
+ */
+ if (!(sb.st_mode & (S_IXUSR | S_IXGRP | S_IXOTH))) {
+ fprintf(stderr, "no execute permissions on device file\n");
+ goto err;
+ }
+
+ ptr = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, fd, 0);
+ if (ptr == (void *)-1) {
+ perror("mmap for read");
+ goto err;
+ }
+ munmap(ptr, PAGE_SIZE);
+
+ ptr = mmap(NULL, PAGE_SIZE, PROT_EXEC, MAP_SHARED, fd, 0);
+ if (ptr == (void *)-1) {
+ perror("ERROR: mmap for exec");
+ fprintf(stderr, "mmap() succeeded for PROT_READ, but failed for PROT_EXEC\n");
+ fprintf(stderr, "check that user has execute permissions on %s and\n", device_path);
+ fprintf(stderr, "that /dev does not have noexec set: 'mount | grep \"/dev .*noexec\"'\n");
goto err;
}
+ munmap(ptr, PAGE_SIZE);
- encl->fd = ret;
+ encl->fd = fd;
if (!encl_map_bin(path, encl))
goto err;
@@ -217,6 +254,8 @@ bool encl_load(const char *path, struct
return true;
err:
+ if (fd != -1)
+ close(fd);
encl_delete(encl);
return false;
}
@@ -229,7 +268,7 @@ static bool encl_map_area(struct encl *e
area = mmap(NULL, encl_size * 2, PROT_NONE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (area == MAP_FAILED) {
- perror("mmap");
+ perror("reservation mmap()");
return false;
}
@@ -268,8 +307,7 @@ bool encl_build(struct encl *encl)
ioc.sigstruct = (uint64_t)&encl->sigstruct;
ret = ioctl(encl->fd, SGX_IOC_ENCLAVE_INIT, &ioc);
if (ret) {
- fprintf(stderr, "SGX_IOC_ENCLAVE_INIT failed: errno=%d\n",
- errno);
+ perror("SGX_IOC_ENCLAVE_INIT failed");
return false;
}
diff -puN tools/testing/selftests/sgx/main.c~sgx-selftest-err-rework tools/testing/selftests/sgx/main.c
--- a/tools/testing/selftests/sgx/main.c~sgx-selftest-err-rework 2021-03-18 12:18:38.652828215 -0700
+++ b/tools/testing/selftests/sgx/main.c 2021-03-18 12:18:38.657828215 -0700
@@ -195,7 +195,7 @@ int main(int argc, char *argv[], char *e
addr = mmap((void *)encl.encl_base + seg->offset, seg->size,
seg->prot, MAP_SHARED | MAP_FIXED, encl.fd, 0);
if (addr == MAP_FAILED) {
- fprintf(stderr, "mmap() failed, errno=%d.\n", errno);
+ perror("mmap() segment failed");
exit(KSFT_FAIL);
}
}
_
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 07e644885bf6727a48db109fad053cb43f3c9859 ]
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210309190304.39169-1-broonie@kernel.org
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index b2282be6f938..612d3899614a 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -332,5 +332,5 @@ int main(void)
ksft_print_cnts();
- return 0;
+ return ret;
}
--
2.30.1
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 07e644885bf6727a48db109fad053cb43f3c9859 ]
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Link: https://lore.kernel.org/r/20210309190304.39169-1-broonie@kernel.org
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index b2282be6f938..612d3899614a 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -332,5 +332,5 @@ int main(void)
ksft_print_cnts();
- return 0;
+ return ret;
}
--
2.30.1
This patchset extends IOCTL interface to retrieve KVM statistics data
in aggregated binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for userspace telemetry applications to pull the
statistics data periodically for large scale systems.
The capability is indicated by KVM_CAP_STATS_BINARY_FORM.
Ioctl KVM_STATS_GET_INFO is used to get the information about VM or
vCPU statistics data (The number of supported statistics data which is
used for buffer allocation).
Ioctl KVM_STATS_GET_NAMES is used to get the list of name strings of
all supported statistics data.
Ioctl KVM_STATS_GET_DATA is used to get the aggregated statistics data
per VM or vCPU in the same order as the list of name strings. This is
the ioctl which would be called periodically to retrieve statistics
data per VM or vCPU.
Jing Zhang (4):
KVM: stats: Separate statistics name strings from debugfs code
KVM: stats: Define APIs for aggregated stats retrieval in binary
format
KVM: stats: Add ioctl commands to pull statistics in binary format
KVM: selftests: Add selftest for KVM binary form statistics interface
Documentation/virt/kvm/api.rst | 79 +++++
arch/arm64/kvm/guest.c | 47 ++-
arch/mips/kvm/mips.c | 114 +++++--
arch/powerpc/kvm/book3s.c | 107 ++++--
arch/powerpc/kvm/booke.c | 84 +++--
arch/s390/kvm/kvm-s390.c | 320 ++++++++++++------
arch/x86/kvm/x86.c | 127 ++++---
include/linux/kvm_host.h | 30 +-
include/uapi/linux/kvm.h | 60 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 89 +++++
virt/kvm/kvm_main.c | 115 +++++++
13 files changed, 935 insertions(+), 241 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: 357ad203d45c0f9d76a8feadbd5a1c5d460c638b
--
2.30.1.766.gb4fecdf3b7-goog
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v5 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v4" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Version 4
https://lore.kernel.org/lkml/20210227150956.6022-1-john.wood@gmx.com/
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
We track if sve-ptrace encountered a failure in a variable but don't
actually use that value when we exit the program, do so.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-ptrace.c b/tools/testing/selftests/arm64/fp/sve-ptrace.c
index b2282be6f938..612d3899614a 100644
--- a/tools/testing/selftests/arm64/fp/sve-ptrace.c
+++ b/tools/testing/selftests/arm64/fp/sve-ptrace.c
@@ -332,5 +332,5 @@ int main(void)
ksft_print_cnts();
- return 0;
+ return ret;
}
--
2.20.1
Base
====
This series is based on top of my series which adds minor fault handling for
hugetlbfs [1]. (And, therefore, it is based on 5.12-rc1 and Peter Xu's series
for disabling huge pmd sharing as well.)
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Changelog
=========
v1->v2:
- For UFFDIO_CONTINUE, don't mess with page flags. Just use find_lock_page to
get a locked page from the page cache, instead of doing __SetPageLocked.
This fixes a VM_BUG_ON v1 hit when handling minor faults for THP-backed
shmem (a tmpfs mounted with huge=always).
Overview
========
See my original series linked above for a detailed overview of minor fault
handling in general. The feature in this series works exactly like the
hugetblfs version (from userspace's perspective).
I'm sending this as a separate series because:
- The original minor fault handling series has a full set of R-Bs, and seems
close to being merged. So, it seems reasonable to start looking at this next
step, which extends the basic functionality.
- shmem is different enough that this series may require some additional work
before it's ready, and I don't want to delay the original series
unnecessarily by bundling them together.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
Axel Rasmussen (5):
userfaultfd: support minor fault handling for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
fs/userfaultfd.c | 6 +-
include/linux/shmem_fs.h | 26 +-
include/uapi/linux/userfaultfd.h | 4 +-
mm/memory.c | 8 +-
mm/shmem.c | 92 +++----
mm/userfaultfd.c | 27 +-
tools/testing/selftests/vm/userfaultfd.c | 322 +++++++++++++++--------
7 files changed, 295 insertions(+), 190 deletions(-)
--
2.30.1.766.gb4fecdf3b7-goog
Hi,
This patch series introduces the futex2 syscalls.
* What happened to the current futex()?
For some years now, developers have been trying to add new features to
futex, but maintainers have been reluctant to accept then, given the
multiplexed interface full of legacy features and tricky to do big
changes. Some problems that people tried to address with patchsets are:
NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
NUMA, for instance, just doesn't fit the current API in a reasonable
way. Considering that, it's not possible to merge new features into the
current futex.
** The NUMA problem
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** The 32bit sized futex problem
Embedded systems or anything with memory constrains would benefit of
using smaller sizes for the futex userspace integer. Also, a mutex
implementation can be done using just three values, so 8 bits is enough
for various scenarios.
** The wait on multiple problem
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running over Wine.
[0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
[1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
[2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.…
* The solution
As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
is required to solve this, which must be designed with those features in
mind. futex2() is that interface. As opposed to the current multiplexed
interface, the new one should have one syscall per operation. This will
allow the maintainability of the API if it gets extended, and will help
users with type checking of arguments.
In particular, the new interface is extended to support the ability to
wait on any of a list of futexes at a time, which could be seen as a
vectored extension of the FUTEX_WAIT semantics.
[3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-…
* The interface
The new interface can be seen in details in the following patches, but
this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- Supports variable sized futexes (8bits, 16bits and 32bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* Implementation
The internal implementation follows a similar design to the original futex.
Given that we want to replicate the same external behavior of current
futex, this should be somewhat expected. For some functions, like the
init and the code to get a shared key, I literally copied code and
comments from kernel/futex.c. I decided to do so instead of exposing the
original function as a public function since in that way we can freely
modify our implementation if required, without any impact on old futex.
Also, the comments precisely describes the details and corner cases of
the implementation.
Each patch contains a brief description of implementation, but patch 6
"docs: locking: futex2: Add documentation" adds a more complete document
about it.
* The patchset
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
- Patch 1: Implements wait/wake, and the basics foundations of futex2
- Patches 2-4: Implement the remaining features (shared, waitv, requeue).
- Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated
patch since I'm not sure if x86_x32 is still a thing, or if it should
return -ENOSYS.
- Patch 6: Add a documentation file which details the interface and
the internal implementation.
- Patches 7-13: Selftests for all operations along with perf
support for futex2.
- Patch 14: While working on porting glibc for futex2, I found out
that there's a futex_wake() call at the user thread exit path, if
that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
order to make pthreads work with futex2, it was required to add
this patch. Note that this is more a proof-of-concept of what we
will need to do in future, rather than part of the interface and
shouldn't be merged as it is.
* Testing:
This patchset provides selftests for each operation and their flags.
Along with that, the following work was done:
** Stability
To stress the interface in "real world scenarios":
- glibc[4]: nptl's low level locking was modified to use futex2 API
(except for robust and PI things). All relevant nptl/ tests passed.
- Wine[5]: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- Full GNU/Linux distro: I installed the modified glibc in my host
machine, so all pthread's programs would use futex2(). After tweaking
systemd[6] to allow futex2() calls at seccomp, everything worked as
expected (web browsers do some syscall sandboxing and need some
configuration as well).
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
** Performance
- For comparing futex() and futex2() performance, I used the artificial
benchmarks implemented at perf (wake, wake-parallel, hash and
requeue). The setup was 200 runs for each test and using 8, 80, 800,
8000 for the number of threads, Note that for this test, I'm not using
patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
at "The patchset" section.
- For the first three ones, I measured an average of 4% gain in
performance. This is not a big step, but it shows that the new
interface is at least comparable in performance with the current one.
- For requeue, I measured an average of 21% decrease in performance
compared to the original futex implementation. This is expected given
the new design with individual flags. The performance trade-offs are
explained at patch 4 ("futex2: Implement requeue operation").
[4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
[5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
[6] https://gitlab.collabora.com/tonyk/systemd
* FAQ
** "Where's the code for NUMA and FUTEX_8/16?"
The current code is already complex enough to take some time for
review, so I believe it's better to split that work out to a future
iteration of this patchset. Besides that, this RFC is the core part of the
infrastructure, and the following features will not pose big design
changes to it, the work will be more about wiring up the flags and
modifying some functions.
** "And what's about FUTEX_64?"
By supporting 64 bit futexes, the kernel structure for futex would
need to have a 64 bit field for the value, and that could defeat one of
the purposes of having different sized futexes in the first place:
supporting smaller ones to decrease memory usage. This might be
something that could be disabled for 32bit archs (and even for
CONFIG_BASE_SMALL).
Which use case would benefit for FUTEX_64? Does it worth the trade-offs?
** "Where's the PI/robust stuff?"
As said by Peter Zijlstra at [3], all those new features are related to
the "simple" futex interface, that doesn't use PI or robust. Do we want
to have this complexity at futex2() and if so, should it be part of
this patchset or can it be future work?
Thanks,
André
* Changelog
Changes from v1:
- Unified futex_set_timer_and_wait and __futex_wait code
- Dropped _carefull from linked list function calls
- Fixed typos on docs patch
- uAPI flags are now added as features are introduced, instead of all flags
in patch 1
- Removed struct futex_single_waiter in favor of an anon struct
v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.…
André Almeida (13):
futex2: Implement wait and wake functions
futex2: Add support for shared futexes
futex2: Implement vectorized wait
futex2: Implement requeue operation
futex2: Add compatibility entry point for x86_x32 ABI
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 198 +++
Documentation/locking/index.rst | 1 +
MAINTAINERS | 2 +-
arch/arm/tools/syscall.tbl | 4 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 8 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
fs/inode.c | 1 +
include/linux/compat.h | 23 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 18 +
include/uapi/asm-generic/unistd.h | 14 +-
include/uapi/linux/futex.h | 31 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex2.c | 1239 +++++++++++++++++
kernel/sys_ni.c | 6 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/include/uapi/asm-generic/unistd.h | 11 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 4 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 +
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 8 +-
.../futex/functional/futex2_requeue.c | 164 +++
.../selftests/futex/functional/futex2_wait.c | 209 +++
.../selftests/futex/functional/futex2_waitv.c | 157 +++
.../futex/functional/futex_wait_timeout.c | 58 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 121 ++
38 files changed, 2527 insertions(+), 53 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.30.1
From: Aaron Lewis <aaronlewis(a)google.com>
[ Upstream commit 6528fc0a11de3d16339cf17639e2f69a68fcaf4d ]
The vcpu mmap area may consist of more than just the kvm_run struct.
Allocate enough space for the entire vcpu mmap area. Without this, on
x86, the PIO page, for example, will be missing. This is problematic
when dealing with an unhandled exception from the guest as the exception
vector will be incorrectly reported as 0x0.
Message-Id: <20210210165035.3712489-1-aaronlewis(a)google.com>
Reviewed-by: Andrew Jones <drjones(a)redhat.com>
Co-developed-by: Steve Rutherford <srutherford(a)google.com>
Signed-off-by: Aaron Lewis <aaronlewis(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index fa5a90e6c6f0..859a0b57c683 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -21,6 +21,8 @@
#define KVM_UTIL_PGS_PER_HUGEPG 512
#define KVM_UTIL_MIN_PFN 2
+static int vcpu_mmap_sz(void);
+
/* Aligns x up to the next multiple of size. Size must be a power of 2. */
static void *align(void *x, size_t size)
{
@@ -509,7 +511,7 @@ static void vm_vcpu_rm(struct kvm_vm *vm, struct vcpu *vcpu)
vcpu->dirty_gfns = NULL;
}
- ret = munmap(vcpu->state, sizeof(*vcpu->state));
+ ret = munmap(vcpu->state, vcpu_mmap_sz());
TEST_ASSERT(ret == 0, "munmap of VCPU fd failed, rc: %i "
"errno: %i", ret, errno);
close(vcpu->fd);
@@ -978,7 +980,7 @@ void vm_vcpu_add(struct kvm_vm *vm, uint32_t vcpuid)
TEST_ASSERT(vcpu_mmap_sz() >= sizeof(*vcpu->state), "vcpu mmap size "
"smaller than expected, vcpu_mmap_sz: %i expected_min: %zi",
vcpu_mmap_sz(), sizeof(*vcpu->state));
- vcpu->state = (struct kvm_run *) mmap(NULL, sizeof(*vcpu->state),
+ vcpu->state = (struct kvm_run *) mmap(NULL, vcpu_mmap_sz(),
PROT_READ | PROT_WRITE, MAP_SHARED, vcpu->fd, 0);
TEST_ASSERT(vcpu->state != MAP_FAILED, "mmap vcpu_state failed, "
"vcpu id: %u errno: %i", vcpuid, errno);
--
2.30.1
Fix the following coccicheck warnings:
./tools/testing/selftests/bpf/test_sockmap.c:735:35-37: WARNING !A || A
&& B is equivalent to !A || B.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
tools/testing/selftests/bpf/test_sockmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/test_sockmap.c b/tools/testing/selftests/bpf/test_sockmap.c
index 427ca00..eefd445 100644
--- a/tools/testing/selftests/bpf/test_sockmap.c
+++ b/tools/testing/selftests/bpf/test_sockmap.c
@@ -732,7 +732,7 @@ static int sendmsg_test(struct sockmap_options *opt)
* socket is not a valid test. So in this case lets not
* enable kTLS but still run the test.
*/
- if (!txmsg_redir || (txmsg_redir && txmsg_ingress)) {
+ if (!txmsg_redir || txmsg_ingress) {
err = sockmap_init_ktls(opt->verbose, rx_fd);
if (err)
return err;
--
1.8.3.1
On Tue, Mar 02, 2021 at 01:49:41PM +0800, kernel test robot wrote:
>
>
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-9):
>
> commit: cfe92ab6a3ea700c08ba673b46822d51f38d6b40 ("[PATCH v5 2/8] security/brute: Define a LSM and manage statistical data")
> url: https://github.com/0day-ci/linux/commits/John-Wood/Fork-brute-force-attack-…
> base: https://git.kernel.org/cgit/linux/kernel/git/shuah/linux-kselftest.git next
>
> in testcase: trinity
> version: trinity-static-i386-x86_64-f93256fb_2019-08-28
> with following parameters:
>
> group: ["group-00", "group-01", "group-02", "group-03", "group-04"]
>
> test-description: Trinity is a linux system call fuzz tester.
> test-url: http://codemonkey.org.uk/projects/trinity/
>
>
> on test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 8G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +-------------------------------------------------------------------------+------------+------------+
> | | 1d53b7aac6 | cfe92ab6a3 |
> +-------------------------------------------------------------------------+------------+------------+
> | WARNING:inconsistent_lock_state | 0 | 6 |
> | inconsistent{IN-SOFTIRQ-W}->{SOFTIRQ-ON-W}usage | 0 | 6 |
> +-------------------------------------------------------------------------+------------+------------+
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <oliver.sang(a)intel.com>
>
>
> [ 116.852721] ================================
> [ 116.853120] WARNING: inconsistent lock state
> [ 116.853120] 5.11.0-rc7-00013-gcfe92ab6a3ea #1 Tainted: G S
> [ 116.853120] --------------------------------
>
> [...]
Thanks for the report. I will work on this for the next version.
> Thanks,
> Oliver Sang
Thanks,
John Wood
Hi,
This v3 series can mainly include two parts.
Based on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
Links of v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
Links of v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
---
Yanan Wang (7):
tools headers: sync headers of asm-generic/hugetlb_encode.h
tools headers: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 476 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++-
tools/testing/selftests/kvm/lib/test_util.c | 92 +++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
12 files changed, 628 insertions(+), 60 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.19.1
From: John Stultz <john.stultz(a)linaro.org>
[ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ]
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd20829..604b43ece15f5 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.27.0
From: John Stultz <john.stultz(a)linaro.org>
[ Upstream commit 64ba3d591c9d2be2a9c09e99b00732afe002ad0d ]
Copied in from somewhere else, the makefile was including
the kerne's usr/include dir, which caused the asm/ioctl.h file
to be used.
Unfortunately, that file has different values for _IOC_SIZEBITS
and _IOC_WRITE than include/uapi/asm-generic/ioctl.h which then
causes the _IOCW macros to give the wrong ioctl numbers,
specifically for DMA_BUF_IOCTL_SYNC.
This patch simply removes the extra include from the Makefile
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Brian Starkey <brian.starkey(a)arm.com>
Cc: Sumit Semwal <sumit.semwal(a)linaro.org>
Cc: Laura Abbott <labbott(a)kernel.org>
Cc: Hridya Valsaraju <hridya(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Sandeep Patil <sspatil(a)google.com>
Cc: Daniel Mentz <danielmentz(a)google.com>
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kselftest(a)vger.kernel.org
Fixes: a8779927fd86c ("kselftests: Add dma-heap test")
Signed-off-by: John Stultz <john.stultz(a)linaro.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/dmabuf-heaps/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/Makefile b/tools/testing/selftests/dmabuf-heaps/Makefile
index 607c2acd20829..604b43ece15f5 100644
--- a/tools/testing/selftests/dmabuf-heaps/Makefile
+++ b/tools/testing/selftests/dmabuf-heaps/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS += -static -O3 -Wl,-no-as-needed -Wall -I../../../../usr/include
+CFLAGS += -static -O3 -Wl,-no-as-needed -Wall
TEST_GEN_PROGS = dmabuf-heap
--
2.27.0
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
This v4 version has changed a lot from the v2. Basically the application
crash period is now compute on an on-going basis using an exponential
moving average (EMA), a detection of a brute force attack through the
"execve" system call has been added and the crossing of the commented
privilege bounds are taken into account. Also, the fine tune has also been
removed and now, all this kind of attacks are detected without
administrator intervention.
In the v2 version Kees Cook suggested to study if the statistical data
shared by all the fork hierarchy processes can be tracked in some other
way. Specifically the question was if this info can be hold by the family
hierarchy of the mm struct. After studying this hierarchy I think it is not
suitable for the Brute LSM since they are totally copied on fork() and in
this case we want that they are shared. So I leave this road.
So, knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and manages the statistical data shared by
all the fork hierarchy processes.
The 3/8 patch detects a fork/exec brute force attack.
The 4/8 patch narrows the detection taken into account the privilege
boundary crossing.
The 5/8 patch mitigates a brute force attack.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [1] and can also be accessed from my
github tree [2] in the "brute_v4" branch.
[1] https://github.com/KSPP/linux/issues/39
[2] https://github.com/johwood/linux/
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Any constructive comments are welcome.
Thanks.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and manage statistical data
securtiy/brute: Detect a brute force attack
security/brute: Fine tuning the attack detection
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 224 +++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 4 +
security/brute/Kconfig | 13 +
security/brute/Makefile | 2 +
security/brute/brute.c | 1102 ++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 44 +
tools/testing/selftests/brute/test.c | 507 ++++++++++
tools/testing/selftests/brute/test.sh | 226 +++++
20 files changed, 2160 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1