Some environments (e.g. kerneci.org) do not set $SHELL for their test
environment. There's no need to use $SHELL here anyway, so just replace
it with hard-coded /bin/sh instead. Without this, the LKDTM tests would
never actually run on kerneci.org.
Fixes: 46d1a0f03d66 ("selftests/lkdtm: Add tests for LKDTM targets")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
tools/testing/selftests/lkdtm/run.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/lkdtm/run.sh b/tools/testing/selftests/lkdtm/run.sh
index bb7a1775307b..968ff3cf5667 100755
--- a/tools/testing/selftests/lkdtm/run.sh
+++ b/tools/testing/selftests/lkdtm/run.sh
@@ -79,7 +79,7 @@ dmesg > "$DMESG"
# Most shells yell about signals and we're expecting the "cat" process
# to usually be killed by the kernel. So we have to run it in a sub-shell
# and silence errors.
-($SHELL -c 'cat <(echo '"$test"') >'"$TRIGGER" 2>/dev/null) || true
+(/bin/sh -c 'cat <(echo '"$test"') >'"$TRIGGER" 2>/dev/null) || true
# Record and dump the results
dmesg | comm --nocheck-order -13 "$DMESG" - > "$LOG" || true
--
2.25.1
The bidirectional test attempts to change the cipher to
TLS_CIPHER_AES_GCM_128. The test fixture setup will have already set
the cipher to be tested, and if it was different than the one set by
the bidir test setsockopt() will fail on account of having different
ciphers for rx and tx, causing the test to fail.
Forcing the use of GCM when testing ChaCha doesn't make sense anyway,
so just use the cipher configured by the test fixture setup.
Fixes: 4f336e88a870 ("selftests/tls: add CHACHA20-POLY1305 to tls selftests")
Signed-off-by: Seth Forshee <seth.forshee(a)canonical.com>
---
tools/testing/selftests/net/tls.c | 17 -----------------
1 file changed, 17 deletions(-)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index 426d07875a48..9f4c87f4ce1e 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -831,23 +831,6 @@ TEST_F(tls, bidir)
char const *test_str = "test_read";
int send_len = 10;
char buf[10];
- int ret;
-
- if (!self->notls) {
- struct tls12_crypto_info_aes_gcm_128 tls12;
-
- memset(&tls12, 0, sizeof(tls12));
- tls12.info.version = variant->tls_version;
- tls12.info.cipher_type = TLS_CIPHER_AES_GCM_128;
-
- ret = setsockopt(self->fd, SOL_TLS, TLS_RX, &tls12,
- sizeof(tls12));
- ASSERT_EQ(ret, 0);
-
- ret = setsockopt(self->cfd, SOL_TLS, TLS_TX, &tls12,
- sizeof(tls12));
- ASSERT_EQ(ret, 0);
- }
ASSERT_EQ(strlen(test_str) + 1, send_len);
--
2.31.1
SRv6 End.DT46 Behavior is defined in the IETF RFC 8986 [1] along with SRv6
End.DT4 and End.DT6 Behaviors.
The proposed End.DT46 implementation is meant to support the decapsulation
of both IPv4 and IPv6 traffic coming from a *single* SRv6 tunnel.
The SRv6 End.DT46 Behavior greatly simplifies the setup and operations of
SRv6 VPNs in the Linux kernel.
- patch 1/2 is the core patch that adds support for the SRv6 End.DT46
Behavior;
- patch 2/2 adds the selftest for SRv6 End.DT46 Behavior.
The patch introducing the new SRv6 End.DT46 Behavior in iproute2 will
follow shortly.
Comments, suggestions and improvements are very welcome as always!
Thanks,
Andrea
RFC -> v1
patch 1/2, seg6: add support for SRv6 End.DT46 Behavior
- add Reviewed-by, thanks to David Ahern.
patch 2/2, selftests: seg6: add selftest for SRv6 End.DT46 Behavior
- add Acked-by, thanks to David Ahern.
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
Andrea Mayer (2):
seg6: add support for SRv6 End.DT46 Behavior
selftests: seg6: add selftest for SRv6 End.DT46 Behavior
include/uapi/linux/seg6_local.h | 2 +
net/ipv6/seg6_local.c | 94 ++-
.../selftests/net/srv6_end_dt46_l3vpn_test.sh | 573 ++++++++++++++++++
3 files changed, 647 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
--
2.20.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or generic
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous, peak
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte, Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics data
are still being updated by KVM subsystems while they are read out.
---
* v10 -> v11
- Rebase to kvm/queue, commit f1b832550832
(KVM: x86/mmu: Fix TDP MMU page table level)
- Separate binary stats implementation commit
- Use flexible length array member field in API structure instead of
zero-length array member field
- Move major binary stats reading function in a separate source file
- Move stats id string into vm/vcpu structures
- Add some detailed comments and update commit messages
- Addressed some other review comments from Greg K.H. and Paolo.
* v9 -> v10
- Relocate vcpu stat in vcpu's slab's usercopy region
- Fix test issue for capability checking
- Update commit message to explain why/how we need to add this new
API for KVM statistics
* v8 -> v9
- Rebase to commit 8331a2bc0898
(KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall)
- Reduce code duplication between binary and debugfs interface
- Add field "offset" in stats descriptor to let us define stats
descriptors in any order (not necessary in the order of stats
defined in vm/vcpu stats structures)
- Add static check to make sure the number of stats descriptors
is the same as the number of stats defined in vm/vcpu stats
structures
- Fix missing/mismatched stats descriptor definition caused by
rebase
* v7 -> v8
- Rebase to kvm/queue, commit c1dc20e254b4 ("KVM: switch per-VM
stats to u64")
- Revise code to reflect the per-VM stats type from ulong to u64
- Addressed some other nits
* v6 -> v7
- Improve file descriptor allocation function by Krish suggestion
- Use "generic stats" instead of "common stats" as Krish suggested
- Addressed some other nits from Krish and David Matlack
* v5 -> v6
- Use designated initializers for STATS_DESC
- Change KVM_STATS_SCALE... to KVM_STATS_BASE...
- Use a common function for kvm_[vm|vcpu]_stats_read
- Fix some documentation errors/missings
- Use TEST_ASSERT in selftest
- Use a common function for [vm|vcpu]_stats_test in selftest
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
[6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
[7] https://lore.kernel.org/kvm/20210603211426.790093-1-jingzhangos@google.com
[8] https://lore.kernel.org/kvm/20210611124624.1404010-1-jingzhangos@google.com
[9] https://lore.kernel.org/kvm/20210614212155.1670777-1-jingzhangos@google.com
[10] https://lore.kernel.org/kvm/20210617044146.2667540-1-jingzhangos@google.com
---
Jing Zhang (7):
KVM: stats: Separate generic stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Support binary stats retrieval for a VM
KVM: stats: Support binary stats retrieval for a VCPU
KVM: stats: Add documentation for binary statistics interface
KVM: selftests: Add selftest for KVM statistics data binary interface
KVM: stats: Remove code duplication for binary and debugfs stats
Documentation/virt/kvm/api.rst | 176 +++++++++++++-
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/guest.c | 46 ++--
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/Makefile | 2 +-
arch/mips/kvm/mips.c | 88 ++++---
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/Makefile | 2 +-
arch/powerpc/kvm/book3s.c | 89 ++++---
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 74 ++++--
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/Makefile | 3 +-
arch/s390/kvm/kvm-s390.c | 230 ++++++++++--------
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/Makefile | 2 +-
arch/x86/kvm/x86.c | 107 ++++----
include/linux/kvm_host.h | 182 ++++++++++++--
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 44 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_binary_stats_test.c | 225 +++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/binary_stats.c | 130 ++++++++++
virt/kvm/kvm_main.c | 218 ++++++++++++++---
30 files changed, 1355 insertions(+), 357 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
create mode 100644 virt/kvm/binary_stats.c
base-commit: f1b8325508327a302f1d5cd8a4bf51e2c9c72fa9
--
2.32.0.288.g62a8d224e6-goog
I've observed that the tls multi_chunk_sendfile selftest hangs during
recv() and ultimately times out, and it seems to have done so even when
the test was first introduced. Reading through the commit message when
it was added (0e6fbe39bdf7 "net/tls(TLS_SW): Add selftest for 'chunked'
sendfile test") I get the impression that the test is meant to
demonstrate a problem with ktls, but there's no indication that the
problem has been fixed.
Am I right that the expectation is that this test will fail? If that's
the case, shouldn't this test be removed until the problem is fixed?
Thanks,
Seth
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or generic
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous, peak
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte, Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics data
are still being updated by KVM subsystems while they are read out.
---
* v9 -> v10
- Relocate vcpu stat in vcpu's slab's usercopy region
- Fix test issue for capability checking
- Update commit message to explain why/how we need to add this new
API for KVM statistics
* v8 -> v9
- Rebase to commit 8331a2bc0898
(KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall)
- Reduce code duplication between binary and debugfs interface
- Add field "offset" in stats descriptor to let us define stats
descriptors in any order (not necessary in the order of stats
defined in vm/vcpu stats structures)
- Add static check to make sure the number of stats descriptors
is the same as the number of stats defined in vm/vcpu stats
structures
- Fix missing/mismatched stats descriptor definition caused by
rebase
* v7 -> v8
- Rebase to kvm/queue, commit c1dc20e254b4 ("KVM: switch per-VM
stats to u64")
- Revise code to reflect the per-VM stats type from ulong to u64
- Addressed some other nits
* v6 -> v7
- Improve file descriptor allocation function by Krish suggestion
- Use "generic stats" instead of "common stats" as Krish suggested
- Addressed some other nits from Krish and David Matlack
* v5 -> v6
- Use designated initializers for STATS_DESC
- Change KVM_STATS_SCALE... to KVM_STATS_BASE...
- Use a common function for kvm_[vm|vcpu]_stats_read
- Fix some documentation errors/missings
- Use TEST_ASSERT in selftest
- Use a common function for [vm|vcpu]_stats_test in selftest
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
[6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
[7] https://lore.kernel.org/kvm/20210603211426.790093-1-jingzhangos@google.com
[8] https://lore.kernel.org/kvm/20210611124624.1404010-1-jingzhangos@google.com
[9] https://lore.kernel.org/kvm/20210614212155.1670777-1-jingzhangos@google.com
---
Jing Zhang (5):
KVM: stats: Separate generic stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for binary statistics interface
KVM: selftests: Add selftest for KVM statistics data binary interface
KVM: stats: Remove code duplication for binary and debugfs stats
Documentation/virt/kvm/api.rst | 177 +++++++++++-
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 50 +++-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 92 +++---
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 93 ++++--
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 78 +++--
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 234 ++++++++-------
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 111 ++++---
include/linux/kvm_host.h | 180 +++++++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 48 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_binary_stats_test.c | 225 +++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/kvm_main.c | 270 +++++++++++++++---
24 files changed, 1299 insertions(+), 351 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
base-commit: 8331a2bc089881d7fd2fc9a6658f39780817e4e0
--
2.32.0.272.g935e593368-goog
This patchset makes the following two major changes to the cpuset v2 code:
Patch 2: Add a new partition state "root-nolb" to create a partition
root with load balancing disabled. This is for handling intermitten
workloads that have a strict low latency requirement.
Patch 3: Allow partition roots that are not the top cpuset to distribute
all its cpus to child partitions as long as there is no task associated
with that partition root. This allows more flexibility for middleware
to manage multiple partitions.
Patch 4 updates the cgroup-v2.rst file accordingly. Patch 5 adds a test
to test the new cpuset partition code.
Waiman Long (5):
cgroup/cpuset: Don't call validate_change() for some flag changes
cgroup/cpuset: Add new cpus.partition type with no load balancing
cgroup/cpuset: Allow non-top parent partition root to distribute out
all CPUs
cgroup/cpuset: Update description of cpuset.cpus.partition in
cgroup-v2.rst
kselftest/cgroup: Add cpuset v2 partition root state test
Documentation/admin-guide/cgroup-v2.rst | 19 ++-
kernel/cgroup/cpuset.c | 124 +++++++++++----
tools/testing/selftests/cgroup/Makefile | 2 +-
.../selftests/cgroup/test_cpuset_prs.sh | 141 ++++++++++++++++++
4 files changed, 247 insertions(+), 39 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
--
2.18.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or generic
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous, peak
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v8 -> v9
- Rebase to commit 8331a2bc0898
(KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall)
- Reduce code duplication between binary and debugfs interface
- Add field "offset" in stats descriptor to let us define stats
descriptors in any order (not necessary in the order of stats
defined in vm/vcpu stats structures)
- Add static check to make sure the number of stats descriptors
is the same as the number of stats defined in vm/vcpu stats
structures
- Fix missing/mismatched stats descriptor definition caused by
rebase
* v7 -> v8
- Rebase to kvm/queue, commit c1dc20e254b4 ("KVM: switch per-VM
stats to u64")
- Revise code to reflect the per-VM stats type from ulong to u64
- Addressed some other nits
* v6 -> v7
- Improve file descriptor allocation function by Krish suggestion
- Use "generic stats" instead of "common stats" as Krish suggested
- Addressed some other nits from Krish and David Matlack
* v5 -> v6
- Use designated initializers for STATS_DESC
- Change KVM_STATS_SCALE... to KVM_STATS_BASE...
- Use a common function for kvm_[vm|vcpu]_stats_read
- Fix some documentation errors/missings
- Use TEST_ASSERT in selftest
- Use a common function for [vm|vcpu]_stats_test in selftest
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
[6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
[7] https://lore.kernel.org/kvm/20210603211426.790093-1-jingzhangos@google.com
[8] https://lore.kernel.org/kvm/20210611124624.1404010-1-jingzhangos@google.com
---
Jing Zhang (5):
KVM: stats: Separate generic stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
KVM: stats: Remove code duplication for binary and debugfs stats
Documentation/virt/kvm/api.rst | 177 +++++++++++-
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 50 +++-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 92 +++---
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 93 ++++---
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 78 ++++--
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 234 +++++++++-------
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 111 +++++---
include/linux/kvm_host.h | 178 +++++++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 48 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_binary_stats_test.c | 225 +++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/kvm_main.c | 261 +++++++++++++++---
24 files changed, 1293 insertions(+), 346 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
base-commit: 8331a2bc089881d7fd2fc9a6658f39780817e4e0
--
2.32.0.272.g935e593368-goog
Note: this does not change the parser behavior at all (except for making
one error message more useful). This is just an internal refactor.
The TAP output parser currently operates over a List[str].
This works, but we only ever need to be able to "peek" at the current
line and the ability to "pop" it off.
Also, using a List means we need to wait for all the output before we
can start parsing. While this is not an issue for most tests which are
really lightweight, we do have some longer (~5 minutes) tests.
This patch introduces an LineStream wrapper class that
* Exposes a peek()/pop() interface instead of manipulating an array
* this allows us to more easily add debugging code [1]
* Can consume an input from a generator
* we can now parse results as tests are running (the parser code
currently doesn't print until the end, so no impact yet).
* Tracks the current line number to print better error messages
* Would allow us to add additional features more easily, e.g. storing
N previous lines so we can print out invalid lines in context, etc.
[1] The parsing logic is currently quite fragile.
E.g. it'll often say the kernel "CRASHED" if there's something slightly
wrong with the output format. When debugging a test that had some memory
corruption issues, it resulted in very misleading errors from the parser.
Now we could easily add this to trace all the lines consumed and why
+import inspect
...
def pop(self) -> str:
n = self._next
+ print(f'popping {n[0]}: {n[1].ljust(40, " ")}| caller={inspect.stack()[1].function}')
Example output:
popping 77: TAP version 14 | caller=parse_tap_header
popping 78: 1..1 | caller=parse_test_plan
popping 79: # Subtest: kunit_executor_test | caller=parse_subtest_header
popping 80: 1..2 | caller=parse_subtest_plan
popping 81: ok 1 - parse_filter_test | caller=parse_ok_not_ok_test_case
popping 82: ok 2 - filter_subsuite_test | caller=parse_ok_not_ok_test_case
popping 83: ok 1 - kunit_executor_test | caller=parse_ok_not_ok_test_suite
If we introduce an invalid line, we can see the parser go down the wrong path:
popping 77: TAP version 14 | caller=parse_tap_header
popping 78: 1..1 | caller=parse_test_plan
popping 79: # Subtest: kunit_executor_test | caller=parse_subtest_header
popping 80: 1..2 | caller=parse_subtest_plan
popping 81: 1..2 # this is invalid! | caller=parse_ok_not_ok_test_case
popping 82: ok 1 - parse_filter_test | caller=parse_ok_not_ok_test_case
popping 83: ok 2 - filter_subsuite_test | caller=parse_ok_not_ok_test_case
popping 84: ok 1 - kunit_executor_test | caller=parse_ok_not_ok_test_case
[ERROR] ran out of lines before end token
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
---
v1 -> v2:
* class Input => class LineStream
* get_input() => extract_tap_lines()
---
tools/testing/kunit/kunit_parser.py | 136 ++++++++++++++++---------
tools/testing/kunit/kunit_tool_test.py | 18 ++--
2 files changed, 99 insertions(+), 55 deletions(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index e8bcc139702e..370c0862cc1e 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -47,22 +47,63 @@ class TestStatus(Enum):
NO_TESTS = auto()
FAILURE_TO_PARSE_TESTS = auto()
+class LineStream:
+ """Provides a peek()/pop() interface over an iterator of (line#, text)."""
+ _lines: Iterator[Tuple[int, str]]
+ _next: Tuple[int, str]
+ _done: bool
+
+ def __init__(self, lines: Iterator[Tuple[int, str]]):
+ self._lines = lines
+ self._done = False
+ self._next = (0, '')
+ self._get_next()
+
+ def _get_next(self) -> None:
+ try:
+ self._next = next(self._lines)
+ except StopIteration:
+ self._done = True
+
+ def peek(self) -> str:
+ return self._next[1]
+
+ def pop(self) -> str:
+ n = self._next
+ self._get_next()
+ return n[1]
+
+ def __bool__(self) -> bool:
+ return not self._done
+
+ # Only used by kunit_tool_test.py.
+ def __iter__(self) -> Iterator[str]:
+ while bool(self):
+ yield self.pop()
+
+ def line_number(self) -> int:
+ return self._next[0]
+
kunit_start_re = re.compile(r'TAP version [0-9]+$')
kunit_end_re = re.compile('(List of all partitions:|'
'Kernel panic - not syncing: VFS:)')
-def isolate_kunit_output(kernel_output) -> Iterator[str]:
- started = False
- for line in kernel_output:
- line = line.rstrip() # line always has a trailing \n
- if kunit_start_re.search(line):
- prefix_len = len(line.split('TAP version')[0])
- started = True
- yield line[prefix_len:] if prefix_len > 0 else line
- elif kunit_end_re.search(line):
- break
- elif started:
- yield line[prefix_len:] if prefix_len > 0 else line
+def extract_tap_lines(kernel_output: Iterable[str]) -> LineStream:
+ def isolate_kunit_output(kernel_output: Iterable[str]) -> Iterator[Tuple[int, str]]:
+ line_num = 0
+ started = False
+ for line in kernel_output:
+ line_num += 1
+ line = line.rstrip() # line always has a trailing \n
+ if kunit_start_re.search(line):
+ prefix_len = len(line.split('TAP version')[0])
+ started = True
+ yield line_num, line[prefix_len:]
+ elif kunit_end_re.search(line):
+ break
+ elif started:
+ yield line_num, line[prefix_len:]
+ return LineStream(lines=isolate_kunit_output(kernel_output))
def raw_output(kernel_output) -> None:
for line in kernel_output:
@@ -97,14 +138,14 @@ def print_log(log) -> None:
TAP_ENTRIES = re.compile(r'^(TAP|[\s]*ok|[\s]*not ok|[\s]*[0-9]+\.\.[0-9]+|[\s]*#).*$')
-def consume_non_diagnostic(lines: List[str]) -> None:
- while lines and not TAP_ENTRIES.match(lines[0]):
- lines.pop(0)
+def consume_non_diagnostic(lines: LineStream) -> None:
+ while lines and not TAP_ENTRIES.match(lines.peek()):
+ lines.pop()
-def save_non_diagnostic(lines: List[str], test_case: TestCase) -> None:
- while lines and not TAP_ENTRIES.match(lines[0]):
- test_case.log.append(lines[0])
- lines.pop(0)
+def save_non_diagnostic(lines: LineStream, test_case: TestCase) -> None:
+ while lines and not TAP_ENTRIES.match(lines.peek()):
+ test_case.log.append(lines.peek())
+ lines.pop()
OkNotOkResult = namedtuple('OkNotOkResult', ['is_ok','description', 'text'])
@@ -112,18 +153,18 @@ OK_NOT_OK_SUBTEST = re.compile(r'^[\s]+(ok|not ok) [0-9]+ - (.*)$')
OK_NOT_OK_MODULE = re.compile(r'^(ok|not ok) ([0-9]+) - (.*)$')
-def parse_ok_not_ok_test_case(lines: List[str], test_case: TestCase) -> bool:
+def parse_ok_not_ok_test_case(lines: LineStream, test_case: TestCase) -> bool:
save_non_diagnostic(lines, test_case)
if not lines:
test_case.status = TestStatus.TEST_CRASHED
return True
- line = lines[0]
+ line = lines.peek()
match = OK_NOT_OK_SUBTEST.match(line)
while not match and lines:
- line = lines.pop(0)
+ line = lines.pop()
match = OK_NOT_OK_SUBTEST.match(line)
if match:
- test_case.log.append(lines.pop(0))
+ test_case.log.append(lines.pop())
test_case.name = match.group(2)
if test_case.status == TestStatus.TEST_CRASHED:
return True
@@ -138,14 +179,14 @@ def parse_ok_not_ok_test_case(lines: List[str], test_case: TestCase) -> bool:
SUBTEST_DIAGNOSTIC = re.compile(r'^[\s]+# (.*)$')
DIAGNOSTIC_CRASH_MESSAGE = re.compile(r'^[\s]+# .*?: kunit test case crashed!$')
-def parse_diagnostic(lines: List[str], test_case: TestCase) -> bool:
+def parse_diagnostic(lines: LineStream, test_case: TestCase) -> bool:
save_non_diagnostic(lines, test_case)
if not lines:
return False
- line = lines[0]
+ line = lines.peek()
match = SUBTEST_DIAGNOSTIC.match(line)
if match:
- test_case.log.append(lines.pop(0))
+ test_case.log.append(lines.pop())
crash_match = DIAGNOSTIC_CRASH_MESSAGE.match(line)
if crash_match:
test_case.status = TestStatus.TEST_CRASHED
@@ -153,7 +194,7 @@ def parse_diagnostic(lines: List[str], test_case: TestCase) -> bool:
else:
return False
-def parse_test_case(lines: List[str]) -> Optional[TestCase]:
+def parse_test_case(lines: LineStream) -> Optional[TestCase]:
test_case = TestCase()
save_non_diagnostic(lines, test_case)
while parse_diagnostic(lines, test_case):
@@ -165,24 +206,24 @@ def parse_test_case(lines: List[str]) -> Optional[TestCase]:
SUBTEST_HEADER = re.compile(r'^[\s]+# Subtest: (.*)$')
-def parse_subtest_header(lines: List[str]) -> Optional[str]:
+def parse_subtest_header(lines: LineStream) -> Optional[str]:
consume_non_diagnostic(lines)
if not lines:
return None
- match = SUBTEST_HEADER.match(lines[0])
+ match = SUBTEST_HEADER.match(lines.peek())
if match:
- lines.pop(0)
+ lines.pop()
return match.group(1)
else:
return None
SUBTEST_PLAN = re.compile(r'[\s]+[0-9]+\.\.([0-9]+)')
-def parse_subtest_plan(lines: List[str]) -> Optional[int]:
+def parse_subtest_plan(lines: LineStream) -> Optional[int]:
consume_non_diagnostic(lines)
- match = SUBTEST_PLAN.match(lines[0])
+ match = SUBTEST_PLAN.match(lines.peek())
if match:
- lines.pop(0)
+ lines.pop()
return int(match.group(1))
else:
return None
@@ -199,17 +240,17 @@ def max_status(left: TestStatus, right: TestStatus) -> TestStatus:
else:
return TestStatus.SUCCESS
-def parse_ok_not_ok_test_suite(lines: List[str],
+def parse_ok_not_ok_test_suite(lines: LineStream,
test_suite: TestSuite,
expected_suite_index: int) -> bool:
consume_non_diagnostic(lines)
if not lines:
test_suite.status = TestStatus.TEST_CRASHED
return False
- line = lines[0]
+ line = lines.peek()
match = OK_NOT_OK_MODULE.match(line)
if match:
- lines.pop(0)
+ lines.pop()
if match.group(1) == 'ok':
test_suite.status = TestStatus.SUCCESS
else:
@@ -231,7 +272,7 @@ def bubble_up_test_case_errors(test_suite: TestSuite) -> TestStatus:
max_test_case_status = bubble_up_errors(x.status for x in test_suite.cases)
return max_status(max_test_case_status, test_suite.status)
-def parse_test_suite(lines: List[str], expected_suite_index: int) -> Optional[TestSuite]:
+def parse_test_suite(lines: LineStream, expected_suite_index: int) -> Optional[TestSuite]:
if not lines:
return None
consume_non_diagnostic(lines)
@@ -257,26 +298,26 @@ def parse_test_suite(lines: List[str], expected_suite_index: int) -> Optional[Te
print_with_timestamp(red('[ERROR] ') + 'ran out of lines before end token')
return test_suite
else:
- print('failed to parse end of suite' + lines[0])
+ print(f'failed to parse end of suite "{name}", at line {lines.line_number()}: {lines.peek()}')
return None
TAP_HEADER = re.compile(r'^TAP version 14$')
-def parse_tap_header(lines: List[str]) -> bool:
+def parse_tap_header(lines: LineStream) -> bool:
consume_non_diagnostic(lines)
- if TAP_HEADER.match(lines[0]):
- lines.pop(0)
+ if TAP_HEADER.match(lines.peek()):
+ lines.pop()
return True
else:
return False
TEST_PLAN = re.compile(r'[0-9]+\.\.([0-9]+)')
-def parse_test_plan(lines: List[str]) -> Optional[int]:
+def parse_test_plan(lines: LineStream) -> Optional[int]:
consume_non_diagnostic(lines)
- match = TEST_PLAN.match(lines[0])
+ match = TEST_PLAN.match(lines.peek())
if match:
- lines.pop(0)
+ lines.pop()
return int(match.group(1))
else:
return None
@@ -284,7 +325,7 @@ def parse_test_plan(lines: List[str]) -> Optional[int]:
def bubble_up_suite_errors(test_suites: Iterable[TestSuite]) -> TestStatus:
return bubble_up_errors(x.status for x in test_suites)
-def parse_test_result(lines: List[str]) -> TestResult:
+def parse_test_result(lines: LineStream) -> TestResult:
consume_non_diagnostic(lines)
if not lines or not parse_tap_header(lines):
return TestResult(TestStatus.NO_TESTS, [], lines)
@@ -338,11 +379,12 @@ def print_and_count_results(test_result: TestResult) -> Tuple[int, int, int]:
print_with_timestamp('')
return total_tests, failed_tests, crashed_tests
-def parse_run_tests(kernel_output) -> TestResult:
+def parse_run_tests(kernel_output: Iterable[str]) -> TestResult:
total_tests = 0
failed_tests = 0
crashed_tests = 0
- test_result = parse_test_result(list(isolate_kunit_output(kernel_output)))
+ lines = extract_tap_lines(kernel_output)
+ test_result = parse_test_result(lines)
if test_result.status == TestStatus.NO_TESTS:
print(red('[ERROR] ') + yellow('no tests run!'))
elif test_result.status == TestStatus.FAILURE_TO_PARSE_TESTS:
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 2e809dd956a7..433cd41d951c 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -11,6 +11,7 @@ from unittest import mock
import tempfile, shutil # Handling test_tmpdir
+import itertools
import json
import signal
import os
@@ -92,17 +93,18 @@ class KconfigTest(unittest.TestCase):
class KUnitParserTest(unittest.TestCase):
- def assertContains(self, needle, haystack):
- for line in haystack:
+ def assertContains(self, needle: str, haystack: kunit_parser.LineStream):
+ # Clone the iterator so we can print the contents on failure.
+ copy, backup = itertools.tee(haystack)
+ for line in copy:
if needle in line:
return
- raise AssertionError('"' +
- str(needle) + '" not found in "' + str(haystack) + '"!')
+ raise AssertionError(f'"{needle}" not found in {list(backup)}!')
def test_output_isolated_correctly(self):
log_path = test_data_path('test_output_isolated_correctly.log')
with open(log_path) as file:
- result = kunit_parser.isolate_kunit_output(file.readlines())
+ result = kunit_parser.extract_tap_lines(file.readlines())
self.assertContains('TAP version 14', result)
self.assertContains(' # Subtest: example', result)
self.assertContains(' 1..2', result)
@@ -113,7 +115,7 @@ class KUnitParserTest(unittest.TestCase):
def test_output_with_prefix_isolated_correctly(self):
log_path = test_data_path('test_pound_sign.log')
with open(log_path) as file:
- result = kunit_parser.isolate_kunit_output(file.readlines())
+ result = kunit_parser.extract_tap_lines(file.readlines())
self.assertContains('TAP version 14', result)
self.assertContains(' # Subtest: kunit-resource-test', result)
self.assertContains(' 1..5', result)
@@ -159,7 +161,7 @@ class KUnitParserTest(unittest.TestCase):
empty_log = test_data_path('test_is_test_passed-no_tests_run.log')
with open(empty_log) as file:
result = kunit_parser.parse_run_tests(
- kunit_parser.isolate_kunit_output(file.readlines()))
+ kunit_parser.extract_tap_lines(file.readlines()))
self.assertEqual(0, len(result.suites))
self.assertEqual(
kunit_parser.TestStatus.NO_TESTS,
@@ -170,7 +172,7 @@ class KUnitParserTest(unittest.TestCase):
print_mock = mock.patch('builtins.print').start()
with open(crash_log) as file:
result = kunit_parser.parse_run_tests(
- kunit_parser.isolate_kunit_output(file.readlines()))
+ kunit_parser.extract_tap_lines(file.readlines()))
print_mock.assert_any_call(StrContains('no tests run!'))
print_mock.stop()
file.close()
base-commit: c3d0e3fd41b7f0f5d5d5b6022ab7e813f04ea727
--
2.31.1.818.g46aad6cb9e-goog
Make the default .kunitconfig (specified in
arch/um/configs/kunit_defconfig) specify CONFIG_KUNIT_ALL_TESTS by
default. KUNIT_ALL_TESTS runs all tests which have satisfied
dependencies in the current .config (which would be the architecture
defconfig).
Currently, the default .kunitconfig enables only the example tests and
KUnit's own tests. While this does provide a good example of what a
.kunitconfig for running a few individual tests should look like, it
does mean that kunit_tool runs a pretty paltry collection of tests by
default.
The example tests' config entry (CONFIG_KUNIT_EXAMPLE_TEST=y) continues
to be included -- despite now being redundant -- to provide an example
of how tests are enabled when KUNIT_ALL_TESTS is disabled.
A default run of ./tools/testing/kunit/kunit.py run now runs 70 tests
instead of 14.
Signed-off-by: David Gow <davidgow(a)google.com>
Acked-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
---
Changes since v1:
https://lore.kernel.org/linux-kselftest/20210518035825.1885357-1-davidgow@g…
- Keep the KUNIT_EXAMPLE_TEST entry as an example.
- Move (in patches 2,3) kunit_defconfig to tools/testing/kunit/configs
and replace all_tests.config.
arch/um/configs/kunit_defconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/um/configs/kunit_defconfig b/arch/um/configs/kunit_defconfig
index 9235b7d42d38..e67af7b9f1bb 100644
--- a/arch/um/configs/kunit_defconfig
+++ b/arch/um/configs/kunit_defconfig
@@ -1,3 +1,3 @@
CONFIG_KUNIT=y
-CONFIG_KUNIT_TEST=y
CONFIG_KUNIT_EXAMPLE_TEST=y
+CONFIG_KUNIT_ALL_TESTS=y
--
2.31.1.818.g46aad6cb9e-goog
udpgro_fwd.sh contains many bash specific operators ("[[", "local -r"),
but it's using /bin/sh; in some distro /bin/sh is mapped to /bin/dash,
that doesn't support such operators.
Force the test to use /bin/bash explicitly and prevent false positive
test failures.
Signed-off-by: Andrea Righi <andrea.righi(a)canonical.com>
---
tools/testing/selftests/net/udpgro_fwd.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/udpgro_fwd.sh b/tools/testing/selftests/net/udpgro_fwd.sh
index a8fa64136282..7f26591f236b 100755
--- a/tools/testing/selftests/net/udpgro_fwd.sh
+++ b/tools/testing/selftests/net/udpgro_fwd.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
readonly BASE="ns-$(mktemp -u XXXXXX)"
--
2.31.1
veth.sh is a shell script that uses /bin/sh; some distro (Ubuntu for
example) use dash as /bin/sh and in this case the test reports the
following error:
# ./veth.sh: 21: local: -r: bad variable name
# ./veth.sh: 21: local: -r: bad variable name
This happens because dash doesn't support the option "-r" with local.
Moreover, in case of missing bpf object, the script is exiting -1, that
is an illegal number for dash:
exit: Illegal number: -1
Change the script to be compatible both with bash and dash and prevent
the errors above.
Signed-off-by: Andrea Righi <andrea.righi(a)canonical.com>
---
tools/testing/selftests/net/veth.sh | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/veth.sh b/tools/testing/selftests/net/veth.sh
index 2fedc0781ce8..11d7cdb898c0 100755
--- a/tools/testing/selftests/net/veth.sh
+++ b/tools/testing/selftests/net/veth.sh
@@ -18,7 +18,8 @@ ret=0
cleanup() {
local ns
- local -r jobs="$(jobs -p)"
+ local jobs
+ readonly jobs="$(jobs -p)"
[ -n "${jobs}" ] && kill -1 ${jobs} 2>/dev/null
rm -f $STATS
@@ -108,7 +109,7 @@ chk_gro() {
if [ ! -f ../bpf/xdp_dummy.o ]; then
echo "Missing xdp_dummy helper. Build bpf selftest first"
- exit -1
+ exit 1
fi
create_ns
--
2.31.1
The use of typecheck() in KUNIT_EXPECT_EQ() and friends is causing more
problems than I think it's worth. Things like enums need to have their
values explicitly cast, and literals all need to be very precisely
typed, else a large warning will be printed.
While typechecking does have its uses, the additional overhead of having
lots of needless casts -- combined with the awkward error messages which
don't mention which types are involved -- makes tests less readable and
more difficult to write.
By removing the typecheck() call, the two arguments still need to be of
compatible types, but don't need to be of exactly the same time, which
seems a less confusing and more useful compromise.
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
---
Changes since v1:
https://lore.kernel.org/linux-kselftest/20210507050908.1008686-1-davidgow@g…
- Tidy up the patch description to note that a warning was being
produced, not an error.
- Add additional patches to remove many of the now unnecessary casts.
include/kunit/test.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 49601c4b98b8..4c56ffcb7403 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -775,7 +775,6 @@ void kunit_do_assertion(struct kunit *test,
do { \
typeof(left) __left = (left); \
typeof(right) __right = (right); \
- ((void)__typecheck(__left, __right)); \
\
KUNIT_ASSERTION(test, \
__left op __right, \
--
2.31.1.751.gd2f1c929bd-goog
As discussed at:
https://lore.kernel.org/linux-doc/871r9k6rmy.fsf@meer.lwn.net/
It is better to avoid using :doc:`foo` to refer to Documentation/foo.rst, as the
automarkup.py extension should handle it automatically, on most cases.
There are a couple of exceptions to this rule:
1. when :doc: tag is used to point to a kernel-doc DOC: markup;
2. when it is used with a named tag, e. g. :doc:`some name <foo>`;
It should also be noticed that automarkup.py has currently an issue:
if one use a markup like:
Documentation/dev-tools/kunit/api/test.rst
- documents all of the standard testing API excluding mocking
or mocking related features.
or, even:
Documentation/dev-tools/kunit/api/test.rst
documents all of the standard testing API excluding mocking
or mocking related features.
The automarkup.py will simply ignore it. Not sure why. This patch series
avoid the above patterns (which is present only on 4 files), but it would be
nice to have a followup patch fixing the issue at automarkup.py.
On this series:
Patch 1 manually adjust the references inside driver-api/pm/devices.rst,
as there it uses :file:`foo` to refer to some Documentation/ files;
Patch 2 converts a table at Documentation/dev-tools/kunit/api/index.rst
into a list, carefully avoiding the
Patch 3 converts the cross-references at the media documentation, also
avoiding the automarkup.py bug;
Patches 4-34 convert the other occurrences via a replace script. They were
manually edited, in order to honour 80-columns where possible.
I did a diff between the Sphinx 2.4.4 output before and after this patch
series in order to double-check that all converted Documentation/
references will produce <a href=<foo>.rst>foo title</a> tags.
Mauro Carvalho Chehab (34):
docs: devices.rst: better reference documentation docs
docs: dev-tools: kunit: don't use a table for docs name
media: docs: */media/index.rst: don't use ReST doc:`foo`
media: userspace-api: avoid using ReST :doc:`foo` markup
media: driver-api: drivers: avoid using ReST :doc:`foo` markup
media: admin-guide: avoid using ReST :doc:`foo` markup
docs: admin-guide: pm: avoid using ReSt :doc:`foo` markup
docs: admin-guide: hw-vuln: avoid using ReST :doc:`foo` markup
docs: admin-guide: sysctl: avoid using ReST :doc:`foo` markup
docs: block: biodoc.rst: avoid using ReSt :doc:`foo` markup
docs: bpf: bpf_lsm.rst: avoid using ReSt :doc:`foo` markup
docs: core-api: avoid using ReSt :doc:`foo` markup
docs: dev-tools: testing-overview.rst: avoid using ReSt :doc:`foo`
markup
docs: dev-tools: kunit: avoid using ReST :doc:`foo` markup
docs: devicetree: bindings: submitting-patches.rst: avoid using ReSt
:doc:`foo` markup
docs: doc-guide: avoid using ReSt :doc:`foo` markup
docs: driver-api: avoid using ReSt :doc:`foo` markup
docs: driver-api: gpio: using-gpio.rst: avoid using ReSt :doc:`foo`
markup
docs: driver-api: surface_aggregator: avoid using ReSt :doc:`foo`
markup
docs: driver-api: usb: avoid using ReSt :doc:`foo` markup
docs: firmware-guide: acpi: avoid using ReSt :doc:`foo` markup
docs: hwmon: adm1177.rst: avoid using ReSt :doc:`foo` markup
docs: i2c: avoid using ReSt :doc:`foo` markup
docs: kernel-hacking: hacking.rst: avoid using ReSt :doc:`foo` markup
docs: networking: devlink: avoid using ReSt :doc:`foo` markup
docs: PCI: endpoint: pci-endpoint-cfs.rst: avoid using ReSt :doc:`foo`
markup
docs: PCI: pci.rst: avoid using ReSt :doc:`foo` markup
docs: process: submitting-patches.rst: avoid using ReSt :doc:`foo`
markup
docs: security: landlock.rst: avoid using ReSt :doc:`foo` markup
docs: trace: coresight: coresight.rst: avoid using ReSt :doc:`foo`
markup
docs: trace: ftrace.rst: avoid using ReSt :doc:`foo` markup
docs: userspace-api: landlock.rst: avoid using ReSt :doc:`foo` markup
docs: virt: kvm: s390-pv-boot.rst: avoid using ReSt :doc:`foo` markup
docs: x86: avoid using ReSt :doc:`foo` markup
.../PCI/endpoint/pci-endpoint-cfs.rst | 2 +-
Documentation/PCI/pci.rst | 6 +--
.../special-register-buffer-data-sampling.rst | 3 +-
Documentation/admin-guide/media/bt8xx.rst | 15 ++++----
Documentation/admin-guide/media/bttv.rst | 21 ++++++-----
Documentation/admin-guide/media/index.rst | 12 +++---
Documentation/admin-guide/media/saa7134.rst | 3 +-
Documentation/admin-guide/pm/intel_idle.rst | 16 +++++---
Documentation/admin-guide/pm/intel_pstate.rst | 9 +++--
Documentation/admin-guide/sysctl/abi.rst | 2 +-
Documentation/admin-guide/sysctl/kernel.rst | 37 ++++++++++---------
Documentation/block/biodoc.rst | 2 +-
Documentation/bpf/bpf_lsm.rst | 13 ++++---
.../core-api/bus-virt-phys-mapping.rst | 2 +-
Documentation/core-api/dma-api.rst | 5 ++-
Documentation/core-api/dma-isa-lpc.rst | 2 +-
Documentation/core-api/index.rst | 4 +-
Documentation/dev-tools/kunit/api/index.rst | 8 ++--
Documentation/dev-tools/kunit/faq.rst | 2 +-
Documentation/dev-tools/kunit/index.rst | 14 +++----
Documentation/dev-tools/kunit/start.rst | 6 +--
Documentation/dev-tools/kunit/tips.rst | 5 ++-
Documentation/dev-tools/kunit/usage.rst | 8 ++--
Documentation/dev-tools/testing-overview.rst | 16 ++++----
.../bindings/submitting-patches.rst | 11 +++---
Documentation/doc-guide/contributing.rst | 8 ++--
Documentation/driver-api/gpio/using-gpio.rst | 4 +-
Documentation/driver-api/ioctl.rst | 2 +-
.../driver-api/media/drivers/bttv-devel.rst | 2 +-
Documentation/driver-api/media/index.rst | 10 +++--
Documentation/driver-api/pm/devices.rst | 8 ++--
.../surface_aggregator/clients/index.rst | 3 +-
.../surface_aggregator/internal.rst | 15 ++++----
.../surface_aggregator/overview.rst | 6 ++-
Documentation/driver-api/usb/dma.rst | 6 +--
.../acpi/dsd/data-node-references.rst | 3 +-
.../firmware-guide/acpi/dsd/graph.rst | 2 +-
.../firmware-guide/acpi/enumeration.rst | 7 ++--
Documentation/hwmon/adm1177.rst | 3 +-
Documentation/i2c/instantiating-devices.rst | 2 +-
Documentation/i2c/old-module-parameters.rst | 3 +-
Documentation/i2c/smbus-protocol.rst | 4 +-
Documentation/kernel-hacking/hacking.rst | 4 +-
.../networking/devlink/devlink-region.rst | 2 +-
.../networking/devlink/devlink-trap.rst | 4 +-
Documentation/process/submitting-patches.rst | 32 ++++++++--------
Documentation/security/landlock.rst | 3 +-
Documentation/trace/coresight/coresight.rst | 8 ++--
Documentation/trace/ftrace.rst | 2 +-
Documentation/userspace-api/landlock.rst | 11 +++---
.../userspace-api/media/glossary.rst | 2 +-
Documentation/userspace-api/media/index.rst | 12 +++---
Documentation/virt/kvm/s390-pv-boot.rst | 2 +-
Documentation/x86/boot.rst | 4 +-
Documentation/x86/mtrr.rst | 2 +-
55 files changed, 217 insertions(+), 183 deletions(-)
--
2.31.1
Hi,
This patch converts existing UUID runtime test to use KUnit framework.
Below, there's a comparison between the old output format and the new
one. Keep in mind that even if KUnit seems very verbose, this is the
corner case where _every_ test has failed.
* This is how the current output looks like in success:
test_uuid: all 18 tests passed
* And when it fails:
test_uuid: conversion test #1 failed on LE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #2 failed on LE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #2 actual data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: conversion test #3 failed on BE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #4 failed on BE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #4 actual data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: conversion test #5 failed on LE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #6 failed on LE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #6 actual data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: conversion test #7 failed on BE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #8 failed on BE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #8 actual data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: conversion test #9 failed on LE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #10 failed on LE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #10 actual data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: conversion test #11 failed on BE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #12 failed on BE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #12 actual data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: negative test #13 passed on wrong LE data: 'c33f4995-3701-450e-9fbf206a2e98e576 '
test_uuid: negative test #14 passed on wrong BE data: 'c33f4995-3701-450e-9fbf206a2e98e576 '
test_uuid: negative test #15 passed on wrong LE data: '64b4371c-77c1-48f9-8221-29f054XX023b'
test_uuid: negative test #16 passed on wrong BE data: '64b4371c-77c1-48f9-8221-29f054XX023b'
test_uuid: negative test #17 passed on wrong LE data: '0cb4ddff-a545-4401-9d06-688af53e'
test_uuid: negative test #18 passed on wrong BE data: '0cb4ddff-a545-4401-9d06-688af53e'
test_uuid: failed 18 out of 18 tests
* Now, here's how it looks like with KUnit:
======== [PASSED] uuid ========
[PASSED] uuid_correct_be
[PASSED] uuid_correct_le
[PASSED] uuid_wrong_be
[PASSED] uuid_wrong_le
* And if every test fail with KUnit:
======== [FAILED] uuid ========
[FAILED] uuid_correct_be
# uuid_correct_be: ASSERTION FAILED at lib/test_uuid.c:57
Expected uuid_parse(data->uuid, &be) == 1, but
uuid_parse(data->uuid, &be) == 0
failed to parse 'c33f4995-3701-450e-9fbf-206a2e98e576'
# uuid_correct_be: not ok 1 - c33f4995-3701-450e-9fbf-206a2e98e576
# uuid_correct_be: ASSERTION FAILED at lib/test_uuid.c:57
Expected uuid_parse(data->uuid, &be) == 1, but
uuid_parse(data->uuid, &be) == 0
failed to parse '64b4371c-77c1-48f9-8221-29f054fc023b'
# uuid_correct_be: not ok 2 - 64b4371c-77c1-48f9-8221-29f054fc023b
# uuid_correct_be: ASSERTION FAILED at lib/test_uuid.c:57
Expected uuid_parse(data->uuid, &be) == 1, but
uuid_parse(data->uuid, &be) == 0
failed to parse '0cb4ddff-a545-4401-9d06-688af53e7f84'
# uuid_correct_be: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e7f84
not ok 1 - uuid_correct_be
[FAILED] uuid_correct_le
# uuid_correct_le: ASSERTION FAILED at lib/test_uuid.c:46
Expected guid_parse(data->uuid, &le) == 1, but
guid_parse(data->uuid, &le) == 0
failed to parse 'c33f4995-3701-450e-9fbf-206a2e98e576'
# uuid_correct_le: not ok 1 - c33f4995-3701-450e-9fbf-206a2e98e576
# uuid_correct_le: ASSERTION FAILED at lib/test_uuid.c:46
Expected guid_parse(data->uuid, &le) == 1, but
guid_parse(data->uuid, &le) == 0
failed to parse '64b4371c-77c1-48f9-8221-29f054fc023b'
# uuid_correct_le: not ok 2 - 64b4371c-77c1-48f9-8221-29f054fc023b
# uuid_correct_le: ASSERTION FAILED at lib/test_uuid.c:46
Expected guid_parse(data->uuid, &le) == 1, but
guid_parse(data->uuid, &le) == 0
failed to parse '0cb4ddff-a545-4401-9d06-688af53e7f84'
# uuid_correct_le: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e7f84
not ok 2 - uuid_correct_le
[FAILED] uuid_wrong_be
# uuid_wrong_be: ASSERTION FAILED at lib/test_uuid.c:77
Expected uuid_parse(*data, &be) == 0, but
uuid_parse(*data, &be) == -22
parsing of 'c33f4995-3701-450e-9fbf206a2e98e576 ' should've failed
# uuid_wrong_be: not ok 1 - c33f4995-3701-450e-9fbf206a2e98e576
# uuid_wrong_be: ASSERTION FAILED at lib/test_uuid.c:77
Expected uuid_parse(*data, &be) == 0, but
uuid_parse(*data, &be) == -22
parsing of '64b4371c-77c1-48f9-8221-29f054XX023b' should've failed
# uuid_wrong_be: not ok 2 - 64b4371c-77c1-48f9-8221-29f054XX023b
# uuid_wrong_be: ASSERTION FAILED at lib/test_uuid.c:77
Expected uuid_parse(*data, &be) == 0, but
uuid_parse(*data, &be) == -22
parsing of '0cb4ddff-a545-4401-9d06-688af53e' should've failed
# uuid_wrong_be: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e
not ok 3 - uuid_wrong_be
[FAILED] uuid_wrong_le
# uuid_wrong_le: ASSERTION FAILED at lib/test_uuid.c:68
Expected guid_parse(*data, &le) == 0, but
guid_parse(*data, &le) == -22
parsing of 'c33f4995-3701-450e-9fbf206a2e98e576 ' should've failed
# uuid_wrong_le: not ok 1 - c33f4995-3701-450e-9fbf206a2e98e576
# uuid_wrong_le: ASSERTION FAILED at lib/test_uuid.c:68
Expected guid_parse(*data, &le) == 0, but
guid_parse(*data, &le) == -22
parsing of '64b4371c-77c1-48f9-8221-29f054XX023b' should've failed
# uuid_wrong_le: not ok 2 - 64b4371c-77c1-48f9-8221-29f054XX023b
# uuid_wrong_le: ASSERTION FAILED at lib/test_uuid.c:68
Expected guid_parse(*data, &le) == 0, but
guid_parse(*data, &le) == -22
parsing of '0cb4ddff-a545-4401-9d06-688af53e' should've failed
# uuid_wrong_le: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e
not ok 4 - uuid_wrong_le
Changes from v2:
- Clarify in commit message the new test cases setup
v2: https://lore.kernel.org/lkml/20210609233730.164082-1-andrealmeid@collabora.…
Changes from v1:
- Test suite name: uuid_test -> uuid
- Config name: TEST_UUID -> UUID_KUNIT_TEST
- Config entry in the Kconfig file left where it is
- Converted tests to use _MSG variant
v1: https://lore.kernel.org/lkml/20210605215215.171165-1-andrealmeid@collabora.…
André Almeida (1):
lib: Convert UUID runtime test to KUnit
lib/Kconfig.debug | 11 +++-
lib/Makefile | 2 +-
lib/test_uuid.c | 137 +++++++++++++++++++---------------------------
3 files changed, 67 insertions(+), 83 deletions(-)
--
2.31.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or generic
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v6 -> v7
- Improve file descriptor allocation function by Krish suggestion
- Use "generic stats" instead of "common stats" as Krish suggested
- Addressed some other nits from Krish and David Matlack
* v5 -> v6
- Use designated initializers for STATS_DESC
- Change KVM_STATS_SCALE... to KVM_STATS_BASE...
- Use a common function for kvm_[vm|vcpu]_stats_read
- Fix some documentation errors/missings
- Use TEST_ASSERT in selftest
- Use a common function for [vm|vcpu]_stats_test in selftest
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
[6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate generic stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 180 +++++++++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 38 +++-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 64 +++++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 64 +++++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 59 ++++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 129 ++++++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 67 +++++-
include/linux/kvm_host.h | 141 +++++++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 50 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_binary_stats_test.c | 215 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/kvm_main.c | 169 +++++++++++++-
24 files changed, 1178 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
--
2.32.0.rc1.229.g3e70b5a671-goog
When one parameter of a parameterised test failed, its failure would be
propagated to the overall test, but not to the suite result (unless it
was the last parameter).
This is because test_case->success was being reset to the test->success
result after each parameter was used, so a failing test's result would
be overwritten by a non-failing result. The overall test result was
handled in a third variable, test_result, but this was disacarded after
the status line was printed.
Instead, just propagate the result after each parameter run.
Signed-off-by: David Gow <davidgow(a)google.com>
Fixes: fadb08e7c750 ("kunit: Support for Parameterized Testing")
---
This is fixing quite a serious bug where some test suites would appear
to succeed even if some of their component tests failed. It'd be nice to
get this into kunit-fixes ASAP.
(This will require a rework of some of the skip tests work, for which
I'll send out a new version soon.)
Cheers,
-- David
lib/kunit/test.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 2f6cc0123232..17973a4a44c2 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -376,7 +376,7 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
context.test_case = test_case;
kunit_try_catch_run(try_catch, &context);
- test_case->success = test->success;
+ test_case->success &= test->success;
}
int kunit_run_tests(struct kunit_suite *suite)
@@ -388,7 +388,7 @@ int kunit_run_tests(struct kunit_suite *suite)
kunit_suite_for_each_test_case(suite, test_case) {
struct kunit test = { .param_value = NULL, .param_index = 0 };
- bool test_success = true;
+ test_case->success = true;
if (test_case->generate_params) {
/* Get initial param. */
@@ -398,7 +398,6 @@ int kunit_run_tests(struct kunit_suite *suite)
do {
kunit_run_case_catch_errors(suite, test_case, &test);
- test_success &= test_case->success;
if (test_case->generate_params) {
if (param_desc[0] == '\0') {
@@ -420,7 +419,7 @@ int kunit_run_tests(struct kunit_suite *suite)
}
} while (test.param_value);
- kunit_print_ok_not_ok(&test, true, test_success,
+ kunit_print_ok_not_ok(&test, true, test_case->success,
kunit_test_case_num(suite, test_case),
test_case->name);
}
--
2.32.0.272.g935e593368-goog
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
It is important to mention that the v8 and v7 versions have changed the
method used to track the information related to the application crashes.
Prior this versions, a pointer per process (in the task_struct structure)
held a reference to the shared statistical data. Or in other words, these
stats were shared by all the fork hierarchy processes. But this has an
important drawback: a brute force attack that happens through the execve
system call losts the faults info since these statistics are freed when the
fork hierarchy disappears. So, the solution adopted in the v6 version was
to use an upper fork hierarchy to track the info for this attack type. But,
as Valdis Kletnieks pointed out during this discussion [1], this method can
be easily bypassed using a double exec (well, this was the method used in
the kselftest to avoid the detection ;) ). So, in this version, to track
all the statistical data (info related with application crashes), the
extended attributes feature for the executable files are used. The xattr is
also used to mark the executables as "not allowed" when an attack is
detected. Then, the execve system call rely on this flag to avoid following
executions of this file.
[1] https://lore.kernel.org/kernelnewbies/20210330173459.GA3163@ubuntu/
Moreover, I think this solves another problem pointed out by Andi Kleen
during the v5 review [2] related to the possibility that a supervisor
respawns processes killed by the Brute LSM. He suggested adding some way so
a supervisor can know that a process has been killed by Brute and then
decide to respawn or not. So, now, the supervisor can read the brute xattr
of one executable and know if it is blocked by Brute and why (using the
statistical data).
[2] https://lore.kernel.org/kernel-hardening/878s78dnrm.fsf@linux.intel.com/
Although the xattr of the executable is accessible from userspace, in
complex daemons this file may not be visible directly by the supervisor as
it may be run through some wrapper. So, an extension to the waitid() system
call has been added in this version. This was suggested by Andi Kleen [3]
during the v7 review. (The case with supervisors using cgroups is not yet
tested).
[3] https://lore.kernel.org/kernel-hardening/19903478-52e0-3829-0515-3e17669108…
Knowing all this information I will explain now the different patches:
The 1/8 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/8 patch defines a new LSM and the necessary sysctl attributes to fine
tuning the attack detection.
The 3/8 patch detects a fork/exec brute force attack and narrows the
possible cases taken into account the privilege boundary crossing.
The 4/8 patch mitigates a brute force attack.
The 5/8 patch adds the extension to the waitid system call to notify to
userspace that a task has been killed by Brute LSM when an attack is
mitigated.
The 6/8 patch adds self-tests to validate the Brute LSM expectations.
The 7/8 patch adds the documentation to explain this implementation.
The 8/8 patch updates the maintainers file.
This patch serie is a task of the KSPP [4] and can also be accessed from my
github tree [5] in the "brute_v8" branch.
[4] https://github.com/KSPP/linux/issues/39
[5] https://github.com/johwood/linux/
When I ran the "checkpatch" script I got the following errors, but I think
they are false positives as I follow the same coding style for the others
extended attributes suffixes.
----------------------------------------------------------------------------
../patches/brute_v8/v8-0003-security-brute-Detect-a-brute-force-attack.patch
----------------------------------------------------------------------------
ERROR: Macros with complex values should be enclosed in parentheses
89: FILE: include/uapi/linux/xattr.h:80:
+#define XATTR_NAME_BRUTE XATTR_SECURITY_PREFIX XATTR_BRUTE_SUFFIX
-----------------------------------------------------------------------------
../patches/brute_v8/v8-0006-selftests-brute-Add-tests-for-the-Brute-LSM.patch
-----------------------------------------------------------------------------
ERROR: Macros with complex values should be enclosed in parentheses
159: FILE: tools/testing/selftests/brute/rmxattr.c:18:
+#define XATTR_NAME_BRUTE XATTR_SECURITY_PREFIX XATTR_BRUTE_SUFFIX
When I ran the "kernel-doc" script with the following parameters:
./scripts/kernel-doc --none -v security/brute/brute.c
I got the following warning:
security/brute/brute.c:118: warning: contents before sections
But I don't understand why it is complaining. Could it be a false positive?
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Version 4
https://lore.kernel.org/lkml/20210227150956.6022-1-john.wood@gmx.com/
Version 5
https://lore.kernel.org/kernel-hardening/20210227153013.6747-1-john.wood@gm…
Version 6
https://lore.kernel.org/kernel-hardening/20210307113031.11671-1-john.wood@g…
Version 7
https://lore.kernel.org/kernel-hardening/20210521172414.69456-1-john.wood@g…
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Changelog v5 -> v6
------------------
- Fix a reported deadlock (kernel test robot).
- Add high level details to the documentation (Andi Kleen).
Changelog v6 -> v7
------------------
- Add the "Reviewed-by:" tag to the first patch.
- Rearrange the brute LSM between lockdown and yama (Kees Cook).
- Split subdir and obj in security/Makefile (Kees Cook).
- Reduce the number of header files included (Kees Cook).
- Print the pid when an attack is detected (Kees Cook).
- Use the socket_accept LSM hook instead of socket_sock_rcv_skb hook to
avoid running a hook on every incoming network packet (Kees Cook).
- Update the documentation and fix it to render it properly (Jonathan
Corbet).
- Manage correctly an exec brute force attack avoiding the bypass (Valdis
Kletnieks).
- Other minor changes and cleanups.
Changelog v7 -> v8
------------------
- Rebase against v5.13-rc4.
- Fix a build error if CONFIG_IPV6 and/or CONFIG_SECURITY_NETWORK is not
set (kernel test robot).
- Notify to userspace that a task has been killed by Brute LSM (Andi
Kleen).
- Add a new test to verify that the userspace notification is working.
- Update the documentation accordingly with this new feature.
- Other minor changes and cleanups.
Any constructive comments are welcome.
Thanks in advance.
John Wood (8):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and add sysctl attributes
security/brute: Detect a brute force attack
security/brute: Mitigate a brute force attack
security/brute: Notify to userspace "task killed"
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 359 ++++++++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 8 +
arch/x86/kernel/signal_compat.c | 2 +-
include/brute/brute.h | 16 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
include/uapi/asm-generic/siginfo.h | 3 +-
include/uapi/linux/xattr.h | 3 +
kernel/exit.c | 6 +-
kernel/signal.c | 5 +-
security/Kconfig | 11 +-
security/Makefile | 2 +
security/brute/Kconfig | 15 +
security/brute/Makefile | 2 +
security/brute/brute.c | 795 +++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 3 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/exec.c | 46 ++
tools/testing/selftests/brute/rmxattr.c | 34 +
tools/testing/selftests/brute/test.c | 507 +++++++++++++++
tools/testing/selftests/brute/test.sh | 269 ++++++++
26 files changed, 2099 insertions(+), 9 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 include/brute/brute.h
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/exec.c
create mode 100644 tools/testing/selftests/brute/rmxattr.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or generic
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v7-> v8
- Rebase to kvm/queue, commit c1dc20e254b4 ("KVM: switch per-VM
stats to u64")
- Revise code to reflect the per-VM stats type from ulong to u64
- Addressed some other nits
* v6 -> v7
- Improve file descriptor allocation function by Krish suggestion
- Use "generic stats" instead of "common stats" as Krish suggested
- Addressed some other nits from Krish and David Matlack
* v5 -> v6
- Use designated initializers for STATS_DESC
- Change KVM_STATS_SCALE... to KVM_STATS_BASE...
- Use a common function for kvm_[vm|vcpu]_stats_read
- Fix some documentation errors/missings
- Use TEST_ASSERT in selftest
- Use a common function for [vm|vcpu]_stats_test in selftest
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
[6] https://lore.kernel.org/kvm/20210524151828.4113777-1-jingzhangos@google.com
[7] https://lore.kernel.org/kvm/20210603211426.790093-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate generic stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 174 +++++++++++++-
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 46 +++-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 71 +++++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 72 +++++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 67 +++++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 137 ++++++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 75 +++++-
include/linux/kvm_host.h | 138 ++++++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 46 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_binary_stats_test.c | 218 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/kvm_main.c | 157 ++++++++++++-
24 files changed, 1202 insertions(+), 91 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_binary_stats_test.c
base-commit: c1dc20e254b421a2463da7f053b37d822788224a
--
2.32.0.272.g935e593368-goog
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - Snooze: 3265
# Observed Average IPI latency(ns) - Stop0_lite: 3507
# Observed Average IPI latency(ns) - Stop0: 3739
# Observed Average IPI latency(ns) - Stop2: 3807
# Observed Average IPI latency(ns) - Stop4: 17070
# Observed Average IPI latency(ns) - Stop5: 1038174
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - Snooze: 1640
# Observed Average timeout diff(ns) - Stop0_lite: 1764
# Observed Average timeout diff(ns) - Stop0: 1715
# Observed Average timeout diff(ns) - Stop2: 1845
# Observed Average timeout diff(ns) - Stop4: 16581
# Observed Average timeout diff(ns) - Stop5: 939977
Pratik R. Sampat (2):
powerpc/cpuidle: Extract IPI based and timer based wakeup latency from
idle states
powerpc/selftest: Add support for cpuidle latency measurement
arch/powerpc/kernel/Makefile | 1 +
arch/powerpc/kernel/test-cpuidle_latency.c | 157 +++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/powerpc/Makefile | 1 +
.../powerpc/cpuidle_latency/.gitignore | 2 +
.../powerpc/cpuidle_latency/Makefile | 6 +
.../cpuidle_latency/cpuidle_latency.sh | 419 ++++++++++++++++++
.../powerpc/cpuidle_latency/settings | 1 +
8 files changed, 597 insertions(+)
create mode 100644 arch/powerpc/kernel/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
create mode 100755 tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings
--
2.17.1
Use ARRAY_SIZE instead of dividing sizeof array with sizeof an
element.
Clean up the following coccicheck warning:
./tools/testing/selftests/x86/syscall_numbering.c:316:35-36: WARNING:
Use ARRAY_SIZE.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
Changes in v2:
-Add ARRAY_SIZE definition.
tools/testing/selftests/x86/syscall_numbering.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/syscall_numbering.c b/tools/testing/selftests/x86/syscall_numbering.c
index 9915917..ef30218 100644
--- a/tools/testing/selftests/x86/syscall_numbering.c
+++ b/tools/testing/selftests/x86/syscall_numbering.c
@@ -40,6 +40,7 @@
#define X32_WRITEV 516
#define X32_BIT 0x40000000
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
static int nullfd = -1; /* File descriptor for /dev/null */
static bool with_x32; /* x32 supported on this kernel? */
@@ -313,7 +314,7 @@ static void test_syscall_numbering(void)
* The MSB is supposed to be ignored, so we loop over a few
* to test that out.
*/
- for (size_t i = 0; i < sizeof(msbs)/sizeof(msbs[0]); i++) {
+ for (size_t i = 0; i < ARRAY_SIZE(msbs); i++) {
int msb = msbs[i];
run("Checking system calls with msb = %d (0x%x)\n",
msb, msb);
--
1.8.3.1
Hi,
This patch converts existing UUID runtime test to use KUnit framework.
Below, there's a comparison between the old output format and the new
one. Keep in mind that even if KUnit seems very verbose, this is the
corner case where _every_ test has failed.
* This is how the current output looks like in success:
test_uuid: all 18 tests passed
* And when it fails:
test_uuid: conversion test #1 failed on LE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #2 failed on LE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #2 actual data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: conversion test #3 failed on BE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #4 failed on BE data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: cmp test #4 actual data: 'c33f4995-3701-450e-9fbf-206a2e98e576'
test_uuid: conversion test #5 failed on LE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #6 failed on LE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #6 actual data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: conversion test #7 failed on BE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #8 failed on BE data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: cmp test #8 actual data: '64b4371c-77c1-48f9-8221-29f054fc023b'
test_uuid: conversion test #9 failed on LE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #10 failed on LE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #10 actual data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: conversion test #11 failed on BE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #12 failed on BE data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: cmp test #12 actual data: '0cb4ddff-a545-4401-9d06-688af53e7f84'
test_uuid: negative test #13 passed on wrong LE data: 'c33f4995-3701-450e-9fbf206a2e98e576 '
test_uuid: negative test #14 passed on wrong BE data: 'c33f4995-3701-450e-9fbf206a2e98e576 '
test_uuid: negative test #15 passed on wrong LE data: '64b4371c-77c1-48f9-8221-29f054XX023b'
test_uuid: negative test #16 passed on wrong BE data: '64b4371c-77c1-48f9-8221-29f054XX023b'
test_uuid: negative test #17 passed on wrong LE data: '0cb4ddff-a545-4401-9d06-688af53e'
test_uuid: negative test #18 passed on wrong BE data: '0cb4ddff-a545-4401-9d06-688af53e'
test_uuid: failed 18 out of 18 tests
* Now, here's how it looks like with KUnit:
======== [PASSED] uuid ========
[PASSED] uuid_correct_be
[PASSED] uuid_correct_le
[PASSED] uuid_wrong_be
[PASSED] uuid_wrong_le
* And if every test fail with KUnit:
======== [FAILED] uuid ========
[FAILED] uuid_correct_be
# uuid_correct_be: ASSERTION FAILED at lib/test_uuid.c:57
Expected uuid_parse(data->uuid, &be) == 1, but
uuid_parse(data->uuid, &be) == 0
failed to parse 'c33f4995-3701-450e-9fbf-206a2e98e576'
# uuid_correct_be: not ok 1 - c33f4995-3701-450e-9fbf-206a2e98e576
# uuid_correct_be: ASSERTION FAILED at lib/test_uuid.c:57
Expected uuid_parse(data->uuid, &be) == 1, but
uuid_parse(data->uuid, &be) == 0
failed to parse '64b4371c-77c1-48f9-8221-29f054fc023b'
# uuid_correct_be: not ok 2 - 64b4371c-77c1-48f9-8221-29f054fc023b
# uuid_correct_be: ASSERTION FAILED at lib/test_uuid.c:57
Expected uuid_parse(data->uuid, &be) == 1, but
uuid_parse(data->uuid, &be) == 0
failed to parse '0cb4ddff-a545-4401-9d06-688af53e7f84'
# uuid_correct_be: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e7f84
not ok 1 - uuid_correct_be
[FAILED] uuid_correct_le
# uuid_correct_le: ASSERTION FAILED at lib/test_uuid.c:46
Expected guid_parse(data->uuid, &le) == 1, but
guid_parse(data->uuid, &le) == 0
failed to parse 'c33f4995-3701-450e-9fbf-206a2e98e576'
# uuid_correct_le: not ok 1 - c33f4995-3701-450e-9fbf-206a2e98e576
# uuid_correct_le: ASSERTION FAILED at lib/test_uuid.c:46
Expected guid_parse(data->uuid, &le) == 1, but
guid_parse(data->uuid, &le) == 0
failed to parse '64b4371c-77c1-48f9-8221-29f054fc023b'
# uuid_correct_le: not ok 2 - 64b4371c-77c1-48f9-8221-29f054fc023b
# uuid_correct_le: ASSERTION FAILED at lib/test_uuid.c:46
Expected guid_parse(data->uuid, &le) == 1, but
guid_parse(data->uuid, &le) == 0
failed to parse '0cb4ddff-a545-4401-9d06-688af53e7f84'
# uuid_correct_le: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e7f84
not ok 2 - uuid_correct_le
[FAILED] uuid_wrong_be
# uuid_wrong_be: ASSERTION FAILED at lib/test_uuid.c:77
Expected uuid_parse(*data, &be) == 0, but
uuid_parse(*data, &be) == -22
parsing of 'c33f4995-3701-450e-9fbf206a2e98e576 ' should've failed
# uuid_wrong_be: not ok 1 - c33f4995-3701-450e-9fbf206a2e98e576
# uuid_wrong_be: ASSERTION FAILED at lib/test_uuid.c:77
Expected uuid_parse(*data, &be) == 0, but
uuid_parse(*data, &be) == -22
parsing of '64b4371c-77c1-48f9-8221-29f054XX023b' should've failed
# uuid_wrong_be: not ok 2 - 64b4371c-77c1-48f9-8221-29f054XX023b
# uuid_wrong_be: ASSERTION FAILED at lib/test_uuid.c:77
Expected uuid_parse(*data, &be) == 0, but
uuid_parse(*data, &be) == -22
parsing of '0cb4ddff-a545-4401-9d06-688af53e' should've failed
# uuid_wrong_be: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e
not ok 3 - uuid_wrong_be
[FAILED] uuid_wrong_le
# uuid_wrong_le: ASSERTION FAILED at lib/test_uuid.c:68
Expected guid_parse(*data, &le) == 0, but
guid_parse(*data, &le) == -22
parsing of 'c33f4995-3701-450e-9fbf206a2e98e576 ' should've failed
# uuid_wrong_le: not ok 1 - c33f4995-3701-450e-9fbf206a2e98e576
# uuid_wrong_le: ASSERTION FAILED at lib/test_uuid.c:68
Expected guid_parse(*data, &le) == 0, but
guid_parse(*data, &le) == -22
parsing of '64b4371c-77c1-48f9-8221-29f054XX023b' should've failed
# uuid_wrong_le: not ok 2 - 64b4371c-77c1-48f9-8221-29f054XX023b
# uuid_wrong_le: ASSERTION FAILED at lib/test_uuid.c:68
Expected guid_parse(*data, &le) == 0, but
guid_parse(*data, &le) == -22
parsing of '0cb4ddff-a545-4401-9d06-688af53e' should've failed
# uuid_wrong_le: not ok 3 - 0cb4ddff-a545-4401-9d06-688af53e
not ok 4 - uuid_wrong_le
Changes from v1:
- Test suite name: uuid_test -> uuid
- Config name: TEST_UUID -> UUID_KUNIT_TEST
- Config entry in the Kconfig file left where it is
- Converted tests to use _MSG variant
André Almeida (1):
lib: Convert UUID runtime test to KUnit
lib/Kconfig.debug | 11 +++-
lib/Makefile | 2 +-
lib/test_uuid.c | 137 +++++++++++++++++++---------------------------
3 files changed, 67 insertions(+), 83 deletions(-)
--
2.31.1
Hi,
This patch series introduces the futex2 syscalls.
* What happened to the current futex()?
For some years now, developers have been trying to add new features to
futex, but maintainers have been reluctant to accept then, given the
multiplexed interface full of legacy features and tricky to do big
changes. Some problems that people tried to address with patchsets are:
NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
NUMA, for instance, just doesn't fit the current API in a reasonable
way. Considering that, it's not possible to merge new features into the
current futex.
** The NUMA problem
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** The 32bit sized futex problem
Futexes are used to implement atomic operations in userspace.
Supporting 8, 16, 32 and 64 bit sized futexes allows user libraries to
implement all those sizes in a performant way. Thanks Boost devs for
feedback: https://lists.boost.org/Archives/boost/2021/05/251508.php
Embedded systems or anything with memory constrains could benefit of
using smaller sizes for the futex userspace integer.
** The wait on multiple problem
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running over Wine.
[0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
[1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
[2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.…
* The solution
As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
is required to solve this, which must be designed with those features in
mind. futex2() is that interface. As opposed to the current multiplexed
interface, the new one should have one syscall per operation. This will
allow the maintainability of the API if it gets extended, and will help
users with type checking of arguments.
In particular, the new interface is extended to support the ability to
wait on any of a list of futexes at a time, which could be seen as a
vectored extension of the FUTEX_WAIT semantics.
[3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-…
* The interface
The new interface can be seen in details in the following patches, but
this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- Supports variable sized futexes (8bits, 16bits, 32bits and 64bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* Implementation
The internal implementation follows a similar design to the original futex.
Given that we want to replicate the same external behavior of current
futex, this should be somewhat expected. For some functions, like the
init and the code to get a shared key, I literally copied code and
comments from kernel/futex.c. I decided to do so instead of exposing the
original function as a public function since in that way we can freely
modify our implementation if required, without any impact on old futex.
Also, the comments precisely describes the details and corner cases of
the implementation.
Each patch contains a brief description of implementation, but patch 7
"docs: locking: futex2: Add documentation" adds a more complete document
about it.
* The patchset
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
- Patch 1: Implements wait/wake, and the basics foundations of futex2
- Patches 2-5: Implement the remaining features (shared, waitv,
requeue, sizes).
- Patch 6: Adds the x86_x32 ABI handling. I kept it in a separated
patch since I'm not sure if x86_x32 is still a thing, or if it should
return -ENOSYS.
- Patch 7: Add a documentation file which details the interface and
the internal implementation.
- Patches 8-14: Selftests for all operations along with perf
support for futex2.
- Patch 15: While working on porting glibc for futex2, I found out
that there's a futex_wake() call at the user thread exit path, if
that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
order to make pthreads work with futex2, it was required to add
this patch. Note that this is more a proof-of-concept of what we
will need to do in future, rather than part of the interface and
shouldn't be merged as it is.
* Testing:
This patchset provides selftests for each operation and their flags.
Along with that, the following work was done:
** Stability
To stress the interface in "real world scenarios":
- glibc[4]: nptl's low level locking was modified to use futex2 API
(except for robust and PI things). All relevant nptl/ tests passed.
- Wine[5]: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- Full GNU/Linux distro: I installed the modified glibc in my host
machine, so all pthread's programs would use futex2(). After tweaking
systemd[6] to allow futex2() calls at seccomp, everything worked as
expected (web browsers do some syscall sandboxing and need some
configuration as well).
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
** Performance
- For comparing futex() and futex2() performance, I used the artificial
benchmarks implemented at perf (wake, wake-parallel, hash and
requeue). The setup was 200 runs for each test and using 8, 80, 800,
8000 for the number of threads, Note that for this test, I'm not using
patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
at "The patchset" section.
- For the first three ones, I measured an average of 4% gain in
performance. This is not a big step, but it shows that the new
interface is at least comparable in performance with the current one.
- For requeue, I measured an average of 21% decrease in performance
compared to the original futex implementation. This is expected given
the new design with individual flags. The performance trade-offs are
explained at patch 4 ("futex2: Implement requeue operation").
[4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
[5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
[6] https://gitlab.collabora.com/tonyk/systemd
* FAQ
** "Where's the code for NUMA?"
NUMA will be implemented in future versions of this patch, and like the
size feature, it will require work with users of futex to get feedback
about it.
** "Where's the PI/robust stuff?"
As said by Peter Zijlstra at [3], all those new features are related to
the "simple" futex interface, that doesn't use PI or robust. Do we want
to have this complexity at futex2() and if so, should it be part of
this patchset or can it be future work?
Thanks,
André
* Changelog
Changes from v3:
- Implemented variable sized futexes
v3: https://lore.kernel.org/lkml/20210427231248.220501-1-andrealmeid@collabora.…
Changes from v2:
- API now supports 64bit futexes, in addition to 8, 16 and 32.
- This API change will break the glibc[4] and Proton[5] ports for now.
- Refactored futex2_wait and futex2_waitv selftests
v2: https://lore.kernel.org/lkml/20210304004219.134051-1-andrealmeid@collabora.…
Changes from v1:
- Unified futex_set_timer_and_wait and __futex_wait code
- Dropped _carefull from linked list function calls
- Fixed typos on docs patch
- uAPI flags are now added as features are introduced, instead of all flags
in patch 1
- Removed struct futex_single_waiter in favor of an anon struct
v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.…
André Almeida (15):
futex2: Implement wait and wake functions
futex2: Add support for shared futexes
futex2: Implement vectorized wait
futex2: Implement requeue operation
futex2: Implement support for different futex sizes
futex2: Add compatibility entry point for x86_x32 ABI
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
selftests: futex2: Add futex sizes test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 198 +++
Documentation/locking/index.rst | 1 +
MAINTAINERS | 2 +-
arch/arm/tools/syscall.tbl | 4 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 8 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
fs/inode.c | 1 +
include/linux/compat.h | 26 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 17 +
include/uapi/asm-generic/unistd.h | 14 +-
include/uapi/linux/futex.h | 34 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex2.c | 1289 +++++++++++++++++
kernel/sys_ni.c | 9 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/include/uapi/asm-generic/unistd.h | 11 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 4 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 +
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 4 +
.../selftests/futex/functional/Makefile | 7 +-
.../futex/functional/futex2_requeue.c | 164 +++
.../selftests/futex/functional/futex2_sizes.c | 146 ++
.../selftests/futex/functional/futex2_wait.c | 195 +++
.../selftests/futex/functional/futex2_waitv.c | 154 ++
.../futex/functional/futex_wait_timeout.c | 58 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 113 ++
39 files changed, 2707 insertions(+), 52 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_sizes.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.31.1
From: Oliver Glitta <glittao(a)gmail.com>
Add documentation for a KUnit test for SLUB debugging functionality.
Signed-off-by: Oliver Glitta <glittao(a)gmail.com>
---
Documentation/vm/slub.rst | 104 ++++++++++++++++++++++++++++++++++++++
1 file changed, 104 insertions(+)
diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 03f294a638bd..ca82fc1649ee 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -384,5 +384,109 @@ c) Execute ``slabinfo-gnuplot.sh`` in '-t' mode, passing all of the
40,60`` range will plot only samples collected between 40th and
60th seconds).
+KUnit tests for SLUB debugging functionality
+============================================
+
+These KUnit tests are used to test some of the SLUB debugging
+functionalities.
+
+KUnit tests are used for unit testing in Linux kernel and easy to run,
+so it is probably the best choice for this type of tests.
+
+There are tests, which corrupt redzone, the free objects and the freelist.
+Tests are corrupting specific bytes in cache and checking if validation
+finds expected number of bugs. Bug reports are silenced.
+
+Config option
+
+In order to built and then run this tests you need to switch
+option SLUB_KUNIT_TEST on. It is tristate option so it can also
+be built as a module. This option depends on SLUB_DEBUG and
+KUNIT options. By default it is on with all kunit tests.
+
+Error counting
+
+To get number of errors discovered in slub is used test API kunit_resource.
+In test_init the reference to the integer variable slab_errors is added
+to the resource of this tests.
+
+During slub cache checking always when bug should be reported or fixed function
+slab_add_kunit_errors() is called. This function find resource to kunit test
+and increment value of data in founded resource, which is slab_errors
+variable.
+
+Silence bug reports
+
+The function slab_add_kunit_errors() is returning bool, which is true if there is kunit test
+with correct kunit_resource running, to silence bug reports, so they are not printed.
+We do not want to correct errors we only want to know they occurred, so these reports
+are unnnecessary.
+
+KASAN option
+
+Only 2 out of 5 tests are runnig with KASAN option is on.
+The other three tests deliberately modifies non-allocated objects. And KASAN
+does not detect some errors in the same way as SLUB_DEBUG. So, these tests
+does not run when KASAN option is on.
+
+TESTS
+
+1. test_clobber_zone
+
+ SLUB cache with SLUB_REDZONE flag can detects writings after object. This
+ functionality is tested here on allocated memory.
+
+ First, there is allocated memory with SLAB_REDZONE and then the first byte
+ after allocated space is modified. Validation founds 2 errors, because of
+ the bug and the fix of the memory.
+
+
+2. test_next_pointer
+
+ SLUB have list of free objects and the address of the next free object
+ is always saved in free object at offset specified in variable offset
+ in struct kmem_cache. This test try to corrupt this freelist and
+ then correct it.
+
+ First, there is allocated and freed memory to get a pointer to free object.
+ After that, the pointer to next free object is corrupted. The first validation finds
+ 3 errors. One for corrupted freechain, the second for the wrong count of objects
+ in use and the third for fixing the issue. This fix only set number of objects
+ in use to a number of all objects minus 1, because the first free object
+ was corrupted.
+
+ Then the free pointer is fixed to his previous value. The second validation finds
+ 2 errors. One for the wrong count of objects in use and one for fixing this error.
+
+ Last validation is used to check if all errors were corrected so no error
+ is found.
+
+3. test_first_word
+
+ SLUB cache with SLAB_POISON flag can detect poisoning free objects. This
+ functionality is tested in this test. The test tries to corrupt
+ the first byte in freed memory.
+
+ First of all, memory is allocated and freed to get a pointer to a free object
+ and then the first byte is corrupted. After that, validation finds 2 errors,
+ one for the bug and the other one for the fix of the memory.
+
+4. test_clobber_50th_byte
+
+ In this test SLAB_POISON functionality is tested. The test tries to
+ corrupt the 50th byte in freed memory.
+
+ First, pointer to a free memory is acquired by allocating and freeing memory.
+ Then 50th byte is corrupted and validation finds 2 errors for the bug and
+ the fix of the memory.
+
+5. test_clobber_redzone_free
+
+ This test tests redzone functionality of SLUB cache on a freed object.
+
+ First, it gets pointer to the free object with allocating and freeing and
+ then corrupts the first byte after the freed object. Validation finds
+ 2 errors for the bug and the fix of the memory.
+
Christoph Lameter, May 30, 2007
Sergey Senozhatsky, October 23, 2015
--
2.31.1.272.g89b43f80a5
Use ARRAY_SIZE instead of dividing sizeof array with sizeof an
element.
Clean up the following coccicheck warning:
./tools/testing/selftests/x86/syscall_numbering.c:316:35-36: WARNING:
Use ARRAY_SIZE.
Reported-by: Abaci Robot <abaci(a)linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
---
tools/testing/selftests/x86/syscall_numbering.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/syscall_numbering.c b/tools/testing/selftests/x86/syscall_numbering.c
index 9915917..7d5e246 100644
--- a/tools/testing/selftests/x86/syscall_numbering.c
+++ b/tools/testing/selftests/x86/syscall_numbering.c
@@ -313,7 +313,7 @@ static void test_syscall_numbering(void)
* The MSB is supposed to be ignored, so we loop over a few
* to test that out.
*/
- for (size_t i = 0; i < sizeof(msbs)/sizeof(msbs[0]); i++) {
+ for (size_t i = 0; i < ARRAY_SIZE(msbs); i++) {
int msb = msbs[i];
run("Checking system calls with msb = %d (0x%x)\n",
msb, msb);
--
1.8.3.1
(39fe2fc96694 "selftests: kvm: make allocation of extra memory take effect")
changed the meaning of extra_mem_pages and treated it as slot0 memory size.
In fact extra_mem_pages is used for non-slot0 memory size, there is no custom
slot0 memory size support. See discuss in https://lkml.org/lkml/2021/6/3/551
for more details.
This patchset restores extra_mem_pages's original meaning and adds support for
custom slot0 memory with a new parameter slot0_mem_pages.
Run below command, all 39 tests passed.
# make -C tools/testing/selftests/ TARGETS=kvm run_tests
Zhenzhong Duan (3):
Revert "selftests: kvm: make allocation of extra memory take effect"
Revert "selftests: kvm: fix overlapping addresses in
memslot_perf_test"
selftests: kvm: Add support for customized slot0 memory size
.../testing/selftests/kvm/include/kvm_util.h | 7 +--
.../selftests/kvm/kvm_page_table_test.c | 2 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 47 +++++++++++++++----
.../selftests/kvm/lib/perf_test_util.c | 2 +-
.../testing/selftests/kvm/memslot_perf_test.c | 2 +-
5 files changed, 45 insertions(+), 15 deletions(-)
--
2.25.1
SRv6 End.DT46 Behavior is defined in the IETF RFC 8986 [1] along with SRv6
End.DT4 and End.DT6 Behaviors.
The proposed End.DT46 implementation is meant to support the decapsulation
of both IPv4 and IPv6 traffic coming from a *single* SRv6 tunnel.
The SRv6 End.DT46 Behavior greatly simplifies the setup and operations of
SRv6 VPNs in the Linux kernel.
- patch 1/2 is the core patch that adds support for the SRv6 End.DT46
Behavior;
- patch 2/2 adds the selftest for SRv6 End.DT46 Behavior.
The patch introducing the new SRv6 End.DT46 Behavior in iproute2 will
follow shortly.
Comments, suggestions and improvements are very welcome as always!
Thanks,
Andrea
[1] https://www.rfc-editor.org/rfc/rfc8986.html#name-enddt46-decapsulation-and-s
Andrea Mayer (2):
seg6: add support for SRv6 End.DT46 Behavior
selftests: seg6: add selftest for SRv6 End.DT46 Behavior
include/uapi/linux/seg6_local.h | 2 +
net/ipv6/seg6_local.c | 94 ++-
.../selftests/net/srv6_end_dt46_l3vpn_test.sh | 573 ++++++++++++++++++
3 files changed, 647 insertions(+), 22 deletions(-)
create mode 100755 tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
--
2.20.1
Hello,
I plan to contribute to n_gsm for better support of 3GPP TS 27.010.
Is there any test framework/suite for this module that I can use to avoid regressions?
I could not find anything within the kselftest framework.
And if there is nothing already available: What would be the best place for this?
With best regards,
Daniel Starke
Bcc:
Subject: Re: [PATCH v4 07/15] docs: locking: futex2: Add documentation
Reply-To:
In-Reply-To: <20210603195924.361327-8-andrealmeid(a)collabora.com>
On Thu, 03 Jun 2021, Andr� Almeida wrote:
>Add a new documentation file specifying both userspace API and internal
>implementation details of futex2 syscalls.
I think equally important would be to provide a manpage for each new
syscall you are introducing, and keep mkt in the loop as in the past he
extensively documented and improved futex manpages, and overall has a
lot of experience with dealing with kernel interfaces.
Thanks,
Davidlohr
>
>Signed-off-by: André Almeida <andrealmeid(a)collabora.com>
>---
> Documentation/locking/futex2.rst | 198 +++++++++++++++++++++++++++++++
> Documentation/locking/index.rst | 1 +
> 2 files changed, 199 insertions(+)
> create mode 100644 Documentation/locking/futex2.rst
>
>diff --git a/Documentation/locking/futex2.rst b/Documentation/locking/futex2.rst
>new file mode 100644
>index 000000000000..2f74d7c97a55
>--- /dev/null
>+++ b/Documentation/locking/futex2.rst
>@@ -0,0 +1,198 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+======
>+futex2
>+======
>+
>+:Author: André Almeida <andrealmeid(a)collabora.com>
>+
>+futex, or fast user mutex, is a set of syscalls to allow userspace to create
>+performant synchronization mechanisms, such as mutexes, semaphores and
>+conditional variables in userspace. C standard libraries, like glibc, uses it
>+as a means to implement more high level interfaces like pthreads.
>+
>+The interface
>+=============
>+
>+uAPI functions
>+--------------
>+
>+.. kernel-doc:: kernel/futex2.c
>+ :identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue
>+
>+uAPI structures
>+---------------
>+
>+.. kernel-doc:: include/uapi/linux/futex.h
>+
>+The ``flag`` argument
>+---------------------
>+
>+The flag is used to specify the size of the futex word
>+(FUTEX_[8, 16, 32, 64]). It's mandatory to define one, since there's no
>+default size.
>+
>+By default, the timeout uses a monotonic clock, but can be used as a realtime
>+one by using the FUTEX_REALTIME_CLOCK flag.
>+
>+By default, futexes are of the private type, that means that this user address
>+will be accessed by threads that share the same memory region. This allows for
>+some internal optimizations, so they are faster. However, if the address needs
>+to be shared with different processes (like using ``mmap()`` or ``shm()``), they
>+need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that.
>+
>+By default, the operation has no NUMA-awareness, meaning that the user can't
>+choose the memory node where the kernel side futex data will be stored. The
>+user can choose the node where it wants to operate by setting the
>+FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, 32 or
>+64)::
>+
>+ struct futexX_numa {
>+ __uX value;
>+ __sX hint;
>+ };
>+
>+This structure should be passed at the ``void *uaddr`` of futex functions. The
>+address of the structure will be used to be waited on/waken on, and the
>+``value`` will be compared to ``val`` as usual. The ``hint`` member is used to
>+define which node the futex will use. When waiting, the futex will be
>+registered on a kernel-side table stored on that node; when waking, the futex
>+will be searched for on that given table. That means that there's no redundancy
>+between tables, and the wrong ``hint`` value will lead to undesired behavior.
>+Userspace is responsible for dealing with node migrations issues that may
>+occur. ``hint`` can range from [0, MAX_NUMA_NODES), for specifying a node, or
>+-1, to use the same node the current process is using.
>+
>+When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a
>+global table on allocated on the first node.
>+
>+The ``timo`` argument
>+---------------------
>+
>+As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout
>+options known to be buggy. Given that, ``timo`` should be a 64-bit timeout at
>+all platforms, using an absolute timeout value.
>+
>+Implementation
>+==============
>+
>+The internal implementation follows a similar design to the original futex.
>+Given that we want to replicate the same external behavior of current futex,
>+this should be somewhat expected.
>+
>+Waiting
>+-------
>+
>+For the wait operations, they are all treated as if you want to wait on N
>+futexes, so the path for futex_wait and futex_waitv is the basically the same.
>+For both syscalls, the first step is to prepare an internal list for the list
>+of futexes to wait for (using struct futexv_head). For futex_wait() calls, this
>+list will have a single object.
>+
>+We have a hash table, where waiters register themselves before sleeping. Then
>+the wake function checks this table looking for waiters at uaddr. The hash
>+bucket to be used is determined by a struct futex_key, that stores information
>+to uniquely identify an address from a given process. Given the huge address
>+space, there'll be hash collisions, so we store information to be later used on
>+collision treatment.
>+
>+First, for every futex we want to wait on, we check if (``*uaddr == val``).
>+This check is done holding the bucket lock, so we are correctly serialized with
>+any futex_wake() calls. If any waiter fails the check above, we dequeue all
>+futexes. The check (``*uaddr == val``) can fail for two reasons:
>+
>+- The values are different, and we return -EAGAIN. However, if while
>+ dequeueing we found that some futexes were awakened, we prioritize this
>+ and return success.
>+
>+- When trying to access the user address, we do so with page faults
>+ disabled because we are holding a bucket's spin lock (and can't sleep
>+ while holding a spin lock). If there's an error, it might be a page
>+ fault, or an invalid address. We release the lock, dequeue everyone
>+ (because it's illegal to sleep while there are futexes enqueued, we
>+ could lose wakeups) and try again with page fault enabled. If we
>+ succeed, this means that the address is valid, but we need to do
>+ all the work again. For serialization reasons, we need to have the
>+ spin lock when getting the user value. Additionally, for shared
>+ futexes, we also need to recalculate the hash, since the underlying
>+ mapping mechanisms could have changed when dealing with page fault.
>+ If, even with page fault enabled, we can't access the address, it
>+ means it's an invalid user address, and we return -EFAULT. For this
>+ case, we prioritize the error, even if some futexes were awaken.
>+
>+If the check is OK, they are enqueued on a linked list in our bucket, and
>+proceed to the next one. If all waiters succeed, we put the thread to sleep
>+until a futex_wake() call, timeout expires or we get a signal. After waking up,
>+we dequeue everyone, and check if some futex was awakened. This dequeue is done
>+by iteratively walking at each element of struct futex_head list.
>+
>+All enqueuing/dequeuing operations requires to hold the bucket lock, to avoid
>+racing while modifying the list.
>+
>+Waking
>+------
>+
>+We get the bucket that's storing the waiters at uaddr, and wake the required
>+number of waiters, checking for hash collision.
>+
>+There's an optimization that makes futex_wake() not take the bucket lock if
>+there's no one to be woken on that bucket. It checks an atomic counter that each
>+bucket has, if it says 0, then the syscall exits. In order for this to work, the
>+waiter thread increases it before taking the lock, so the wake thread will
>+correctly see that there's someone waiting and will continue the path to take
>+the bucket lock. To get the correct serialization, the waiter issues a memory
>+barrier after increasing the bucket counter and the waker issues a memory
>+barrier before checking it.
>+
>+Requeuing
>+---------
>+
>+The requeue path first checks for each struct futex_requeue and their flags.
>+Then, it will compare the expected value with the one at uaddr1::uaddr.
>+Following the same serialization explained at Waking_, we increase the atomic
>+counter for the bucket of uaddr2 before taking the lock. We need to have both
>+buckets locks at same time so we don't race with other futex operation. To
>+ensure the locks are taken in the same order for all threads (and thus avoiding
>+deadlocks), every requeue operation takes the "smaller" bucket first, when
>+comparing both addresses.
>+
>+If the compare with user value succeeds, we proceed by waking ``nr_wake``
>+futexes, and then requeuing ``nr_requeue`` from bucket of uaddr1 to the uaddr2.
>+This consists in a simple list deletion/addition and replacing the old futex key
>+with the new one.
>+
>+Futex keys
>+----------
>+
>+There are two types of futexes: private and shared ones. The private are futexes
>+meant to be used by threads that share the same memory space, are easier to be
>+uniquely identified and thus can have some performance optimization. The
>+elements for identifying one are: the start address of the page where the
>+address is, the address offset within the page and the current->mm pointer.
>+
>+Now, for uniquely identifying a shared futex:
>+
>+- If the page containing the user address is an anonymous page, we can
>+ just use the same data used for private futexes (the start address of
>+ the page, the address offset within the page and the current->mm
>+ pointer); that will be enough for uniquely identifying such futex. We
>+ also set one bit at the key to differentiate if a private futex is
>+ used on the same address (mixing shared and private calls does not
>+ work).
>+
>+- If the page is file-backed, current->mm maybe isn't the same one for
>+ every user of this futex, so we need to use other data: the
>+ page->index, a UUID for the struct inode and the offset within the
>+ page.
>+
>+Note that members of futex_key don't have any particular meaning after they
>+are part of the struct - they are just bytes to identify a futex. Given that,
>+we don't need to use a particular name or type that matches the original data,
>+we only need to care about the bitsize of each component and make both private
>+and shared fit in the same memory space.
>+
>+Source code documentation
>+=========================
>+
>+.. kernel-doc:: kernel/futex2.c
>+ :no-identifiers: sys_futex_wait sys_futex_wake sys_futex_waitv sys_futex_requeue
>diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
>index 7003bd5aeff4..9bf03c7fa1ec 100644
>--- a/Documentation/locking/index.rst
>+++ b/Documentation/locking/index.rst
>@@ -24,6 +24,7 @@ locking
> percpu-rw-semaphore
> robust-futexes
> robust-futex-ABI
>+ futex2
>
> .. only:: subproject and html
>
>--
>2.31.1
>
Add a libbpf dumper function that supports dumping a representation
of data passed in using the BTF id associated with the data in a
manner similar to the bpf_snprintf_btf helper.
Default output format is identical to that dumped by bpf_snprintf_btf()
(bar using tabs instead of spaces for indentation, but the indent string
can be customized also); for example, a "struct sk_buff" representation
would look like this:
(struct sk_buff){
(union){
(struct){
.next = (struct sk_buff *)0xffffffffffffffff,
.prev = (struct sk_buff *)0xffffffffffffffff,
(union){
.dev = (struct net_device *)0xffffffffffffffff,
.dev_scratch = (long unsigned int)18446744073709551615,
},
},
...
Patch 1 implements the dump functionality in a manner similar
to that in kernel/bpf/btf.c, but with a view to fitting into
libbpf more naturally. For example, rather than using flags,
boolean dump options are used to control output. In addition,
rather than combining checks for display (such as is this
field zero?) and actual display - as is done for the kernel
code - the code is organized to separate zero and overflow
checks from type display.
Patch 2 consists of selftests that utilize a dump printf function
to snprintf the dump output to a string for comparison with
expected output. Tests deliberately mirror those in
snprintf_btf helper test to keep output consistent, but
also cover overflow handling, var/section display.
Changes since v3 [1]
- Retained separation of emitting of type name cast prefixing
type values from existing functionality such as btf_dump_emit_type_chain()
since initial code-shared version had so many exceptions it became
hard to read. For example, we don't emit a type name if the type
to be displayed is an array member, we also always emit "forward"
definitions for structs/unions that aren't really forward definitions
(we just want a "struct foo" output for "(struct foo){.bar = ...".
We also always ignore modifiers const/volatile/restrict as they
clutter output when emitting large types.
- Added configurable 4-char indent string option; defaults to tab
(Andrii)
- Added support for BTF_KIND_FLOAT and associated tests (Andrii)
- Added support for BTF_KIND_FUNC_PROTO function pointers to
improve output of "ops" structures; for example:
(struct file_operations){
.owner = (struct module *)0xffffffffffffffff,
.llseek = (loff_t(*)(struct file *, loff_t, int))0xffffffffffffffff,
...
Added associated test also (Andrii)
- Added handling for enum bitfields and associated test (Andrii)
- Allocation of "struct btf_dump_data" done on-demand (Andrii)
- Removed ".field = " output from function emitting type name and
into caller (Andrii)
- Removed BTF_INT_OFFSET() support (Andrii)
- Use libbpf_err() to set errno for error cases (Andrii)
- btf_dump_dump_type_data() returns size written, which is used
when returning successfully from btf_dump__dump_type_data()
(Andrii)
Changes since v2 [2]
- Renamed function to btf_dump__dump_type_data, reorganized
arguments such that opts are last (Andrii)
- Modified code to separate questions about display such
as have we overflowed?/is this field zero? from actual
display of typed data, such that we ask those questions
separately from the code that actually displays typed data
(Andrii)
- Reworked code to handle overflow - where we do not provide
enough data for the type we wish to display - by returning
-E2BIG and attempting to present as much data as possible.
Such a mode of operation allows for tracers which retrieve
partial data (such as first 1024 bytes of a
"struct task_struct" say), and want to display that partial
data, while also knowing that it is not the full type.
Such tracers can then denote this (perhaps via "..." or
similar).
- Explored reusing existing type emit functions, such as
passing in a type id stack with a single type id to
btf_dump_emit_type_chain() to support the display of
typed data where a "cast" is prepended to the data to
denote its type; "(int)1", "(struct foo){", etc.
However the task of emitting a
".field_name = (typecast)" did not match well with model
of walking the stack to display innermost types first
and made the resultant code harder to read. Added a
dedicated btf_dump_emit_type_name() function instead which
is only ~70 lines (Andrii)
- Various cleanups around bitfield macros, unneeded member
iteration macros, avoiding compiler complaints when
displaying int da ta by casting to long long, etc (Andrii)
- Use DECLARE_LIBBPF_OPTS() in defining opts for tests (Andrii)
- Added more type tests, overflow tests, var tests and
section tests.
Changes since RFC [3]
- The initial approach explored was to share the kernel code
with libbpf using #defines to paper over the different needs;
however it makes more sense to try and fit in with libbpf
code style for maintenance. A comment in the code points at
the implementation in kernel/bpf/btf.c and notes that any
issues found in it should be fixed there or vice versa;
mirroring the tests should help with this also
(Andrii)
[1] https://lore.kernel.org/bpf/1622131170-8260-1-git-send-email-alan.maguire@o…
[2] https://lore.kernel.org/bpf/1610921764-7526-1-git-send-email-alan.maguire@o…
[3] https://lore.kernel.org/bpf/1610386373-24162-1-git-send-email-alan.maguire@…
Alan Maguire (2):
libbpf: BTF dumper support for typed data
selftests/bpf: add dump type data tests to btf dump tests
tools/lib/bpf/btf.h | 22 +
tools/lib/bpf/btf_dump.c | 1008 ++++++++++++++++++++-
tools/lib/bpf/libbpf.map | 1 +
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 638 +++++++++++++
4 files changed, 1667 insertions(+), 2 deletions(-)
--
1.8.3.1
This test will require /dev/rtc0, the default RTC device, or one
specified by user to run. Since this default RTC is not guaranteed to
exist on all of the devices, so check its existence first, otherwise
skip this test with the kselftest skip code 4.
Without this patch this test will fail like this on a s390x zVM:
# selftests: timers: rtcpie
# /dev/rtc0: No such file or directory
not ok 1 selftests: timers: rtcpie # exit=22
With this patch:
# selftests: timers: rtcpie
# Default RTC /dev/rtc0 does not exist. Test Skipped!
not ok 9 selftests: timers: rtcpie # SKIP
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/timers/rtcpie.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/timers/rtcpie.c b/tools/testing/selftests/timers/rtcpie.c
index 47b5bad..4ef2184 100644
--- a/tools/testing/selftests/timers/rtcpie.c
+++ b/tools/testing/selftests/timers/rtcpie.c
@@ -18,6 +18,8 @@
#include <stdlib.h>
#include <errno.h>
+#include "../kselftest.h"
+
/*
* This expects the new RTC class driver framework, working with
* clocks that will often not be clones of what the PC-AT had.
@@ -35,8 +37,14 @@ int main(int argc, char **argv)
switch (argc) {
case 2:
rtc = argv[1];
- /* FALLTHROUGH */
+ break;
case 1:
+ fd = open(default_rtc, O_RDONLY);
+ if (fd == -1) {
+ printf("Default RTC %s does not exist. Test Skipped!\n", default_rtc);
+ exit(KSFT_SKIP);
+ }
+ close(fd);
break;
default:
fprintf(stderr, "usage: rtctest [rtcdev] [d]\n");
--
2.7.4
This patchset expands test coverage for futex, implementing two new
selftests: one for testing different types of futexes and one for the
requeue operation.
André Almeida (2):
selftests: futex: Add futex wait test
selftests: futex: Add futex compare requeue test
.../selftests/futex/functional/.gitignore | 2 +
.../selftests/futex/functional/Makefile | 4 +-
.../futex/functional/futex_requeue.c | 145 ++++++++++++++
.../selftests/futex/functional/futex_wait.c | 180 ++++++++++++++++++
.../testing/selftests/futex/functional/run.sh | 6 +
5 files changed, 336 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/futex/functional/futex_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex_wait.c
--
2.31.1
Currently vlan modification action checks existence of vlan priority by
comparing it to 0. Therefore it is impossible to modify existing vlan
tag to have priority 0.
For example, the following tc command will change the vlan id but will
not affect vlan priority:
tc filter add dev eth1 ingress matchall action vlan modify id 300 \
priority 0 pipe mirred egress redirect dev eth2
The incoming packet on eth1:
ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4
will be changed to:
ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4
although the user has intended to have p == 0.
The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params
and rely on it when deciding to set the priority.
The same flag is used to avoid dumping unset vlan priority.
Change Log:
v3 -> v4:
- revert tcf_vlan_get_fill_size change: total size calculation may race vs dump
v2 -> v3:
- Push assumes that the priority is being set
- tcf_vlan_get_fill_size accounts for priority existence
v1 -> v2:
- Do not dump unset priority and fix tests accordingly
- Test for priority 0 modification
Boris Sukholitko (3):
net/sched: act_vlan: Fix modify to allow 0
net/sched: act_vlan: No dump for unset priority
net/sched: act_vlan: Test priority 0 modification
include/net/tc_act/tc_vlan.h | 1 +
net/sched/act_vlan.c | 11 +++++---
.../tc-testing/tc-tests/actions/vlan.json | 28 +++++++++++++++++--
3 files changed, 34 insertions(+), 6 deletions(-)
--
2.29.3
Currently vlan modification action checks existence of vlan priority by
comparing it to 0. Therefore it is impossible to modify existing vlan
tag to have priority 0.
For example, the following tc command will change the vlan id but will
not affect vlan priority:
tc filter add dev eth1 ingress matchall action vlan modify id 300 \
priority 0 pipe mirred egress redirect dev eth2
The incoming packet on eth1:
ethertype 802.1Q (0x8100), vlan 200, p 4, ethertype IPv4
will be changed to:
ethertype 802.1Q (0x8100), vlan 300, p 4, ethertype IPv4
although the user has intended to have p == 0.
The fix is to add tcfv_push_prio_exists flag to struct tcf_vlan_params
and rely on it when deciding to set the priority.
The same flag is used to avoid dumping unset vlan priority.
Change Log:
v2 -> v3:
- Push assumes that the priority is being set
- tcf_vlan_get_fill_size accounts for priority existence
v1 -> v2:
- Do not dump unset priority and fix tests accordingly
- Test for priority 0 modification
Boris Sukholitko (3):
net/sched: act_vlan: Fix modify to allow 0
net/sched: act_vlan: No dump for unset priority
net/sched: act_vlan: Test priority 0 modification
include/net/tc_act/tc_vlan.h | 1 +
net/sched/act_vlan.c | 26 ++++++++++++-----
.../tc-testing/tc-tests/actions/vlan.json | 28 +++++++++++++++++--
3 files changed, 46 insertions(+), 9 deletions(-)
--
2.29.3
From: Colin Ian King <colin.king(a)canonical.com>
There is a spelling mistake in a debug message. Fix it.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/kvm/demand_paging_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c
index b74704305835..874dc8f7248f 100644
--- a/tools/testing/selftests/kvm/demand_paging_test.c
+++ b/tools/testing/selftests/kvm/demand_paging_test.c
@@ -230,7 +230,7 @@ static void setup_demand_paging(struct kvm_vm *vm,
PER_PAGE_DEBUG("Userfaultfd %s mode, faults resolved with %s\n",
is_minor ? "MINOR" : "MISSING",
- is_minor ? "UFFDIO_CONINUE" : "UFFDIO_COPY");
+ is_minor ? "UFFDIO_CONTINUE" : "UFFDIO_COPY");
/* In order to get minor faults, prefault via the alias. */
if (is_minor) {
--
2.31.1
Hi,
Possible ptrace related bug in tools/testing/selftests/ptrace/vmaccess.c -
it timeout every time I run it on any kernel I try, in vm or bare metal.
Does it run correctly for anybody?
Example vmaccess from v5.12 source tree run on 5.13.0+rc2 on bare metal
Supermicro server (AMD EPYC):
5.10.35-rt-alt1.rt39:~# /usr/lib/kselftests/ptrace/vmaccess
TAP version 13
1..2
# Starting 2 tests from 1 test cases.
# RUN global.vmaccess ...
# OK global.vmaccess
ok 1 global.vmaccess
# RUN global.attach ...
# attach: Test terminated by timeout
# FAIL global.attach
not ok 2 global.attach
# FAILED: 1 / 2 tests passed.
# Totals: pass:1 fail:1 xfail:0 xpass:0 skip:0 error:0
Just to confirm for the latest kernel, it behaves the same [for drm-tip
based] on 5.13rc2. Also, just tested on 5.11.21 with the same failure.
Other ptrace tests pass.
Thanks,
TL;DR: Add support to kunit_tool to dispatch tests via QEMU. Also add
support to immediately shutdown a kernel after running KUnit tests.
Background
----------
KUnit has supported running on all architectures for quite some time;
however, kunit_tool - the script commonly used to invoke KUnit tests -
has only fully supported KUnit run on UML. Its functionality has been
broken up for some time to separate the configure, build, run, and parse
phases making it possible to be used in part on other architectures to a
small extent. Nevertheless, kunit_tool has not supported running tests
on other architectures.
What this patchset does
-----------------------
This patchset introduces first class support to kunit_tool for KUnit to
be run on many popular architectures via QEMU. It does this by adding
two new flags: `--arch` and `--cross_compile`.
`--arch` allows an architecture to be specified by the name the
architecture is given in `arch/`. It uses the specified architecture to
select a minimal amount of Kconfigs and QEMU configs needed for the
architecture to run in QEMU and provide a console from which KTAP
results can be scraped.
`--cross_compile` allows a toolchain prefix to be specified to make
similar to how `CROSS_COMPILE` is used.
Additionally, this patchset revives the previously considered "kunit:
tool: add support for QEMU"[1] patchs. The motivation for this new
kernel command line flags, `kunit_shutdown`, is to better support
running KUnit tests inside of QEMU. For most popular architectures, QEMU
can be made to terminate when the Linux kernel that is being run is
reboted, halted, or powered off. As Kees pointed out in a previous
discussion[2], it is possible to make a kernel initrd that can reboot
the kernel immediately, doing this for every architecture would likely
be infeasible. Instead, just having an option for the kernel to shutdown
when it is done with testing seems a lot simpler, especially since it is
an option which would only available in testing configurations of the
kernel anyway.
Changes since last revision
---------------------------
I pulled out the QemuConfigs into their own files; the way in which I
did this also allows new QemuConfigs to be added without making any
changes to kunit_tool.
I changed how Kconfigs are loaded; they are now merged inside of
kunit_tool instead of letting Kbuild do it.
I also made numerous nit fixes.
Finally, I added a new section to the kunit_tool documentation to
document the new command line flags I added.
[1] http://patches.linaro.org/patch/208336/
[2] https://lkml.org/lkml/2020/6/26/988
Brendan Higgins (3):
Documentation: Add kunit_shutdown to kernel-parameters.txt
kunit: tool: add support for QEMU
Documentation: kunit: document support for QEMU in kunit_tool
David Gow (1):
kunit: Add 'kunit_shutdown' option
.../admin-guide/kernel-parameters.txt | 8 +
Documentation/dev-tools/kunit/kunit-tool.rst | 48 +++++
Documentation/dev-tools/kunit/usage.rst | 50 +++--
lib/kunit/executor.c | 20 ++
tools/testing/kunit/kunit.py | 57 +++++-
tools/testing/kunit/kunit_config.py | 7 +-
tools/testing/kunit/kunit_kernel.py | 177 +++++++++++++++---
tools/testing/kunit/kunit_parser.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 18 +-
tools/testing/kunit/qemu_config.py | 16 ++
tools/testing/kunit/qemu_configs/alpha.py | 10 +
tools/testing/kunit/qemu_configs/arm.py | 13 ++
tools/testing/kunit/qemu_configs/arm64.py | 12 ++
tools/testing/kunit/qemu_configs/i386.py | 10 +
tools/testing/kunit/qemu_configs/powerpc.py | 12 ++
tools/testing/kunit/qemu_configs/riscv.py | 31 +++
tools/testing/kunit/qemu_configs/s390.py | 14 ++
tools/testing/kunit/qemu_configs/sparc.py | 10 +
tools/testing/kunit/qemu_configs/x86_64.py | 10 +
19 files changed, 471 insertions(+), 54 deletions(-)
create mode 100644 tools/testing/kunit/qemu_config.py
create mode 100644 tools/testing/kunit/qemu_configs/alpha.py
create mode 100644 tools/testing/kunit/qemu_configs/arm.py
create mode 100644 tools/testing/kunit/qemu_configs/arm64.py
create mode 100644 tools/testing/kunit/qemu_configs/i386.py
create mode 100644 tools/testing/kunit/qemu_configs/powerpc.py
create mode 100644 tools/testing/kunit/qemu_configs/riscv.py
create mode 100644 tools/testing/kunit/qemu_configs/s390.py
create mode 100644 tools/testing/kunit/qemu_configs/sparc.py
create mode 100644 tools/testing/kunit/qemu_configs/x86_64.py
base-commit: d7eab3df8f39b116d934bc17f8070861e18cfb62
--
2.31.1.818.g46aad6cb9e-goog
Add a libbpf dumper function that supports dumping a representation
of data passed in using the BTF id associated with the data in a
manner similar to the bpf_snprintf_btf helper.
Default output format is identical to that dumped by bpf_snprintf_btf()
(bar using tabs instead of spaces for indentation); for example,
a "struct sk_buff" representation would look like this:
(struct sk_buff){
(union){
(struct){
.next = (struct sk_buff *)0xffffffffffffffff,
.prev = (struct sk_buff *)0xffffffffffffffff,
(union){
.dev = (struct net_device *)0xffffffffffffffff,
.dev_scratch = (long unsigned int)18446744073709551615,
},
},
...
Patch 1 implements the dump functionality in a manner similar
to that in kernel/bpf/btf.c, but with a view to fitting into
libbpf more naturally. For example, rather than using flags,
boolean dump options are used to control output. In addition,
rather than combining checks for display (such as is this
field zero?) and actual display - as is done for the kernel
code - the code is organized to separate zero and overflow
checks from type display.
Patch 2 consists of selftests that utilize a dump printf function
to snprintf the dump output to a string for comparison with
expected output. Tests deliberately mirror those in
snprintf_btf helper test to keep output consistent, but
also cover overflow handling, var/section display.
Apologies for the long time lag between v2 and this revision.
Changes since v2 [1]
- Renamed function to btf_dump__dump_type_data, reorganized
arguments such that opts are last (Andrii)
- Modified code to separate questions about display such
as have we overflowed?/is this field zero? from actual
display of typed data, such that we ask those questions
separately from the code that actually displays typed data
(Andrii)
- Reworked code to handle overflow - where we do not provide
enough data for the type we wish to display - by returning
-E2BIG and attempting to present as much data as possible.
Such a mode of operation allows for tracers which retrieve
partial data (such as first 1024 bytes of a
"struct task_struct" say), and want to display that partial
data, while also knowing that it is not the full type.
Such tracers can then denote this (perhaps via "..." or
similar).
- Explored reusing existing type emit functions, such as
passing in a type id stack with a single type id to
btf_dump_emit_type_chain() to support the display of
typed data where a "cast" is prepended to the data to
denote its type; "(int)1", "(struct foo){", etc.
However the task of emitting a
".field_name = (typecast)" did not match well with model
of walking the stack to display innermost types first
and made the resultant code harder to read. Added a
dedicated btf_dump_emit_type_name() function instead which
is only ~70 lines (Andrii)
- Various cleanups around bitfield macros, unneeded member
iteration macros, avoiding compiler complaints when
displaying int da ta by casting to long long, etc (Andrii)
- Use DECLARE_LIBBPF_OPTS() in defining opts for tests (Andrii)
- Added more type tests, overflow tests, var tests and
section tests.
Changes since RFC [2]
- The initial approach explored was to share the kernel code
with libbpf using #defines to paper over the different needs;
however it makes more sense to try and fit in with libbpf
code style for maintenance. A comment in the code points at
the implementation in kernel/bpf/btf.c and notes that any
issues found in it should be fixed there or vice versa;
mirroring the tests should help with this also
(Andrii)
[1] https://lore.kernel.org/bpf/1610921764-7526-1-git-send-email-alan.maguire@o…
[2] https://lore.kernel.org/bpf/1610386373-24162-1-git-send-email-alan.maguire@…
Alan Maguire (2):
libbpf: BTF dumper support for typed data
selftests/bpf: add dump type data tests to btf dump tests
tools/lib/bpf/btf.h | 17 +
tools/lib/bpf/btf_dump.c | 901 ++++++++++++++++++++++
tools/lib/bpf/libbpf.map | 5 +
tools/testing/selftests/bpf/prog_tests/btf_dump.c | 524 +++++++++++++
4 files changed, 1447 insertions(+)
--
1.8.3.1
The kunit_mark_skipped() macro marks the current test as "skipped", with
the provided reason. The kunit_skip() macro will mark the test as
skipped, and abort the test.
The TAP specification supports this "SKIP directive" as a comment after
the "ok" / "not ok" for a test. See the "Directives" section of the TAP
spec for details:
https://testanything.org/tap-specification.html#directives
The 'success' field for KUnit tests is replaced with a kunit_status
enum, which can be SUCCESS, FAILURE, or SKIPPED, combined with a
'status_comment' containing information on why a test was skipped.
A new 'kunit_status' test suite is added to test this.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This change depends on the assertion typechecking fix here:
https://lore.kernel.org/linux-kselftest/20210513193204.816681-1-davidgow@go…
Only the first two patches in the series are required.
This is the long-awaited follow-up to the skip tests RFC:
https://lore.kernel.org/linux-kselftest/20200513042956.109987-1-davidgow@go…
There are quite a few changes since that version, principally:
- A kunit_status enum is now used, with SKIPPED a distinct state
- The kunit_mark_skipped() and kunit_skip() macros now take printf-style
format strings.
- There is now a kunit_status test suite providing basic tests of this
functionality.
- The kunit_tool changes have been split into a separate commit.
- The example skipped tests have been expanded an moved to their own
suite, which is not enabled by KUNIT_ALL_TESTS.
- A number of other fixes and changes here and there.
Cheers,
-- David
include/kunit/test.h | 68 ++++++++++++++++++++++++++++++++++++++----
lib/kunit/kunit-test.c | 42 +++++++++++++++++++++++++-
lib/kunit/test.c | 51 ++++++++++++++++++-------------
3 files changed, 134 insertions(+), 27 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index b68c61348121..40b536da027e 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -105,6 +105,18 @@ struct kunit;
#define KUNIT_SUBTEST_INDENT " "
#define KUNIT_SUBSUBTEST_INDENT " "
+/**
+ * enum kunit_status - Type of result for a test or test suite
+ * @KUNIT_SUCCESS: Denotes the test suite has not failed nor been skipped
+ * @KUNIT_FAILURE: Denotes the test has failed.
+ * @KUNIT_SKIPPED: Denotes the test has been skipped.
+ */
+enum kunit_status {
+ KUNIT_SUCCESS,
+ KUNIT_FAILURE,
+ KUNIT_SKIPPED,
+};
+
/**
* struct kunit_case - represents an individual test case.
*
@@ -148,13 +160,20 @@ struct kunit_case {
const void* (*generate_params)(const void *prev, char *desc);
/* private: internal use only. */
- bool success;
+ enum kunit_status status;
char *log;
};
-static inline char *kunit_status_to_string(bool status)
+static inline char *kunit_status_to_string(enum kunit_status status)
{
- return status ? "ok" : "not ok";
+ switch (status) {
+ case KUNIT_SKIPPED:
+ case KUNIT_SUCCESS:
+ return "ok";
+ case KUNIT_FAILURE:
+ return "not ok";
+ }
+ return "invalid";
}
/**
@@ -212,6 +231,7 @@ struct kunit_suite {
struct kunit_case *test_cases;
/* private: internal use only */
+ char status_comment[256];
struct dentry *debugfs;
char *log;
};
@@ -245,19 +265,21 @@ struct kunit {
* be read after the test case finishes once all threads associated
* with the test case have terminated.
*/
- bool success; /* Read only after test_case finishes! */
spinlock_t lock; /* Guards all mutable test state. */
+ enum kunit_status status; /* Read only after test_case finishes! */
/*
* Because resources is a list that may be updated multiple times (with
* new resources) from any thread associated with a test case, we must
* protect it with some type of lock.
*/
struct list_head resources; /* Protected by lock. */
+
+ char status_comment[256];
};
static inline void kunit_set_failure(struct kunit *test)
{
- WRITE_ONCE(test->success, false);
+ WRITE_ONCE(test->status, KUNIT_FAILURE);
}
void kunit_init_test(struct kunit *test, const char *name, char *log);
@@ -348,7 +370,7 @@ static inline int kunit_run_all_tests(void)
#define kunit_suite_for_each_test_case(suite, test_case) \
for (test_case = suite->test_cases; test_case->run_case; test_case++)
-bool kunit_suite_has_succeeded(struct kunit_suite *suite);
+enum kunit_status kunit_suite_has_succeeded(struct kunit_suite *suite);
/*
* Like kunit_alloc_resource() below, but returns the struct kunit_resource
@@ -612,6 +634,40 @@ void kunit_cleanup(struct kunit *test);
void kunit_log_append(char *log, const char *fmt, ...);
+/**
+ * kunit_mark_skipped() - Marks @test_or_suite as skipped
+ *
+ * @test_or_suite: The test context object.
+ * @fmt: A printk() style format string.
+ *
+ * Marks the test as skipped. @fmt is given output as the test status
+ * comment, typically the reason the test was skipped.
+ *
+ * Test execution continues after kunit_mark_skipped() is called.
+ */
+#define kunit_mark_skipped(test_or_suite, fmt, ...) \
+ do { \
+ WRITE_ONCE((test_or_suite)->status, KUNIT_SKIPPED); \
+ scnprintf((test_or_suite)->status_comment, 256, fmt, ##__VA_ARGS__); \
+ } while (0)
+
+/**
+ * kunit_skip() - Marks @test_or_suite as skipped
+ *
+ * @test_or_suite: The test context object.
+ * @fmt: A printk() style format string.
+ *
+ * Skips the test. @fmt is given output as the test status
+ * comment, typically the reason the test was skipped.
+ *
+ * Test execution is halted after kunit_skip() is called.
+ */
+#define kunit_skip(test_or_suite, fmt, ...) \
+ do { \
+ kunit_mark_skipped((test_or_suite), fmt, ##__VA_ARGS__);\
+ kunit_try_catch_throw(&((test_or_suite)->try_catch)); \
+ } while (0)
+
/*
* printk and log to per-test or per-suite log buffer. Logging only done
* if CONFIG_KUNIT_DEBUGFS is 'y'; if it is 'n', no log is allocated/used.
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index 69f902440a0e..d69efcbed624 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -437,7 +437,47 @@ static void kunit_log_test(struct kunit *test)
#endif
}
+static void kunit_status_set_failure_test(struct kunit *test)
+{
+ struct kunit fake;
+
+ kunit_init_test(&fake, "fake test", NULL);
+
+ KUNIT_EXPECT_EQ(test, fake.status, (enum kunit_status)KUNIT_SUCCESS);
+ kunit_set_failure(&fake);
+ KUNIT_EXPECT_EQ(test, fake.status, (enum kunit_status)KUNIT_FAILURE);
+}
+
+static void kunit_status_mark_skipped_test(struct kunit *test)
+{
+ struct kunit fake;
+
+ kunit_init_test(&fake, "fake test", NULL);
+
+ /* Before: Should be SUCCESS with no comment. */
+ KUNIT_EXPECT_EQ(test, fake.status, KUNIT_SUCCESS);
+ KUNIT_EXPECT_STREQ(test, fake.status_comment, "");
+
+ /* Mark the test as skipped. */
+ kunit_mark_skipped(&fake, "Accepts format string: %s", "YES");
+
+ /* After: Should be SKIPPED with our comment. */
+ KUNIT_EXPECT_EQ(test, fake.status, (enum kunit_status)KUNIT_SKIPPED);
+ KUNIT_EXPECT_STREQ(test, fake.status_comment, "Accepts format string: YES");
+}
+
+static struct kunit_case kunit_status_test_cases[] = {
+ KUNIT_CASE(kunit_status_set_failure_test),
+ KUNIT_CASE(kunit_status_mark_skipped_test),
+ {}
+};
+
+static struct kunit_suite kunit_status_test_suite = {
+ .name = "kunit_status",
+ .test_cases = kunit_status_test_cases,
+};
+
kunit_test_suites(&kunit_try_catch_test_suite, &kunit_resource_test_suite,
- &kunit_log_test_suite);
+ &kunit_log_test_suite, &kunit_status_test_suite);
MODULE_LICENSE("GPL v2");
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 2f6cc0123232..0ee07705d2b0 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -98,12 +98,14 @@ static void kunit_print_subtest_start(struct kunit_suite *suite)
static void kunit_print_ok_not_ok(void *test_or_suite,
bool is_test,
- bool is_ok,
+ enum kunit_status status,
size_t test_number,
- const char *description)
+ const char *description,
+ const char *directive)
{
struct kunit_suite *suite = is_test ? NULL : test_or_suite;
struct kunit *test = is_test ? test_or_suite : NULL;
+ const char *directive_header = (status == KUNIT_SKIPPED) ? " # SKIP " : "";
/*
* We do not log the test suite results as doing so would
@@ -114,25 +116,31 @@ static void kunit_print_ok_not_ok(void *test_or_suite,
* representation.
*/
if (suite)
- pr_info("%s %zd - %s\n",
- kunit_status_to_string(is_ok),
- test_number, description);
+ pr_info("%s %zd - %s%s%s\n",
+ kunit_status_to_string(status),
+ test_number, description,
+ directive_header, directive ? directive : "");
else
- kunit_log(KERN_INFO, test, KUNIT_SUBTEST_INDENT "%s %zd - %s",
- kunit_status_to_string(is_ok),
- test_number, description);
+ kunit_log(KERN_INFO, test,
+ KUNIT_SUBTEST_INDENT "%s %zd - %s%s%s",
+ kunit_status_to_string(status),
+ test_number, description,
+ directive_header, directive ? directive : "");
}
-bool kunit_suite_has_succeeded(struct kunit_suite *suite)
+enum kunit_status kunit_suite_has_succeeded(struct kunit_suite *suite)
{
const struct kunit_case *test_case;
+ enum kunit_status status = KUNIT_SKIPPED;
kunit_suite_for_each_test_case(suite, test_case) {
- if (!test_case->success)
- return false;
+ if (test_case->status == KUNIT_FAILURE)
+ return KUNIT_FAILURE;
+ else if (test_case->status == KUNIT_SUCCESS)
+ status = KUNIT_SUCCESS;
}
- return true;
+ return status;
}
EXPORT_SYMBOL_GPL(kunit_suite_has_succeeded);
@@ -143,7 +151,8 @@ static void kunit_print_subtest_end(struct kunit_suite *suite)
kunit_print_ok_not_ok((void *)suite, false,
kunit_suite_has_succeeded(suite),
kunit_suite_counter++,
- suite->name);
+ suite->name,
+ suite->status_comment);
}
unsigned int kunit_test_case_num(struct kunit_suite *suite,
@@ -252,7 +261,8 @@ void kunit_init_test(struct kunit *test, const char *name, char *log)
test->log = log;
if (test->log)
test->log[0] = '\0';
- test->success = true;
+ test->status = KUNIT_SUCCESS;
+ test->status_comment[0] = '\0';
}
EXPORT_SYMBOL_GPL(kunit_init_test);
@@ -376,7 +386,8 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
context.test_case = test_case;
kunit_try_catch_run(try_catch, &context);
- test_case->success = test->success;
+ test_case->status = test->status;
+
}
int kunit_run_tests(struct kunit_suite *suite)
@@ -388,7 +399,6 @@ int kunit_run_tests(struct kunit_suite *suite)
kunit_suite_for_each_test_case(suite, test_case) {
struct kunit test = { .param_value = NULL, .param_index = 0 };
- bool test_success = true;
if (test_case->generate_params) {
/* Get initial param. */
@@ -398,7 +408,6 @@ int kunit_run_tests(struct kunit_suite *suite)
do {
kunit_run_case_catch_errors(suite, test_case, &test);
- test_success &= test_case->success;
if (test_case->generate_params) {
if (param_desc[0] == '\0') {
@@ -410,7 +419,7 @@ int kunit_run_tests(struct kunit_suite *suite)
KUNIT_SUBTEST_INDENT
"# %s: %s %d - %s",
test_case->name,
- kunit_status_to_string(test.success),
+ kunit_status_to_string(test.status),
test.param_index + 1, param_desc);
/* Get next param. */
@@ -420,9 +429,10 @@ int kunit_run_tests(struct kunit_suite *suite)
}
} while (test.param_value);
- kunit_print_ok_not_ok(&test, true, test_success,
+ kunit_print_ok_not_ok(&test, true, test_case->status,
kunit_test_case_num(suite, test_case),
- test_case->name);
+ test_case->name,
+ test.status_comment);
}
kunit_print_subtest_end(suite);
@@ -434,6 +444,7 @@ EXPORT_SYMBOL_GPL(kunit_run_tests);
static void kunit_init_suite(struct kunit_suite *suite)
{
kunit_debugfs_create_suite(suite);
+ suite->status_comment[0] = '\0';
}
int __kunit_test_suites_init(struct kunit_suite * const * const suites)
--
2.31.1.818.g46aad6cb9e-goog
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v5 -> v6
- Use designated initializers for STATS_DESC
- Change KVM_STATS_SCALE... to KVM_STATS_BASE...
- Use a common function for kvm_[vm|vcpu]_stats_read
- Fix some documentation errors/missings
- Use TEST_ASSERT in selftest
- Use a common function for [vm|vcpu]_stats_test in selftest
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
[5] https://lore.kernel.org/kvm/20210517145314.157626-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 179 +++++++++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 38 ++-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 64 +++++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 64 +++++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 59 ++++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 129 ++++++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 67 +++++-
include/linux/kvm_host.h | 141 +++++++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 50 ++++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 216 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/kvm_main.c | 179 ++++++++++++++-
24 files changed, 1188 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
--
2.31.1.818.g46aad6cb9e-goog
Resctrl test suite accepts command line argument "-t" to specify the
unit tests to run in the test list (e.g., -t mbm,mba,cmt,cat) as
documented in the help.
When calling strtok() to parse the option, the incorrect delimiters
argument ":\t" is used. As a result, passing "-t mbm,mba,cmt,cat" throws
an invalid option error.
Fix this by using delimiters argument "," instead of ":\t" for parsing
of unit tests list. At the same time, remove the unnecessary "spaces"
between the unit tests in help documentation to prevent confusion.
Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest")
Fixes: 78941183d1b1 ("selftests/resctrl: Add Cache QoS Monitoring (CQM) selftest")
Fixes: ecdbb911f22d ("selftests/resctrl: Add MBM test")
Fixes: 034c7678dd2c ("selftests/resctrl: Add README for resctrl tests")
Cc: stable(a)vger.kernel.org
Signed-off-by: Xiaochen Shen <xiaochen.shen(a)intel.com>
Reviewed-by: Tony Luck <tony.luck(a)intel.com>
---
tools/testing/selftests/resctrl/README | 2 +-
tools/testing/selftests/resctrl/resctrl_tests.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/resctrl/README b/tools/testing/selftests/resctrl/README
index 4b36b25b6ac0..3d2bbd4fa3aa 100644
--- a/tools/testing/selftests/resctrl/README
+++ b/tools/testing/selftests/resctrl/README
@@ -47,7 +47,7 @@ Parameter '-h' shows usage information.
usage: resctrl_tests [-h] [-b "benchmark_cmd [options]"] [-t test list] [-n no_of_bits]
-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CMT default benchmark is builtin fill_buf
- -t test list: run tests specified in the test list, e.g. -t mbm, mba, cmt, cat
+ -t test list: run tests specified in the test list, e.g. -t mbm,mba,cmt,cat
-n no_of_bits: run cache tests using specified no of bits in cache bit mask
-p cpu_no: specify CPU number to run the test. 1 is default
-h: help
diff --git a/tools/testing/selftests/resctrl/resctrl_tests.c b/tools/testing/selftests/resctrl/resctrl_tests.c
index f51b5fc066a3..973f09a66e1e 100644
--- a/tools/testing/selftests/resctrl/resctrl_tests.c
+++ b/tools/testing/selftests/resctrl/resctrl_tests.c
@@ -40,7 +40,7 @@ static void cmd_help(void)
printf("\t-b benchmark_cmd [options]: run specified benchmark for MBM, MBA and CMT\n");
printf("\t default benchmark is builtin fill_buf\n");
printf("\t-t test list: run tests specified in the test list, ");
- printf("e.g. -t mbm, mba, cmt, cat\n");
+ printf("e.g. -t mbm,mba,cmt,cat\n");
printf("\t-n no_of_bits: run cache tests using specified no of bits in cache bit mask\n");
printf("\t-p cpu_no: specify CPU number to run the test. 1 is default\n");
printf("\t-h: help\n");
@@ -173,7 +173,7 @@ int main(int argc, char **argv)
return -1;
}
- token = strtok(NULL, ":\t");
+ token = strtok(NULL, ",");
}
break;
case 'p':
--
1.8.3.1
Hi All,
We have been noticing protection key test failures on our systems here.
Shggy(Dave Kleikamp) from oracle reported to this problem couple of weeks
ago. Here the is the failure.
# ./protection_keys_64
has pkeys: 1
startup pkey_reg: 0000000055555550
test 0 PASSED (iteration 1)
test 1 PASSED (iteration 1)
test 2 PASSED (iteration 1)
test 3 PASSED (iteration 1)
test 4 PASSED (iteration 1)
test 5 PASSED (iteration 1)
test 6 PASSED (iteration 1)
test 7 PASSED (iteration 1)
test 8 PASSED (iteration 1)
test 9 PASSED (iteration 1)
test 10 PASSED (iteration 1)
test 11 PASSED (iteration 1)
test 12 PASSED (iteration 1)
test 13 PASSED (iteration 1)
protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
`pkey_reg == shadow_pkey_reg' failed.
Aborted (core dumped)
The test that is failing is "test_ptrace_of_child".
Sometimes it fails even earlier also. Sometimes(very rarely) it
passes when I insert few printfs. Most probably it fails with
test_ptrace_of_child.
In the test "test_ptrace_of_child", the parent thread attaches to the
child and modifies the data structure associated with protection key.
Verifies the access results based on the key permissions. While running
the test, the tool finds the key permission changes out of nowhere. The
test fails with assert "assert(pkey_reg == shadow_pkey_reg);"
Investigation so far.
1. The test fails on AMD(Milan) systems. Test appears to pass on Intel
systems. Tested on old Intel system available here.
2. I was trying to see if the hardware(or firmware) is corrupting the pkru
register value. At this point, it does not appear that way. I was able to
trace all the MSR writes to the application or kernel. At this point, it
does not appear to me as an hardware issue. I see that kernel appears to
do save/restore properly during the context switch. This value stays
default(value 0x55555554) in most cases unless the application changes the
default permissions. Only application that changes here is
"protection_keys".
3. I played with the test tool little bit. The behavior changes
drastically when I make minor changes.
For example, in the following code.
void setup_handlers(void)
{
signal(SIGCHLD, &sig_chld);
setup_sigsegv_handler();
}
Just commenting the first line "signal(SIGCHLD, &sig_chld);" changes the
behavior drastically. I see some tests PASS after this change. The first
line appear to not do anything except some printing.
I still have not figured out completely what is going on with
setup_sigsegv_handler();. It seems very odd modifying some structures in
the signal handler. I am not sure about some of the offsets it is trying
to modify. I feel it may be messing up something there. I am not sure
though. Will have to investigate.
4. I took the traces(x86_fpu) while running test. It appears to show some
of the feature headers are not copied during the test. But I could not
figure out why it was happening. It appears to show not all the feature
headers are copied when the new child is created. When I printed the
feature bits, they all appear to be intact. Here is the trace.
protection_keys-17350 [035] 59275.833511: x86_fpu_regs_activated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu:
0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17351 [040] 59275.834274: x86_fpu_regs_activated:
x86/fpu: 0xffff93d722877800 load: 1 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834283: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17351 [040] 59275.834289: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834296: x86_fpu_regs_activated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834298: x86_fpu_regs_activated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834406: x86_fpu_before_save: x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834406: x86_fpu_after_save: x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834408: x86_fpu_before_save: x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834408: x86_fpu_after_save: x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834654: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834654: x86_fpu_dropped: x86/fpu:
0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
auditd-1834 [036] 59275.834655:
x86_fpu_regs_activated: x86/fpu: 0xffff93d713fef800 load: 1 xfeatures: 202
xcomp_bv: 8000000000000207
protection_keys-17350 [035] 59275.834665: x86_fpu_regs_deactivated:
x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
<...>-17285 [041] 59275.834679: x86_fpu_regs_activated:
x86/fpu: 0xffff93d716d0df40 load: 1 xfeatures: 202 xcomp_bv: 8000000000000207
5. I instrumented parent child interactions using a separate standalone
application(without the signal handler), it appears to work just fine. I
see PRKU register staying intact swiching from parent to child.
My suspicion at this point is towards the selftest tool protection_keys.c.
I will keep looking. Any feedback would be much appreciated to debug further.
Shaggy, Feel free to add if I missed something.
Thanks
Babu
Base
====
These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-13-17-23
tag. This is because this series depends on:
- UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1]
- UFFD minor fault support for shmem (in Andrew's tree) [2]
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
[2] https://lore.kernel.org/patchwork/cover/1420967/
Changelog
=========
v1->v2:
- Picked up Reviewed-by's.
- Change backing_src_is_shared() to check the flags, instead of the type. This
makes it robust to adding new backing source types in the future.
- Add another commit which refactors setup_demand_paging() error handling.
- Print UFFD ioctl type once in setup_demand_paging, instead of on every page-in
operation.
- Expand comment on why we use MFD_HUGETLB instead of MAP_HUGETLB.
- Reworded comment on addr_gpa2alias.
- Moved demand_paging_test.c timing calls outside of the if (), deduplicating
them.
- Split trivial comment / logging fixups into a separate commit.
- Add another commit which prints a clarifying message on test skip.
- Split the commit allowing backing src_type to be modified in two.
- Split the commit adding the shmem backing type in two.
- Rebased onto v5.13-rc1-mmots-2021-05-13-17-23.
Overview
========
Minor fault handling is a new userfaultfd feature whose goal is generally to
improve performance. In particular, it is intended for use with demand paging.
There are more details in the cover letters for this new feature (linked above),
but at a high level the idea is that we think of these three phases of live
migration of a VM:
1. Precopy, where we copy "some" pages from the source to the target, while the
VM is still running on the source machine.
2. Blackout, where execution stops on the source, and begins on the target.
3. Postcopy, where the VM is running on the target, some pages are already up
to date, and others are not (because they weren't copied, or were modified
after being copied).
During postcopy, the first time the guest touches memory, we intercept a minor
fault. Userspace checks whether or not the page is already up to date. If
needed, we copy the final version of the page from the soure machine. This
could be done with RDMA for example, to do it truly in place / with no copying.
At this point, all that's left is to setup PTEs for the guest: so we issue
UFFDIO_CONTINUE. No copying or page allocation needed.
Because of this use case, it's useful to exercise this as part of the demand
paging test. It lets us ensure the use case works correctly end-to-end, and also
gives us an in-tree way to profile the end-to-end flow for future performance
improvements.
Axel Rasmussen (10):
KVM: selftests: trivial comment/logging fixes
KVM: selftests: simplify setup_demand_paging error handling
KVM: selftests: print a message when skipping KVM tests
KVM: selftests: compute correct demand paging size
KVM: selftests: allow different backing source types
KVM: selftests: refactor vm_mem_backing_src_type flags
KVM: selftests: add shmem backing source type
KVM: selftests: create alias mappings when using shared memory
KVM: selftests: allow using UFFD minor faults for demand paging
KVM: selftests: add shared hugetlbfs backing source type
.../selftests/kvm/demand_paging_test.c | 175 +++++++++++-------
.../testing/selftests/kvm/include/kvm_util.h | 1 +
.../testing/selftests/kvm/include/test_util.h | 12 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 84 ++++++++-
.../selftests/kvm/lib/kvm_util_internal.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 51 +++--
6 files changed, 238 insertions(+), 87 deletions(-)
--
2.31.1.751.gd2f1c929bd-goog
Note: this does not change the parser behavior at all (except for making
one error message more useful). This is just an internal refactor.
The TAP output parser currently operates over a List[str].
This works, but we only ever need to be able to "peek" at the current
line and the ability to "pop" it off.
Also, using a List means we need to wait for all the output before we
can start parsing. While this is not an issue for most tests which are
really lightweight, we do have some longer (~5 minutes) tests.
This patch introduces an Input wrapper class that
* Exposes a peek()/pop() interface instead of manipulating an array
* this allows us to more easily add debugging code [1]
* Can consume an input from a generator
* we can now parse results as tests are running (the parser code
currently doesn't print until the end, so no impact yet).
* Tracks the current line number to print better error messages
* Would allow us to add additional features more easily, e.g. storing
N previous lines so we can print out invalid lines in context, etc.
[1] The parsing logic is currently quite fragile.
E.g. it'll often say the kernel "CRASHED" if there's something slightly
wrong with the output format. When debugging a test that had some memory
corruption issues, it resulted in very misleading errors from the parser.
Now we could easily add this to trace all the lines consumed and why
def pop(self) -> str:
n = self._next
+ print(f'popping {n[0]}: {n[1].ljust(40, " ")}| caller={inspect.stack()[1].function}')
Example output:
popping 77: TAP version 14 | caller=parse_tap_header
popping 78: 1..1 | caller=parse_test_plan
popping 79: # Subtest: kunit_executor_test | caller=parse_subtest_header
popping 80: 1..2 | caller=parse_subtest_plan
popping 81: ok 1 - parse_filter_test | caller=parse_ok_not_ok_test_case
popping 82: ok 2 - filter_subsuite_test | caller=parse_ok_not_ok_test_case
popping 83: ok 1 - kunit_executor_test | caller=parse_ok_not_ok_test_suite
If we introduce an invalid line, we can see the parser go down the wrong path:
popping 77: TAP version 14 | caller=parse_tap_header
popping 78: 1..1 | caller=parse_test_plan
popping 79: # Subtest: kunit_executor_test | caller=parse_subtest_header
popping 80: 1..2 | caller=parse_subtest_plan
popping 81: 1..2 # this is invalid! | caller=parse_ok_not_ok_test_case
popping 82: ok 1 - parse_filter_test | caller=parse_ok_not_ok_test_case
popping 83: ok 2 - filter_subsuite_test | caller=parse_ok_not_ok_test_case
popping 84: ok 1 - kunit_executor_test | caller=parse_ok_not_ok_test_case
[ERROR] ran out of lines before end token
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit_parser.py | 136 ++++++++++++++++---------
tools/testing/kunit/kunit_tool_test.py | 18 ++--
2 files changed, 99 insertions(+), 55 deletions(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index e8bcc139702e..65adb386364a 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -47,22 +47,63 @@ class TestStatus(Enum):
NO_TESTS = auto()
FAILURE_TO_PARSE_TESTS = auto()
+class Input:
+ """Provides a more convenient interface over isolate_kunit_output()."""
+ _lines: Iterator[Tuple[int, str]]
+ _next: Tuple[int, str]
+ _done: bool
+
+ def __init__(self, lines: Iterator[Tuple[int, str]]):
+ self._lines = lines
+ self._done = False
+ self._next = (0, '')
+ self._get_next()
+
+ def _get_next(self) -> None:
+ try:
+ self._next = next(self._lines)
+ except StopIteration:
+ self._done = True
+
+ def peek(self) -> str:
+ return self._next[1]
+
+ def pop(self) -> str:
+ n = self._next
+ self._get_next()
+ return n[1]
+
+ def __bool__(self) -> bool:
+ return not self._done
+
+ # Only used by kunit_tool_test.py.
+ def __iter__(self) -> Iterator[str]:
+ while bool(self):
+ yield self.pop()
+
+ def line_number(self) -> int:
+ return self._next[0]
+
kunit_start_re = re.compile(r'TAP version [0-9]+$')
kunit_end_re = re.compile('(List of all partitions:|'
'Kernel panic - not syncing: VFS:)')
-def isolate_kunit_output(kernel_output) -> Iterator[str]:
- started = False
- for line in kernel_output:
- line = line.rstrip() # line always has a trailing \n
- if kunit_start_re.search(line):
- prefix_len = len(line.split('TAP version')[0])
- started = True
- yield line[prefix_len:] if prefix_len > 0 else line
- elif kunit_end_re.search(line):
- break
- elif started:
- yield line[prefix_len:] if prefix_len > 0 else line
+def get_input(kernel_output: Iterable[str]) -> Input:
+ def isolate_kunit_output(kernel_output: Iterable[str]) -> Iterator[Tuple[int, str]]:
+ line_num = 0
+ started = False
+ for line in kernel_output:
+ line_num += 1
+ line = line.rstrip() # line always has a trailing \n
+ if kunit_start_re.search(line):
+ prefix_len = len(line.split('TAP version')[0])
+ started = True
+ yield line_num, line[prefix_len:]
+ elif kunit_end_re.search(line):
+ break
+ elif started:
+ yield line_num, line[prefix_len:]
+ return Input(lines=isolate_kunit_output(kernel_output))
def raw_output(kernel_output) -> None:
for line in kernel_output:
@@ -97,14 +138,14 @@ def print_log(log) -> None:
TAP_ENTRIES = re.compile(r'^(TAP|[\s]*ok|[\s]*not ok|[\s]*[0-9]+\.\.[0-9]+|[\s]*#).*$')
-def consume_non_diagnostic(lines: List[str]) -> None:
- while lines and not TAP_ENTRIES.match(lines[0]):
- lines.pop(0)
+def consume_non_diagnostic(lines: Input) -> None:
+ while lines and not TAP_ENTRIES.match(lines.peek()):
+ lines.pop()
-def save_non_diagnostic(lines: List[str], test_case: TestCase) -> None:
- while lines and not TAP_ENTRIES.match(lines[0]):
- test_case.log.append(lines[0])
- lines.pop(0)
+def save_non_diagnostic(lines: Input, test_case: TestCase) -> None:
+ while lines and not TAP_ENTRIES.match(lines.peek()):
+ test_case.log.append(lines.peek())
+ lines.pop()
OkNotOkResult = namedtuple('OkNotOkResult', ['is_ok','description', 'text'])
@@ -112,18 +153,18 @@ OK_NOT_OK_SUBTEST = re.compile(r'^[\s]+(ok|not ok) [0-9]+ - (.*)$')
OK_NOT_OK_MODULE = re.compile(r'^(ok|not ok) ([0-9]+) - (.*)$')
-def parse_ok_not_ok_test_case(lines: List[str], test_case: TestCase) -> bool:
+def parse_ok_not_ok_test_case(lines: Input, test_case: TestCase) -> bool:
save_non_diagnostic(lines, test_case)
if not lines:
test_case.status = TestStatus.TEST_CRASHED
return True
- line = lines[0]
+ line = lines.peek()
match = OK_NOT_OK_SUBTEST.match(line)
while not match and lines:
- line = lines.pop(0)
+ line = lines.pop()
match = OK_NOT_OK_SUBTEST.match(line)
if match:
- test_case.log.append(lines.pop(0))
+ test_case.log.append(lines.pop())
test_case.name = match.group(2)
if test_case.status == TestStatus.TEST_CRASHED:
return True
@@ -138,14 +179,14 @@ def parse_ok_not_ok_test_case(lines: List[str], test_case: TestCase) -> bool:
SUBTEST_DIAGNOSTIC = re.compile(r'^[\s]+# (.*)$')
DIAGNOSTIC_CRASH_MESSAGE = re.compile(r'^[\s]+# .*?: kunit test case crashed!$')
-def parse_diagnostic(lines: List[str], test_case: TestCase) -> bool:
+def parse_diagnostic(lines: Input, test_case: TestCase) -> bool:
save_non_diagnostic(lines, test_case)
if not lines:
return False
- line = lines[0]
+ line = lines.peek()
match = SUBTEST_DIAGNOSTIC.match(line)
if match:
- test_case.log.append(lines.pop(0))
+ test_case.log.append(lines.pop())
crash_match = DIAGNOSTIC_CRASH_MESSAGE.match(line)
if crash_match:
test_case.status = TestStatus.TEST_CRASHED
@@ -153,7 +194,7 @@ def parse_diagnostic(lines: List[str], test_case: TestCase) -> bool:
else:
return False
-def parse_test_case(lines: List[str]) -> Optional[TestCase]:
+def parse_test_case(lines: Input) -> Optional[TestCase]:
test_case = TestCase()
save_non_diagnostic(lines, test_case)
while parse_diagnostic(lines, test_case):
@@ -165,24 +206,24 @@ def parse_test_case(lines: List[str]) -> Optional[TestCase]:
SUBTEST_HEADER = re.compile(r'^[\s]+# Subtest: (.*)$')
-def parse_subtest_header(lines: List[str]) -> Optional[str]:
+def parse_subtest_header(lines: Input) -> Optional[str]:
consume_non_diagnostic(lines)
if not lines:
return None
- match = SUBTEST_HEADER.match(lines[0])
+ match = SUBTEST_HEADER.match(lines.peek())
if match:
- lines.pop(0)
+ lines.pop()
return match.group(1)
else:
return None
SUBTEST_PLAN = re.compile(r'[\s]+[0-9]+\.\.([0-9]+)')
-def parse_subtest_plan(lines: List[str]) -> Optional[int]:
+def parse_subtest_plan(lines: Input) -> Optional[int]:
consume_non_diagnostic(lines)
- match = SUBTEST_PLAN.match(lines[0])
+ match = SUBTEST_PLAN.match(lines.peek())
if match:
- lines.pop(0)
+ lines.pop()
return int(match.group(1))
else:
return None
@@ -199,17 +240,17 @@ def max_status(left: TestStatus, right: TestStatus) -> TestStatus:
else:
return TestStatus.SUCCESS
-def parse_ok_not_ok_test_suite(lines: List[str],
+def parse_ok_not_ok_test_suite(lines: Input,
test_suite: TestSuite,
expected_suite_index: int) -> bool:
consume_non_diagnostic(lines)
if not lines:
test_suite.status = TestStatus.TEST_CRASHED
return False
- line = lines[0]
+ line = lines.peek()
match = OK_NOT_OK_MODULE.match(line)
if match:
- lines.pop(0)
+ lines.pop()
if match.group(1) == 'ok':
test_suite.status = TestStatus.SUCCESS
else:
@@ -231,7 +272,7 @@ def bubble_up_test_case_errors(test_suite: TestSuite) -> TestStatus:
max_test_case_status = bubble_up_errors(x.status for x in test_suite.cases)
return max_status(max_test_case_status, test_suite.status)
-def parse_test_suite(lines: List[str], expected_suite_index: int) -> Optional[TestSuite]:
+def parse_test_suite(lines: Input, expected_suite_index: int) -> Optional[TestSuite]:
if not lines:
return None
consume_non_diagnostic(lines)
@@ -257,26 +298,26 @@ def parse_test_suite(lines: List[str], expected_suite_index: int) -> Optional[Te
print_with_timestamp(red('[ERROR] ') + 'ran out of lines before end token')
return test_suite
else:
- print('failed to parse end of suite' + lines[0])
+ print(f'failed to parse end of suite "{name}", at line {lines.line_number()}: {lines.peek()}')
return None
TAP_HEADER = re.compile(r'^TAP version 14$')
-def parse_tap_header(lines: List[str]) -> bool:
+def parse_tap_header(lines: Input) -> bool:
consume_non_diagnostic(lines)
- if TAP_HEADER.match(lines[0]):
- lines.pop(0)
+ if TAP_HEADER.match(lines.peek()):
+ lines.pop()
return True
else:
return False
TEST_PLAN = re.compile(r'[0-9]+\.\.([0-9]+)')
-def parse_test_plan(lines: List[str]) -> Optional[int]:
+def parse_test_plan(lines: Input) -> Optional[int]:
consume_non_diagnostic(lines)
- match = TEST_PLAN.match(lines[0])
+ match = TEST_PLAN.match(lines.peek())
if match:
- lines.pop(0)
+ lines.pop()
return int(match.group(1))
else:
return None
@@ -284,7 +325,7 @@ def parse_test_plan(lines: List[str]) -> Optional[int]:
def bubble_up_suite_errors(test_suites: Iterable[TestSuite]) -> TestStatus:
return bubble_up_errors(x.status for x in test_suites)
-def parse_test_result(lines: List[str]) -> TestResult:
+def parse_test_result(lines: Input) -> TestResult:
consume_non_diagnostic(lines)
if not lines or not parse_tap_header(lines):
return TestResult(TestStatus.NO_TESTS, [], lines)
@@ -338,11 +379,12 @@ def print_and_count_results(test_result: TestResult) -> Tuple[int, int, int]:
print_with_timestamp('')
return total_tests, failed_tests, crashed_tests
-def parse_run_tests(kernel_output) -> TestResult:
+def parse_run_tests(kernel_output: Iterable[str]) -> TestResult:
total_tests = 0
failed_tests = 0
crashed_tests = 0
- test_result = parse_test_result(list(isolate_kunit_output(kernel_output)))
+ lines = get_input(kernel_output)
+ test_result = parse_test_result(lines)
if test_result.status == TestStatus.NO_TESTS:
print(red('[ERROR] ') + yellow('no tests run!'))
elif test_result.status == TestStatus.FAILURE_TO_PARSE_TESTS:
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 2e809dd956a7..e82678a25bef 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -11,6 +11,7 @@ from unittest import mock
import tempfile, shutil # Handling test_tmpdir
+import itertools
import json
import signal
import os
@@ -92,17 +93,18 @@ class KconfigTest(unittest.TestCase):
class KUnitParserTest(unittest.TestCase):
- def assertContains(self, needle, haystack):
- for line in haystack:
+ def assertContains(self, needle: str, haystack: kunit_parser.Input):
+ # Clone the iterator so we can print the contents on failure.
+ copy, backup = itertools.tee(haystack)
+ for line in copy:
if needle in line:
return
- raise AssertionError('"' +
- str(needle) + '" not found in "' + str(haystack) + '"!')
+ raise AssertionError(f'"{needle}" not found in {list(backup)}!')
def test_output_isolated_correctly(self):
log_path = test_data_path('test_output_isolated_correctly.log')
with open(log_path) as file:
- result = kunit_parser.isolate_kunit_output(file.readlines())
+ result = kunit_parser.get_input(file.readlines())
self.assertContains('TAP version 14', result)
self.assertContains(' # Subtest: example', result)
self.assertContains(' 1..2', result)
@@ -113,7 +115,7 @@ class KUnitParserTest(unittest.TestCase):
def test_output_with_prefix_isolated_correctly(self):
log_path = test_data_path('test_pound_sign.log')
with open(log_path) as file:
- result = kunit_parser.isolate_kunit_output(file.readlines())
+ result = kunit_parser.get_input(file.readlines())
self.assertContains('TAP version 14', result)
self.assertContains(' # Subtest: kunit-resource-test', result)
self.assertContains(' 1..5', result)
@@ -159,7 +161,7 @@ class KUnitParserTest(unittest.TestCase):
empty_log = test_data_path('test_is_test_passed-no_tests_run.log')
with open(empty_log) as file:
result = kunit_parser.parse_run_tests(
- kunit_parser.isolate_kunit_output(file.readlines()))
+ kunit_parser.get_input(file.readlines()))
self.assertEqual(0, len(result.suites))
self.assertEqual(
kunit_parser.TestStatus.NO_TESTS,
@@ -170,7 +172,7 @@ class KUnitParserTest(unittest.TestCase):
print_mock = mock.patch('builtins.print').start()
with open(crash_log) as file:
result = kunit_parser.parse_run_tests(
- kunit_parser.isolate_kunit_output(file.readlines()))
+ kunit_parser.get_input(file.readlines()))
print_mock.assert_any_call(StrContains('no tests run!'))
print_mock.stop()
file.close()
base-commit: c3d0e3fd41b7f0f5d5d5b6022ab7e813f04ea727
--
2.31.1.818.g46aad6cb9e-goog
The newline is expected to come from the caller but got missed for this
test.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/sve-probe-vls.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/sve-probe-vls.c b/tools/testing/selftests/arm64/fp/sve-probe-vls.c
index b29cbc642c57..76e138525d55 100644
--- a/tools/testing/selftests/arm64/fp/sve-probe-vls.c
+++ b/tools/testing/selftests/arm64/fp/sve-probe-vls.c
@@ -25,7 +25,7 @@ int main(int argc, char **argv)
ksft_set_plan(2);
if (!(getauxval(AT_HWCAP) & HWCAP_SVE))
- ksft_exit_skip("SVE not available");
+ ksft_exit_skip("SVE not available\n");
/*
* Enumerate up to SVE_VQ_MAX vector lengths
--
2.20.1
There are several test cases still using exit 0 when they need to be
skipped. Use kselftest framework skip code instead so it can help us
to distinguish the proper return status.
Criterion to filter out what should be fixed in selftests directory:
grep -r "exit 0" -B1 | grep -i skip
This change might cause some false-positives if people are running
these test scripts directly and only checking their return codes,
which will change from 0 to 4. However I think the impact should be
small as most of our scripts here are already using this skip code.
And there will be no such issue if running them with the kselftest
framework.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/bpf/test_bpftool_build.sh | 5 ++++-
tools/testing/selftests/bpf/test_xdp_meta.sh | 5 ++++-
tools/testing/selftests/bpf/test_xdp_vlan.sh | 7 +++++--
tools/testing/selftests/net/fcnal-test.sh | 5 ++++-
tools/testing/selftests/net/fib_rule_tests.sh | 7 +++++--
tools/testing/selftests/net/forwarding/lib.sh | 5 ++++-
tools/testing/selftests/net/forwarding/router_mpath_nh.sh | 5 ++++-
tools/testing/selftests/net/forwarding/router_mpath_nh_res.sh | 5 ++++-
tools/testing/selftests/net/run_afpackettests | 5 ++++-
tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh | 9 ++++++---
tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh | 9 ++++++---
tools/testing/selftests/net/unicast_extensions.sh | 5 ++++-
tools/testing/selftests/net/vrf_strict_mode_test.sh | 9 ++++++---
tools/testing/selftests/ptp/phc.sh | 7 +++++--
tools/testing/selftests/vm/charge_reserved_hugetlb.sh | 5 ++++-
tools/testing/selftests/vm/hugetlb_reparenting_test.sh | 5 ++++-
16 files changed, 73 insertions(+), 25 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_bpftool_build.sh b/tools/testing/selftests/bpf/test_bpftool_build.sh
index ac349a5..b6fab1e 100755
--- a/tools/testing/selftests/bpf/test_bpftool_build.sh
+++ b/tools/testing/selftests/bpf/test_bpftool_build.sh
@@ -1,6 +1,9 @@
#!/bin/bash
# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
case $1 in
-h|--help)
echo -e "$0 [-j <n>]"
@@ -22,7 +25,7 @@ KDIR_ROOT_DIR=$(realpath $PWD/$SCRIPT_REL_DIR/../../../../)
cd $KDIR_ROOT_DIR
if [ ! -e tools/bpf/bpftool/Makefile ]; then
echo -e "skip: bpftool files not found!\n"
- exit 0
+ exit $ksft_skip
fi
ERROR=0
diff --git a/tools/testing/selftests/bpf/test_xdp_meta.sh b/tools/testing/selftests/bpf/test_xdp_meta.sh
index 637fcf4..fd3f218 100755
--- a/tools/testing/selftests/bpf/test_xdp_meta.sh
+++ b/tools/testing/selftests/bpf/test_xdp_meta.sh
@@ -1,5 +1,8 @@
#!/bin/sh
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
cleanup()
{
if [ "$?" = "0" ]; then
@@ -17,7 +20,7 @@ cleanup()
ip link set dev lo xdp off 2>/dev/null > /dev/null
if [ $? -ne 0 ];then
echo "selftests: [SKIP] Could not run test without the ip xdp support"
- exit 0
+ exit $ksft_skip
fi
set -e
diff --git a/tools/testing/selftests/bpf/test_xdp_vlan.sh b/tools/testing/selftests/bpf/test_xdp_vlan.sh
index bb8b0da..1aa7404 100755
--- a/tools/testing/selftests/bpf/test_xdp_vlan.sh
+++ b/tools/testing/selftests/bpf/test_xdp_vlan.sh
@@ -2,6 +2,9 @@
# SPDX-License-Identifier: GPL-2.0
# Author: Jesper Dangaard Brouer <hawk(a)kernel.org>
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
# Allow wrapper scripts to name test
if [ -z "$TESTNAME" ]; then
TESTNAME=xdp_vlan
@@ -94,7 +97,7 @@ while true; do
-h | --help )
usage;
echo "selftests: $TESTNAME [SKIP] usage help info requested"
- exit 0
+ exit $ksft_skip
;;
* )
shift
@@ -117,7 +120,7 @@ fi
ip link set dev lo xdpgeneric off 2>/dev/null > /dev/null
if [ $? -ne 0 ]; then
echo "selftests: $TESTNAME [SKIP] need ip xdp support"
- exit 0
+ exit $ksft_skip
fi
# Interactive mode likely require us to cleanup netns
diff --git a/tools/testing/selftests/net/fcnal-test.sh b/tools/testing/selftests/net/fcnal-test.sh
index a8ad928..9074e25 100755
--- a/tools/testing/selftests/net/fcnal-test.sh
+++ b/tools/testing/selftests/net/fcnal-test.sh
@@ -37,6 +37,9 @@
#
# server / client nomenclature relative to ns-A
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
VERBOSE=0
NSA_DEV=eth1
@@ -3946,7 +3949,7 @@ fi
which nettest >/dev/null
if [ $? -ne 0 ]; then
echo "'nettest' command not found; skipping tests"
- exit 0
+ exit $ksft_skip
fi
declare -i nfail=0
diff --git a/tools/testing/selftests/net/fib_rule_tests.sh b/tools/testing/selftests/net/fib_rule_tests.sh
index a93e6b6..43ea840 100755
--- a/tools/testing/selftests/net/fib_rule_tests.sh
+++ b/tools/testing/selftests/net/fib_rule_tests.sh
@@ -3,6 +3,9 @@
# This test is for checking IPv4 and IPv6 FIB rules API
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
ret=0
PAUSE_ON_FAIL=${PAUSE_ON_FAIL:=no}
@@ -238,12 +241,12 @@ run_fibrule_tests()
if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
- exit 0
+ exit $ksft_skip
fi
if [ ! -x "$(command -v ip)" ]; then
echo "SKIP: Could not run test without ip tool"
- exit 0
+ exit $ksft_skip
fi
# start clean
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index 42e28c9..eed9f08 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -4,6 +4,9 @@
##############################################################################
# Defines
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
# Can be overridden by the configuration file.
PING=${PING:=ping}
PING6=${PING6:=ping6}
@@ -121,7 +124,7 @@ check_ethtool_lanes_support()
if [[ "$(id -u)" -ne 0 ]]; then
echo "SKIP: need root privileges"
- exit 0
+ exit $ksft_skip
fi
if [[ "$CHECK_TC" = "yes" ]]; then
diff --git a/tools/testing/selftests/net/forwarding/router_mpath_nh.sh b/tools/testing/selftests/net/forwarding/router_mpath_nh.sh
index 76efb1f..bb7dc6d 100755
--- a/tools/testing/selftests/net/forwarding/router_mpath_nh.sh
+++ b/tools/testing/selftests/net/forwarding/router_mpath_nh.sh
@@ -1,6 +1,9 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
ALL_TESTS="
ping_ipv4
ping_ipv6
@@ -411,7 +414,7 @@ ping_ipv6()
ip nexthop ls >/dev/null 2>&1
if [ $? -ne 0 ]; then
echo "Nexthop objects not supported; skipping tests"
- exit 0
+ exit $ksft_skip
fi
trap cleanup EXIT
diff --git a/tools/testing/selftests/net/forwarding/router_mpath_nh_res.sh b/tools/testing/selftests/net/forwarding/router_mpath_nh_res.sh
index 4898dd4..e7bb976 100755
--- a/tools/testing/selftests/net/forwarding/router_mpath_nh_res.sh
+++ b/tools/testing/selftests/net/forwarding/router_mpath_nh_res.sh
@@ -1,6 +1,9 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
ALL_TESTS="
ping_ipv4
ping_ipv6
@@ -386,7 +389,7 @@ ping_ipv6()
ip nexthop ls >/dev/null 2>&1
if [ $? -ne 0 ]; then
echo "Nexthop objects not supported; skipping tests"
- exit 0
+ exit $ksft_skip
fi
trap cleanup EXIT
diff --git a/tools/testing/selftests/net/run_afpackettests b/tools/testing/selftests/net/run_afpackettests
index 8b42e8b..a59cb6a 100755
--- a/tools/testing/selftests/net/run_afpackettests
+++ b/tools/testing/selftests/net/run_afpackettests
@@ -1,9 +1,12 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
if [ $(id -u) != 0 ]; then
echo $msg must be run as root >&2
- exit 0
+ exit $ksft_skip
fi
ret=0
diff --git a/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
index ad7a9fc..1003119 100755
--- a/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt4_l3vpn_test.sh
@@ -163,6 +163,9 @@
# +---------------------------------------------------+
#
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
readonly LOCALSID_TABLE_ID=90
readonly IPv6_RT_NETWORK=fd00
readonly IPv4_HS_NETWORK=10.0.0
@@ -464,18 +467,18 @@ host_vpn_isolation_tests()
if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
- exit 0
+ exit $ksft_skip
fi
if [ ! -x "$(command -v ip)" ]; then
echo "SKIP: Could not run test without ip tool"
- exit 0
+ exit $ksft_skip
fi
modprobe vrf &>/dev/null
if [ ! -e /proc/sys/net/vrf/strict_mode ]; then
echo "SKIP: vrf sysctl does not exist"
- exit 0
+ exit $ksft_skip
fi
cleanup &>/dev/null
diff --git a/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
index 68708f5..b9b06ef 100755
--- a/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt6_l3vpn_test.sh
@@ -164,6 +164,9 @@
# +---------------------------------------------------+
#
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
readonly LOCALSID_TABLE_ID=90
readonly IPv6_RT_NETWORK=fd00
readonly IPv6_HS_NETWORK=cafe
@@ -472,18 +475,18 @@ host_vpn_isolation_tests()
if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
- exit 0
+ exit $ksft_skip
fi
if [ ! -x "$(command -v ip)" ]; then
echo "SKIP: Could not run test without ip tool"
- exit 0
+ exit $ksft_skip
fi
modprobe vrf &>/dev/null
if [ ! -e /proc/sys/net/vrf/strict_mode ]; then
echo "SKIP: vrf sysctl does not exist"
- exit 0
+ exit $ksft_skip
fi
cleanup &>/dev/null
diff --git a/tools/testing/selftests/net/unicast_extensions.sh b/tools/testing/selftests/net/unicast_extensions.sh
index dbf0421..728e4d5 100755
--- a/tools/testing/selftests/net/unicast_extensions.sh
+++ b/tools/testing/selftests/net/unicast_extensions.sh
@@ -28,12 +28,15 @@
# These tests provide an easy way to flip the expected result of any
# of these behaviors for testing kernel patches that change them.
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
# nettest can be run from PATH or from same directory as this selftest
if ! which nettest >/dev/null; then
PATH=$PWD:$PATH
if ! which nettest >/dev/null; then
echo "'nettest' command not found; skipping tests"
- exit 0
+ exit $ksft_skip
fi
fi
diff --git a/tools/testing/selftests/net/vrf_strict_mode_test.sh b/tools/testing/selftests/net/vrf_strict_mode_test.sh
index 18b982d..865d53c 100755
--- a/tools/testing/selftests/net/vrf_strict_mode_test.sh
+++ b/tools/testing/selftests/net/vrf_strict_mode_test.sh
@@ -3,6 +3,9 @@
# This test is designed for testing the new VRF strict_mode functionality.
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
ret=0
# identifies the "init" network namespace which is often called root network
@@ -371,18 +374,18 @@ vrf_strict_mode_check_support()
if [ "$(id -u)" -ne 0 ];then
echo "SKIP: Need root privileges"
- exit 0
+ exit $ksft_skip
fi
if [ ! -x "$(command -v ip)" ]; then
echo "SKIP: Could not run test without ip tool"
- exit 0
+ exit $ksft_skip
fi
modprobe vrf &>/dev/null
if [ ! -e /proc/sys/net/vrf/strict_mode ]; then
echo "SKIP: vrf sysctl does not exist"
- exit 0
+ exit $ksft_skip
fi
cleanup &> /dev/null
diff --git a/tools/testing/selftests/ptp/phc.sh b/tools/testing/selftests/ptp/phc.sh
index ac6e5a6..ca3c844c 100755
--- a/tools/testing/selftests/ptp/phc.sh
+++ b/tools/testing/selftests/ptp/phc.sh
@@ -1,6 +1,9 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
ALL_TESTS="
settime
adjtime
@@ -13,12 +16,12 @@ DEV=$1
if [[ "$(id -u)" -ne 0 ]]; then
echo "SKIP: need root privileges"
- exit 0
+ exit $ksft_skip
fi
if [[ "$DEV" == "" ]]; then
echo "SKIP: PTP device not provided"
- exit 0
+ exit $ksft_skip
fi
require_command()
diff --git a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
index 18d3368..fe8fcfb 100644
--- a/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/vm/charge_reserved_hugetlb.sh
@@ -1,11 +1,14 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
set -e
if [[ $(id -u) -ne 0 ]]; then
echo "This test must be run as root. Skipping..."
- exit 0
+ exit $ksft_skip
fi
fault_limit_file=limit_in_bytes
diff --git a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
index d11d1fe..4a9a3af 100644
--- a/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/vm/hugetlb_reparenting_test.sh
@@ -1,11 +1,14 @@
#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
set -e
if [[ $(id -u) -ne 0 ]]; then
echo "This test must be run as root. Skipping..."
- exit 0
+ exit $ksft_skip
fi
usage_file=usage_in_bytes
--
2.7.4
This patch has been written to support page-ins using userfaultfd's
SIGBUS feature. When a userfaultfd is created with UFFD_FEATURE_SIGBUS,
`handle_userfault` will return VM_FAULT_SIGBUS instead of putting the
calling thread to sleep. Normal (non-guest) threads that access memory
that has been registered with a UFFD_FEATURE_SIGBUS userfaultfd receive
a SIGBUS.
When a vCPU gets an EPT page fault in a userfaultfd-registered region,
KVM calls into `handle_userfault` to resolve the page fault. With
UFFD_FEATURE_SIGBUS, VM_FAULT_SIGBUS is returned, but a SIGBUS is never
delivered to the userspace thread. This patch propagates the
VM_FAULT_SIGBUS error up to KVM, where we then send the signal.
Upon receiving a VM_FAULT_SIGBUS, the KVM_RUN ioctl will exit to
userspace. This functionality already exists. This allows a hypervisor
to do page-ins with UFFD_FEATURE_SIGBUS:
1. Setup a SIGBUS handler to save the address of the SIGBUS (to a
thread-local variable).
2. Enter the guest.
3. Immediately after KVM_RUN returns, check if the address has been set.
4. If an address has been set, we exited due to a page fault that we can
now handle.
5. Userspace can do anything it wants to make the memory available,
using MODE_NOWAKE for the UFFDIO memory installation ioctls.
6. Re-enter the guest. If the memory still isn't ready, this process
will repeat.
This style of demand paging is significantly faster than the standard
poll/read/wake mechanism userfaultfd uses and completely bypasses the
userfaultfd waitq. For a single vCPU, page-in throughput increases by
about 3-4x.
Signed-off-by: James Houghton <jthoughton(a)google.com>
Suggested-by: Jue Wang <juew(a)google.com>
---
include/linux/hugetlb.h | 2 +-
include/linux/mm.h | 3 ++-
mm/gup.c | 57 +++++++++++++++++++++++++++--------------
mm/hugetlb.c | 5 +++-
virt/kvm/kvm_main.c | 30 +++++++++++++++++++++-
5 files changed, 74 insertions(+), 23 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index b92f25ccef58..a777fb254df0 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -119,7 +119,7 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_ar
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
struct page **, struct vm_area_struct **,
unsigned long *, unsigned long *, long, unsigned int,
- int *);
+ int *, int *);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *);
void __unmap_hugepage_range_final(struct mmu_gather *tlb,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 322ec61d0da7..1dcd1ac81992 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1824,7 +1824,8 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages, int *locked);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
- struct page **pages, unsigned int gup_flags);
+ struct page **pages, unsigned int gup_flags,
+ int *fault_error);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
diff --git a/mm/gup.c b/mm/gup.c
index 0697134b6a12..ab55a67aef78 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -881,7 +881,8 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
* is, *@locked will be set to 0 and -EBUSY returned.
*/
static int faultin_page(struct vm_area_struct *vma,
- unsigned long address, unsigned int *flags, int *locked)
+ unsigned long address, unsigned int *flags, int *locked,
+ int *fault_error)
{
unsigned int fault_flags = 0;
vm_fault_t ret;
@@ -906,6 +907,8 @@ static int faultin_page(struct vm_area_struct *vma,
}
ret = handle_mm_fault(vma, address, fault_flags, NULL);
+ if (fault_error)
+ *fault_error = ret;
if (ret & VM_FAULT_ERROR) {
int err = vm_fault_to_errno(ret, *flags);
@@ -996,6 +999,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* @vmas: array of pointers to vmas corresponding to each page.
* Or NULL if the caller does not require them.
* @locked: whether we're still with the mmap_lock held
+ * @fault_error: VM fault error from handle_mm_fault. NULL if the caller
+ * does not require this error.
*
* Returns either number of pages pinned (which may be less than the
* number requested), or an error. Details about the return value:
@@ -1040,6 +1045,13 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* when it's been released. Otherwise, it must be held for either
* reading or writing and will not be released.
*
+ * If @fault_error != NULL, __get_user_pages will return the VM fault error
+ * from handle_mm_fault() in this argument in the event of a VM fault error.
+ * On success (ret == nr_pages) fault_error is zero.
+ * On failure (ret != nr_pages) fault_error may still be 0 if the error did
+ * not originate from handle_mm_fault().
+ *
+ *
* In most cases, get_user_pages or get_user_pages_fast should be used
* instead of __get_user_pages. __get_user_pages should be used only if
* you need some special @gup_flags.
@@ -1047,7 +1059,8 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ struct vm_area_struct **vmas, int *locked,
+ int *fault_error)
{
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
@@ -1097,7 +1110,7 @@ static long __get_user_pages(struct mm_struct *mm,
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
&start, &nr_pages, i,
- gup_flags, locked);
+ gup_flags, locked, fault_error);
if (locked && *locked == 0) {
/*
* We've got a VM_FAULT_RETRY
@@ -1124,7 +1137,8 @@ static long __get_user_pages(struct mm_struct *mm,
page = follow_page_mask(vma, start, foll_flags, &ctx);
if (!page) {
- ret = faultin_page(vma, start, &foll_flags, locked);
+ ret = faultin_page(vma, start, &foll_flags, locked,
+ fault_error);
switch (ret) {
case 0:
goto retry;
@@ -1280,7 +1294,8 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
struct page **pages,
struct vm_area_struct **vmas,
int *locked,
- unsigned int flags)
+ unsigned int flags,
+ int *fault_error)
{
long ret, pages_done;
bool lock_dropped;
@@ -1311,7 +1326,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
lock_dropped = false;
for (;;) {
ret = __get_user_pages(mm, start, nr_pages, flags, pages,
- vmas, locked);
+ vmas, locked, fault_error);
if (!locked)
/* VM_FAULT_RETRY couldn't trigger, bypass */
return ret;
@@ -1371,7 +1386,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
*locked = 1;
ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
- pages, NULL, locked);
+ pages, NULL, locked, fault_error);
if (!*locked) {
/* Continue to retry until we succeeded */
BUG_ON(ret != 0);
@@ -1458,7 +1473,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* not result in a stack expansion that recurses back here.
*/
return __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked);
+ NULL, NULL, locked, NULL);
}
/*
@@ -1524,7 +1539,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
unsigned long nr_pages, struct page **pages,
struct vm_area_struct **vmas, int *locked,
- unsigned int foll_flags)
+ unsigned int foll_flags, int *fault_error)
{
struct vm_area_struct *vma;
unsigned long vm_flags;
@@ -1590,7 +1605,8 @@ struct page *get_dump_page(unsigned long addr)
if (mmap_read_lock_killable(mm))
return NULL;
ret = __get_user_pages_locked(mm, addr, 1, &page, NULL, &locked,
- FOLL_FORCE | FOLL_DUMP | FOLL_GET);
+ FOLL_FORCE | FOLL_DUMP | FOLL_GET,
+ NULL);
if (locked)
mmap_read_unlock(mm);
@@ -1704,11 +1720,11 @@ static long __gup_longterm_locked(struct mm_struct *mm,
if (!(gup_flags & FOLL_LONGTERM))
return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
- NULL, gup_flags);
+ NULL, gup_flags, NULL);
flags = memalloc_pin_save();
do {
rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
- NULL, gup_flags);
+ NULL, gup_flags, NULL);
if (rc <= 0)
break;
rc = check_and_migrate_movable_pages(rc, pages, gup_flags);
@@ -1764,7 +1780,8 @@ static long __get_user_pages_remote(struct mm_struct *mm,
return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
locked,
- gup_flags | FOLL_TOUCH | FOLL_REMOTE);
+ gup_flags | FOLL_TOUCH | FOLL_REMOTE,
+ NULL);
}
/**
@@ -1941,7 +1958,7 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages,
return __get_user_pages_locked(current->mm, start, nr_pages,
pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
+ gup_flags | FOLL_TOUCH, NULL);
}
EXPORT_SYMBOL(get_user_pages_locked);
@@ -1961,7 +1978,8 @@ EXPORT_SYMBOL(get_user_pages_locked);
* (e.g. FOLL_FORCE) are not required.
*/
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
- struct page **pages, unsigned int gup_flags)
+ struct page **pages, unsigned int gup_flags,
+ int *fault_error)
{
struct mm_struct *mm = current->mm;
int locked = 1;
@@ -1978,7 +1996,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
mmap_read_lock(mm);
ret = __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
- &locked, gup_flags | FOLL_TOUCH);
+ &locked, gup_flags | FOLL_TOUCH,
+ fault_error);
if (locked)
mmap_read_unlock(mm);
return ret;
@@ -2550,7 +2569,7 @@ static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
mmap_read_unlock(current->mm);
} else {
ret = get_user_pages_unlocked(start, nr_pages,
- pages, gup_flags);
+ pages, gup_flags, NULL);
}
return ret;
@@ -2880,7 +2899,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
return -EINVAL;
gup_flags |= FOLL_PIN;
- return get_user_pages_unlocked(start, nr_pages, pages, gup_flags);
+ return get_user_pages_unlocked(start, nr_pages, pages, gup_flags, NULL);
}
EXPORT_SYMBOL(pin_user_pages_unlocked);
@@ -2909,6 +2928,6 @@ long pin_user_pages_locked(unsigned long start, unsigned long nr_pages,
gup_flags |= FOLL_PIN;
return __get_user_pages_locked(current->mm, start, nr_pages,
pages, NULL, locked,
- gup_flags | FOLL_TOUCH);
+ gup_flags | FOLL_TOUCH, NULL);
}
EXPORT_SYMBOL(pin_user_pages_locked);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 3db405dea3dc..889ac33d57d5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5017,7 +5017,8 @@ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page **pages, struct vm_area_struct **vmas,
unsigned long *position, unsigned long *nr_pages,
- long i, unsigned int flags, int *locked)
+ long i, unsigned int flags, int *locked,
+ int *fault_error)
{
unsigned long pfn_offset;
unsigned long vaddr = *position;
@@ -5103,6 +5104,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
}
ret = hugetlb_fault(mm, vma, vaddr, fault_flags);
if (ret & VM_FAULT_ERROR) {
+ if (fault_error)
+ *fault_error = ret;
err = vm_fault_to_errno(ret, flags);
remainder = 0;
break;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2799c6660cce..0a20d926ae32 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2004,6 +2004,30 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault,
return false;
}
+static void kvm_send_vm_fault_signal(int fault_error, int errno,
+ unsigned long address,
+ struct task_struct *tsk)
+{
+ kernel_siginfo_t info;
+
+ clear_siginfo(&info);
+
+ if (fault_error == VM_FAULT_SIGBUS)
+ info.si_signo = SIGBUS;
+ else if (fault_error == VM_FAULT_SIGSEGV)
+ info.si_signo = SIGSEGV;
+ else
+ // Other fault errors should not result in a signal.
+ return;
+
+ info.si_errno = errno;
+ info.si_code = BUS_ADRERR;
+ info.si_addr = (void __user *)address;
+ info.si_addr_lsb = PAGE_SHIFT;
+
+ send_sig_info(info.si_signo, &info, tsk);
+}
+
/*
* The slow path to get the pfn of the specified host virtual address,
* 1 indicates success, -errno is returned if error is detected.
@@ -2014,6 +2038,7 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
unsigned int flags = FOLL_HWPOISON;
struct page *page;
int npages = 0;
+ int fault_error;
might_sleep();
@@ -2025,7 +2050,10 @@ static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault,
if (async)
flags |= FOLL_NOWAIT;
- npages = get_user_pages_unlocked(addr, 1, &page, flags);
+ npages = get_user_pages_unlocked(addr, 1, &page, flags, &fault_error);
+ if (fault_error & VM_FAULT_ERROR)
+ kvm_send_vm_fault_signal(fault_error, npages, addr, current);
+
if (npages != 1)
return npages;
--
2.31.1.751.gd2f1c929bd-goog
Attacks against vulnerable userspace applications with the purpose to break
ASLR or bypass canaries traditionally use some level of brute force with
the help of the fork system call. This is possible since when creating a
new process using fork its memory contents are the same as those of the
parent process (the process that called the fork system call). So, the
attacker can test the memory infinite times to find the correct memory
values or the correct memory addresses without worrying about crashing the
application.
Based on the above scenario it would be nice to have this detected and
mitigated, and this is the goal of this patch serie. Specifically the
following attacks are expected to be detected:
1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
desirable memory layout is got (e.g. Stack Clash).
2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
until a desirable memory layout is got (e.g. what CTFs do for simple
network service).
3.- Launching processes without exec() (e.g. Android Zygote) and exposing
state to attack a sibling.
4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly until
the previously shared memory layout of all the other children is
exposed (e.g. kind of related to HeartBleed).
In each case, a privilege boundary has been crossed:
Case 1: setuid/setgid process
Case 2: network to local
Case 3: privilege changes
Case 4: network to local
So, what will really be detected are fork/exec brute force attacks that
cross any of the commented bounds.
The implementation details and comparison against other existing
implementations can be found in the "Documentation" patch.
It is important to mention that this version has changed the method used to
track the information related to the application crashes. Prior this
version, a pointer per process (in the task_struct structure) held a
reference to the shared statistical data. Or in other words, these stats
were shared by all the fork hierarchy processes. But this has an important
drawback: a brute force attack that happens through the execve system call
losts the faults info since these statistics are freed when the fork
hierarchy disappears. So, the solution adopted in the v6 version was to use
an upper fork hierarchy to track the info for this attack type. But, as
Valdis Kletnieks pointed out during this discussion [1], this method can
be easily bypassed using a double exec (well, this was the method used in
the kselftest to avoid the detection ;) ). So, in this version, to track
all the statistical data (info related with application crashes), the
extended attributes feature for the executable files are used. The xattr is
also used to mark the executables as "not allowed" when an attack is
detected. Then, the execve system call rely on this flag to avoid following
executions of this file.
[1] https://lore.kernel.org/kernelnewbies/20210330173459.GA3163@ubuntu/
Moreover, I think this solves another problem pointed out by Andi Kleen
during the v5 review [2] related to the possibility that a supervisor
respawns processes killed by the Brute LSM. He suggested adding some way so
a supervisor can know that a process has been killed by Brute and then
decide to respawn or not. So, now, the supervisor can read the brute xattr
of one executable and know if it is blocked by Brute and why (using the
statistical data).
[2] https://lore.kernel.org/kernel-hardening/878s78dnrm.fsf@linux.intel.com/
Knowing all this information I will explain now the different patches:
The 1/7 patch defines a new LSM hook to get the fatal signal of a task.
This will be useful during the attack detection phase.
The 2/7 patch defines a new LSM and the necessary sysctl attributes to fine
tuning the attack detection.
The 3/7 patch detects a fork/exec brute force attack and narrows the
possible cases taken into account the privilege boundary crossing.
The 4/7 patch mitigates a brute force attack.
The 5/7 patch adds self-tests to validate the Brute LSM expectations.
The 6/7 patch adds the documentation to explain this implementation.
The 7/7 patch updates the maintainers file.
This patch serie is a task of the KSPP [3] and can also be accessed from my
github tree [4] in the "brute_v7" branch.
[3] https://github.com/KSPP/linux/issues/39
[4] https://github.com/johwood/linux/
When I ran the "checkpatch" script I got the following errors, but I think
they are false positives as I follow the same coding style for the others
extended attributes suffixes.
----------------------------------------------------------------------------
../patches/brute_v7/v7-0003-security-brute-Detect-a-brute-force-attack.patch
----------------------------------------------------------------------------
ERROR: Macros with complex values should be enclosed in parentheses
89: FILE: include/uapi/linux/xattr.h:80:
+#define XATTR_NAME_BRUTE XATTR_SECURITY_PREFIX XATTR_BRUTE_SUFFIX
-----------------------------------------------------------------------------
../patches/brute_v7/v7-0005-selftests-brute-Add-tests-for-the-Brute-LSM.patch
-----------------------------------------------------------------------------
ERROR: Macros with complex values should be enclosed in parentheses
100: FILE: tools/testing/selftests/brute/rmxattr.c:18:
+#define XATTR_NAME_BRUTE XATTR_SECURITY_PREFIX XATTR_BRUTE_SUFFIX
When I ran the "kernel-doc" script with the following parameters:
./scripts/kernel-doc --none -v security/brute/brute.c
I got the following warning:
security/brute/brute.c:65: warning: contents before sections
But I don't understand why it is complaining. Could it be a false positive?
The previous versions can be found in:
RFC
https://lore.kernel.org/kernel-hardening/20200910202107.3799376-1-keescook@…
Version 2
https://lore.kernel.org/kernel-hardening/20201025134540.3770-1-john.wood@gm…
Version 3
https://lore.kernel.org/lkml/20210221154919.68050-1-john.wood@gmx.com/
Version 4
https://lore.kernel.org/lkml/20210227150956.6022-1-john.wood@gmx.com/
Version 5
https://lore.kernel.org/kernel-hardening/20210227153013.6747-1-john.wood@gm…
Version 6
https://lore.kernel.org/kernel-hardening/20210307113031.11671-1-john.wood@g…
Changelog RFC -> v2
-------------------
- Rename this feature with a more suitable name (Jann Horn, Kees Cook).
- Convert the code to an LSM (Kees Cook).
- Add locking to avoid data races (Jann Horn).
- Add a new LSM hook to get the fatal signal of a task (Jann Horn, Kees
Cook).
- Add the last crashes timestamps list to avoid false positives in the
attack detection (Jann Horn).
- Use "period" instead of "rate" (Jann Horn).
- Other minor changes suggested (Jann Horn, Kees Cook).
Changelog v2 -> v3
------------------
- Compute the application crash period on an on-going basis (Kees Cook).
- Detect a brute force attack through the execve system call (Kees Cook).
- Detect an slow brute force attack (Randy Dunlap).
- Fine tuning the detection taken into account privilege boundary crossing
(Kees Cook).
- Taken into account only fatal signals delivered by the kernel (Kees
Cook).
- Remove the sysctl attributes to fine tuning the detection (Kees Cook).
- Remove the prctls to allow per process enabling/disabling (Kees Cook).
- Improve the documentation (Kees Cook).
- Fix some typos in the documentation (Randy Dunlap).
- Add self-test to validate the expectations (Kees Cook).
Changelog v3 -> v4
------------------
- Fix all the warnings shown by the tool "scripts/kernel-doc" (Randy
Dunlap).
Changelog v4 -> v5
------------------
- Fix some typos (Randy Dunlap).
Changelog v5 -> v6
------------------
- Fix a reported deadlock (kernel test robot).
- Add high level details to the documentation (Andi Kleen).
Changelog v6 -> v7
------------------
- Add the "Reviewed-by:" tag to the first patch.
- Rearrange the brute LSM between lockdown and yama (Kees Cook).
- Split subdir and obj in security/Makefile (Kees Cook).
- Reduce the number of header files included (Kees Cook).
- Print the pid when an attack is detected (Kees Cook).
- Use the socket_accept LSM hook instead of socket_sock_rcv_skb hook to
avoid running a hook on every incoming network packet (Kees Cook).
- Update the documentation and fix it to render it properly (Jonathan
Corbet).
- Manage correctly an exec brute force attack avoiding the bypass (Valdis
Kletnieks).
- Other minor changes and cleanups.
Any constructive comments are welcome.
Thanks in advance.
John Wood (7):
security: Add LSM hook at the point where a task gets a fatal signal
security/brute: Define a LSM and add sysctl attributes
security/brute: Detect a brute force attack
security/brute: Mitigate a brute force attack
selftests/brute: Add tests for the Brute LSM
Documentation: Add documentation for the Brute LSM
MAINTAINERS: Add a new entry for the Brute LSM
Documentation/admin-guide/LSM/Brute.rst | 334 +++++++++++
Documentation/admin-guide/LSM/index.rst | 1 +
MAINTAINERS | 7 +
include/linux/lsm_hook_defs.h | 1 +
include/linux/lsm_hooks.h | 4 +
include/linux/security.h | 4 +
include/uapi/linux/xattr.h | 3 +
kernel/signal.c | 1 +
security/Kconfig | 11 +-
security/Makefile | 2 +
security/brute/Kconfig | 15 +
security/brute/Makefile | 2 +
security/brute/brute.c | 716 +++++++++++++++++++++++
security/security.c | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/brute/.gitignore | 2 +
tools/testing/selftests/brute/Makefile | 5 +
tools/testing/selftests/brute/config | 1 +
tools/testing/selftests/brute/rmxattr.c | 34 ++
tools/testing/selftests/brute/test.c | 507 ++++++++++++++++
tools/testing/selftests/brute/test.sh | 256 ++++++++
21 files changed, 1907 insertions(+), 5 deletions(-)
create mode 100644 Documentation/admin-guide/LSM/Brute.rst
create mode 100644 security/brute/Kconfig
create mode 100644 security/brute/Makefile
create mode 100644 security/brute/brute.c
create mode 100644 tools/testing/selftests/brute/.gitignore
create mode 100644 tools/testing/selftests/brute/Makefile
create mode 100644 tools/testing/selftests/brute/config
create mode 100644 tools/testing/selftests/brute/rmxattr.c
create mode 100644 tools/testing/selftests/brute/test.c
create mode 100755 tools/testing/selftests/brute/test.sh
--
2.25.1
TL;DR: Add support to kunit_tool to dispatch tests via QEMU. Also add
support to immediately shutdown a kernel after running KUnit tests.
Background
----------
KUnit has supported running on all architectures for quite some time;
however, kunit_tool - the script commonly used to invoke KUnit tests -
has only fully supported KUnit run on UML. Its functionality has been
broken up for some time to separate the configure, build, run, and parse
phases making it possible to be used in part on other architectures to a
small extent. Nevertheless, kunit_tool has not supported running tests
on other architectures.
What this patchset does
-----------------------
This patchset introduces first class support to kunit_tool for KUnit to
be run on many popular architectures via QEMU. It does this by adding
two new flags: `--arch` and `--cross_compile`.
`--arch` allows an architecture to be specified by the name the
architecture is given in `arch/`. It uses the specified architecture to
select a minimal amount of Kconfigs and QEMU configs needed for the
architecture to run in QEMU and provide a console from which KTAP
results can be scraped.
`--cross_compile` allows a toolchain prefix to be specified to make
similar to how `CROSS_COMPILE` is used.
Additionally, this patchset revives the previously considered "kunit:
tool: add support for QEMU"[1] patchs. The motivation for this new
kernel command line flags, `kunit_shutdown`, is to better support
running KUnit tests inside of QEMU. For most popular architectures, QEMU
can be made to terminate when the Linux kernel that is being run is
reboted, halted, or powered off. As Kees pointed out in a previous
discussion[2], it is possible to make a kernel initrd that can reboot
the kernel immediately, doing this for every architecture would likely
be infeasible. Instead, just having an option for the kernel to shutdown
when it is done with testing seems a lot simpler, especially since it is
an option which would only available in testing configurations of the
kernel anyway.
Changes since last revision
---------------------------
Mostly fixed lots of minor issues suggested/poited out by David and
Daniel. Also reworked how qemu_configs are loaded: Now each config is in
its own Python file and is loaded dynamically. Given the number of
improvements that address the biggest concerns I had in the last RFC, I
decided to promote this to a normal patch set.
What discussion remains for this patchset?
------------------------------------------
I am still hoping to see some discussion regarding the kunit_shutdown
patch: I want to make sure with the added context of QEMU running under
kunit_tool that this is now a reasonable approach. Nevertheless, I am
pretty happy with this patchset as is, and I did not get any negative
feedback on the previous revision, so I think we can probably just move
forward as is.
Brendan Higgins (3):
Documentation: Add kunit_shutdown to kernel-parameters.txt
kunit: tool: add support for QEMU
Documentation: kunit: document support for QEMU in kunit_tool
David Gow (1):
kunit: Add 'kunit_shutdown' option
.../admin-guide/kernel-parameters.txt | 8 +
Documentation/dev-tools/kunit/usage.rst | 37 +++-
lib/kunit/executor.c | 20 ++
tools/testing/kunit/kunit.py | 57 +++++-
tools/testing/kunit/kunit_config.py | 7 +-
tools/testing/kunit/kunit_kernel.py | 172 +++++++++++++++---
tools/testing/kunit/kunit_parser.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 18 +-
tools/testing/kunit/qemu_config.py | 17 ++
tools/testing/kunit/qemu_configs/alpha.py | 10 +
tools/testing/kunit/qemu_configs/arm.py | 13 ++
tools/testing/kunit/qemu_configs/arm64.py | 12 ++
tools/testing/kunit/qemu_configs/i386.py | 10 +
tools/testing/kunit/qemu_configs/powerpc.py | 12 ++
tools/testing/kunit/qemu_configs/riscv.py | 31 ++++
tools/testing/kunit/qemu_configs/s390.py | 14 ++
tools/testing/kunit/qemu_configs/sparc.py | 10 +
tools/testing/kunit/qemu_configs/x86_64.py | 10 +
18 files changed, 411 insertions(+), 49 deletions(-)
create mode 100644 tools/testing/kunit/qemu_config.py
create mode 100644 tools/testing/kunit/qemu_configs/alpha.py
create mode 100644 tools/testing/kunit/qemu_configs/arm.py
create mode 100644 tools/testing/kunit/qemu_configs/arm64.py
create mode 100644 tools/testing/kunit/qemu_configs/i386.py
create mode 100644 tools/testing/kunit/qemu_configs/powerpc.py
create mode 100644 tools/testing/kunit/qemu_configs/riscv.py
create mode 100644 tools/testing/kunit/qemu_configs/s390.py
create mode 100644 tools/testing/kunit/qemu_configs/sparc.py
create mode 100644 tools/testing/kunit/qemu_configs/x86_64.py
base-commit: 38182162b50aa4e970e5997df0a0c4288147a153
--
2.31.1.607.g51e8a6a459-goog
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v4 -> v5
- Rebase to kvm/queue, commit a4345a7cecfb ("Merge tag
'kvmarm-fixes-5.13-1'")
- Change maximum stats name length to 48
- Replace VM_STATS_COMMON/VCPU_STATS_COMMON macros with stats
descriptor definition macros.
- Fixed some errors/warnings reported by checkpatch.pl
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
[4] https://lore.kernel.org/kvm/20210429203740.1935629-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 171 ++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 38 +-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 64 ++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 64 ++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 59 ++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 129 +++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 67 +++-
include/linux/kvm_host.h | 136 ++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 50 +++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 379 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 12 +
virt/kvm/kvm_main.c | 237 ++++++++++-
24 files changed, 1396 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: a4345a7cecfb91ae78cd43d26b0c6a956420761a
--
2.31.1.751.gd2f1c929bd-goog
When there is no devlink device, the following command will return:
$ devlink -j dev show
{dev:{}}
This will cause IndexError when trying to access the first element
in dev of this json dataset. Use the kselftest framework skip code
to skip this test in this case.
Example output with this change:
# selftests: net: devlink_port_split.py
# no devlink device was found, test skipped
ok 7 selftests: net: devlink_port_split.py # SKIP
Link: https://bugs.launchpad.net/bugs/1928889
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/devlink_port_split.py | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/devlink_port_split.py b/tools/testing/selftests/net/devlink_port_split.py
index 834066d..2b5d6ff 100755
--- a/tools/testing/selftests/net/devlink_port_split.py
+++ b/tools/testing/selftests/net/devlink_port_split.py
@@ -18,6 +18,8 @@ import sys
#
+# Kselftest framework requirement - SKIP code is 4
+KSFT_SKIP=4
Port = collections.namedtuple('Port', 'bus_info name')
@@ -239,7 +241,11 @@ def main(cmdline=None):
assert stderr == ""
devs = json.loads(stdout)['dev']
- dev = list(devs.keys())[0]
+ if devs:
+ dev = list(devs.keys())[0]
+ else:
+ print("no devlink device was found, test skipped")
+ sys.exit(KSFT_SKIP)
cmd = "devlink dev show %s" % dev
stdout, stderr = run_command(cmd)
--
2.7.4
Building the nci test suite produces a binary, nci_dev, that git then
tries to track. Add a .gitignore file to tell git to ignore this binary.
Signed-off-by: David Matlack <dmatlack(a)google.com>
---
tools/testing/selftests/nci/.gitignore | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/nci/.gitignore
diff --git a/tools/testing/selftests/nci/.gitignore b/tools/testing/selftests/nci/.gitignore
new file mode 100644
index 000000000000..448eeb4590fc
--- /dev/null
+++ b/tools/testing/selftests/nci/.gitignore
@@ -0,0 +1 @@
+/nci_dev
--
2.31.1.751.gd2f1c929bd-goog
> -----Original Message-----
> From: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
> Sent: Thursday, May 20, 2021 1:50 PM
> To: linux-kernel(a)vger.kernel.org; linux-kselftest(a)vger.kernel.org; netdev(a)vger.kernel.org
> Cc: po-hsu.lin(a)canonical.com; shuah(a)kernel.org; kuba(a)kernel.org; davem(a)davemloft.net; skhan(a)linuxfoundation.org
> Subject: [PATCH] selftests: net: devlink_port_split.py: skip the test if no devlink device
>
> When there is no devlink device, the following command will return:
> $ devlink -j dev show
> {dev:{}}
>
> This will cause IndexError when trying to access the first element in dev of this json dataset. Use the kselftest framework skip code to
> skip this test in this case.
>
> Example output with this change:
> # selftests: net: devlink_port_split.py
> # no devlink device was found, test skipped
> ok 7 selftests: net: devlink_port_split.py # SKIP
>
> Link: https://bugs.launchpad.net/bugs/1928889
> Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
Reviewed-by: Danielle Ratson <danieller(a)nvidia.com>
If a signed number field starts with a '-' the field width must be > 1,
or unlimited, to allow at least one digit after the '-'.
This patch adds a check for this. If a signed field starts with '-'
and field_width == 1 the scanf will quit.
It is ok for a signed number field to have a field width of 1 if it
starts with a digit. In that case the single digit can be converted.
Signed-off-by: Richard Fitzgerald <rf(a)opensource.cirrus.com>
Reviewed-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
lib/vsprintf.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index 41ddc353ebb8..f78651e9b030 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -3466,8 +3466,12 @@ int vsscanf(const char *buf, const char *fmt, va_list args)
str = skip_spaces(str);
digit = *str;
- if (is_sign && digit == '-')
+ if (is_sign && digit == '-') {
+ if (field_width == 1)
+ break;
+
digit = *(str + 1);
+ }
if (!digit
|| (base == 16 && !isxdigit(digit))
--
2.20.1
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
@Andrew, this is based on v5.13-rc1, I can rebase whatever way you prefer.
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
It's designed to provide the following protections:
* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes "simple"
ROP insufficient to perform exfiltration, which increases the required
complexity of the attack. Along with other protections like the kernel
stack size limit and address space layout randomization which make finding
gadgets is really hard, absence of any in-kernel primitive for accessing
secret memory means the one gadget ROP attack can't work. Since the only
way to access secret memory is to reconstruct the missing mapping entry,
the attacker has to recover the physical page and insert a PTE pointing to
it in the kernel and then retrieve the contents. That takes at least three
gadgets which is a level of difficulty beyond most standard attacks.
* Prevent cross-process secret userspace memory exposures. Once the secret
memory is allocated, the user can't accidentally pass it into the kernel to
be transmitted somewhere. The secreremem pages cannot be accessed via the
direct map and they are disallowed in GUP.
* Harden against exploited kernel flaws. In order to access secretmem, a
kernel-side attack would need to either walk the page tables and create new
ones, or spawn a new privileged uiserspace process to perform secrets
exfiltration using ptrace.
In the future the secret mappings may be used as a mean to protect guest memory
in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows usage of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which affects
the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to have
secretmem disabled by default with the ability of a system administrator to
enable it at boot time.
In addition, there is also a long term goal to improve management of the
direct map.
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux…
v19:
* block /dev/mem mmap access, per David
* disallow mmap/mprotect with PROT_EXEC, per Kees
* simplify return in page_is_secretmem(), per Matthew
* use unsigned int for syscall falgs, per Yury
v18: https://lore.kernel.org/lkml/20210303162209.8609-1-rppt@kernel.org
* rebase on v5.12-rc1
* merge kfence fix into the original patch
* massage commit message of the patch introducing the memfd_secret syscall
v17: https://lore.kernel.org/lkml/20210208084920.2884-1-rppt@kernel.org
* Remove pool of large pages backing secretmem allocations, per Michal Hocko
* Add secretmem pages to unevictable LRU, per Michal Hocko
* Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko
* Make secretmem an opt-in feature that is disabled by default
v16: https://lore.kernel.org/lkml/20210121122723.3446-1-rppt@kernel.org
* Fix memory leak intorduced in v15
* Clean the data left from previous page user before handing the page to
the userspace
v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org
* Add riscv/Kconfig update to disable set_memory operations for nommu
builds (patch 3)
* Update the code around add_to_page_cache() per Matthew's comments
(patches 6,7)
* Add fixups for build/checkpatch errors discovered by CI systems
Older history:
v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org
v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org
v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org
v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org
v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o…
Mike Rapoport (8):
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow set_direct_map_*_noflush() for multiple pages
set_memory: allow querying whether set_direct_map_*() is actually enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 -
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 ++
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +-
arch/arm64/mm/pageattr.c | 23 +-
arch/riscv/Kconfig | 4 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
drivers/char/mem.c | 4 +
include/linux/secretmem.h | 54 ++++
include/linux/set_memory.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 4 +
mm/Makefile | 1 +
mm/gup.c | 12 +
mm/internal.h | 3 +
mm/mlock.c | 3 +-
mm/mmap.c | 5 +-
mm/secretmem.c | 254 +++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 17 ++
38 files changed, 744 insertions(+), 46 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
--
2.28.0
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
@Andrew, this is based on v5.13-rc1, I can rebase whatever way you prefer.
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
It's designed to provide the following protections:
* Enhanced protection (in conjunction with all the other in-kernel
attack prevention systems) against ROP attacks. Seceretmem makes "simple"
ROP insufficient to perform exfiltration, which increases the required
complexity of the attack. Along with other protections like the kernel
stack size limit and address space layout randomization which make finding
gadgets is really hard, absence of any in-kernel primitive for accessing
secret memory means the one gadget ROP attack can't work. Since the only
way to access secret memory is to reconstruct the missing mapping entry,
the attacker has to recover the physical page and insert a PTE pointing to
it in the kernel and then retrieve the contents. That takes at least three
gadgets which is a level of difficulty beyond most standard attacks.
* Prevent cross-process secret userspace memory exposures. Once the secret
memory is allocated, the user can't accidentally pass it into the kernel to
be transmitted somewhere. The secreremem pages cannot be accessed via the
direct map and they are disallowed in GUP.
* Harden against exploited kernel flaws. In order to access secretmem, a
kernel-side attack would need to either walk the page tables and create new
ones, or spawn a new privileged uiserspace process to perform secrets
exfiltration using ptrace.
In the future the secret mappings may be used as a mean to protect guest memory
in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows usage of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which affects
the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to have
secretmem disabled by default with the ability of a system administrator to
enable it at boot time.
In addition, there is also a long term goal to improve management of the
direct map.
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux…
v20:
* Drop the patch that enable multi-page updates to the direct map, per David
* Drop the changes to /dev/mem, they anyway have no effect when CONFIG_STRICT_DEVMEM=y
* Add Acked-by and Reviewed-by tags
v19: https://lore.kernel.org/lkml/20210513184734.29317-1-rppt@kernel.org
* block /dev/mem mmap access, per David
* disallow mmap/mprotect with PROT_EXEC, per Kees
* simplify return in page_is_secretmem(), per Matthew
* use unsigned int for syscall falgs, per Yury
v18: https://lore.kernel.org/lkml/20210303162209.8609-1-rppt@kernel.org
* rebase on v5.12-rc1
* merge kfence fix into the original patch
* massage commit message of the patch introducing the memfd_secret syscall
v17: https://lore.kernel.org/lkml/20210208084920.2884-1-rppt@kernel.org
* Remove pool of large pages backing secretmem allocations, per Michal Hocko
* Add secretmem pages to unevictable LRU, per Michal Hocko
* Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko
* Make secretmem an opt-in feature that is disabled by default
v16: https://lore.kernel.org/lkml/20210121122723.3446-1-rppt@kernel.org
* Fix memory leak intorduced in v15
* Clean the data left from previous page user before handing the page to
the userspace
Older history:
v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org
v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org
v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org
v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org
v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org
v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o…
Mike Rapoport (7):
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow querying whether set_direct_map_*() is actually
enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 -
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 ++
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +-
arch/arm64/mm/pageattr.c | 13 +-
arch/riscv/Kconfig | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
include/linux/secretmem.h | 54 ++++
include/linux/set_memory.h | 12 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 7 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 5 +
mm/Makefile | 1 +
mm/gup.c | 12 +
mm/internal.h | 3 +
mm/mlock.c | 3 +-
mm/mmap.c | 5 +-
mm/secretmem.c | 254 +++++++++++++++++++
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 17 ++
31 files changed, 716 insertions(+), 24 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
base-commit: 6efb943b8616ec53a5e444193dccf1af9ad627b5
--
2.28.0
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v4->v5:
- Picked up {Reviewed,Acked}-by's.
- Fix cleanup in error path in shmem_mcopy_atomic_pte(). [Hugh, Peter]
- Mention switching to lru_cache_add() in the commit message of 9/10. [Hugh]
- Split + reorder commits, so now we 1) implement the faulting path, 2)
implement the CONTINUE ioctl, and 3) advertise the feature. Squash the
documentation update into step (3). [Hugh, Peter]
- Reorder install_pte() cleanup to come before selftest changes. [Hugh]
v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
assumed that !vma_is_anonymous() was equivalent to "the page will be in the
page cache". But, in this case we have an optimization where we allocate a new
*anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commit 5 advertises that the feature is now available since at this point it's
fully implemented.
- Commit 6 is a final cleanup, modifying an existing code path to re-use a new
helper we've introduced.
- Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: advertise shmem minor fault support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 17 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 110 +++-----
mm/userfaultfd.c | 175 ++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 360 insertions(+), 250 deletions(-)
--
2.31.1.498.g6c1eba8ee3d-goog
Make the default .kunitconfig (specified in
arch/um/configs/kunit_defconfig) specify CONFIG_KUNIT_ALL_TESTS by
default. KUNIT_ALL_TESTS runs all tests which have satisfied
dependencies in the current .config (which would be the architecture
defconfig).
Currently, the default .kunitconfig enables only the example tests and
KUnit's own tests. While this does provide a good example of what a
.kunitconfig for running a few individual tests should look like, it
does mean that kunit_tool runs a pretty paltry collection of tests by
default.
A default run of ./tools/testing/kunit/kunit.py run now runs 70 tests
instead of 14.
Signed-off-by: David Gow <davidgow(a)google.com>
---
arch/um/configs/kunit_defconfig | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/um/configs/kunit_defconfig b/arch/um/configs/kunit_defconfig
index 9235b7d42d38..becf3432a375 100644
--- a/arch/um/configs/kunit_defconfig
+++ b/arch/um/configs/kunit_defconfig
@@ -1,3 +1,2 @@
CONFIG_KUNIT=y
-CONFIG_KUNIT_TEST=y
-CONFIG_KUNIT_EXAMPLE_TEST=y
+CONFIG_KUNIT_ALL_TESTS=y
--
2.31.1.751.gd2f1c929bd-goog
Base
====
These patches are based upon Andrew Morton's v5.13-rc1-mmots-2021-05-10-22-15
tag. This is because this series depends on:
- UFFD minor fault support for hugetlbfs (in v5.13-rc1) [1]
- UFFD minor fault support for shmem (in Andrew's tree) [2]
[1] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
[2] https://lore.kernel.org/patchwork/cover/1420967/
Overview
========
Minor fault handling is a new userfaultfd feature whose goal is generally to
improve performance. In particular, it is intended for use with demand paging.
There are more details in the cover letters for this new feature (linked above),
but at a high level the idea is that we think of these three phases of live
migration of a VM:
1. Precopy, where we copy "some" pages from the source to the target, while the
VM is still running on the source machine.
2. Blackout, where execution stops on the source, and begins on the target.
3. Postcopy, where the VM is running on the target, some pages are already up
to date, and others are not (because they weren't copied, or were modified
after being copied).
During postcopy, the first time the guest touches memory, we intercept a minor
fault. Userspace checks whether or not the page is already up to date. If
needed, we copy the final version of the page from the soure machine. This
could be done with RDMA for example, to do it truly in place / with no copying.
At this point, all that's left is to setup PTEs for the guest: so we issue
UFFDIO_CONTINUE. No copying or page allocation needed.
Because of this use case, it's useful to exercise this as part of the demand
paging test. It lets us ensure the use case works correctly end-to-end, and also
gives us an in-tree way to profile the end-to-end flow for future performance
improvements.
Axel Rasmussen (5):
KVM: selftests: allow different backing memory types for demand paging
KVM: selftests: add shmem backing source type
KVM: selftests: create alias mappings when using shared memory
KVM: selftests: allow using UFFD minor faults for demand paging
KVM: selftests: add shared hugetlbfs backing source type
.../selftests/kvm/demand_paging_test.c | 146 +++++++++++++-----
.../testing/selftests/kvm/include/kvm_util.h | 1 +
.../testing/selftests/kvm/include/test_util.h | 11 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 79 +++++++++-
.../selftests/kvm/lib/kvm_util_internal.h | 2 +
tools/testing/selftests/kvm/lib/test_util.c | 46 ++++--
6 files changed, 222 insertions(+), 63 deletions(-)
--
2.31.1.607.g51e8a6a459-goog
One the mmap tests will map a single page, then try to extend the
mapping by use of mremap, which should fail. Right after that, it unmaps
the extended area, which may end up unmapping other valid mapped areas,
this causing a segfault.
Only unmap the area that is expected to be mapped.
Fixes: b2fb299c9aa4 ("selftests/bpf: test ringbuf mmap read-only and read-write restrictions")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
---
tools/testing/selftests/bpf/prog_tests/ringbuf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/ringbuf.c b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
index 197e30b83298..f9a8ae331963 100644
--- a/tools/testing/selftests/bpf/prog_tests/ringbuf.c
+++ b/tools/testing/selftests/bpf/prog_tests/ringbuf.c
@@ -146,7 +146,7 @@ void test_ringbuf(void)
ASSERT_ERR(mprotect(mmap_ptr, page_size, PROT_WRITE), "write_protect");
ASSERT_ERR(mprotect(mmap_ptr, page_size, PROT_EXEC), "exec_protect");
ASSERT_ERR_PTR(mremap(mmap_ptr, 0, 3 * page_size, MREMAP_MAYMOVE), "ro_remap");
- ASSERT_OK(munmap(mmap_ptr, 3 * page_size), "unmap_ro");
+ ASSERT_OK(munmap(mmap_ptr, page_size), "unmap_ro");
/* only trigger BPF program for current process */
skel->bss->pid = getpid();
--
2.30.2
From: Colin Ian King <colin.king(a)canonical.com>
There is a spelling mistake in a message. Fix it.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/sched/cs_prctl_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/sched/cs_prctl_test.c b/tools/testing/selftests/sched/cs_prctl_test.c
index 63fe6521c56d..cf9ca10b876c 100644
--- a/tools/testing/selftests/sched/cs_prctl_test.c
+++ b/tools/testing/selftests/sched/cs_prctl_test.c
@@ -262,7 +262,7 @@ int main(int argc, char *argv[])
if (setpgid(0, 0) != 0)
handle_error("process group");
- printf("\n## Create a thread/process/process group hiearchy\n");
+ printf("\n## Create a thread/process/process group hierarchy\n");
create_processes(num_processes, num_threads, procs);
disp_processes(num_processes, procs);
validate(get_cs_cookie(0) == 0);
--
2.30.2
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock" between
install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 171 ++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 42 +-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 67 +++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 68 +++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 63 ++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 133 ++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 71 +++-
include/linux/kvm_host.h | 132 ++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 50 +++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 370 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +
virt/kvm/kvm_main.c | 237 ++++++++++-
24 files changed, 1405 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: edf408f5257ba39e63781b820528e1ce1ec0f543
--
2.31.1.498.g6c1eba8ee3d-goog
Changelog RFC v4 --> PATCH v5:
1. Added a CPU online check prior to parsing the CPU topology to avoid
parsing topologies for CPUs unavailable for the latency test
2. Added comment describing the selftest in cpuidle.sh
As I have made changes to cpuidle.sh's working, hence dropping
"Reviewed-by" from Doug Smythies for the second patch, while retaining
it for the first patch.
RFC v4: https://lkml.org/lkml/2021/4/12/99
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik R. Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 414 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 591 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
If the xfrm_policy.sh script gets terminated by any reason, the netns
namespace files created by the test will be left alone.
In this case a second attempt to run this test will fail with:
# Cannot create namespace file "/run/netns/ns1": File exists
Move the netns cleanup code into an exit trap so that we can ensure
these files will be removed in the end.
Changes in V2:
- Update commit message and patch title.
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/xfrm_policy.sh | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/xfrm_policy.sh b/tools/testing/selftests/net/xfrm_policy.sh
index bdf450e..bb4632b 100755
--- a/tools/testing/selftests/net/xfrm_policy.sh
+++ b/tools/testing/selftests/net/xfrm_policy.sh
@@ -28,6 +28,11 @@ KEY_AES=0x0123456789abcdef0123456789012345
SPI1=0x1
SPI2=0x2
+cleanup() {
+ for i in 1 2 3 4;do ip netns del ns$i 2>/dev/null ;done
+}
+trap cleanup EXIT
+
do_esp_policy() {
local ns=$1
local me=$2
@@ -481,6 +486,4 @@ check_hthresh_repeat "policies with repeated htresh change"
check_random_order ns3 "policies inserted in random order"
-for i in 1 2 3 4;do ip netns del ns$i;done
-
exit $ret
--
2.7.4
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v3 -> v4
- Rebase to kvm/queue, commit 9f242010c3b4 ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Use C-stype comments in the whole patch
- Fix wrong count for x86 VCPU stats descriptors
- Fix KVM stats data size counting and validity check in selftest
* v2 -> v3
- Rebase to kvm/queue, commit edf408f5257b ("KVM: avoid "deadlock"
between install_new_memslots and MMU notifier")
- Resolve some nitpicks about format
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
[2] https://lore.kernel.org/kvm/20210415151741.1607806-1-jingzhangos@google.com
[3] https://lore.kernel.org/kvm/20210423181727.596466-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 171 ++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 42 +-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 67 ++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 68 +++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 63 ++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 133 +++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 71 +++-
include/linux/kvm_host.h | 132 +++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 50 +++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 380 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +
virt/kvm/kvm_main.c | 237 ++++++++++-
24 files changed, 1415 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: 9f242010c3b46e63bc62f08fff42cef992d3801b
--
2.31.1.527.g47e6f16901-goog
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
@Andrew, this is based on v5.12-rc1, I can rebase whatever way you prefer.
This is an implementation of "secret" mappings backed by a file descriptor.
The file descriptor backing secret memory mappings is created using a
dedicated memfd_secret system call The desired protection mode for the
memory is configured using flags parameter of the system call. The mmap()
of the file descriptor created with memfd_secret() will create a "secret"
memory mapping. The pages in that mapping will be marked as not present in
the direct map and will be present only in the page table of the owning mm.
Although normally Linux userspace mappings are protected from other users,
such secret mappings are useful for environments where a hostile tenant is
trying to trick the kernel into giving them access to other tenants
mappings.
Additionally, in the future the secret mappings may be used as a mean to
protect guest memory in a virtual machine host.
For demonstration of secret memory usage we've created a userspace library
https://git.kernel.org/pub/scm/linux/kernel/git/jejb/secret-memory-preloade…
that does two things: the first is act as a preloader for openssl to
redirect all the OPENSSL_malloc calls to secret memory meaning any secret
keys get automatically protected this way and the other thing it does is
expose the API to the user who needs it. We anticipate that a lot of the
use cases would be like the openssl one: many toolkits that deal with
secret keys already have special handling for the memory to try to give
them greater protection, so this would simply be pluggable into the
toolkits without any need for user application modification.
Hiding secret memory mappings behind an anonymous file allows usage of
the page cache for tracking pages allocated for the "secret" mappings as
well as using address_space_operations for e.g. page migration callbacks.
The anonymous file may be also used implicitly, like hugetlb files, to
implement mmap(MAP_SECRET) and use the secret memory areas with "native" mm
ABIs in the future.
Removing of the pages from the direct map may cause its fragmentation on
architectures that use large pages to map the physical memory which affects
the system performance. However, the original Kconfig text for
CONFIG_DIRECT_GBPAGES said that gigabyte pages in the direct map "... can
improve the kernel's performance a tiny bit ..." (commit 00d1c5e05736
("x86: add gbpages switches")) and the recent report [1] showed that "...
although 1G mappings are a good default choice, there is no compelling
evidence that it must be the only choice". Hence, it is sufficient to have
secretmem disabled by default with the ability of a system administrator to
enable it at boot time.
In addition, there is also a long term goal to improve management of the
direct map.
[1] https://lore.kernel.org/linux-mm/213b4567-46ce-f116-9cdf-bbd0c884eb3c@linux…
v18:
* rebase on v5.12-rc1
* merge kfence fix into the original patch
* massage commit message of the patch introducing the memfd_secret syscall
v17: https://lore.kernel.org/lkml/20210208084920.2884-1-rppt@kernel.org
* Remove pool of large pages backing secretmem allocations, per Michal Hocko
* Add secretmem pages to unevictable LRU, per Michal Hocko
* Use GFP_HIGHUSER as secretmem mapping mask, per Michal Hocko
* Make secretmem an opt-in feature that is disabled by default
v16: https://lore.kernel.org/lkml/20210121122723.3446-1-rppt@kernel.org
* Fix memory leak intorduced in v15
* Clean the data left from previous page user before handing the page to
the userspace
v15: https://lore.kernel.org/lkml/20210120180612.1058-1-rppt@kernel.org
* Add riscv/Kconfig update to disable set_memory operations for nommu
builds (patch 3)
* Update the code around add_to_page_cache() per Matthew's comments
(patches 6,7)
* Add fixups for build/checkpatch errors discovered by CI systems
v14: https://lore.kernel.org/lkml/20201203062949.5484-1-rppt@kernel.org
* Finally s/mod_node_page_state/mod_lruvec_page_state/
v13: https://lore.kernel.org/lkml/20201201074559.27742-1-rppt@kernel.org
* Added Reviewed-by, thanks Catalin and David
* s/mod_node_page_state/mod_lruvec_page_state/ as Shakeel suggested
Older history:
v12: https://lore.kernel.org/lkml/20201125092208.12544-1-rppt@kernel.org
v11: https://lore.kernel.org/lkml/20201124092556.12009-1-rppt@kernel.org
v10: https://lore.kernel.org/lkml/20201123095432.5860-1-rppt@kernel.org
v9: https://lore.kernel.org/lkml/20201117162932.13649-1-rppt@kernel.org
v8: https://lore.kernel.org/lkml/20201110151444.20662-1-rppt@kernel.org
v7: https://lore.kernel.org/lkml/20201026083752.13267-1-rppt@kernel.org
v6: https://lore.kernel.org/lkml/20200924132904.1391-1-rppt@kernel.org
v5: https://lore.kernel.org/lkml/20200916073539.3552-1-rppt@kernel.org
v4: https://lore.kernel.org/lkml/20200818141554.13945-1-rppt@kernel.org
v3: https://lore.kernel.org/lkml/20200804095035.18778-1-rppt@kernel.org
v2: https://lore.kernel.org/lkml/20200727162935.31714-1-rppt@kernel.org
v1: https://lore.kernel.org/lkml/20200720092435.17469-1-rppt@kernel.org
rfc-v2: https://lore.kernel.org/lkml/20200706172051.19465-1-rppt@kernel.org/
rfc-v1: https://lore.kernel.org/lkml/20200130162340.GA14232@rapoport-lnx/
rfc-v0: https://lore.kernel.org/lkml/1572171452-7958-1-git-send-email-rppt@kernel.o…
Mike Rapoport (9):
mm: add definition of PMD_PAGE_ORDER
mmap: make mlock_future_check() global
riscv/Kconfig: make direct map manipulation options depend on MMU
set_memory: allow set_direct_map_*_noflush() for multiple pages
set_memory: allow querying whether set_direct_map_*() is actually enabled
mm: introduce memfd_secret system call to create "secret" memory areas
PM: hibernate: disable when there are active secretmem users
arch, mm: wire up memfd_secret system call where relevant
secretmem: test: add basic selftest for memfd_secret(2)
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/cacheflush.h | 6 -
arch/arm64/include/asm/kfence.h | 2 +-
arch/arm64/include/asm/set_memory.h | 17 ++
arch/arm64/include/uapi/asm/unistd.h | 1 +
arch/arm64/kernel/machine_kexec.c | 1 +
arch/arm64/mm/mmu.c | 6 +-
arch/arm64/mm/pageattr.c | 23 +-
arch/riscv/Kconfig | 4 +-
arch/riscv/include/asm/set_memory.h | 4 +-
arch/riscv/include/asm/unistd.h | 1 +
arch/riscv/mm/pageattr.c | 8 +-
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/x86/include/asm/set_memory.h | 4 +-
arch/x86/mm/pat/set_memory.c | 8 +-
fs/dax.c | 11 +-
include/linux/pgtable.h | 3 +
include/linux/secretmem.h | 30 +++
include/linux/set_memory.h | 16 +-
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 6 +-
include/uapi/linux/magic.h | 1 +
kernel/power/hibernate.c | 5 +-
kernel/power/snapshot.c | 4 +-
kernel/sys_ni.c | 2 +
mm/Kconfig | 3 +
mm/Makefile | 1 +
mm/gup.c | 10 +
mm/internal.h | 3 +
mm/mlock.c | 3 +-
mm/mmap.c | 5 +-
mm/secretmem.c | 261 +++++++++++++++++++
mm/vmalloc.c | 5 +-
scripts/checksyscalls.sh | 4 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 3 +-
tools/testing/selftests/vm/memfd_secret.c | 296 ++++++++++++++++++++++
tools/testing/selftests/vm/run_vmtests.sh | 17 ++
39 files changed, 726 insertions(+), 53 deletions(-)
create mode 100644 arch/arm64/include/asm/set_memory.h
create mode 100644 include/linux/secretmem.h
create mode 100644 mm/secretmem.c
create mode 100644 tools/testing/selftests/vm/memfd_secret.c
--
2.28.0
Explicitly include stddef.h when building the BTI tests so that we have
a definition of NULL, with at least some toolchains this is not done
implicitly by anything else:
test.c: In function ‘start’:
test.c:214:25: error: ‘NULL’ undeclared (first use in this function)
214 | sigaction(SIGILL, &sa, NULL);
| ^~~~
test.c:20:1: note: ‘NULL’ is defined in header ‘<stddef.h>’; did you forget to ‘#include <stddef.h>’?
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/bti/test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/arm64/bti/test.c b/tools/testing/selftests/arm64/bti/test.c
index 656b04976ccc..67b77ab83c20 100644
--- a/tools/testing/selftests/arm64/bti/test.c
+++ b/tools/testing/selftests/arm64/bti/test.c
@@ -6,6 +6,7 @@
#include "system.h"
+#include <stddef.h>
#include <linux/errno.h>
#include <linux/auxvec.h>
#include <linux/signal.h>
--
2.20.1
The result of an expression consisting of a single relational operator is
already of the bool type and does not need to be evaluated explicitly.
No functional change.
Signed-off-by: Zhen Lei <thunder.leizhen(a)huawei.com>
---
tools/testing/selftests/mount/unprivileged-remount-test.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mount/unprivileged-remount-test.c b/tools/testing/selftests/mount/unprivileged-remount-test.c
index 584dc6bc3b06679..d2917054fe3ae56 100644
--- a/tools/testing/selftests/mount/unprivileged-remount-test.c
+++ b/tools/testing/selftests/mount/unprivileged-remount-test.c
@@ -204,7 +204,7 @@ bool test_unpriv_remount(const char *fstype, const char *mount_options,
if (!WIFEXITED(status)) {
die("child did not terminate cleanly\n");
}
- return WEXITSTATUS(status) == EXIT_SUCCESS ? true : false;
+ return WEXITSTATUS(status) == EXIT_SUCCESS;
}
create_and_enter_userns();
@@ -282,7 +282,7 @@ static bool test_priv_mount_unpriv_remount(void)
if (!WIFEXITED(status)) {
die("child did not terminate cleanly\n");
}
- return WEXITSTATUS(status) == EXIT_SUCCESS ? true : false;
+ return WEXITSTATUS(status) == EXIT_SUCCESS;
}
orig_mnt_flags = read_mnt_flags(orig_path);
--
2.26.0.106.g9fadedd
The use of typecheck() in KUNIT_EXPECT_EQ() and friends is causing more
problems than I think it's worth. Things like enums need to have their
values explicitly cast, and literals all need to be very precisely typed
for the code to compile.
While typechecking does have its uses, the additional overhead of having
lots of needless casts -- combined with the awkward error messages which
don't mention which types are involved -- makes tests less readable and
more difficult to write.
By removing the typecheck() call, the two arguments still need to be of
compatible types, but don't need to be of exactly the same time, which
seems a less confusing and more useful compromise.
Signed-off-by: David Gow <davidgow(a)google.com>
---
I appreciate that this is probably a bit controversial (and, indeed, I
was a bit hesitant about sending it out myself), but after sitting on it
for a few days, I still think this is probably an improvement overall.
The second patch does fix what I think is an actual bug, though, so even
if this isn't determined to be a good idea, it (or some equivalent)
should probably go through.
Cheers,
-- David
include/kunit/test.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 49601c4b98b8..4c56ffcb7403 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -775,7 +775,6 @@ void kunit_do_assertion(struct kunit *test,
do { \
typeof(left) __left = (left); \
typeof(right) __right = (right); \
- ((void)__typecheck(__left, __right)); \
\
KUNIT_ASSERTION(test, \
__left op __right, \
--
2.31.1.607.g51e8a6a459-goog
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an updated version of page_is_secretmem() changes.
This is based on v5.12-rc7-mmots-2021-04-15-16-28.
@Andrew, please let me know if you'd like me to rebase it differently or
resend the entire set.
v3:
* add missing put_compound_head() if we are to return NULL from
gup_page_range(), thanks David.
* add unlikely() to test for page_is_secretmem.
v2:
* move the check for secretmem page in gup_pte_range after we get a
reference to the page, per Matthew.
Mike Rapoport (2):
secretmem/gup: don't check if page is secretmem without reference
secretmem: optimize page_is_secretmem()
include/linux/secretmem.h | 26 +++++++++++++++++++++++++-
mm/gup.c | 6 +++---
mm/secretmem.c | 12 +-----------
3 files changed, 29 insertions(+), 15 deletions(-)
--
2.28.0
Mike Rapoport (2):
secretmem/gup: don't check if page is secretmem without reference
secretmem: optimize page_is_secretmem()
include/linux/secretmem.h | 26 +++++++++++++++++++++++++-
mm/gup.c | 8 +++++---
mm/secretmem.c | 12 +-----------
3 files changed, 31 insertions(+), 15 deletions(-)
--
2.28.0
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc8-mmots-2021-04-21-23-08", with the following applied first:
1. Peter's selftest cleanup series:
https://lore.kernel.org/patchwork/cover/1412450/
2. My patch to fix a pre-existing BUG_ON in an edge case:
https://lore.kernel.org/patchwork/patch/1419758/
Changelog
=========
v5->v6:
- Picked up {Reviewed,Acked}-by's.
- Rebased onto v5.12-rc8-mmots-2021-04-21-23-08.
- Put mistakenly removed delete_from_page_cache() back in the error path in
shmem_mfill_atomic_pte(). [Hugh]
- Keep shmem_mfill_atomic_pte() naming, instead of shmem_mcopy_... Likewise,
rename our new helper to mfill_atomic_install_pte(). [Hugh]
- Return directly instead of "goto out" in shmem_mfill_atomic_pte(), saving a
couple of lines. [Peter]
v4->v5:
- Picked up {Reviewed,Acked}-by's.
- Fix cleanup in error path in shmem_mcopy_atomic_pte(). [Hugh, Peter]
- Mention switching to lru_cache_add() in the commit message of 9/10. [Hugh]
- Split + reorder commits, so now we 1) implement the faulting path, 2)
implement the CONTINUE ioctl, and 3) advertise the feature. Squash the
documentation update into step (3). [Hugh, Peter]
- Reorder install_pte() cleanup to come before selftest changes. [Hugh]
v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
assumed that !vma_is_anonymous() was equivalent to "the page will be in the
page cache". But, in this case we have an optimization where we allocate a new
*anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commit 5 advertises that the feature is now available since at this point it's
fully implemented.
- Commit 6 is a final cleanup, modifying an existing code path to re-use a new
helper we've introduced.
- Commits 7, 8, 9, 10 update the userfaultfd selftest to exercise the feature.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: advertise shmem minor fault support
userfaultfd/shmem: modify shmem_mfill_atomic_pte to use install_pte()
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 2 +-
include/linux/shmem_fs.h | 19 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 120 +++-----
mm/userfaultfd.c | 175 ++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 364 insertions(+), 256 deletions(-)
--
2.31.1.527.g47e6f16901-goog
Hi,
This patch series introduces the futex2 syscalls.
* What happened to the current futex()?
For some years now, developers have been trying to add new features to
futex, but maintainers have been reluctant to accept then, given the
multiplexed interface full of legacy features and tricky to do big
changes. Some problems that people tried to address with patchsets are:
NUMA-awareness[0], smaller sized futexes[1], wait on multiple futexes[2].
NUMA, for instance, just doesn't fit the current API in a reasonable
way. Considering that, it's not possible to merge new features into the
current futex.
** The NUMA problem
At the current implementation, all futex kernel side infrastructure is
stored on a single node. Given that, all futex() calls issued by
processors that aren't located on that node will have a memory access
penalty when doing it.
** The 32bit sized futex problem
Embedded systems or anything with memory constrains would benefit of
using smaller sizes for the futex userspace integer. Also, a mutex
implementation can be done using just three values, so 8 bits is enough
for various scenarios.
** The wait on multiple problem
The use case lies in the Wine implementation of the Windows NT interface
WaitMultipleObjects. This Windows API function allows a thread to sleep
waiting on the first of a set of event sources (mutexes, timers, signal,
console input, etc) to signal. Considering this is a primitive
synchronization operation for Windows applications, being able to quickly
signal events on the producer side, and quickly go to sleep on the
consumer side is essential for good performance of those running over Wine.
[0] https://lore.kernel.org/lkml/20160505204230.932454245@linutronix.de/
[1] https://lore.kernel.org/lkml/20191221155659.3159-2-malteskarupke@web.de/
[2] https://lore.kernel.org/lkml/20200213214525.183689-1-andrealmeid@collabora.…
* The solution
As proposed by Peter Zijlstra and Florian Weimer[3], a new interface
is required to solve this, which must be designed with those features in
mind. futex2() is that interface. As opposed to the current multiplexed
interface, the new one should have one syscall per operation. This will
allow the maintainability of the API if it gets extended, and will help
users with type checking of arguments.
In particular, the new interface is extended to support the ability to
wait on any of a list of futexes at a time, which could be seen as a
vectored extension of the FUTEX_WAIT semantics.
[3] https://lore.kernel.org/lkml/20200303120050.GC2596@hirez.programming.kicks-…
* The interface
The new interface can be seen in details in the following patches, but
this is a high level summary of what the interface can do:
- Supports wake/wait semantics, as in futex()
- Supports requeue operations, similarly as FUTEX_CMP_REQUEUE, but with
individual flags for each address
- Supports waiting for a vector of futexes, using a new syscall named
futex_waitv()
- Supports variable sized futexes (8bits, 16bits, 32bits and 64bits)
- Supports NUMA-awareness operations, where the user can specify on
which memory node would like to operate
* Implementation
The internal implementation follows a similar design to the original futex.
Given that we want to replicate the same external behavior of current
futex, this should be somewhat expected. For some functions, like the
init and the code to get a shared key, I literally copied code and
comments from kernel/futex.c. I decided to do so instead of exposing the
original function as a public function since in that way we can freely
modify our implementation if required, without any impact on old futex.
Also, the comments precisely describes the details and corner cases of
the implementation.
Each patch contains a brief description of implementation, but patch 6
"docs: locking: futex2: Add documentation" adds a more complete document
about it.
* The patchset
This patchset can be also found at my git tree:
https://gitlab.collabora.com/tonyk/linux/-/tree/futex2-dev
- Patch 1: Implements wait/wake, and the basics foundations of futex2
- Patches 2-4: Implement the remaining features (shared, waitv, requeue).
- Patch 5: Adds the x86_x32 ABI handling. I kept it in a separated
patch since I'm not sure if x86_x32 is still a thing, or if it should
return -ENOSYS.
- Patch 6: Add a documentation file which details the interface and
the internal implementation.
- Patches 7-13: Selftests for all operations along with perf
support for futex2.
- Patch 14: While working on porting glibc for futex2, I found out
that there's a futex_wake() call at the user thread exit path, if
that thread was created with clone(..., CLONE_CHILD_SETTID, ...). In
order to make pthreads work with futex2, it was required to add
this patch. Note that this is more a proof-of-concept of what we
will need to do in future, rather than part of the interface and
shouldn't be merged as it is.
* Testing:
This patchset provides selftests for each operation and their flags.
Along with that, the following work was done:
** Stability
To stress the interface in "real world scenarios":
- glibc[4]: nptl's low level locking was modified to use futex2 API
(except for robust and PI things). All relevant nptl/ tests passed.
- Wine[5]: Proton/Wine was modified in order to use futex2() for the
emulation of Windows NT sync mechanisms based on futex, called "fsync".
Triple-A games with huge CPU's loads and tons of parallel jobs worked
as expected when compared with the previous FUTEX_WAIT_MULTIPLE
implementation at futex(). Some games issue 42k futex2() calls
per second.
- Full GNU/Linux distro: I installed the modified glibc in my host
machine, so all pthread's programs would use futex2(). After tweaking
systemd[6] to allow futex2() calls at seccomp, everything worked as
expected (web browsers do some syscall sandboxing and need some
configuration as well).
- perf: The perf benchmarks tests can also be used to stress the
interface, and they can be found in this patchset.
** Performance
- For comparing futex() and futex2() performance, I used the artificial
benchmarks implemented at perf (wake, wake-parallel, hash and
requeue). The setup was 200 runs for each test and using 8, 80, 800,
8000 for the number of threads, Note that for this test, I'm not using
patch 14 ("kernel: Enable waitpid() for futex2") , for reasons explained
at "The patchset" section.
- For the first three ones, I measured an average of 4% gain in
performance. This is not a big step, but it shows that the new
interface is at least comparable in performance with the current one.
- For requeue, I measured an average of 21% decrease in performance
compared to the original futex implementation. This is expected given
the new design with individual flags. The performance trade-offs are
explained at patch 4 ("futex2: Implement requeue operation").
[4] https://gitlab.collabora.com/tonyk/glibc/-/tree/futex2
[5] https://gitlab.collabora.com/tonyk/wine/-/tree/proton_5.13
[6] https://gitlab.collabora.com/tonyk/systemd
* FAQ
** "Where's the code for NUMA and FUTEX_8/16/64?"
The current code is already complex enough to take some time for
review, so I believe it's better to split that work out to a future
iteration of this patchset. Besides that, this RFC is the core part of the
infrastructure, and the following features will not pose big design
changes to it, the work will be more about wiring up the flags and
modifying some functions.
** "Where's the PI/robust stuff?"
As said by Peter Zijlstra at [3], all those new features are related to
the "simple" futex interface, that doesn't use PI or robust. Do we want
to have this complexity at futex2() and if so, should it be part of
this patchset or can it be future work?
Thanks,
André
* Changelog
Changes from v2:
- API now supports 64bit futexes, in addition to 8, 16 and 32.
- This API change will break the glibc[4] and Proton[5] ports for now.
- Refactored futex2_wait and futex2_waitv selftests
v2: https://lore.kernel.org/lkml/20210304004219.134051-1-andrealmeid@collabora.…
Changes from v1:
- Unified futex_set_timer_and_wait and __futex_wait code
- Dropped _carefull from linked list function calls
- Fixed typos on docs patch
- uAPI flags are now added as features are introduced, instead of all flags
in patch 1
- Removed struct futex_single_waiter in favor of an anon struct
v1: https://lore.kernel.org/lkml/20210215152404.250281-1-andrealmeid@collabora.…
André Almeida (13):
futex2: Implement wait and wake functions
futex2: Add support for shared futexes
futex2: Implement vectorized wait
futex2: Implement requeue operation
futex2: Add compatibility entry point for x86_x32 ABI
docs: locking: futex2: Add documentation
selftests: futex2: Add wake/wait test
selftests: futex2: Add timeout test
selftests: futex2: Add wouldblock test
selftests: futex2: Add waitv test
selftests: futex2: Add requeue test
perf bench: Add futex2 benchmark tests
kernel: Enable waitpid() for futex2
Documentation/locking/futex2.rst | 198 +++
Documentation/locking/index.rst | 1 +
MAINTAINERS | 2 +-
arch/arm/tools/syscall.tbl | 4 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 8 +
arch/x86/entry/syscalls/syscall_32.tbl | 4 +
arch/x86/entry/syscalls/syscall_64.tbl | 4 +
fs/inode.c | 1 +
include/linux/compat.h | 26 +
include/linux/fs.h | 1 +
include/linux/syscalls.h | 17 +
include/uapi/asm-generic/unistd.h | 14 +-
include/uapi/linux/futex.h | 31 +
init/Kconfig | 7 +
kernel/Makefile | 1 +
kernel/fork.c | 2 +
kernel/futex2.c | 1252 +++++++++++++++++
kernel/sys_ni.c | 9 +
tools/arch/x86/include/asm/unistd_64.h | 12 +
tools/include/uapi/asm-generic/unistd.h | 11 +-
.../arch/x86/entry/syscalls/syscall_64.tbl | 4 +
tools/perf/bench/bench.h | 4 +
tools/perf/bench/futex-hash.c | 24 +-
tools/perf/bench/futex-requeue.c | 57 +-
tools/perf/bench/futex-wake-parallel.c | 41 +-
tools/perf/bench/futex-wake.c | 37 +-
tools/perf/bench/futex.h | 47 +
tools/perf/builtin-bench.c | 18 +-
.../selftests/futex/functional/.gitignore | 3 +
.../selftests/futex/functional/Makefile | 6 +-
.../futex/functional/futex2_requeue.c | 164 +++
.../selftests/futex/functional/futex2_wait.c | 195 +++
.../selftests/futex/functional/futex2_waitv.c | 154 ++
.../futex/functional/futex_wait_timeout.c | 58 +-
.../futex/functional/futex_wait_wouldblock.c | 33 +-
.../testing/selftests/futex/functional/run.sh | 6 +
.../selftests/futex/include/futex2test.h | 112 ++
38 files changed, 2518 insertions(+), 52 deletions(-)
create mode 100644 Documentation/locking/futex2.rst
create mode 100644 kernel/futex2.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_requeue.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_wait.c
create mode 100644 tools/testing/selftests/futex/functional/futex2_waitv.c
create mode 100644 tools/testing/selftests/futex/include/futex2test.h
--
2.31.1
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 50a93f5f13d6..d8fa6c72b7ca 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
define RUN_TESTS
@for TEST in $(TEST_PROGS); do \
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 50a93f5f13d6..d8fa6c72b7ca 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
define RUN_TESTS
@for TEST in $(TEST_PROGS); do \
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index c9be64dc681d..cd3034602ea5 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
OUTPUT := $(shell pwd)
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 0ef203ec59fd..a5d40653a921 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
OUTPUT := $(shell pwd)
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index 3ed0134a764d..67386aa3f31d 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit 1233898ab758cbcf5f6fea10b8dd16a0b2c24fab ]
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 3 ++-
.../selftests/net/forwarding/mirror_lib.sh | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
index 6f3a70df63bc..e00435753008 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
@@ -120,12 +120,13 @@ __mirror_gre_test()
sleep 5
for ((i = 0; i < count; ++i)); do
+ local sip=$(mirror_gre_ipv6_addr 1 $i)::1
local dip=$(mirror_gre_ipv6_addr 1 $i)::2
local htun=h3-gt6-$i
local message
icmp6_capture_install $htun
- mirror_test v$h1 "" $dip $htun 100 10
+ mirror_test v$h1 $sip $dip $htun 100 10
icmp6_capture_uninstall $htun
done
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
index 13db1cb50e57..6406cd76a19d 100644
--- a/tools/testing/selftests/net/forwarding/mirror_lib.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -20,6 +20,13 @@ mirror_uninstall()
tc filter del dev $swp1 $direction pref 1000
}
+is_ipv6()
+{
+ local addr=$1; shift
+
+ [[ -z ${addr//[0-9a-fA-F:]/} ]]
+}
+
mirror_test()
{
local vrf_name=$1; shift
@@ -29,9 +36,17 @@ mirror_test()
local pref=$1; shift
local expect=$1; shift
+ if is_ipv6 $dip; then
+ local proto=-6
+ local type="icmp6 type=128" # Echo request.
+ else
+ local proto=
+ local type="icmp echoreq"
+ fi
+
local t0=$(tc_rule_stats_get $dev $pref)
- $MZ $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
- -c 10 -d 100msec -t icmp type=8
+ $MZ $proto $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
+ -c 10 -d 100msec -t $type
sleep 0.5
local t1=$(tc_rule_stats_get $dev $pref)
local delta=$((t1 - t0))
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit dda7f4fa55839baeb72ae040aeaf9ccf89d3e416 ]
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Jiri Pirko <jiri(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..33ddd01689be 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -507,8 +507,8 @@ do_red_test()
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
local diff=$((limit - backlog))
pct=$((100 * diff / limit))
- ((0 <= pct && pct <= 5))
- check_err $? "backlog $backlog / $limit expected <= 5% distance"
+ ((0 <= pct && pct <= 10))
+ check_err $? "backlog $backlog / $limit expected <= 10% distance"
log_test "TC $((vlan - 10)): RED backlog > limit"
stop_traffic
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..9a41d8bb9ff1 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Russell Currey <ruscur(a)russell.cc>
[ Upstream commit 3a72c94ebfb1f171eba0715998010678a09ec796 ]
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Signed-off-by: Russell Currey <ruscur(a)russell.cc>
Acked-by: Daniel Axtens <dja(a)axtens.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210223070227.2916871-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/entry_flush.c | 2 +-
tools/testing/selftests/powerpc/security/flush_utils.h | 4 ++++
tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c
index 78cf914fa321..68ce377b205e 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -53,7 +53,7 @@ int entry_flush_test(void)
entry_flush = entry_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h
index 07a5eb301466..7a3d60292916 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.h
+++ b/tools/testing/selftests/powerpc/security/flush_utils.h
@@ -9,6 +9,10 @@
#define CACHELINE_SIZE 128
+#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) | \
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))
+
void syscall_loop(char *p, unsigned long iterations,
unsigned long zero_size);
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c
index 7565fd786640..f73484a6470f 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -54,7 +54,7 @@ int rfi_flush_test(void)
rfi_flush = rfi_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit 1233898ab758cbcf5f6fea10b8dd16a0b2c24fab ]
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 3 ++-
.../selftests/net/forwarding/mirror_lib.sh | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
index 6f3a70df63bc..e00435753008 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
@@ -120,12 +120,13 @@ __mirror_gre_test()
sleep 5
for ((i = 0; i < count; ++i)); do
+ local sip=$(mirror_gre_ipv6_addr 1 $i)::1
local dip=$(mirror_gre_ipv6_addr 1 $i)::2
local htun=h3-gt6-$i
local message
icmp6_capture_install $htun
- mirror_test v$h1 "" $dip $htun 100 10
+ mirror_test v$h1 $sip $dip $htun 100 10
icmp6_capture_uninstall $htun
done
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
index 13db1cb50e57..6406cd76a19d 100644
--- a/tools/testing/selftests/net/forwarding/mirror_lib.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -20,6 +20,13 @@ mirror_uninstall()
tc filter del dev $swp1 $direction pref 1000
}
+is_ipv6()
+{
+ local addr=$1; shift
+
+ [[ -z ${addr//[0-9a-fA-F:]/} ]]
+}
+
mirror_test()
{
local vrf_name=$1; shift
@@ -29,9 +36,17 @@ mirror_test()
local pref=$1; shift
local expect=$1; shift
+ if is_ipv6 $dip; then
+ local proto=-6
+ local type="icmp6 type=128" # Echo request.
+ else
+ local proto=
+ local type="icmp echoreq"
+ fi
+
local t0=$(tc_rule_stats_get $dev $pref)
- $MZ $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
- -c 10 -d 100msec -t icmp type=8
+ $MZ $proto $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
+ -c 10 -d 100msec -t $type
sleep 0.5
local t1=$(tc_rule_stats_get $dev $pref)
local delta=$((t1 - t0))
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit dda7f4fa55839baeb72ae040aeaf9ccf89d3e416 ]
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Jiri Pirko <jiri(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..33ddd01689be 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -507,8 +507,8 @@ do_red_test()
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
local diff=$((limit - backlog))
pct=$((100 * diff / limit))
- ((0 <= pct && pct <= 5))
- check_err $? "backlog $backlog / $limit expected <= 5% distance"
+ ((0 <= pct && pct <= 10))
+ check_err $? "backlog $backlog / $limit expected <= 10% distance"
log_test "TC $((vlan - 10)): RED backlog > limit"
stop_traffic
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..9a41d8bb9ff1 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Russell Currey <ruscur(a)russell.cc>
[ Upstream commit 3a72c94ebfb1f171eba0715998010678a09ec796 ]
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Signed-off-by: Russell Currey <ruscur(a)russell.cc>
Acked-by: Daniel Axtens <dja(a)axtens.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210223070227.2916871-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/entry_flush.c | 2 +-
tools/testing/selftests/powerpc/security/flush_utils.h | 4 ++++
tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c
index 78cf914fa321..68ce377b205e 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -53,7 +53,7 @@ int entry_flush_test(void)
entry_flush = entry_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h
index 07a5eb301466..7a3d60292916 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.h
+++ b/tools/testing/selftests/powerpc/security/flush_utils.h
@@ -9,6 +9,10 @@
#define CACHELINE_SIZE 128
+#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) | \
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))
+
void syscall_loop(char *p, unsigned long iterations,
unsigned long zero_size);
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c
index 7565fd786640..f73484a6470f 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -54,7 +54,7 @@ int rfi_flush_test(void)
rfi_flush = rfi_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit 1233898ab758cbcf5f6fea10b8dd16a0b2c24fab ]
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 3 ++-
.../selftests/net/forwarding/mirror_lib.sh | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
index 6f3a70df63bc..e00435753008 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
@@ -120,12 +120,13 @@ __mirror_gre_test()
sleep 5
for ((i = 0; i < count; ++i)); do
+ local sip=$(mirror_gre_ipv6_addr 1 $i)::1
local dip=$(mirror_gre_ipv6_addr 1 $i)::2
local htun=h3-gt6-$i
local message
icmp6_capture_install $htun
- mirror_test v$h1 "" $dip $htun 100 10
+ mirror_test v$h1 $sip $dip $htun 100 10
icmp6_capture_uninstall $htun
done
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
index 13db1cb50e57..6406cd76a19d 100644
--- a/tools/testing/selftests/net/forwarding/mirror_lib.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -20,6 +20,13 @@ mirror_uninstall()
tc filter del dev $swp1 $direction pref 1000
}
+is_ipv6()
+{
+ local addr=$1; shift
+
+ [[ -z ${addr//[0-9a-fA-F:]/} ]]
+}
+
mirror_test()
{
local vrf_name=$1; shift
@@ -29,9 +36,17 @@ mirror_test()
local pref=$1; shift
local expect=$1; shift
+ if is_ipv6 $dip; then
+ local proto=-6
+ local type="icmp6 type=128" # Echo request.
+ else
+ local proto=
+ local type="icmp echoreq"
+ fi
+
local t0=$(tc_rule_stats_get $dev $pref)
- $MZ $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
- -c 10 -d 100msec -t icmp type=8
+ $MZ $proto $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
+ -c 10 -d 100msec -t $type
sleep 0.5
local t1=$(tc_rule_stats_get $dev $pref)
local delta=$((t1 - t0))
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit dda7f4fa55839baeb72ae040aeaf9ccf89d3e416 ]
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Jiri Pirko <jiri(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..33ddd01689be 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -507,8 +507,8 @@ do_red_test()
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
local diff=$((limit - backlog))
pct=$((100 * diff / limit))
- ((0 <= pct && pct <= 5))
- check_err $? "backlog $backlog / $limit expected <= 5% distance"
+ ((0 <= pct && pct <= 10))
+ check_err $? "backlog $backlog / $limit expected <= 10% distance"
log_test "TC $((vlan - 10)): RED backlog > limit"
stop_traffic
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..9a41d8bb9ff1 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Russell Currey <ruscur(a)russell.cc>
[ Upstream commit 3a72c94ebfb1f171eba0715998010678a09ec796 ]
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Signed-off-by: Russell Currey <ruscur(a)russell.cc>
Acked-by: Daniel Axtens <dja(a)axtens.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210223070227.2916871-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/entry_flush.c | 2 +-
tools/testing/selftests/powerpc/security/flush_utils.h | 4 ++++
tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c
index 78cf914fa321..68ce377b205e 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -53,7 +53,7 @@ int entry_flush_test(void)
entry_flush = entry_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h
index 07a5eb301466..7a3d60292916 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.h
+++ b/tools/testing/selftests/powerpc/security/flush_utils.h
@@ -9,6 +9,10 @@
#define CACHELINE_SIZE 128
+#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) | \
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))
+
void syscall_loop(char *p, unsigned long iterations,
unsigned long zero_size);
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c
index 7565fd786640..f73484a6470f 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -54,7 +54,7 @@ int rfi_flush_test(void)
rfi_flush = rfi_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
--
2.30.2
TL;DR: Add support to kunit_tool to dispatch tests via QEMU. Also add
support to immediately shutdown a kernel after running KUnit tests.
Background
----------
KUnit has supported running on all architectures for quite some time;
however, kunit_tool - the script commonly used to invoke KUnit tests -
has only fully supported KUnit run on UML. Its functionality has been
broken up for some time to separate the configure, build, run, and parse
phases making it possible to be used in part on other architectures to a
small extent. Nevertheless, kunit_tool has not supported running tests
on other architectures.
What this patchset does
-----------------------
This patchset introduces first class support to kunit_tool for KUnit to
be run on many popular architectures via QEMU. It does this by adding
two new flags: `--arch` and `--cross_compile`.
`--arch` allows an architecture to be specified by the name the
architecture is given in `arch/`. It uses the specified architecture to
select a minimal amount of Kconfigs and QEMU configs needed for the
architecture to run in QEMU and provide a console from which KTAP
results can be scraped.
`--cross_compile` allows a toolchain prefix to be specified to make
similar to how `CROSS_COMPILE` is used.
Additionally, this patchset revives the previously considered "kunit:
tool: add support for QEMU"[1] patchs. The motivation for this new
kernel command line flags, `kunit_shutdown`, is to better support
running KUnit tests inside of QEMU. For most popular architectures, QEMU
can be made to terminate when the Linux kernel that is being run is
reboted, halted, or powered off. As Kees pointed out in a previous
discussion[2], it is possible to make a kernel initrd that can reboot
the kernel immediately, doing this for every architecture would likely
be infeasible. Instead, just having an option for the kernel to shutdown
when it is done with testing seems a lot simpler, especially since it is
an option which would only available in testing configurations of the
kernel anyway.
What discussion remains for this patchset?
------------------------------------------
The first most obvious thing is settling the debate about
`kunit_shutdown`. If I recall correctly, Kees suggested that it might be
better to just add a new initrd; however, as I mentioned above, now to
support many new architectures, it may be substantially easier to
support this option. So I am hoping with this new usecase, the argument
for `kunit_shutdown` will be more compelling.
The second and likely harder issue is figuring out the best way to
configure and provide configs for running KUnit tests via QEMU. I
provide a pretty primitive way in this patchset which is not super
flexible; for example, for our PPC support we have it set to build big
endian, and POWER8 - we currently don't support a way to change that.
Nevertheless, having sensible defaults is handy too, so we will probably
want to have some support for overriding defaults, while still being
able to have defaults.
[1] http://patches.linaro.org/patch/208336/
[2] https://lkml.org/lkml/2020/6/26/988
Brendan Higgins (3):
Documentation: Add kunit_shutdown to kernel-parameters.txt
kunit: tool: add support for QEMU
Documentation: kunit: document support for QEMU in kunit_tool
David Gow (1):
kunit: Add 'kunit_shutdown' option
.../admin-guide/kernel-parameters.txt | 8 +
Documentation/dev-tools/kunit/usage.rst | 37 +++-
lib/kunit/executor.c | 20 ++
tools/testing/kunit/kunit.py | 33 ++-
tools/testing/kunit/kunit_config.py | 2 +-
tools/testing/kunit/kunit_kernel.py | 209 +++++++++++++++---
tools/testing/kunit/kunit_parser.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 15 +-
8 files changed, 278 insertions(+), 48 deletions(-)
base-commit: 7af08140979a6e7e12b78c93b8625c8d25b084e2
--
2.31.1.498.g6c1eba8ee3d-goog
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v4:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
- Fixed error code sign.
- Corrected version message
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..3f941b1f4e78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != -E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
Clang's integrated assembler does not allow symbols with non-absolute
values to be reassigned. Modify the interrupt entry loop macro to be
compatible with IAS by using a label and an offset.
Cc: Jian Cai <caij2003(a)gmail.com>
Signed-off-by: Bill Wendling <morbo(a)google.com>
References: https://lore.kernel.org/lkml/20200714233024.1789985-1-caij2003@gmail.com/
---
tools/testing/selftests/kvm/lib/x86_64/handlers.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/handlers.S b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
index aaf7bc7d2ce1..3f9181e9a0a7 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/handlers.S
+++ b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
@@ -54,9 +54,9 @@ idt_handlers:
.align 8
/* Fetch current address and append it to idt_handlers. */
- current_handler = .
+0 :
.pushsection .rodata
-.quad current_handler
+ .quad 0b
.popsection
.if ! \has_error
--
2.29.2.576.ga3fc446d84-goog
I'm just starting my learning curve on SGX, so I don't know if I've missed
some setup for the SGX device entries. After looking at arch/x86/kernel/cpu/sgx/driver.c
I see that there is no mode value for either sgx_dev_enclave or sgx_dev_provision.
With this patch I can get the SGX self test to complete:
sudo ./test_sgx
Warning: no execute permissions on device file /dev/sgx_enclave
0x0000000000000000 0x0000000000002000 0x03
0x0000000000002000 0x0000000000001000 0x05
0x0000000000003000 0x0000000000003000 0x03
SUCCESS
Is the warning even necessary ?
Tim
Functionally, this just means that the test output will be slightly
changed and it'll now depend on CONFIG_KUNIT=y/m.
It'll still run at boot time and can still be built as a loadable
module.
There was a pre-existing patch to convert this test that I found later,
here [1]. Compared to [1], this patch doesn't rename files and uses
KUnit features more heavily (i.e. does more than converting pr_err()
calls to KUNIT_FAIL()).
What this conversion gives us:
* a shorter test thanks to KUnit's macros
* a way to run this a bit more easily via kunit.py (and
CONFIG_KUNIT_ALL_TESTS=y) [2]
* a structured way of reporting pass/fail
* uses kunit-managed allocations to avoid the risk of memory leaks
* more descriptive error messages:
* i.e. it prints out which fields are invalid, what the expected
values are, etc.
What this conversion does not do:
* change the name of the file (and thus the name of the module)
* change the name of the config option
Leaving these as-is for now to minimize the impact to people wanting to
run this test. IMO, that concern trumps following KUnit's style guide
for both names, at least for now.
[1] https://lore.kernel.org/linux-kselftest/20201015014616.309000-1-vitor@massa…
[2] Can be run via
$ ./tools/testing/kunit/kunit.py run --kunitconfig /dev/stdin <<EOF
CONFIG_KUNIT=y
CONFIG_TEST_LIST_SORT=y
EOF
[16:55:56] Configuring KUnit Kernel ...
[16:55:56] Building KUnit Kernel ...
[16:56:29] Starting KUnit Kernel ...
[16:56:32] ============================================================
[16:56:32] ======== [PASSED] list_sort ========
[16:56:32] [PASSED] list_sort_test
[16:56:32] ============================================================
[16:56:32] Testing complete. 1 tests run. 0 failed. 0 crashed.
[16:56:32] Elapsed time: 35.668s total, 0.001s configuring, 32.725s building, 0.000s running
Note: the build time is as after a `make mrproper`.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
lib/Kconfig.debug | 5 +-
lib/test_list_sort.c | 128 +++++++++++++++++--------------------------
2 files changed, 54 insertions(+), 79 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 417c3d3e521b..09a0cc8a55cc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1999,8 +1999,9 @@ config LKDTM
Documentation/fault-injection/provoke-crashes.rst
config TEST_LIST_SORT
- tristate "Linked list sorting test"
- depends on DEBUG_KERNEL || m
+ tristate "Linked list sorting test" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ default KUNIT_ALL_TESTS
help
Enable this to turn on 'list_sort()' function test. This test is
executed only once during system boot (so affects only boot time),
diff --git a/lib/test_list_sort.c b/lib/test_list_sort.c
index 1f017d3b610e..ccfd98dbf57c 100644
--- a/lib/test_list_sort.c
+++ b/lib/test_list_sort.c
@@ -1,5 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
-#define pr_fmt(fmt) "list_sort_test: " fmt
+#include <kunit/test.h>
#include <linux/kernel.h>
#include <linux/list_sort.h>
@@ -23,67 +23,52 @@ struct debug_el {
struct list_head list;
unsigned int poison2;
int value;
- unsigned serial;
+ unsigned int serial;
};
-/* Array, containing pointers to all elements in the test list */
-static struct debug_el **elts __initdata;
-
-static int __init check(struct debug_el *ela, struct debug_el *elb)
+static void check(struct kunit *test, struct debug_el *ela, struct debug_el *elb)
{
- if (ela->serial >= TEST_LIST_LEN) {
- pr_err("error: incorrect serial %d\n", ela->serial);
- return -EINVAL;
- }
- if (elb->serial >= TEST_LIST_LEN) {
- pr_err("error: incorrect serial %d\n", elb->serial);
- return -EINVAL;
- }
- if (elts[ela->serial] != ela || elts[elb->serial] != elb) {
- pr_err("error: phantom element\n");
- return -EINVAL;
- }
- if (ela->poison1 != TEST_POISON1 || ela->poison2 != TEST_POISON2) {
- pr_err("error: bad poison: %#x/%#x\n",
- ela->poison1, ela->poison2);
- return -EINVAL;
- }
- if (elb->poison1 != TEST_POISON1 || elb->poison2 != TEST_POISON2) {
- pr_err("error: bad poison: %#x/%#x\n",
- elb->poison1, elb->poison2);
- return -EINVAL;
- }
- return 0;
+ struct debug_el **elts = test->priv;
+
+ KUNIT_EXPECT_LT_MSG(test, ela->serial, (unsigned int)TEST_LIST_LEN, "incorrect serial");
+ KUNIT_EXPECT_LT_MSG(test, elb->serial, (unsigned int)TEST_LIST_LEN, "incorrect serial");
+
+ KUNIT_EXPECT_PTR_EQ_MSG(test, elts[ela->serial], ela, "phantom element");
+ KUNIT_EXPECT_PTR_EQ_MSG(test, elts[elb->serial], elb, "phantom element");
+
+ KUNIT_EXPECT_EQ_MSG(test, ela->poison1, TEST_POISON1, "bad poison");
+ KUNIT_EXPECT_EQ_MSG(test, ela->poison2, TEST_POISON2, "bad poison");
+
+ KUNIT_EXPECT_EQ_MSG(test, elb->poison1, TEST_POISON1, "bad poison");
+ KUNIT_EXPECT_EQ_MSG(test, elb->poison2, TEST_POISON2, "bad poison");
}
-static int __init cmp(void *priv, struct list_head *a, struct list_head *b)
+/* `priv` is the test pointer so check() can fail the test if the list is invalid. */
+static int cmp(void *priv, struct list_head *a, struct list_head *b)
{
struct debug_el *ela, *elb;
ela = container_of(a, struct debug_el, list);
elb = container_of(b, struct debug_el, list);
- check(ela, elb);
+ check(priv, ela, elb);
return ela->value - elb->value;
}
-static int __init list_sort_test(void)
+static void list_sort_test(struct kunit *test)
{
- int i, count = 1, err = -ENOMEM;
- struct debug_el *el;
+ int i, count = 1;
+ struct debug_el *el, **elts;
struct list_head *cur;
LIST_HEAD(head);
- pr_debug("start testing list_sort()\n");
-
- elts = kcalloc(TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL);
- if (!elts)
- return err;
+ elts = kunit_kcalloc(test, TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, elts);
+ test->priv = elts;
for (i = 0; i < TEST_LIST_LEN; i++) {
- el = kmalloc(sizeof(*el), GFP_KERNEL);
- if (!el)
- goto exit;
+ el = kunit_kmalloc(test, sizeof(*el), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, el);
/* force some equivalencies */
el->value = prandom_u32() % (TEST_LIST_LEN / 3);
@@ -94,55 +79,44 @@ static int __init list_sort_test(void)
list_add_tail(&el->list, &head);
}
- list_sort(NULL, &head, cmp);
+ list_sort(test, &head, cmp);
- err = -EINVAL;
for (cur = head.next; cur->next != &head; cur = cur->next) {
struct debug_el *el1;
int cmp_result;
- if (cur->next->prev != cur) {
- pr_err("error: list is corrupted\n");
- goto exit;
- }
+ KUNIT_ASSERT_PTR_EQ_MSG(test, cur->next->prev, cur,
+ "list is corrupted");
- cmp_result = cmp(NULL, cur, cur->next);
- if (cmp_result > 0) {
- pr_err("error: list is not sorted\n");
- goto exit;
- }
+ cmp_result = cmp(test, cur, cur->next);
+ KUNIT_ASSERT_LE_MSG(test, cmp_result, 0, "list is not sorted");
el = container_of(cur, struct debug_el, list);
el1 = container_of(cur->next, struct debug_el, list);
- if (cmp_result == 0 && el->serial >= el1->serial) {
- pr_err("error: order of equivalent elements not "
- "preserved\n");
- goto exit;
+ if (cmp_result == 0) {
+ KUNIT_ASSERT_LE_MSG(test, el->serial, el1->serial,
+ "order of equivalent elements not preserved");
}
- if (check(el, el1)) {
- pr_err("error: element check failed\n");
- goto exit;
- }
+ check(test, el, el1);
count++;
}
- if (head.prev != cur) {
- pr_err("error: list is corrupted\n");
- goto exit;
- }
+ KUNIT_EXPECT_PTR_EQ_MSG(test, head.prev, cur, "list is corrupted");
+ KUNIT_EXPECT_EQ_MSG(test, count, TEST_LIST_LEN,
+ "list length changed after sorting!");
+}
- if (count != TEST_LIST_LEN) {
- pr_err("error: bad list length %d", count);
- goto exit;
- }
+static struct kunit_case list_sort_cases[] = {
+ KUNIT_CASE(list_sort_test),
+ {}
+};
+
+static struct kunit_suite list_sort_suite = {
+ .name = "list_sort",
+ .test_cases = list_sort_cases,
+};
+
+kunit_test_suites(&list_sort_suite);
- err = 0;
-exit:
- for (i = 0; i < TEST_LIST_LEN; i++)
- kfree(elts[i]);
- kfree(elts);
- return err;
-}
-module_init(list_sort_test);
MODULE_LICENSE("GPL");
--
2.31.1.498.g6c1eba8ee3d-goog
Add in:
* kunit_kmalloc_array() and wire up kunit_kmalloc() to be a special
case of it.
* kunit_kcalloc() for symmetry with kunit_kzalloc()
This should using KUnit more natural by making it more similar to the
existing *alloc() APIs.
And while we shouldn't necessarily be writing unit tests where overflow
should be a concern, it can't hurt to be safe.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
---
v1 -> v2: s/kzalloc/kcalloc in doc comment.
---
include/kunit/test.h | 36 ++++++++++++++++++++++++++++++++----
lib/kunit/test.c | 22 ++++++++++++----------
2 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 49601c4b98b8..e8ecb69dd567 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -577,16 +577,30 @@ static inline int kunit_destroy_named_resource(struct kunit *test,
void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/**
- * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * kunit_kmalloc_array() - Like kmalloc_array() except the allocation is *test managed*.
* @test: The test context object.
+ * @n: number of elements.
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * Just like `kmalloc(...)`, except the allocation is managed by the test case
+ * Just like `kmalloc_array(...)`, except the allocation is managed by the test case
* and is automatically cleaned up after the test case concludes. See &struct
* kunit_resource for more information.
*/
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp);
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t flags);
+
+/**
+ * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * @test: The test context object.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kmalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+{
+ return kunit_kmalloc_array(test, 1, size, gfp);
+}
/**
* kunit_kfree() - Like kfree except for allocations managed by KUnit.
@@ -601,13 +615,27 @@ void kunit_kfree(struct kunit *test, const void *ptr);
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * See kzalloc() and kunit_kmalloc() for more information.
+ * See kzalloc() and kunit_kmalloc_array() for more information.
*/
static inline void *kunit_kzalloc(struct kunit *test, size_t size, gfp_t gfp)
{
return kunit_kmalloc(test, size, gfp | __GFP_ZERO);
}
+/**
+ * kunit_kcalloc() - Just like kunit_kmalloc_array(), but zeroes the allocation.
+ * @test: The test context object.
+ * @n: number of elements.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kcalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp_t flags)
+{
+ return kunit_kmalloc_array(test, n, size, flags | __GFP_ZERO);
+}
+
void kunit_cleanup(struct kunit *test);
void kunit_log_append(char *log, const char *fmt, ...);
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 2f6cc0123232..41fa46b14c3b 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -573,41 +573,43 @@ int kunit_destroy_resource(struct kunit *test, kunit_resource_match_t match,
}
EXPORT_SYMBOL_GPL(kunit_destroy_resource);
-struct kunit_kmalloc_params {
+struct kunit_kmalloc_array_params {
+ size_t n;
size_t size;
gfp_t gfp;
};
-static int kunit_kmalloc_init(struct kunit_resource *res, void *context)
+static int kunit_kmalloc_array_init(struct kunit_resource *res, void *context)
{
- struct kunit_kmalloc_params *params = context;
+ struct kunit_kmalloc_array_params *params = context;
- res->data = kmalloc(params->size, params->gfp);
+ res->data = kmalloc_array(params->n, params->size, params->gfp);
if (!res->data)
return -ENOMEM;
return 0;
}
-static void kunit_kmalloc_free(struct kunit_resource *res)
+static void kunit_kmalloc_array_free(struct kunit_resource *res)
{
kfree(res->data);
}
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
- struct kunit_kmalloc_params params = {
+ struct kunit_kmalloc_array_params params = {
.size = size,
+ .n = n,
.gfp = gfp
};
return kunit_alloc_resource(test,
- kunit_kmalloc_init,
- kunit_kmalloc_free,
+ kunit_kmalloc_array_init,
+ kunit_kmalloc_array_free,
gfp,
¶ms);
}
-EXPORT_SYMBOL_GPL(kunit_kmalloc);
+EXPORT_SYMBOL_GPL(kunit_kmalloc_array);
void kunit_kfree(struct kunit *test, const void *ptr)
{
base-commit: cda689f8708b6bef0b921c3a17fcdecbe959a079
--
2.31.1.527.g47e6f16901-goog
The readahead size used to be 2MB, thus it's reasonable to set the file
size as 4MB when checking check_file_mmap().
However since commit c2e4cd57cfa1 ("block: lift setting the readahead
size into the block layer"), readahead size could be as large as twice
the io_opt, and thus the hardcoded file size no longer works.
check_file_mmap() may report "Read-ahead pages reached the end of the
file" when the readahead size actually exceeds the file size in this
case.
To fix this issue, read the exact readahead window size via BLKRAGET
ioctl. Since now we have the readahead window size, take a more
fine-grained check. It is worth noting that this fine-grained check may
be broken as the sync readahead algorithm of kernel changes. It may be
acceptable since the algorithm of readahead ranging should be quite
stable, and we could tune the test case accorddingly if the algorithm
indeed changes.
Reported-by: James Wang <jnwang(a)linux.alibaba.com>
Acked-by: Ricardo Cañuelo <ricardo.canuelo(a)collabora.com>
Signed-off-by: Jeffle Xu <jefflexu(a)linux.alibaba.com>
---
changes since v3:
- make the check more fine-grained since we have the exact readahead
window size now, as suggested by Ricardo Cañuelo
chnages since v2:
- add 'Reported-by'
chnages since v1:
- add the test name "mincore" in the subject line
- add the error message in commit message
- rename @filesize to @file_size to keep a more consistent naming
convention
---
.../selftests/mincore/mincore_selftest.c | 96 +++++++++++++------
1 file changed, 68 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c
index 5a1e85ff5d32..369b35af4b4f 100644
--- a/tools/testing/selftests/mincore/mincore_selftest.c
+++ b/tools/testing/selftests/mincore/mincore_selftest.c
@@ -15,6 +15,11 @@
#include <string.h>
#include <fcntl.h>
#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/sysmacros.h>
+#include <sys/mount.h>
#include "../kselftest.h"
#include "../kselftest_harness.h"
@@ -193,12 +198,44 @@ TEST(check_file_mmap)
int retval;
int page_size;
int fd;
- int i;
+ int i, start, end;
int ra_pages = 0;
+ long ra_size, file_size;
+ struct stat stats;
+ dev_t devt;
+ unsigned int major, minor;
+ char devpath[32];
+
+ retval = stat(".", &stats);
+ ASSERT_EQ(0, retval) {
+ TH_LOG("Can't stat pwd: %s", strerror(errno));
+ }
+
+ devt = stats.st_dev;
+ major = major(devt);
+ minor = minor(devt);
+ snprintf(devpath, sizeof(devpath), "/dev/block/%u:%u", major, minor);
+
+ fd = open(devpath, O_RDONLY);
+ ASSERT_NE(-1, fd) {
+ TH_LOG("Can't open underlying disk %s", strerror(errno));
+ }
+
+ retval = ioctl(fd, BLKRAGET, &ra_size);
+ ASSERT_EQ(0, retval) {
+ TH_LOG("Error ioctl with the underlying disk: %s", strerror(errno));
+ }
+
+ /*
+ * BLKRAGET ioctl returns the readahead size in sectors (512 bytes).
+ * Make file_size large enough to contain the readahead window.
+ */
+ ra_size *= 512;
+ file_size = ra_size * 2;
page_size = sysconf(_SC_PAGESIZE);
- vec_size = FILE_SIZE / page_size;
- if (FILE_SIZE % page_size)
+ vec_size = file_size / page_size;
+ if (file_size % page_size)
vec_size++;
vec = calloc(vec_size, sizeof(unsigned char));
@@ -213,7 +250,7 @@ TEST(check_file_mmap)
strerror(errno));
}
errno = 0;
- retval = fallocate(fd, 0, 0, FILE_SIZE);
+ retval = fallocate(fd, 0, 0, file_size);
ASSERT_EQ(0, retval) {
TH_LOG("Error allocating space for the temporary file: %s",
strerror(errno));
@@ -223,12 +260,12 @@ TEST(check_file_mmap)
* Map the whole file, the pages shouldn't be fetched yet.
*/
errno = 0;
- addr = mmap(NULL, FILE_SIZE, PROT_READ | PROT_WRITE,
+ addr = mmap(NULL, file_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
ASSERT_NE(MAP_FAILED, addr) {
TH_LOG("mmap error: %s", strerror(errno));
}
- retval = mincore(addr, FILE_SIZE, vec);
+ retval = mincore(addr, file_size, vec);
ASSERT_EQ(0, retval);
for (i = 0; i < vec_size; i++) {
ASSERT_EQ(0, vec[i]) {
@@ -240,38 +277,41 @@ TEST(check_file_mmap)
* Touch a page in the middle of the mapping. We expect the next
* few pages (the readahead window) to be populated too.
*/
- addr[FILE_SIZE / 2] = 1;
- retval = mincore(addr, FILE_SIZE, vec);
+ addr[file_size / 2] = 1;
+ retval = mincore(addr, file_size, vec);
ASSERT_EQ(0, retval);
- ASSERT_EQ(1, vec[FILE_SIZE / 2 / page_size]) {
- TH_LOG("Page not found in memory after use");
- }
- i = FILE_SIZE / 2 / page_size + 1;
- while (i < vec_size && vec[i]) {
- ra_pages++;
- i++;
- }
- EXPECT_GT(ra_pages, 0) {
- TH_LOG("No read-ahead pages found in memory");
- }
+ /*
+ * Readahead window is [start, end). So far the sync readahead
+ * algorithm takes the page that triggers the page fault as the
+ * midpoint.
+ */
+ ra_pages = ra_size / page_size;
+ start = file_size / 2 / page_size - ra_pages / 2;
+ end = start + ra_pages;
- EXPECT_LT(i, vec_size) {
- TH_LOG("Read-ahead pages reached the end of the file");
+ /*
+ * Check there's no hole in the readahead window.
+ */
+ for (i = start; i < end; i++) {
+ ASSERT_EQ(1, vec[i]) {
+ TH_LOG("Hole found in read-ahead window");
+ }
}
+
/*
- * End of the readahead window. The rest of the pages shouldn't
- * be in memory.
+ * Check there's no page beyond the readahead window.
*/
- if (i < vec_size) {
- while (i < vec_size && !vec[i])
- i++;
- EXPECT_EQ(vec_size, i) {
+ for (i = 0; i < vec_size; i++) {
+ if (i == start)
+ i = end;
+
+ EXPECT_EQ(0, vec[i]) {
TH_LOG("Unexpected page in memory beyond readahead window");
}
}
- munmap(addr, FILE_SIZE);
+ munmap(addr, file_size);
close(fd);
free(vec);
}
--
2.27.0
It is documented in Documentation/admin-guide/hw-vuln/spectre.rst, that
disabling indirect branch speculation for a user-space process creates
more overhead and cause it to run slower. The performance hit varies by
CPU, but on the AMD A4-9120C and A6-9220C CPUs, a simple ping-pong using
pipes between two processes runs ~10x slower when disabling IB
speculation.
Patch 2, included in this RFC but not intended for commit, is a simple
program that demonstrates this issue. Running on a A4-9120C without IB
speculation disabled, each process ping-pong takes ~7us:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1936300, iter/sec: 135383, us/iter: 7
But when IB speculation is disabled, that number increases
significantly:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 16384, t: 1500518, iter/sec: 10918, us/iter: 91
Although this test is a worst-case scenario, we can also consider a real
situation: an audio server (i.e. pulse). If we imagine a low-latency
capture, with 10ms packets and a concurrent task on the same CPU (i.e.
video encoding, for a video call), the audio server will preempt the
CPU at a rate of 100HZ. At 91us overhead per preemption (switching to
and from the audio process), that's 0.9% overhead for one process doing
preemption. In real-world testing (on a A4-9120C), I've seen 9% of CPU
used by IBPB when doing a 2-person video call.
With this patch, the number of IBPBs issued can be reduced to the
minimum necessary, only when there's a potential attacker->victim
process switch.
Running on the same A4-9120C device, this patch reduces the performance
hit of IBPB by ~half, as expected:
localhost ~ # taskset 1 /usr/local/bin/test ds
...
iters: 32768, t: 1824043, iter/sec: 17964, us/iter: 55
It should be noted, CPUs from multiple vendors experience a performance
hit due to IBPB. I also tested a Intel i3-8130U which sees a noticable
(~2x) increase in process switch time due to IBPB.
IB spec enabled:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1210821us, iter/sec: 216501, us/iter: 4
IB spec disabled:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 131072, t: 1257583us, iter/sec: 104225, us/iter: 9
Open questions:
- There are a significant number of task flags, which also now reaches the
limit of the 'long' on 32-bit systems. Should the 'mode' flags be
stored somewhere else?
- Having x86-specific flags in linux/sched.h feels wrong. However, this
is the mechanism for doing atomic flag updates. Is there an alternate
approach?
Open tasks:
- Documentation
- Naming
Changes in v2:
- Make flag per-process using prctl().
Anand K Mistry (2):
x86/speculation: Allow per-process control of when to issue IBPB
selftests: Benchmark for the cost of disabling IB speculation
arch/x86/include/asm/thread_info.h | 4 +
arch/x86/kernel/cpu/bugs.c | 56 +++++++++
arch/x86/kernel/process.c | 10 ++
arch/x86/mm/tlb.c | 51 ++++++--
include/linux/sched.h | 10 ++
include/uapi/linux/prctl.h | 5 +
.../testing/selftests/ib_spec/ib_spec_bench.c | 109 ++++++++++++++++++
7 files changed, 236 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/ib_spec/ib_spec_bench.c
--
2.31.1.498.g6c1eba8ee3d-goog
Hi, a friend and I were chasing bug 205219 [1] listed in Bugzilla.
We step into something a little bit different when trying to reproduce
the buggy behavior. In our try, compilation failed with a message form
make asking us to clean the source tree. We couldn't run kunit_tool
after compiling the kernel for x86, as described by Ted in the
discussion pointed out by the bug report.
Steps to reproduce:
0) Run kunit_tool
$ ./tools/testing/kunit/kunit.py run
Works fine with a clean tree.
1) Compile the kernel for some architecture (we did it for x86_64).
2) Run kunit_tool again
$ ./tools/testing/kunit/kunit.py run
Fails with a message form make asking us to clean the source tree.
Removing the clean source tree check from the top-level Makefile gives
us a similar error to what was described in the bug report. We see that
after running `git clean -fdx` kunit_tool runs nicely again. However,
this is not a real solution since some kernel binaries are erased by git.
We also had a look into the commit messages of Masahiro Yamada but
couldn't quite grasp why the check for the tree to be clean was added.
We could invest more time in this issue but actually don't know how to
proceed. We'd be glad to receive any comment about it. We could also try
something else if it's a too hard issue for beginners.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=205219
Best Regards,
Marcelo
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v2:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
- Fixed error code sign.
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..3f941b1f4e78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != -E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v2:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..fa9bb6b751c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
arch/x86/kvm/x86.c | 10 +++++-----
.../selftests/kvm/lib/x86_64/processor.c | 20 +++++++++++--------
2 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..df8a3e44e722 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
- goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
Hi Linus,
Please pull the following KUnit update for Linux 5.13-rc1.
This KUnit update for Linux 5.13-rc1 consists of several fixes and
new feature to support failure from dynamic analysis tools such as
UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure
(e.g. super_operations, iommu_ops, etc.).
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-kunit-5.13-rc1
for you to fetch changes up to de2fcb3e62013738f22bbb42cbd757d9a242574e:
Documentation: kunit: add tips for using current->kunit_test
(2021-04-07 16:40:37 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-5.13-rc1
This KUnit update for Linux 5.13-rc1 consists of several fixes and
new feature to support failure from dynamic analysis tools such as
UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure
(e.g. super_operations, iommu_ops, etc.).
----------------------------------------------------------------
Daniel Latypov (4):
kunit: make KUNIT_EXPECT_STREQ() quote values, don't print literals
kunit: tool: make --kunitconfig accept dirs, add lib/kunit fragment
kunit: fix -Wunused-function warning for __kunit_fail_current_test
Documentation: kunit: add tips for using current->kunit_test
Lucas Stankus (1):
kunit: Match parenthesis alignment to improve code readability
Uriel Guajardo (1):
kunit: support failure from dynamic analysis tools
Documentation/dev-tools/kunit/tips.rst | 78
+++++++++++++++++++++++++++++++++-
include/kunit/test-bug.h | 29 +++++++++++++
lib/kunit/.kunitconfig | 3 ++
lib/kunit/assert.c | 61 ++++++++++++++++++--------
lib/kunit/test.c | 39 +++++++++++++++--
tools/testing/kunit/kunit.py | 4 +-
tools/testing/kunit/kunit_kernel.py | 2 +
tools/testing/kunit/kunit_tool_test.py | 6 +++
8 files changed, 198 insertions(+), 24 deletions(-)
create mode 100644 include/kunit/test-bug.h
create mode 100644 lib/kunit/.kunitconfig
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 5.13-rc1.
This Kselftest update for Linux 5.13-rc1 consists of:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-next-5.13-rc1
for you to fetch changes up to e75074781f1735c1976bc551e29ccf2ba9a4b17f:
selftests/resctrl: Change a few printed messages (2021-04-07 16:37:49
-0600)
----------------------------------------------------------------
linux-kselftest-next-5.13-rc1
This Kselftest update for Linux 5.13-rc1 consists of:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
----------------------------------------------------------------
Antonio Terceiro (1):
Documentation: kselftest: fix path to test module files
Colin Ian King (1):
selftests/timers: Fix spelling mistake "clocksourc" -> "clocksource"
Fenghua Yu (20):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl
FS is supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is
not supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
selftests/resctrl: Change a few printed messages
Ilya Leoshkevich (1):
selftests: fix prepending $(OUTPUT) to $(TEST_PROGS)
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
Documentation/dev-tools/kselftest.rst | 4 +-
tools/testing/selftests/lib.mk | 3 +-
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 ++++++-
tools/testing/selftests/resctrl/cat_test.c | 57 +++----
.../selftests/resctrl/{cqm_test.c => cmt_test.c} | 75 +++-------
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 +++---
tools/testing/selftests/resctrl/mbm_test.c | 42 +++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
tools/testing/selftests/resctrl/resctrl_tests.c | 163
++++++++++++++-------
tools/testing/selftests/resctrl/resctrl_val.c | 95 +++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++++-------
.../testing/selftests/timers/clocksource-switch.c | 2 +-
17 files changed, 413 insertions(+), 300 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
----------------------------------------------------------------
This patchset introduces batched operations for the per-cpu variant of
the array map.
Also updates the batch ops test for arrays.
v4 -> v5:
- Revert removal of percpu macros
v3 -> v4:
- Prefer 'calloc()' over 'malloc()' on batch ops tests
- Add missing static keyword in a couple of test functions
- 'offset' to 'cpu_offset' as suggested by Martin
v2 -> v3:
- Remove percpu macros as suggested by Andrii
- Update tests that used the per cpu macros
v1 -> v2:
- Amended a more descriptive commit message
Pedro Tammela (2):
bpf: add batched ops support for percpu array
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
.../bpf/map_tests/array_map_batch_ops.c | 104 +++++++++++++-----
2 files changed, 77 insertions(+), 29 deletions(-)
--
2.25.1
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
assumed that !vma_is_anonymous() was equivalent to "the page will be in the
page cache". But, in this case we have an optimization where we allocate a new
*anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
- Commit 10 is a small documentation update to reflect the new changes.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
userfaultfd: update documentation to mention shmem minor faults
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 17 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 115 +++-----
mm/userfaultfd.c | 175 ++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 364 insertions(+), 251 deletions(-)
--
2.31.1.368.gbe11c130af-goog
This small series expands futex timeout selftests by checking if all
operations that allows timeouts works as expected. When some version of
Thomas' series "futex: Bugfixes and FUTEX_LOCK_PI2"[0] get merged, I'll
add the new rules to the timeout test. This test should be used to check
for regressions when modifying the timeout path or changing the
interface.
Additionally, fix a bug in the Makefile that can be found when compiling
selftests with new operations, like the one defined at [0] or from the
futex2 patchset.
[0] https://lore.kernel.org/lkml/20210422194417.866740847@linutronix.de/
André Almeida (2):
selftests: futex: Correctly include headers dirs
selftests: futex: Expand timeout test
.../selftests/futex/functional/Makefile | 3 +-
.../futex/functional/futex_wait_timeout.c | 126 +++++++++++++++---
2 files changed, 112 insertions(+), 17 deletions(-)
--
2.31.1
Changelog v3-->v4
Based on review comments by Doug Smythies,
1. Parsing the thread_siblings_list for CPU topology information to
correctly identify the cores the test should run on in
default(quick) mode.
2. The source CPU to source CPU interaction in the IPI test will always
result in a lower latency and cause a bias in the average, hence
avoid adding the latency to be averaged for same cpu IPIs. The
latency will still be displayed in the detailed logs.
RFC v3: https://lkml.org/lkml/2021/4/4/31
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik Rajesh Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 402 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 579 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
We found that with the latest mainline kernel (5.12.0-051200rc8) on
some KVM instances / bare-metal systems, the following tests will take
longer than the kselftest framework default timeout (45 seconds) to
run and thus got terminated with TIMEOUT error:
* xfrm_policy.sh - took about 2m20s
* pmtu.sh - took about 3m5s
* udpgso_bench.sh - took about 60s
Bump the timeout setting to 5 minutes to allow them have a chance to
finish.
https://bugs.launchpad.net/bugs/1856010
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/Makefile | 2 ++
tools/testing/selftests/net/settings | 1 +
2 files changed, 3 insertions(+)
create mode 100644 tools/testing/selftests/net/settings
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 25f198b..2be4670 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -37,6 +37,8 @@ TEST_GEN_FILES += ipsec
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
+TEST_FILES := settings
+
KSFT_KHDR_INSTALL := 1
include ../lib.mk
diff --git a/tools/testing/selftests/net/settings b/tools/testing/selftests/net/settings
new file mode 100644
index 0000000..694d707
--- /dev/null
+++ b/tools/testing/selftests/net/settings
@@ -0,0 +1 @@
+timeout=300
--
2.7.4
Add in:
* kunit_kmalloc_array() and wire up kunit_kmalloc() to be a special
case of it.
* kunit_kcalloc() for symmetry with kunit_kzalloc()
This should using KUnit more natural by making it more similar to the
existing *alloc() APIs.
And while we shouldn't necessarily be writing unit tests where overflow
should be a concern, it can't hurt to be safe.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
include/kunit/test.h | 36 ++++++++++++++++++++++++++++++++----
lib/kunit/test.c | 22 ++++++++++++----------
2 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 49601c4b98b8..7fa0de4af977 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -577,16 +577,30 @@ static inline int kunit_destroy_named_resource(struct kunit *test,
void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/**
- * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * kunit_kmalloc_array() - Like kmalloc_array() except the allocation is *test managed*.
* @test: The test context object.
+ * @n: number of elements.
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * Just like `kmalloc(...)`, except the allocation is managed by the test case
+ * Just like `kmalloc_array(...)`, except the allocation is managed by the test case
* and is automatically cleaned up after the test case concludes. See &struct
* kunit_resource for more information.
*/
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp);
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t flags);
+
+/**
+ * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * @test: The test context object.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kmalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+{
+ return kunit_kmalloc_array(test, 1, size, gfp);
+}
/**
* kunit_kfree() - Like kfree except for allocations managed by KUnit.
@@ -601,13 +615,27 @@ void kunit_kfree(struct kunit *test, const void *ptr);
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * See kzalloc() and kunit_kmalloc() for more information.
+ * See kzalloc() and kunit_kmalloc_array() for more information.
*/
static inline void *kunit_kzalloc(struct kunit *test, size_t size, gfp_t gfp)
{
return kunit_kmalloc(test, size, gfp | __GFP_ZERO);
}
+/**
+ * kunit_kzalloc() - Just like kunit_kmalloc_array(), but zeroes the allocation.
+ * @test: The test context object.
+ * @n: number of elements.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kcalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp_t flags)
+{
+ return kunit_kmalloc_array(test, n, size, flags | __GFP_ZERO);
+}
+
void kunit_cleanup(struct kunit *test);
void kunit_log_append(char *log, const char *fmt, ...);
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index ec9494e914ef..052fccf69eef 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -540,41 +540,43 @@ int kunit_destroy_resource(struct kunit *test, kunit_resource_match_t match,
}
EXPORT_SYMBOL_GPL(kunit_destroy_resource);
-struct kunit_kmalloc_params {
+struct kunit_kmalloc_array_params {
+ size_t n;
size_t size;
gfp_t gfp;
};
-static int kunit_kmalloc_init(struct kunit_resource *res, void *context)
+static int kunit_kmalloc_array_init(struct kunit_resource *res, void *context)
{
- struct kunit_kmalloc_params *params = context;
+ struct kunit_kmalloc_array_params *params = context;
- res->data = kmalloc(params->size, params->gfp);
+ res->data = kmalloc_array(params->n, params->size, params->gfp);
if (!res->data)
return -ENOMEM;
return 0;
}
-static void kunit_kmalloc_free(struct kunit_resource *res)
+static void kunit_kmalloc_array_free(struct kunit_resource *res)
{
kfree(res->data);
}
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
- struct kunit_kmalloc_params params = {
+ struct kunit_kmalloc_array_params params = {
.size = size,
+ .n = n,
.gfp = gfp
};
return kunit_alloc_resource(test,
- kunit_kmalloc_init,
- kunit_kmalloc_free,
+ kunit_kmalloc_array_init,
+ kunit_kmalloc_array_free,
gfp,
¶ms);
}
-EXPORT_SYMBOL_GPL(kunit_kmalloc);
+EXPORT_SYMBOL_GPL(kunit_kmalloc_array);
void kunit_kfree(struct kunit *test, const void *ptr)
{
base-commit: 16fc44d6387e260f4932e9248b985837324705d8
--
2.31.1.498.g6c1eba8ee3d-goog
The kernel now has a number of testing and debugging tools, and we've
seen a bit of confusion about what the differences between them are.
Add a basic documentation outlining the testing tools, when to use each,
and how they interact.
This is a pretty quick overview rather than the idealised "kernel
testing guide" that'd probably be optimal, but given the number of times
questions like "When do you use KUnit and when do you use Kselftest?"
are being asked, it seemed worth at least having something. Hopefully
this can form the basis for more detailed documentation later.
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
---
Thanks again. Assuming no-one has any objections, I think this is good
to go.
-- David
Changes since v2:
https://lore.kernel.org/linux-kselftest/20210414081428.337494-1-davidgow@go…
- A few typo fixes (Thanks Daniel)
- Reworded description of dynamic analysis tools.
- Updated dev-tools index page to not use ':doc:' syntax, but to provide
a path instead.
- Added Marco and Daniel's Reviewed-by tags.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20210410070529.4113432-1-davidgow@g…
- Note KUnit's speed and that one should provide selftests for syscalls
- Mention lockdep as a Dynamic Analysis Tool
- Refer to "Dynamic Analysis Tools" instead of "Sanitizers"
- A number of minor formatting tweaks and rewordings for clarity
Documentation/dev-tools/index.rst | 4 +
Documentation/dev-tools/testing-overview.rst | 117 +++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 Documentation/dev-tools/testing-overview.rst
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 1b1cf4f5c9d9..929d916ffd4c 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -7,6 +7,9 @@ be used to work on the kernel. For now, the documents have been pulled
together without any significant effort to integrate them into a coherent
whole; patches welcome!
+A brief overview of testing-specific tools can be found in
+Documentation/dev-tools/testing-overview.rst
+
.. class:: toc-title
Table of contents
@@ -14,6 +17,7 @@ whole; patches welcome!
.. toctree::
:maxdepth: 2
+ testing-overview
coccinelle
sparse
kcov
diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
new file mode 100644
index 000000000000..b5b46709969c
--- /dev/null
+++ b/Documentation/dev-tools/testing-overview.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Kernel Testing Guide
+====================
+
+
+There are a number of different tools for testing the Linux kernel, so knowing
+when to use each of them can be a challenge. This document provides a rough
+overview of their differences, and how they fit together.
+
+
+Writing and Running Tests
+=========================
+
+The bulk of kernel tests are written using either the kselftest or KUnit
+frameworks. These both provide infrastructure to help make running tests and
+groups of tests easier, as well as providing helpers to aid in writing new
+tests.
+
+If you're looking to verify the behaviour of the Kernel — particularly specific
+parts of the kernel — then you'll want to use KUnit or kselftest.
+
+
+The Difference Between KUnit and kselftest
+------------------------------------------
+
+KUnit (Documentation/dev-tools/kunit/index.rst) is an entirely in-kernel system
+for "white box" testing: because test code is part of the kernel, it can access
+internal structures and functions which aren't exposed to userspace.
+
+KUnit tests therefore are best written against small, self-contained parts
+of the kernel, which can be tested in isolation. This aligns well with the
+concept of 'unit' testing.
+
+For example, a KUnit test might test an individual kernel function (or even a
+single codepath through a function, such as an error handling case), rather
+than a feature as a whole.
+
+This also makes KUnit tests very fast to build and run, allowing them to be
+run frequently as part of the development process.
+
+There is a KUnit test style guide which may give further pointers in
+Documentation/dev-tools/kunit/style.rst
+
+
+kselftest (Documentation/dev-tools/kselftest.rst), on the other hand, is
+largely implemented in userspace, and tests are normal userspace scripts or
+programs.
+
+This makes it easier to write more complicated tests, or tests which need to
+manipulate the overall system state more (e.g., spawning processes, etc.).
+However, it's not possible to call kernel functions directly from kselftest.
+This means that only kernel functionality which is exposed to userspace somehow
+(e.g. by a syscall, device, filesystem, etc.) can be tested with kselftest. To
+work around this, some tests include a companion kernel module which exposes
+more information or functionality. If a test runs mostly or entirely within the
+kernel, however, KUnit may be the more appropriate tool.
+
+kselftest is therefore suited well to tests of whole features, as these will
+expose an interface to userspace, which can be tested, but not implementation
+details. This aligns well with 'system' or 'end-to-end' testing.
+
+For example, all new system calls should be accompanied by kselftest tests.
+
+Code Coverage Tools
+===================
+
+The Linux Kernel supports two different code coverage measurement tools. These
+can be used to verify that a test is executing particular functions or lines
+of code. This is useful for determining how much of the kernel is being tested,
+and for finding corner-cases which are not covered by the appropriate test.
+
+:doc:`gcov` is GCC's coverage testing tool, which can be used with the kernel
+to get global or per-module coverage. Unlike KCOV, it does not record per-task
+coverage. Coverage data can be read from debugfs, and interpreted using the
+usual gcov tooling.
+
+:doc:`kcov` is a feature which can be built in to the kernel to allow
+capturing coverage on a per-task level. It's therefore useful for fuzzing and
+other situations where information about code executed during, for example, a
+single syscall is useful.
+
+
+Dynamic Analysis Tools
+======================
+
+The kernel also supports a number of dynamic analysis tools, which attempt to
+detect classes of issues when they occur in a running kernel. These typically
+each look for a different class of bugs, such as invalid memory accesses,
+concurrency issues such as data races, or other undefined behaviour like
+integer overflows.
+
+Some of these tools are listed below:
+
+* kmemleak detects possible memory leaks. See
+ Documentation/dev-tools/kmemleak.rst
+* KASAN detects invalid memory accesses such as out-of-bounds and
+ use-after-free errors. See Documentation/dev-tools/kasan.rst
+* UBSAN detects behaviour that is undefined by the C standard, like integer
+ overflows. See Documentation/dev-tools/ubsan.rst
+* KCSAN detects data races. See Documentation/dev-tools/kcsan.rst
+* KFENCE is a low-overhead detector of memory issues, which is much faster than
+ KASAN and can be used in production. See Documentation/dev-tools/kfence.rst
+* lockdep is a locking correctness validator. See
+ Documentation/locking/lockdep-design.rst
+* There are several other pieces of debug instrumentation in the kernel, many
+ of which can be found in lib/Kconfig.debug
+
+These tools tend to test the kernel as a whole, and do not "pass" like
+kselftest or KUnit tests. They can be combined with KUnit or kselftest by
+running tests on a kernel with these tools enabled: you can then be sure
+that none of these errors are occurring during the test.
+
+Some of these tools integrate with KUnit or kselftest and will
+automatically fail tests if an issue is detected.
+
--
2.31.1.295.g9ea45b61b8-goog
From: Colin Ian King <colin.king(a)canonical.com>
There are a few function prototypes that are missing a void parameter,
fix this by adding it in.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/vm/mremap_dontunmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/vm/mremap_dontunmap.c b/tools/testing/selftests/vm/mremap_dontunmap.c
index f01dc4a85b0b..78baaf0e85d9 100644
--- a/tools/testing/selftests/vm/mremap_dontunmap.c
+++ b/tools/testing/selftests/vm/mremap_dontunmap.c
@@ -42,7 +42,7 @@ static void dump_maps(void)
// Try a simple operation for to "test" for kernel support this prevents
// reporting tests as failed when it's run on an older kernel.
-static int kernel_support_for_mremap_dontunmap()
+static int kernel_support_for_mremap_dontunmap(void)
{
int ret = 0;
unsigned long num_pages = 1;
@@ -95,7 +95,7 @@ static int check_region_contains_byte(void *addr, unsigned long size, char byte)
// this test validates that MREMAP_DONTUNMAP moves the pagetables while leaving
// the source mapping mapped.
-static void mremap_dontunmap_simple()
+static void mremap_dontunmap_simple(void)
{
unsigned long num_pages = 5;
@@ -128,7 +128,7 @@ static void mremap_dontunmap_simple()
}
// This test validates that MREMAP_DONTUNMAP on a shared mapping works as expected.
-static void mremap_dontunmap_simple_shmem()
+static void mremap_dontunmap_simple_shmem(void)
{
unsigned long num_pages = 5;
@@ -181,7 +181,7 @@ static void mremap_dontunmap_simple_shmem()
// This test validates MREMAP_DONTUNMAP will move page tables to a specific
// destination using MREMAP_FIXED, also while validating that the source
// remains intact.
-static void mremap_dontunmap_simple_fixed()
+static void mremap_dontunmap_simple_fixed(void)
{
unsigned long num_pages = 5;
@@ -226,7 +226,7 @@ static void mremap_dontunmap_simple_fixed()
// This test validates that we can MREMAP_DONTUNMAP for a portion of an
// existing mapping.
-static void mremap_dontunmap_partial_mapping()
+static void mremap_dontunmap_partial_mapping(void)
{
/*
* source mapping:
--
2.30.2
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes
userfaultfd: update documentation to mention shmem minor faults
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 17 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 114 +++-----
mm/userfaultfd.c | 183 +++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 371 insertions(+), 251 deletions(-)
--
2.31.1.368.gbe11c130af-goog
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied *first*:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (9):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 15 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 112 +++------
mm/userfaultfd.c | 183 ++++++++++-----
tools/testing/selftests/vm/userfaultfd.c | 280 +++++++++++++++--------
10 files changed, 377 insertions(+), 244 deletions(-)
--
2.31.1.295.g9ea45b61b8-goog
From: Mike Rapoport <rppt(a)linux.ibm.com>
Hi,
This is an updated version of page_is_secretmem() changes.
This is based on v5.12-rc7-mmots-2021-04-15-16-28.
@Andrew, please let me know if you'd like me to rebase it differently or
resend the entire set.
v2:
* move the check for secretmem page in gup_pte_range after we get a
reference to the page, per Matthew.
Mike Rapoport (2):
secretmem/gup: don't check if page is secretmem without reference
secretmem: optimize page_is_secretmem()
include/linux/secretmem.h | 26 +++++++++++++++++++++++++-
mm/gup.c | 6 +++---
mm/secretmem.c | 12 +-----------
3 files changed, 29 insertions(+), 15 deletions(-)
--
2.28.0
From: Mike Rapoport <rppt(a)linux.ibm.com>
Kernel test robot reported -4.2% regression of will-it-scale.per_thread_ops
due to commit "mm: introduce memfd_secret system call to create "secret"
memory areas".
The perf profile of the test indicated that the regression is caused by
page_is_secretmem() called from gup_pte_range() (inlined by gup_pgd_range):
27.76 +2.5 30.23 perf-profile.children.cycles-pp.gup_pgd_range
0.00 +3.2 3.19 ± 2% perf-profile.children.cycles-pp.page_mapping
0.00 +3.7 3.66 ± 2% perf-profile.children.cycles-pp.page_is_secretmem
Further analysis showed that the slow down happens because neither
page_is_secretmem() nor page_mapping() are not inline and moreover,
multiple page flags checks in page_mapping() involve calling
compound_head() several times for the same page.
Make page_is_secretmem() inline and replace page_mapping() with page flag
checks that do not imply page-to-head conversion.
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
---
@Andrew,
The patch is vs v5.12-rc7-mmots-2021-04-15-16-28, I'd appreciate if it would
be added as a fixup to the memfd_secret series.
include/linux/secretmem.h | 26 +++++++++++++++++++++++++-
mm/secretmem.c | 12 +-----------
2 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 907a6734059c..b842b38cbeb1 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -4,8 +4,32 @@
#ifdef CONFIG_SECRETMEM
+extern const struct address_space_operations secretmem_aops;
+
+static inline bool page_is_secretmem(struct page *page)
+{
+ struct address_space *mapping;
+
+ /*
+ * Using page_mapping() is quite slow because of the actual call
+ * instruction and repeated compound_head(page) inside the
+ * page_mapping() function.
+ * We know that secretmem pages are not compound and LRU so we can
+ * save a couple of cycles here.
+ */
+ if (PageCompound(page) || !PageLRU(page))
+ return false;
+
+ mapping = (struct address_space *)
+ ((unsigned long)page->mapping & ~PAGE_MAPPING_FLAGS);
+
+ if (mapping != page->mapping)
+ return false;
+
+ return page->mapping->a_ops == &secretmem_aops;
+}
+
bool vma_is_secretmem(struct vm_area_struct *vma);
-bool page_is_secretmem(struct page *page);
bool secretmem_active(void);
#else
diff --git a/mm/secretmem.c b/mm/secretmem.c
index 3b1ba3991964..0bcd15e1b549 100644
--- a/mm/secretmem.c
+++ b/mm/secretmem.c
@@ -151,22 +151,12 @@ static void secretmem_freepage(struct page *page)
clear_highpage(page);
}
-static const struct address_space_operations secretmem_aops = {
+const struct address_space_operations secretmem_aops = {
.freepage = secretmem_freepage,
.migratepage = secretmem_migratepage,
.isolate_page = secretmem_isolate_page,
};
-bool page_is_secretmem(struct page *page)
-{
- struct address_space *mapping = page_mapping(page);
-
- if (!mapping)
- return false;
-
- return mapping->a_ops == &secretmem_aops;
-}
-
static struct vfsmount *secretmem_mnt;
static struct file *secretmem_file_create(unsigned long flags)
--
2.28.0
This patchset introduces batched operations for the per-cpu variant of
the array map.
It also removes the percpu macros from 'bpf_util.h'. This change was
suggested by Andrii in a earlier iteration of this patchset.
The tests were updated to reflect all the new changes.
v3 -> v4:
- Prefer 'calloc()' over 'malloc()' on batch ops tests
- Add missing static keyword in a couple of test functions
- 'offset' to 'cpu_offset' as suggested by Martin
v2 -> v3:
- Remove percpu macros as suggested by Andrii
- Update tests that used the per cpu macros
v1 -> v2:
- Amended a more descriptive commit message
Pedro Tammela (3):
bpf: add batched ops support for percpu array
bpf: selftests: remove percpu macros from bpf_util.h
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
tools/testing/selftests/bpf/bpf_util.h | 7 --
.../bpf/map_tests/array_map_batch_ops.c | 104 +++++++++++++-----
.../bpf/map_tests/htab_map_batch_ops.c | 87 +++++++--------
.../selftests/bpf/prog_tests/map_init.c | 9 +-
tools/testing/selftests/bpf/test_maps.c | 84 ++++++++------
6 files changed, 173 insertions(+), 120 deletions(-)
--
2.25.1
This patchset provides a file descriptor for every VM and VCPU to read
KVM statistics data in binary format.
It is meant to provide a lightweight, flexible, scalable and efficient
lock-free solution for user space telemetry applications to pull the
statistics data periodically for large scale systems. The pulling
frequency could be as high as a few times per second.
In this patchset, every statistics data are treated to have some
attributes as below:
* architecture dependent or common
* VM statistics data or VCPU statistics data
* type: cumulative, instantaneous,
* unit: none for simple counter, nanosecond, microsecond,
millisecond, second, Byte, KiByte, MiByte, GiByte. Clock Cycles
Since no lock/synchronization is used, the consistency between all
the statistics data is not guaranteed. That means not all statistics
data are read out at the exact same time, since the statistics date
are still being updated by KVM subsystems while they are read out.
---
* v1 -> v2
- Use ARRAY_SIZE to count the number of stats descriptors
- Fix missing `size` field initialization in macro STATS_DESC
[1] https://lore.kernel.org/kvm/20210402224359.2297157-1-jingzhangos@google.com
---
Jing Zhang (4):
KVM: stats: Separate common stats from architecture specific ones
KVM: stats: Add fd-based API to read binary stats data
KVM: stats: Add documentation for statistics data binary interface
KVM: selftests: Add selftest for KVM statistics data binary interface
Documentation/virt/kvm/api.rst | 169 ++++++++
arch/arm64/include/asm/kvm_host.h | 9 +-
arch/arm64/kvm/guest.c | 42 +-
arch/mips/include/asm/kvm_host.h | 9 +-
arch/mips/kvm/mips.c | 67 +++-
arch/powerpc/include/asm/kvm_host.h | 9 +-
arch/powerpc/kvm/book3s.c | 68 +++-
arch/powerpc/kvm/book3s_hv.c | 12 +-
arch/powerpc/kvm/book3s_pr.c | 2 +-
arch/powerpc/kvm/book3s_pr_papr.c | 2 +-
arch/powerpc/kvm/booke.c | 63 ++-
arch/s390/include/asm/kvm_host.h | 9 +-
arch/s390/kvm/kvm-s390.c | 133 ++++++-
arch/x86/include/asm/kvm_host.h | 9 +-
arch/x86/kvm/x86.c | 71 +++-
include/linux/kvm_host.h | 132 ++++++-
include/linux/kvm_types.h | 12 +
include/uapi/linux/kvm.h | 48 +++
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../testing/selftests/kvm/include/kvm_util.h | 3 +
.../selftests/kvm/kvm_bin_form_stats.c | 370 ++++++++++++++++++
tools/testing/selftests/kvm/lib/kvm_util.c | 11 +
virt/kvm/kvm_main.c | 237 ++++++++++-
24 files changed, 1401 insertions(+), 90 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_bin_form_stats.c
base-commit: f96be2deac9bca3ef5a2b0b66b71fcef8bad586d
--
2.31.1.295.g9ea45b61b8-goog
Hi,
This v6 series can mainly include two parts.
Rebased on kvm queue branch: https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=queue
In the first part, all the known hugetlb backing src types specified
with different hugepage sizes are listed, so that we can specify use
of hugetlb source of the exact granularity that we want, instead of
the system default ones. And as all the known hugetlb page sizes are
listed, it's appropriate for all architectures. Besides, a helper that
can get granularity of different backing src types(anonumous/thp/hugetlb)
is added, so that we can use the accurate backing src granularity for
kinds of alignment or guest memory accessing of vcpus.
In the second part, a new test is added:
This test is added to serve as a performance tester and a bug reproducer
for kvm page table code (GPA->HPA mappings), it gives guidance for the
people trying to make some improvement for kvm. And the following explains
what we can exactly do through this test.
The function guest_code() can cover the conditions where a single vcpu or
multiple vcpus access guest pages within the same memory region, in three
VM stages(before dirty logging, during dirty logging, after dirty logging).
Besides, the backing src memory type(ANONYMOUS/THP/HUGETLB) of the tested
memory region can be specified by users, which means normal page mappings
or block mappings can be chosen by users to be created in the test.
If ANONYMOUS memory is specified, kvm will create normal page mappings
for the tested memory region before dirty logging, and update attributes
of the page mappings from RO to RW during dirty logging. If THP/HUGETLB
memory is specified, kvm will create block mappings for the tested memory
region before dirty logging, and split the blcok mappings into normal page
mappings during dirty logging, and coalesce the page mappings back into
block mappings after dirty logging is stopped.
So in summary, as a performance tester, this test can present the
performance of kvm creating/updating normal page mappings, or the
performance of kvm creating/splitting/recovering block mappings,
through execution time.
When we need to coalesce the page mappings back to block mappings after
dirty logging is stopped, we have to firstly invalidate *all* the TLB
entries for the page mappings right before installation of the block entry,
because a TLB conflict abort error could occur if we can't invalidate the
TLB entries fully. We have hit this TLB conflict twice on aarch64 software
implementation and fixed it. As this test can imulate process from dirty
logging enabled to dirty logging stopped of a VM with block mappings,
so it can also reproduce this TLB conflict abort due to inadequate TLB
invalidation when coalescing tables.
Links about the TLB conflict abort:
https://lore.kernel.org/lkml/20201201201034.116760-3-wangyanan55@huawei.com/
---
Change logs:
v5->v6:
- Address Andrew Jones's comments for v5 series
- Add Andrew Jones's R-b tags in some patches
- Rebased on newest kvm/queue tree
- v5: https://lore.kernel.org/lkml/20210323135231.24948-1-wangyanan55@huawei.com/
v4->v5:
- Use synchronization(sem_wait) for time measurement
- Add a new patch about TEST_ASSERT(patch 4)
- Address Andrew Jones's comments for v4 series
- Add Andrew Jones's R-b tags in some patches
- v4: https://lore.kernel.org/lkml/20210302125751.19080-1-wangyanan55@huawei.com/
v3->v4:
- Add a helper to get system default hugetlb page size
- Add tags of Reviewed-by of Ben in the patches
- v3: https://lore.kernel.org/lkml/20210301065916.11484-1-wangyanan55@huawei.com/
v2->v3:
- Add tags of Suggested-by, Reviewed-by in the patches
- Add a generic micro to get hugetlb page sizes
- Some changes for suggestions about v2 series
- v2: https://lore.kernel.org/lkml/20210225055940.18748-1-wangyanan55@huawei.com/
v1->v2:
- Add a patch to sync header files
- Add helpers to get granularity of different backing src types
- Some changes for suggestions about v1 series
- v1: https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
---
Yanan Wang (10):
tools headers: sync headers of asm-generic/hugetlb_encode.h
mm/hugetlb: Add a macro to get HUGETLB page sizes for mmap
KVM: selftests: Use flag CLOCK_MONOTONIC_RAW for timing
KVM: selftests: Print the errno besides error-string in TEST_ASSERT
KVM: selftests: Make a generic helper to get vm guest mode strings
KVM: selftests: Add a helper to get system configured THP page size
KVM: selftests: Add a helper to get system default hugetlb page size
KVM: selftests: List all hugetlb src types specified with page sizes
KVM: selftests: Adapt vm_userspace_mem_region_add to new helpers
KVM: selftests: Add a test for kvm page table code
include/uapi/linux/mman.h | 2 +
tools/include/asm-generic/hugetlb_encode.h | 3 +
tools/include/uapi/linux/mman.h | 2 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/demand_paging_test.c | 8 +-
.../selftests/kvm/dirty_log_perf_test.c | 14 +-
.../testing/selftests/kvm/include/kvm_util.h | 4 +-
.../testing/selftests/kvm/include/test_util.h | 21 +-
.../selftests/kvm/kvm_page_table_test.c | 506 ++++++++++++++++++
tools/testing/selftests/kvm/lib/assert.c | 4 +-
tools/testing/selftests/kvm/lib/kvm_util.c | 59 +-
tools/testing/selftests/kvm/lib/test_util.c | 163 +++++-
tools/testing/selftests/kvm/steal_time.c | 4 +-
14 files changed, 733 insertions(+), 61 deletions(-)
create mode 100644 tools/testing/selftests/kvm/kvm_page_table_test.c
--
2.23.0
Since commit d9f4ff50d2aa ("kbuild: spilt cc-option and friends to
scripts/Makefile.compiler"), some kselftests fail to build.
The tools/ directory opted out Kbuild, and went in a different
direction. They copy any kind of files to the tools/ directory
in order to do whatever they want in their world.
tools/build/Build.include mimics scripts/Kbuild.include, but some
tool Makefiles included the Kbuild one to import a feature that is
missing in tools/build/Build.include:
- Commit ec04aa3ae87b ("tools/thermal: tmon: use "-fstack-protector"
only if supported") included scripts/Kbuild.include from
tools/thermal/tmon/Makefile to import the cc-option macro.
- Commit c2390f16fc5b ("selftests: kvm: fix for compilers that do
not support -no-pie") included scripts/Kbuild.include from
tools/testing/selftests/kvm/Makefile to import the try-run macro.
- Commit 9cae4ace80ef ("selftests/bpf: do not ignore clang
failures") included scripts/Kbuild.include from
tools/testing/selftests/bpf/Makefile to import the .DELETE_ON_ERROR
target.
- Commit 0695f8bca93e ("selftests/powerpc: Handle Makefile for
unrecognized option") included scripts/Kbuild.include from
tools/testing/selftests/powerpc/pmu/ebb/Makefile to import the
try-run macro.
Copy what they need into tools/build/Build.include, and make them
include it instead of scripts/Kbuild.include.
Link: https://lore.kernel.org/lkml/86dadf33-70f7-a5ac-cb8c-64966d2f45a1@linux.ibm…
Fixes: d9f4ff50d2aa ("kbuild: spilt cc-option and friends to scripts/Makefile.compiler")
Reported-by: Janosch Frank <frankja(a)linux.ibm.com>
Reported-by: Christian Borntraeger <borntraeger(a)de.ibm.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
---
Changes in v2:
- copy macros to tools/build/BUild.include
tools/build/Build.include | 24 +++++++++++++++++++
tools/testing/selftests/bpf/Makefile | 2 +-
tools/testing/selftests/kvm/Makefile | 2 +-
.../selftests/powerpc/pmu/ebb/Makefile | 2 +-
tools/thermal/tmon/Makefile | 2 +-
5 files changed, 28 insertions(+), 4 deletions(-)
diff --git a/tools/build/Build.include b/tools/build/Build.include
index 585486e40995..2cf3b1bde86e 100644
--- a/tools/build/Build.include
+++ b/tools/build/Build.include
@@ -100,3 +100,27 @@ cxx_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(CXXFLAGS) -D"BUILD_STR(s)=\#s" $(CXX
## HOSTCC C flags
host_c_flags = -Wp,-MD,$(depfile) -Wp,-MT,$@ $(KBUILD_HOSTCFLAGS) -D"BUILD_STR(s)=\#s" $(HOSTCFLAGS_$(basetarget).o) $(HOSTCFLAGS_$(obj))
+
+# output directory for tests below
+TMPOUT = .tmp_$$$$
+
+# try-run
+# Usage: option = $(call try-run, $(CC)...-o "$$TMP",option-ok,otherwise)
+# Exit code chooses option. "$$TMP" serves as a temporary file and is
+# automatically cleaned up.
+try-run = $(shell set -e; \
+ TMP=$(TMPOUT)/tmp; \
+ mkdir -p $(TMPOUT); \
+ trap "rm -rf $(TMPOUT)" EXIT; \
+ if ($(1)) >/dev/null 2>&1; \
+ then echo "$(2)"; \
+ else echo "$(3)"; \
+ fi)
+
+# cc-option
+# Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
+cc-option = $(call try-run, \
+ $(CC) -Werror $(1) -c -x c /dev/null -o "$$TMP",$(1),$(2))
+
+# delete partially updated (i.e. corrupted) files on error
+.DELETE_ON_ERROR:
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 044bfdcf5b74..17a5cdf48d37 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-include ../../../../scripts/Kbuild.include
+include ../../../build/Build.include
include ../../../scripts/Makefile.arch
include ../../../scripts/Makefile.include
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index a6d61f451f88..5ef141f265bd 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0-only
-include ../../../../scripts/Kbuild.include
+include ../../../build/Build.include
all:
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/Makefile b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
index af3df79d8163..c5ecb4634094 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/Makefile
+++ b/tools/testing/selftests/powerpc/pmu/ebb/Makefile
@@ -1,5 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
-include ../../../../../../scripts/Kbuild.include
+include ../../../../../build/Build.include
noarg:
$(MAKE) -C ../../
diff --git a/tools/thermal/tmon/Makefile b/tools/thermal/tmon/Makefile
index 59e417ec3e13..9db867df7679 100644
--- a/tools/thermal/tmon/Makefile
+++ b/tools/thermal/tmon/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
# We need this for the "cc-option" macro.
-include ../../../scripts/Kbuild.include
+include ../../build/Build.include
VERSION = 1.0
--
2.27.0
The kunit_tool documentation page was pretty minimal, and a bit
outdated. Update it and flesh it out a bit.
In particular,
- Mention that .kunitconfig is now in the build directory
- Describe the use of --kunitconfig to specify a different config
framgent
- Mention the split functionality (i.e., commands other than 'run')
- Describe --raw_output and kunit.py parse
- Mention the globbing support
- Provide a quick overview of other options, including --build_dir and
--alltests
Note that this does overlap a little with the new running_tips page. I
don't think it's a problem having both: this page is supposed to be a
bit more of a reference, rather than a list of useful tips, so the fact
that they both describe the same features isn't a problem.
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
---
Adopted the changes from Daniel.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20210416034036.797727-1-davidgow@go…
- Mention that the default build directory is '.kunit' when discussing
'.kunitconfig' files.
- Reword the discussion of 'CONFIG_KUNIT_ALL_TESTS' under '--alltests'
Documentation/dev-tools/kunit/kunit-tool.rst | 140 +++++++++++++++++--
1 file changed, 132 insertions(+), 8 deletions(-)
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index 29ae2fee8123..4247b7420e3b 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -22,14 +22,19 @@ not require any virtualization support: it is just a regular program.
What is a .kunitconfig?
=======================
-It's just a defconfig that kunit_tool looks for in the base directory.
-kunit_tool uses it to generate a .config as you might expect. In addition, it
-verifies that the generated .config contains the CONFIG options in the
-.kunitconfig; the reason it does this is so that it is easy to be sure that a
-CONFIG that enables a test actually ends up in the .config.
+It's just a defconfig that kunit_tool looks for in the build directory
+(``.kunit`` by default). kunit_tool uses it to generate a .config as you might
+expect. In addition, it verifies that the generated .config contains the CONFIG
+options in the .kunitconfig; the reason it does this is so that it is easy to
+be sure that a CONFIG that enables a test actually ends up in the .config.
-How do I use kunit_tool?
-========================
+It's also possible to pass a separate .kunitconfig fragment to kunit_tool,
+which is useful if you have several different groups of tests you wish
+to run independently, or if you want to use pre-defined test configs for
+certain subsystems.
+
+Getting Started with kunit_tool
+===============================
If a kunitconfig is present at the root directory, all you have to do is:
@@ -48,10 +53,129 @@ However, you most likely want to use it with the following options:
.. note::
This command will work even without a .kunitconfig file: if no
- .kunitconfig is present, a default one will be used instead.
+ .kunitconfig is present, a default one will be used instead.
+
+If you wish to use a different .kunitconfig file (such as one provided for
+testing a particular subsystem), you can pass it as an option.
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig
For a list of all the flags supported by kunit_tool, you can run:
.. code-block:: bash
./tools/testing/kunit/kunit.py run --help
+
+Configuring, Building, and Running Tests
+========================================
+
+It's also possible to run just parts of the KUnit build process independently,
+which is useful if you want to make manual changes to part of the process.
+
+A .config can be generated from a .kunitconfig by using the ``config`` argument
+when running kunit_tool:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py config
+
+Similarly, if you just want to build a KUnit kernel from the current .config,
+you can use the ``build`` argument:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py build
+
+And, if you already have a built UML kernel with built-in KUnit tests, you can
+run the kernel and display the test results with the ``exec`` argument:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py exec
+
+The ``run`` command which is discussed above is equivalent to running all three
+of these in sequence.
+
+All of these commands accept a number of optional command-line arguments. The
+``--help`` flag will give a complete list of these, or keep reading this page
+for a guide to some of the more useful ones.
+
+Parsing Test Results
+====================
+
+KUnit tests output their results in TAP (Test Anything Protocol) format.
+kunit_tool will, when running tests, parse this output and print a summary
+which is much more pleasant to read. If you wish to look at the raw test
+results in TAP format, you can pass the ``--raw_output`` argument.
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run --raw_output
+
+.. note::
+ The raw output from test runs may contain other, non-KUnit kernel log
+ lines.
+
+If you have KUnit results in their raw TAP format, you can parse them and print
+the human-readable summary with the ``parse`` command for kunit_tool. This
+accepts a filename for an argument, or will read from standard input.
+
+.. code-block:: bash
+
+ # Reading from a file
+ ./tools/testing/kunit/kunit.py parse /var/log/dmesg
+ # Reading from stdin
+ dmesg | ./tools/testing/kunit/kunit.py parse
+
+This is very useful if you wish to run tests in a configuration not supported
+by kunit_tool (such as on real hardware, or an unsupported architecture).
+
+Filtering Tests
+===============
+
+It's possible to run only a subset of the tests built into a kernel by passing
+a filter to the ``exec`` or ``run`` commands. For example, if you only wanted
+to run KUnit resource tests, you could use:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run 'kunit-resource*'
+
+This uses the standard glob format for wildcards.
+
+Other Useful Options
+====================
+
+kunit_tool has a number of other command-line arguments which can be useful
+when adapting it to fit your environment or needs.
+
+Some of the more useful ones are:
+
+``--help``
+ Lists all of the available options. Note that different commands
+ (``config``, ``build``, ``run``, etc) will have different supported
+ options. Place ``--help`` before the command to list common options,
+ and after the command for options specific to that command.
+
+``--build_dir``
+ Specifies the build directory that kunit_tool will use. This is where
+ the .kunitconfig file is located, as well as where the .config and
+ compiled kernel will be placed. Defaults to ``.kunit``.
+
+``--make_options``
+ Specifies additional options to pass to ``make`` when compiling a
+ kernel (with the ``build`` or ``run`` commands). For example, to enable
+ compiler warnings, you can pass ``--make_options W=1``.
+
+``--alltests``
+ Builds a UML kernel with all config options enabled using ``make
+ allyesconfig``. This allows you to run as many tests as is possible,
+ but is very slow and prone to breakage as new options are added or
+ modified. In most cases, enabling all tests which have satisfied
+ dependencies by adding ``CONFIG_KUNIT_ALL_TESTS=1`` to your
+ .kunitconfig is preferable.
+
+There are several other options (and new ones are often added), so do check
+``--help`` if you're looking for something not mentioned here.
--
2.31.1.368.gbe11c130af-goog
The kunit_tool documentation page was pretty minimal, and a bit
outdated. Update it and flesh it out a bit.
In particular,
- Mention that .kunitconfig is now in the build directory
- Describe the use of --kunitconfig to specify a different config
framgent
- Mention the split functionality (i.e., commands other than 'run')
- Describe --raw_output and kunit.py parse
- Mention the globbing support
- Provide a quick overview of other options, including --build_dir and
--alltests
Note that this does overlap a little with the new running_tips page. I
don't think it's a problem having both: this page is supposed to be a
bit more of a reference, rather than a list of useful tips, so the fact
that they both describe the same features isn't a problem.
Signed-off-by: David Gow <davidgow(a)google.com>
---
Documentation/dev-tools/kunit/kunit-tool.rst | 132 ++++++++++++++++++-
1 file changed, 128 insertions(+), 4 deletions(-)
diff --git a/Documentation/dev-tools/kunit/kunit-tool.rst b/Documentation/dev-tools/kunit/kunit-tool.rst
index 29ae2fee8123..0b45affcd65c 100644
--- a/Documentation/dev-tools/kunit/kunit-tool.rst
+++ b/Documentation/dev-tools/kunit/kunit-tool.rst
@@ -22,14 +22,19 @@ not require any virtualization support: it is just a regular program.
What is a .kunitconfig?
=======================
-It's just a defconfig that kunit_tool looks for in the base directory.
+It's just a defconfig that kunit_tool looks for in the build directory.
kunit_tool uses it to generate a .config as you might expect. In addition, it
verifies that the generated .config contains the CONFIG options in the
.kunitconfig; the reason it does this is so that it is easy to be sure that a
CONFIG that enables a test actually ends up in the .config.
-How do I use kunit_tool?
-========================
+It's also possible to pass a separate .kunitconfig fragment to kunit_tool,
+which is useful if you have several different groups of tests you wish
+to run independently, or if you want to use pre-defined test configs for
+certain subsystems.
+
+Getting Started with kunit_tool
+===============================
If a kunitconfig is present at the root directory, all you have to do is:
@@ -48,10 +53,129 @@ However, you most likely want to use it with the following options:
.. note::
This command will work even without a .kunitconfig file: if no
- .kunitconfig is present, a default one will be used instead.
+ .kunitconfig is present, a default one will be used instead.
+
+If you wish to use a different .kunitconfig file (such as one provided for
+testing a particular subsystem), you can pass it as an option.
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig
For a list of all the flags supported by kunit_tool, you can run:
.. code-block:: bash
./tools/testing/kunit/kunit.py run --help
+
+Configuring, Building, and Running Tests
+========================================
+
+It's also possible to run just parts of the KUnit build process independently,
+which is useful if you want to make manual changes to part of the process.
+
+A .config can be generated from a .kunitconfig by using the ``config`` argument
+when running kunit_tool:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py config
+
+Similarly, if you just want to build a KUnit kernel from the current .config,
+you can use the ``build`` argument:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py build
+
+And, if you already have a built UML kernel with built-in KUnit tests, you can
+run the kernel and display the test results with the ``exec`` argument:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py exec
+
+The ``run`` command which is discussed above is equivalent to running all three
+of these in sequence.
+
+All of these commands accept a number of optional command-line arguments. The
+``--help`` flag will give a complete list of these, or keep reading this page
+for a guide to some of the more useful ones.
+
+Parsing Test Results
+====================
+
+KUnit tests output their results in TAP (Test Anything Protocol) format.
+kunit_tool will, when running tests, parse this output and print a summary
+which is much more pleasant to read. If you wish to look at the raw test
+results in TAP format, you can pass the ``--raw_output`` argument.
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run --raw_output
+
+.. note::
+ The raw output from test runs may contain other, non-KUnit kernel log
+ lines.
+
+If you have KUnit results in their raw TAP format, you can parse them and print
+the human-readable summary with the ``parse`` command for kunit_tool. This
+accepts a filename for an argument, or will read from standard input.
+
+.. code-block:: bash
+
+ # Reading from a file
+ ./tools/testing/kunit/kunit.py parse /var/log/dmesg
+ # Reading from stdin
+ dmesg | ./tools/testing/kunit/kunit.py parse
+
+This is very useful if you wish to run tests in a configuration not supported
+by kunit_tool (such as on real hardware, or an unsupported architecture).
+
+Filtering Tests
+===============
+
+It's possible to run only a subset of the tests built into a kernel by passing
+a filter to the ``exec`` or ``run`` commands. For example, if you only wanted
+to run KUnit resource tests, you could use:
+
+.. code-block:: bash
+
+ ./tools/testing/kunit/kunit.py run 'kunit-resource*'
+
+This uses the standard glob format for wildcards.
+
+Other Useful Options
+====================
+
+kunit_tool has a number of other command-line arguments which can be useful
+when adapting it to fit your environment or needs.
+
+Some of the more useful ones are:
+
+``--help``
+ Lists all of the available options. Note that different commands
+ (``config``, ``build``, ``run``, etc) will have different supported
+ options. Place ``--help`` before the command to list common options,
+ and after the command for options specific to that command.
+
+``--build_dir``
+ Specifies the build directory that kunit_tool will use. This is where
+ the .kunitconfig file is located, as well as where the .config and
+ compiled kernel will be placed. Defaults to ``.kunit``.
+
+``--make_options``
+ Specifies additional options to pass to ``make`` when compiling a
+ kernel (with the ``build`` or ``run`` commands). For example, to enable
+ compiler warnings, you can pass ``--make_options W=1``.
+
+``--alltests``
+ Builds a UML kernel with all config options enabled using
+ ``make allyesconfig``. This allows you to run as many tests as is
+ possible, but is very slow and prone to breakage as new options are
+ added or modified. Most people should add ``CONFIG_KUNIT_ALL_TESTS=1``
+ to their .kunitconfig instead if they wish to run "all tests".
+
+
+There are several other options (and new ones are often added), so do check
+``--help`` if you're looking for something not mentioned here.
--
2.31.1.368.gbe11c130af-goog
From: Ira Weiny <ira.weiny(a)intel.com>
Introduce a new page protection mechanism for supervisor pages, Protection Key
Supervisor (PKS).
Generally PKS enables protections on 'domains' of supervisor pages to limit
supervisor mode access to pages beyond the normal paging protections. PKS
works in a similar fashion to user space pkeys, PKU. As with PKU, supervisor
pkeys are checked in addition to normal paging protections and Access or Writes
can be disabled via a MSR update without TLB flushes when permissions change.
Also like PKU, a page mapping is assigned to a domain by setting pkey bits in
the page table entry for that mapping.
Access is controlled through a PKRS register which is updated via WRMSR/RDMSR.
XSAVE is not supported for the PKRS MSR. Therefore the implementation
saves/restores the MSR across context switches and during exceptions. Nested
exceptions are supported by each exception getting a new PKS state.
For consistent behavior with current paging protections, pkey 0 is reserved and
configured to allow full access via the pkey mechanism, thus preserving the
default paging protections on mappings with the default pkey value of 0.
Other keys, (1-15) are allocated by an allocator which prepares us for key
contention from day one. Kernel users should be prepared for the allocator to
fail either because of key exhaustion or due to PKS not being supported on the
CPU instance.
The following are key attributes of PKS.
1) Fast switching of permissions
1a) Prevents access without page table manipulations
1b) No TLB flushes required
2) Works on a per thread basis
PKS is available with 4 and 5 level paging. Like PKRU it consumes 4 bits from
the PTE to store the pkey within the entry.
All code to support PKS is configured via ARCH_ENABLE_SUPERVISOR_PKEYS which
is designed to only be turned on when a user is configured on in the kernel.
Those users must depend on ARCH_HAS_SUPERVISOR_PKEYS to properly work with
other architectures which do not yet support PKS.
Originally this series was submitted as part of a large patch set which
converted the kmap call sites.[1]
Many follow on discussions revealed a few problems. The first of which was
that some callers leak a kmap mapping across threads rather than containing it
to a critical section. Attempts were made to see if these 'global kmaps' could
be supported.[2] However, supporting global kmaps had many problems. Work is
being done in parallel on converting as many kmap calls to the new
kmap_local_page().[3]
Changes from V5 [6]
From Dave Hansen
Remove 'we' from comments
Changes from V4 [5]
From kernel test robot <lkp(a)intel.com>
Fix i386 build: pks_init_task not found
Move MSR_IA32_PKRS and INIT_PKRS_VALUE into patch 5 where they are
first 'used'. (Technically nothing is 'used' until the final
test patch. But review wise this is much cleaner.)
From Sean Christoperson
Add documentation details on what happens if the pkey is violated
Change cpu_feature_enabled to be in WARN_ON check
Clean up commit message of patch 6
[1] https://lore.kernel.org/lkml/20201009195033.3208459-1-ira.weiny@intel.com/
[2] https://lore.kernel.org/lkml/87mtycqcjf.fsf@nanos.tec.linutronix.de/
[3] https://lore.kernel.org/lkml/20210128061503.1496847-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210210062221.3023586-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210205170030.856723-1-ira.weiny@intel.com/https://lore.kernel.org/lkml/20210217024826.3466046-1-ira.weiny@intel.com/
[4] https://lore.kernel.org/lkml/20201106232908.364581-1-ira.weiny@intel.com/
[5] https://lore.kernel.org/lkml/20210322053020.2287058-1-ira.weiny@intel.com/
[6] https://lore.kernel.org/lkml/20210331191405.341999-1-ira.weiny@intel.com/
Fenghua Yu (1):
x86/pks: Add PKS kernel API
Ira Weiny (9):
x86/pkeys: Create pkeys_common.h
x86/fpu: Refactor arch_set_user_pkey_access() for PKS support
x86/pks: Add additional PKEY helper macros
x86/pks: Add PKS defines and Kconfig options
x86/pks: Add PKS setup code
x86/fault: Adjust WARN_ON for PKey fault
x86/pks: Preserve the PKRS MSR on context switch
x86/entry: Preserve PKRS MSR across exceptions
x86/pks: Add PKS test code
Documentation/core-api/protection-keys.rst | 112 +++-
arch/x86/Kconfig | 1 +
arch/x86/entry/calling.h | 26 +
arch/x86/entry/common.c | 57 ++
arch/x86/entry/entry_64.S | 22 +-
arch/x86/entry/entry_64_compat.S | 6 +-
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/disabled-features.h | 8 +-
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/pgtable.h | 15 +-
arch/x86/include/asm/pgtable_types.h | 12 +
arch/x86/include/asm/pkeys.h | 4 +
arch/x86/include/asm/pkeys_common.h | 34 +
arch/x86/include/asm/pks.h | 54 ++
arch/x86/include/asm/processor-flags.h | 2 +
arch/x86/include/asm/processor.h | 47 +-
arch/x86/include/uapi/asm/processor-flags.h | 2 +
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/fpu/xstate.c | 22 +-
arch/x86/kernel/head_64.S | 7 +-
arch/x86/kernel/process.c | 3 +
arch/x86/kernel/process_64.c | 2 +
arch/x86/mm/fault.c | 30 +-
arch/x86/mm/pkeys.c | 218 +++++-
include/linux/pgtable.h | 4 +
include/linux/pkeys.h | 34 +
kernel/entry/common.c | 14 +-
lib/Kconfig.debug | 11 +
lib/Makefile | 3 +
lib/pks/Makefile | 3 +
lib/pks/pks_test.c | 694 ++++++++++++++++++++
mm/Kconfig | 5 +
tools/testing/selftests/x86/Makefile | 3 +-
tools/testing/selftests/x86/test_pks.c | 149 +++++
34 files changed, 1527 insertions(+), 81 deletions(-)
create mode 100644 arch/x86/include/asm/pkeys_common.h
create mode 100644 arch/x86/include/asm/pks.h
create mode 100644 lib/pks/Makefile
create mode 100644 lib/pks/pks_test.c
create mode 100644 tools/testing/selftests/x86/test_pks.c
--
2.28.0.rc0.12.gb6a658bd00c9