From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit 1233898ab758cbcf5f6fea10b8dd16a0b2c24fab ]
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 3 ++-
.../selftests/net/forwarding/mirror_lib.sh | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
index 6f3a70df63bc..e00435753008 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
@@ -120,12 +120,13 @@ __mirror_gre_test()
sleep 5
for ((i = 0; i < count; ++i)); do
+ local sip=$(mirror_gre_ipv6_addr 1 $i)::1
local dip=$(mirror_gre_ipv6_addr 1 $i)::2
local htun=h3-gt6-$i
local message
icmp6_capture_install $htun
- mirror_test v$h1 "" $dip $htun 100 10
+ mirror_test v$h1 $sip $dip $htun 100 10
icmp6_capture_uninstall $htun
done
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
index 13db1cb50e57..6406cd76a19d 100644
--- a/tools/testing/selftests/net/forwarding/mirror_lib.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -20,6 +20,13 @@ mirror_uninstall()
tc filter del dev $swp1 $direction pref 1000
}
+is_ipv6()
+{
+ local addr=$1; shift
+
+ [[ -z ${addr//[0-9a-fA-F:]/} ]]
+}
+
mirror_test()
{
local vrf_name=$1; shift
@@ -29,9 +36,17 @@ mirror_test()
local pref=$1; shift
local expect=$1; shift
+ if is_ipv6 $dip; then
+ local proto=-6
+ local type="icmp6 type=128" # Echo request.
+ else
+ local proto=
+ local type="icmp echoreq"
+ fi
+
local t0=$(tc_rule_stats_get $dev $pref)
- $MZ $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
- -c 10 -d 100msec -t icmp type=8
+ $MZ $proto $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
+ -c 10 -d 100msec -t $type
sleep 0.5
local t1=$(tc_rule_stats_get $dev $pref)
local delta=$((t1 - t0))
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit dda7f4fa55839baeb72ae040aeaf9ccf89d3e416 ]
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Jiri Pirko <jiri(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..33ddd01689be 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -507,8 +507,8 @@ do_red_test()
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
local diff=$((limit - backlog))
pct=$((100 * diff / limit))
- ((0 <= pct && pct <= 5))
- check_err $? "backlog $backlog / $limit expected <= 5% distance"
+ ((0 <= pct && pct <= 10))
+ check_err $? "backlog $backlog / $limit expected <= 10% distance"
log_test "TC $((vlan - 10)): RED backlog > limit"
stop_traffic
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..9a41d8bb9ff1 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Russell Currey <ruscur(a)russell.cc>
[ Upstream commit 3a72c94ebfb1f171eba0715998010678a09ec796 ]
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Signed-off-by: Russell Currey <ruscur(a)russell.cc>
Acked-by: Daniel Axtens <dja(a)axtens.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210223070227.2916871-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/entry_flush.c | 2 +-
tools/testing/selftests/powerpc/security/flush_utils.h | 4 ++++
tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c
index 78cf914fa321..68ce377b205e 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -53,7 +53,7 @@ int entry_flush_test(void)
entry_flush = entry_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h
index 07a5eb301466..7a3d60292916 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.h
+++ b/tools/testing/selftests/powerpc/security/flush_utils.h
@@ -9,6 +9,10 @@
#define CACHELINE_SIZE 128
+#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) | \
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))
+
void syscall_loop(char *p, unsigned long iterations,
unsigned long zero_size);
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c
index 7565fd786640..f73484a6470f 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -54,7 +54,7 @@ int rfi_flush_test(void)
rfi_flush = rfi_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit 1233898ab758cbcf5f6fea10b8dd16a0b2c24fab ]
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 3 ++-
.../selftests/net/forwarding/mirror_lib.sh | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
index 6f3a70df63bc..e00435753008 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
@@ -120,12 +120,13 @@ __mirror_gre_test()
sleep 5
for ((i = 0; i < count; ++i)); do
+ local sip=$(mirror_gre_ipv6_addr 1 $i)::1
local dip=$(mirror_gre_ipv6_addr 1 $i)::2
local htun=h3-gt6-$i
local message
icmp6_capture_install $htun
- mirror_test v$h1 "" $dip $htun 100 10
+ mirror_test v$h1 $sip $dip $htun 100 10
icmp6_capture_uninstall $htun
done
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
index 13db1cb50e57..6406cd76a19d 100644
--- a/tools/testing/selftests/net/forwarding/mirror_lib.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -20,6 +20,13 @@ mirror_uninstall()
tc filter del dev $swp1 $direction pref 1000
}
+is_ipv6()
+{
+ local addr=$1; shift
+
+ [[ -z ${addr//[0-9a-fA-F:]/} ]]
+}
+
mirror_test()
{
local vrf_name=$1; shift
@@ -29,9 +36,17 @@ mirror_test()
local pref=$1; shift
local expect=$1; shift
+ if is_ipv6 $dip; then
+ local proto=-6
+ local type="icmp6 type=128" # Echo request.
+ else
+ local proto=
+ local type="icmp echoreq"
+ fi
+
local t0=$(tc_rule_stats_get $dev $pref)
- $MZ $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
- -c 10 -d 100msec -t icmp type=8
+ $MZ $proto $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
+ -c 10 -d 100msec -t $type
sleep 0.5
local t1=$(tc_rule_stats_get $dev $pref)
local delta=$((t1 - t0))
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit dda7f4fa55839baeb72ae040aeaf9ccf89d3e416 ]
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Jiri Pirko <jiri(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..33ddd01689be 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -507,8 +507,8 @@ do_red_test()
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
local diff=$((limit - backlog))
pct=$((100 * diff / limit))
- ((0 <= pct && pct <= 5))
- check_err $? "backlog $backlog / $limit expected <= 5% distance"
+ ((0 <= pct && pct <= 10))
+ check_err $? "backlog $backlog / $limit expected <= 10% distance"
log_test "TC $((vlan - 10)): RED backlog > limit"
stop_traffic
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..9a41d8bb9ff1 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Russell Currey <ruscur(a)russell.cc>
[ Upstream commit 3a72c94ebfb1f171eba0715998010678a09ec796 ]
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Signed-off-by: Russell Currey <ruscur(a)russell.cc>
Acked-by: Daniel Axtens <dja(a)axtens.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210223070227.2916871-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/entry_flush.c | 2 +-
tools/testing/selftests/powerpc/security/flush_utils.h | 4 ++++
tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c
index 78cf914fa321..68ce377b205e 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -53,7 +53,7 @@ int entry_flush_test(void)
entry_flush = entry_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h
index 07a5eb301466..7a3d60292916 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.h
+++ b/tools/testing/selftests/powerpc/security/flush_utils.h
@@ -9,6 +9,10 @@
#define CACHELINE_SIZE 128
+#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) | \
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))
+
void syscall_loop(char *p, unsigned long iterations,
unsigned long zero_size);
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c
index 7565fd786640..f73484a6470f 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -54,7 +54,7 @@ int rfi_flush_test(void)
rfi_flush = rfi_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit 1233898ab758cbcf5f6fea10b8dd16a0b2c24fab ]
The mirror_gre_scale test creates as many ERSPAN sessions as the underlying
chip supports, and tests that they all work. In order to determine that it
issues a stream of ICMP packets and checks if they are mirrored as
expected.
However, the mausezahn invocation missed the -6 flag to identify the use of
IPv6 protocol, and was sending ICMP messages over IPv6, as opposed to
ICMP6. It also didn't pass an explicit source IP address, which apparently
worked at some point in the past, but does not anymore.
To fix these issues, extend the function mirror_test() in mirror_lib by
detecting the IPv6 protocol addresses, and using a different ICMP scheme.
Fix __mirror_gre_test() in the selftest itself to pass a source IP address.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../drivers/net/mlxsw/mirror_gre_scale.sh | 3 ++-
.../selftests/net/forwarding/mirror_lib.sh | 19 +++++++++++++++++--
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
index 6f3a70df63bc..e00435753008 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/mirror_gre_scale.sh
@@ -120,12 +120,13 @@ __mirror_gre_test()
sleep 5
for ((i = 0; i < count; ++i)); do
+ local sip=$(mirror_gre_ipv6_addr 1 $i)::1
local dip=$(mirror_gre_ipv6_addr 1 $i)::2
local htun=h3-gt6-$i
local message
icmp6_capture_install $htun
- mirror_test v$h1 "" $dip $htun 100 10
+ mirror_test v$h1 $sip $dip $htun 100 10
icmp6_capture_uninstall $htun
done
}
diff --git a/tools/testing/selftests/net/forwarding/mirror_lib.sh b/tools/testing/selftests/net/forwarding/mirror_lib.sh
index 13db1cb50e57..6406cd76a19d 100644
--- a/tools/testing/selftests/net/forwarding/mirror_lib.sh
+++ b/tools/testing/selftests/net/forwarding/mirror_lib.sh
@@ -20,6 +20,13 @@ mirror_uninstall()
tc filter del dev $swp1 $direction pref 1000
}
+is_ipv6()
+{
+ local addr=$1; shift
+
+ [[ -z ${addr//[0-9a-fA-F:]/} ]]
+}
+
mirror_test()
{
local vrf_name=$1; shift
@@ -29,9 +36,17 @@ mirror_test()
local pref=$1; shift
local expect=$1; shift
+ if is_ipv6 $dip; then
+ local proto=-6
+ local type="icmp6 type=128" # Echo request.
+ else
+ local proto=
+ local type="icmp echoreq"
+ fi
+
local t0=$(tc_rule_stats_get $dev $pref)
- $MZ $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
- -c 10 -d 100msec -t icmp type=8
+ $MZ $proto $vrf_name ${sip:+-A $sip} -B $dip -a own -b bc -q \
+ -c 10 -d 100msec -t $type
sleep 0.5
local t1=$(tc_rule_stats_get $dev $pref)
local delta=$((t1 - t0))
--
2.30.2
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit dda7f4fa55839baeb72ae040aeaf9ccf89d3e416 ]
The intention behind this test is to make sure that qdisc limit is
correctly projected to the HW. However, first, due to rounding in the
qdisc, and then in the driver, the number cannot actually be accurate. And
second, the approach to testing this is to oversubscribe the port with
traffic generated on the same switch. The actual backlog size therefore
fluctuates.
In practice, this test proved to be noisier than the rest, and spuriously
fails every now and then. Increase the tolerance to 10 % to avoid these
issues.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Acked-by: Jiri Pirko <jiri(a)nvidia.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
index b0cb1aaffdda..33ddd01689be 100644
--- a/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/sch_red_core.sh
@@ -507,8 +507,8 @@ do_red_test()
check_err $? "backlog $backlog / $limit Got $pct% marked packets, expected == 0."
local diff=$((limit - backlog))
pct=$((100 * diff / limit))
- ((0 <= pct && pct <= 5))
- check_err $? "backlog $backlog / $limit expected <= 5% distance"
+ ((0 <= pct && pct <= 10))
+ check_err $? "backlog $backlog / $limit expected <= 10% distance"
log_test "TC $((vlan - 10)): RED backlog > limit"
stop_traffic
--
2.30.2
From: Yonghong Song <yhs(a)fb.com>
[ Upstream commit 26e6dd1072763cd5696b75994c03982dde952ad9 ]
selftests/bpf/Makefile includes lib.mk. With the following command
make -j60 LLVM=1 LLVM_IAS=1 <=== compile kernel
make -j60 -C tools/testing/selftests/bpf LLVM=1 LLVM_IAS=1 V=1
some files are still compiled with gcc. This patch
fixed lib.mk issue which sets CC to gcc in all cases.
Signed-off-by: Yonghong Song <yhs(a)fb.com>
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20210413153413.3027426-1-yhs@fb.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/lib.mk | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index a5ce26d548e4..9a41d8bb9ff1 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -1,6 +1,10 @@
# This mimics the top-level Makefile. We do it explicitly here so that this
# Makefile can operate with or without the kbuild infrastructure.
+ifneq ($(LLVM),)
+CC := clang
+else
CC := $(CROSS_COMPILE)gcc
+endif
ifeq (0,$(MAKELEVEL))
ifeq ($(OUTPUT),)
--
2.30.2
From: Russell Currey <ruscur(a)russell.cc>
[ Upstream commit 3a72c94ebfb1f171eba0715998010678a09ec796 ]
The rfi_flush and entry_flush selftests work by using the PM_LD_MISS_L1
perf event to count L1D misses. The value of this event has changed
over time:
- Power7 uses 0x400f0
- Power8 and Power9 use both 0x400f0 and 0x3e054
- Power10 uses only 0x3e054
Rather than relying on raw values, configure perf to count L1D read
misses in the most explicit way available.
This fixes the selftests to work on systems without 0x400f0 as
PM_LD_MISS_L1, and should change no behaviour for systems that the tests
already worked on.
The only potential downside is that referring to a specific perf event
requires PMU support implemented in the kernel for that platform.
Signed-off-by: Russell Currey <ruscur(a)russell.cc>
Acked-by: Daniel Axtens <dja(a)axtens.net>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210223070227.2916871-1-ruscur@russell.cc
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/powerpc/security/entry_flush.c | 2 +-
tools/testing/selftests/powerpc/security/flush_utils.h | 4 ++++
tools/testing/selftests/powerpc/security/rfi_flush.c | 2 +-
3 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/powerpc/security/entry_flush.c b/tools/testing/selftests/powerpc/security/entry_flush.c
index 78cf914fa321..68ce377b205e 100644
--- a/tools/testing/selftests/powerpc/security/entry_flush.c
+++ b/tools/testing/selftests/powerpc/security/entry_flush.c
@@ -53,7 +53,7 @@ int entry_flush_test(void)
entry_flush = entry_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
diff --git a/tools/testing/selftests/powerpc/security/flush_utils.h b/tools/testing/selftests/powerpc/security/flush_utils.h
index 07a5eb301466..7a3d60292916 100644
--- a/tools/testing/selftests/powerpc/security/flush_utils.h
+++ b/tools/testing/selftests/powerpc/security/flush_utils.h
@@ -9,6 +9,10 @@
#define CACHELINE_SIZE 128
+#define PERF_L1D_READ_MISS_CONFIG ((PERF_COUNT_HW_CACHE_L1D) | \
+ (PERF_COUNT_HW_CACHE_OP_READ << 8) | \
+ (PERF_COUNT_HW_CACHE_RESULT_MISS << 16))
+
void syscall_loop(char *p, unsigned long iterations,
unsigned long zero_size);
diff --git a/tools/testing/selftests/powerpc/security/rfi_flush.c b/tools/testing/selftests/powerpc/security/rfi_flush.c
index 7565fd786640..f73484a6470f 100644
--- a/tools/testing/selftests/powerpc/security/rfi_flush.c
+++ b/tools/testing/selftests/powerpc/security/rfi_flush.c
@@ -54,7 +54,7 @@ int rfi_flush_test(void)
rfi_flush = rfi_flush_orig;
- fd = perf_event_open_counter(PERF_TYPE_RAW, /* L1d miss */ 0x400f0, -1);
+ fd = perf_event_open_counter(PERF_TYPE_HW_CACHE, PERF_L1D_READ_MISS_CONFIG, -1);
FAIL_IF(fd < 0);
p = (char *)memalign(zero_size, CACHELINE_SIZE);
--
2.30.2
TL;DR: Add support to kunit_tool to dispatch tests via QEMU. Also add
support to immediately shutdown a kernel after running KUnit tests.
Background
----------
KUnit has supported running on all architectures for quite some time;
however, kunit_tool - the script commonly used to invoke KUnit tests -
has only fully supported KUnit run on UML. Its functionality has been
broken up for some time to separate the configure, build, run, and parse
phases making it possible to be used in part on other architectures to a
small extent. Nevertheless, kunit_tool has not supported running tests
on other architectures.
What this patchset does
-----------------------
This patchset introduces first class support to kunit_tool for KUnit to
be run on many popular architectures via QEMU. It does this by adding
two new flags: `--arch` and `--cross_compile`.
`--arch` allows an architecture to be specified by the name the
architecture is given in `arch/`. It uses the specified architecture to
select a minimal amount of Kconfigs and QEMU configs needed for the
architecture to run in QEMU and provide a console from which KTAP
results can be scraped.
`--cross_compile` allows a toolchain prefix to be specified to make
similar to how `CROSS_COMPILE` is used.
Additionally, this patchset revives the previously considered "kunit:
tool: add support for QEMU"[1] patchs. The motivation for this new
kernel command line flags, `kunit_shutdown`, is to better support
running KUnit tests inside of QEMU. For most popular architectures, QEMU
can be made to terminate when the Linux kernel that is being run is
reboted, halted, or powered off. As Kees pointed out in a previous
discussion[2], it is possible to make a kernel initrd that can reboot
the kernel immediately, doing this for every architecture would likely
be infeasible. Instead, just having an option for the kernel to shutdown
when it is done with testing seems a lot simpler, especially since it is
an option which would only available in testing configurations of the
kernel anyway.
What discussion remains for this patchset?
------------------------------------------
The first most obvious thing is settling the debate about
`kunit_shutdown`. If I recall correctly, Kees suggested that it might be
better to just add a new initrd; however, as I mentioned above, now to
support many new architectures, it may be substantially easier to
support this option. So I am hoping with this new usecase, the argument
for `kunit_shutdown` will be more compelling.
The second and likely harder issue is figuring out the best way to
configure and provide configs for running KUnit tests via QEMU. I
provide a pretty primitive way in this patchset which is not super
flexible; for example, for our PPC support we have it set to build big
endian, and POWER8 - we currently don't support a way to change that.
Nevertheless, having sensible defaults is handy too, so we will probably
want to have some support for overriding defaults, while still being
able to have defaults.
[1] http://patches.linaro.org/patch/208336/
[2] https://lkml.org/lkml/2020/6/26/988
Brendan Higgins (3):
Documentation: Add kunit_shutdown to kernel-parameters.txt
kunit: tool: add support for QEMU
Documentation: kunit: document support for QEMU in kunit_tool
David Gow (1):
kunit: Add 'kunit_shutdown' option
.../admin-guide/kernel-parameters.txt | 8 +
Documentation/dev-tools/kunit/usage.rst | 37 +++-
lib/kunit/executor.c | 20 ++
tools/testing/kunit/kunit.py | 33 ++-
tools/testing/kunit/kunit_config.py | 2 +-
tools/testing/kunit/kunit_kernel.py | 209 +++++++++++++++---
tools/testing/kunit/kunit_parser.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 15 +-
8 files changed, 278 insertions(+), 48 deletions(-)
base-commit: 7af08140979a6e7e12b78c93b8625c8d25b084e2
--
2.31.1.498.g6c1eba8ee3d-goog
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v4:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
- Fixed error code sign.
- Corrected version message
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..3f941b1f4e78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != -E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
Clang's integrated assembler does not allow symbols with non-absolute
values to be reassigned. Modify the interrupt entry loop macro to be
compatible with IAS by using a label and an offset.
Cc: Jian Cai <caij2003(a)gmail.com>
Signed-off-by: Bill Wendling <morbo(a)google.com>
References: https://lore.kernel.org/lkml/20200714233024.1789985-1-caij2003@gmail.com/
---
tools/testing/selftests/kvm/lib/x86_64/handlers.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/handlers.S b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
index aaf7bc7d2ce1..3f9181e9a0a7 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/handlers.S
+++ b/tools/testing/selftests/kvm/lib/x86_64/handlers.S
@@ -54,9 +54,9 @@ idt_handlers:
.align 8
/* Fetch current address and append it to idt_handlers. */
- current_handler = .
+0 :
.pushsection .rodata
-.quad current_handler
+ .quad 0b
.popsection
.if ! \has_error
--
2.29.2.576.ga3fc446d84-goog
I'm just starting my learning curve on SGX, so I don't know if I've missed
some setup for the SGX device entries. After looking at arch/x86/kernel/cpu/sgx/driver.c
I see that there is no mode value for either sgx_dev_enclave or sgx_dev_provision.
With this patch I can get the SGX self test to complete:
sudo ./test_sgx
Warning: no execute permissions on device file /dev/sgx_enclave
0x0000000000000000 0x0000000000002000 0x03
0x0000000000002000 0x0000000000001000 0x05
0x0000000000003000 0x0000000000003000 0x03
SUCCESS
Is the warning even necessary ?
Tim
Functionally, this just means that the test output will be slightly
changed and it'll now depend on CONFIG_KUNIT=y/m.
It'll still run at boot time and can still be built as a loadable
module.
There was a pre-existing patch to convert this test that I found later,
here [1]. Compared to [1], this patch doesn't rename files and uses
KUnit features more heavily (i.e. does more than converting pr_err()
calls to KUNIT_FAIL()).
What this conversion gives us:
* a shorter test thanks to KUnit's macros
* a way to run this a bit more easily via kunit.py (and
CONFIG_KUNIT_ALL_TESTS=y) [2]
* a structured way of reporting pass/fail
* uses kunit-managed allocations to avoid the risk of memory leaks
* more descriptive error messages:
* i.e. it prints out which fields are invalid, what the expected
values are, etc.
What this conversion does not do:
* change the name of the file (and thus the name of the module)
* change the name of the config option
Leaving these as-is for now to minimize the impact to people wanting to
run this test. IMO, that concern trumps following KUnit's style guide
for both names, at least for now.
[1] https://lore.kernel.org/linux-kselftest/20201015014616.309000-1-vitor@massa…
[2] Can be run via
$ ./tools/testing/kunit/kunit.py run --kunitconfig /dev/stdin <<EOF
CONFIG_KUNIT=y
CONFIG_TEST_LIST_SORT=y
EOF
[16:55:56] Configuring KUnit Kernel ...
[16:55:56] Building KUnit Kernel ...
[16:56:29] Starting KUnit Kernel ...
[16:56:32] ============================================================
[16:56:32] ======== [PASSED] list_sort ========
[16:56:32] [PASSED] list_sort_test
[16:56:32] ============================================================
[16:56:32] Testing complete. 1 tests run. 0 failed. 0 crashed.
[16:56:32] Elapsed time: 35.668s total, 0.001s configuring, 32.725s building, 0.000s running
Note: the build time is as after a `make mrproper`.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
lib/Kconfig.debug | 5 +-
lib/test_list_sort.c | 128 +++++++++++++++++--------------------------
2 files changed, 54 insertions(+), 79 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 417c3d3e521b..09a0cc8a55cc 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1999,8 +1999,9 @@ config LKDTM
Documentation/fault-injection/provoke-crashes.rst
config TEST_LIST_SORT
- tristate "Linked list sorting test"
- depends on DEBUG_KERNEL || m
+ tristate "Linked list sorting test" if !KUNIT_ALL_TESTS
+ depends on KUNIT
+ default KUNIT_ALL_TESTS
help
Enable this to turn on 'list_sort()' function test. This test is
executed only once during system boot (so affects only boot time),
diff --git a/lib/test_list_sort.c b/lib/test_list_sort.c
index 1f017d3b610e..ccfd98dbf57c 100644
--- a/lib/test_list_sort.c
+++ b/lib/test_list_sort.c
@@ -1,5 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
-#define pr_fmt(fmt) "list_sort_test: " fmt
+#include <kunit/test.h>
#include <linux/kernel.h>
#include <linux/list_sort.h>
@@ -23,67 +23,52 @@ struct debug_el {
struct list_head list;
unsigned int poison2;
int value;
- unsigned serial;
+ unsigned int serial;
};
-/* Array, containing pointers to all elements in the test list */
-static struct debug_el **elts __initdata;
-
-static int __init check(struct debug_el *ela, struct debug_el *elb)
+static void check(struct kunit *test, struct debug_el *ela, struct debug_el *elb)
{
- if (ela->serial >= TEST_LIST_LEN) {
- pr_err("error: incorrect serial %d\n", ela->serial);
- return -EINVAL;
- }
- if (elb->serial >= TEST_LIST_LEN) {
- pr_err("error: incorrect serial %d\n", elb->serial);
- return -EINVAL;
- }
- if (elts[ela->serial] != ela || elts[elb->serial] != elb) {
- pr_err("error: phantom element\n");
- return -EINVAL;
- }
- if (ela->poison1 != TEST_POISON1 || ela->poison2 != TEST_POISON2) {
- pr_err("error: bad poison: %#x/%#x\n",
- ela->poison1, ela->poison2);
- return -EINVAL;
- }
- if (elb->poison1 != TEST_POISON1 || elb->poison2 != TEST_POISON2) {
- pr_err("error: bad poison: %#x/%#x\n",
- elb->poison1, elb->poison2);
- return -EINVAL;
- }
- return 0;
+ struct debug_el **elts = test->priv;
+
+ KUNIT_EXPECT_LT_MSG(test, ela->serial, (unsigned int)TEST_LIST_LEN, "incorrect serial");
+ KUNIT_EXPECT_LT_MSG(test, elb->serial, (unsigned int)TEST_LIST_LEN, "incorrect serial");
+
+ KUNIT_EXPECT_PTR_EQ_MSG(test, elts[ela->serial], ela, "phantom element");
+ KUNIT_EXPECT_PTR_EQ_MSG(test, elts[elb->serial], elb, "phantom element");
+
+ KUNIT_EXPECT_EQ_MSG(test, ela->poison1, TEST_POISON1, "bad poison");
+ KUNIT_EXPECT_EQ_MSG(test, ela->poison2, TEST_POISON2, "bad poison");
+
+ KUNIT_EXPECT_EQ_MSG(test, elb->poison1, TEST_POISON1, "bad poison");
+ KUNIT_EXPECT_EQ_MSG(test, elb->poison2, TEST_POISON2, "bad poison");
}
-static int __init cmp(void *priv, struct list_head *a, struct list_head *b)
+/* `priv` is the test pointer so check() can fail the test if the list is invalid. */
+static int cmp(void *priv, struct list_head *a, struct list_head *b)
{
struct debug_el *ela, *elb;
ela = container_of(a, struct debug_el, list);
elb = container_of(b, struct debug_el, list);
- check(ela, elb);
+ check(priv, ela, elb);
return ela->value - elb->value;
}
-static int __init list_sort_test(void)
+static void list_sort_test(struct kunit *test)
{
- int i, count = 1, err = -ENOMEM;
- struct debug_el *el;
+ int i, count = 1;
+ struct debug_el *el, **elts;
struct list_head *cur;
LIST_HEAD(head);
- pr_debug("start testing list_sort()\n");
-
- elts = kcalloc(TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL);
- if (!elts)
- return err;
+ elts = kunit_kcalloc(test, TEST_LIST_LEN, sizeof(*elts), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, elts);
+ test->priv = elts;
for (i = 0; i < TEST_LIST_LEN; i++) {
- el = kmalloc(sizeof(*el), GFP_KERNEL);
- if (!el)
- goto exit;
+ el = kunit_kmalloc(test, sizeof(*el), GFP_KERNEL);
+ KUNIT_ASSERT_NOT_ERR_OR_NULL(test, el);
/* force some equivalencies */
el->value = prandom_u32() % (TEST_LIST_LEN / 3);
@@ -94,55 +79,44 @@ static int __init list_sort_test(void)
list_add_tail(&el->list, &head);
}
- list_sort(NULL, &head, cmp);
+ list_sort(test, &head, cmp);
- err = -EINVAL;
for (cur = head.next; cur->next != &head; cur = cur->next) {
struct debug_el *el1;
int cmp_result;
- if (cur->next->prev != cur) {
- pr_err("error: list is corrupted\n");
- goto exit;
- }
+ KUNIT_ASSERT_PTR_EQ_MSG(test, cur->next->prev, cur,
+ "list is corrupted");
- cmp_result = cmp(NULL, cur, cur->next);
- if (cmp_result > 0) {
- pr_err("error: list is not sorted\n");
- goto exit;
- }
+ cmp_result = cmp(test, cur, cur->next);
+ KUNIT_ASSERT_LE_MSG(test, cmp_result, 0, "list is not sorted");
el = container_of(cur, struct debug_el, list);
el1 = container_of(cur->next, struct debug_el, list);
- if (cmp_result == 0 && el->serial >= el1->serial) {
- pr_err("error: order of equivalent elements not "
- "preserved\n");
- goto exit;
+ if (cmp_result == 0) {
+ KUNIT_ASSERT_LE_MSG(test, el->serial, el1->serial,
+ "order of equivalent elements not preserved");
}
- if (check(el, el1)) {
- pr_err("error: element check failed\n");
- goto exit;
- }
+ check(test, el, el1);
count++;
}
- if (head.prev != cur) {
- pr_err("error: list is corrupted\n");
- goto exit;
- }
+ KUNIT_EXPECT_PTR_EQ_MSG(test, head.prev, cur, "list is corrupted");
+ KUNIT_EXPECT_EQ_MSG(test, count, TEST_LIST_LEN,
+ "list length changed after sorting!");
+}
- if (count != TEST_LIST_LEN) {
- pr_err("error: bad list length %d", count);
- goto exit;
- }
+static struct kunit_case list_sort_cases[] = {
+ KUNIT_CASE(list_sort_test),
+ {}
+};
+
+static struct kunit_suite list_sort_suite = {
+ .name = "list_sort",
+ .test_cases = list_sort_cases,
+};
+
+kunit_test_suites(&list_sort_suite);
- err = 0;
-exit:
- for (i = 0; i < TEST_LIST_LEN; i++)
- kfree(elts[i]);
- kfree(elts);
- return err;
-}
-module_init(list_sort_test);
MODULE_LICENSE("GPL");
--
2.31.1.498.g6c1eba8ee3d-goog
Add in:
* kunit_kmalloc_array() and wire up kunit_kmalloc() to be a special
case of it.
* kunit_kcalloc() for symmetry with kunit_kzalloc()
This should using KUnit more natural by making it more similar to the
existing *alloc() APIs.
And while we shouldn't necessarily be writing unit tests where overflow
should be a concern, it can't hurt to be safe.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Reviewed-by: Brendan Higgins <brendanhiggins(a)google.com>
---
v1 -> v2: s/kzalloc/kcalloc in doc comment.
---
include/kunit/test.h | 36 ++++++++++++++++++++++++++++++++----
lib/kunit/test.c | 22 ++++++++++++----------
2 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 49601c4b98b8..e8ecb69dd567 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -577,16 +577,30 @@ static inline int kunit_destroy_named_resource(struct kunit *test,
void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/**
- * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * kunit_kmalloc_array() - Like kmalloc_array() except the allocation is *test managed*.
* @test: The test context object.
+ * @n: number of elements.
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * Just like `kmalloc(...)`, except the allocation is managed by the test case
+ * Just like `kmalloc_array(...)`, except the allocation is managed by the test case
* and is automatically cleaned up after the test case concludes. See &struct
* kunit_resource for more information.
*/
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp);
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t flags);
+
+/**
+ * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * @test: The test context object.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kmalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+{
+ return kunit_kmalloc_array(test, 1, size, gfp);
+}
/**
* kunit_kfree() - Like kfree except for allocations managed by KUnit.
@@ -601,13 +615,27 @@ void kunit_kfree(struct kunit *test, const void *ptr);
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * See kzalloc() and kunit_kmalloc() for more information.
+ * See kzalloc() and kunit_kmalloc_array() for more information.
*/
static inline void *kunit_kzalloc(struct kunit *test, size_t size, gfp_t gfp)
{
return kunit_kmalloc(test, size, gfp | __GFP_ZERO);
}
+/**
+ * kunit_kcalloc() - Just like kunit_kmalloc_array(), but zeroes the allocation.
+ * @test: The test context object.
+ * @n: number of elements.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kcalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp_t flags)
+{
+ return kunit_kmalloc_array(test, n, size, flags | __GFP_ZERO);
+}
+
void kunit_cleanup(struct kunit *test);
void kunit_log_append(char *log, const char *fmt, ...);
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 2f6cc0123232..41fa46b14c3b 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -573,41 +573,43 @@ int kunit_destroy_resource(struct kunit *test, kunit_resource_match_t match,
}
EXPORT_SYMBOL_GPL(kunit_destroy_resource);
-struct kunit_kmalloc_params {
+struct kunit_kmalloc_array_params {
+ size_t n;
size_t size;
gfp_t gfp;
};
-static int kunit_kmalloc_init(struct kunit_resource *res, void *context)
+static int kunit_kmalloc_array_init(struct kunit_resource *res, void *context)
{
- struct kunit_kmalloc_params *params = context;
+ struct kunit_kmalloc_array_params *params = context;
- res->data = kmalloc(params->size, params->gfp);
+ res->data = kmalloc_array(params->n, params->size, params->gfp);
if (!res->data)
return -ENOMEM;
return 0;
}
-static void kunit_kmalloc_free(struct kunit_resource *res)
+static void kunit_kmalloc_array_free(struct kunit_resource *res)
{
kfree(res->data);
}
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
- struct kunit_kmalloc_params params = {
+ struct kunit_kmalloc_array_params params = {
.size = size,
+ .n = n,
.gfp = gfp
};
return kunit_alloc_resource(test,
- kunit_kmalloc_init,
- kunit_kmalloc_free,
+ kunit_kmalloc_array_init,
+ kunit_kmalloc_array_free,
gfp,
¶ms);
}
-EXPORT_SYMBOL_GPL(kunit_kmalloc);
+EXPORT_SYMBOL_GPL(kunit_kmalloc_array);
void kunit_kfree(struct kunit *test, const void *ptr)
{
base-commit: cda689f8708b6bef0b921c3a17fcdecbe959a079
--
2.31.1.527.g47e6f16901-goog
The readahead size used to be 2MB, thus it's reasonable to set the file
size as 4MB when checking check_file_mmap().
However since commit c2e4cd57cfa1 ("block: lift setting the readahead
size into the block layer"), readahead size could be as large as twice
the io_opt, and thus the hardcoded file size no longer works.
check_file_mmap() may report "Read-ahead pages reached the end of the
file" when the readahead size actually exceeds the file size in this
case.
To fix this issue, read the exact readahead window size via BLKRAGET
ioctl. Since now we have the readahead window size, take a more
fine-grained check. It is worth noting that this fine-grained check may
be broken as the sync readahead algorithm of kernel changes. It may be
acceptable since the algorithm of readahead ranging should be quite
stable, and we could tune the test case accorddingly if the algorithm
indeed changes.
Reported-by: James Wang <jnwang(a)linux.alibaba.com>
Acked-by: Ricardo Cañuelo <ricardo.canuelo(a)collabora.com>
Signed-off-by: Jeffle Xu <jefflexu(a)linux.alibaba.com>
---
changes since v3:
- make the check more fine-grained since we have the exact readahead
window size now, as suggested by Ricardo Cañuelo
chnages since v2:
- add 'Reported-by'
chnages since v1:
- add the test name "mincore" in the subject line
- add the error message in commit message
- rename @filesize to @file_size to keep a more consistent naming
convention
---
.../selftests/mincore/mincore_selftest.c | 96 +++++++++++++------
1 file changed, 68 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c
index 5a1e85ff5d32..369b35af4b4f 100644
--- a/tools/testing/selftests/mincore/mincore_selftest.c
+++ b/tools/testing/selftests/mincore/mincore_selftest.c
@@ -15,6 +15,11 @@
#include <string.h>
#include <fcntl.h>
#include <string.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <sys/ioctl.h>
+#include <sys/sysmacros.h>
+#include <sys/mount.h>
#include "../kselftest.h"
#include "../kselftest_harness.h"
@@ -193,12 +198,44 @@ TEST(check_file_mmap)
int retval;
int page_size;
int fd;
- int i;
+ int i, start, end;
int ra_pages = 0;
+ long ra_size, file_size;
+ struct stat stats;
+ dev_t devt;
+ unsigned int major, minor;
+ char devpath[32];
+
+ retval = stat(".", &stats);
+ ASSERT_EQ(0, retval) {
+ TH_LOG("Can't stat pwd: %s", strerror(errno));
+ }
+
+ devt = stats.st_dev;
+ major = major(devt);
+ minor = minor(devt);
+ snprintf(devpath, sizeof(devpath), "/dev/block/%u:%u", major, minor);
+
+ fd = open(devpath, O_RDONLY);
+ ASSERT_NE(-1, fd) {
+ TH_LOG("Can't open underlying disk %s", strerror(errno));
+ }
+
+ retval = ioctl(fd, BLKRAGET, &ra_size);
+ ASSERT_EQ(0, retval) {
+ TH_LOG("Error ioctl with the underlying disk: %s", strerror(errno));
+ }
+
+ /*
+ * BLKRAGET ioctl returns the readahead size in sectors (512 bytes).
+ * Make file_size large enough to contain the readahead window.
+ */
+ ra_size *= 512;
+ file_size = ra_size * 2;
page_size = sysconf(_SC_PAGESIZE);
- vec_size = FILE_SIZE / page_size;
- if (FILE_SIZE % page_size)
+ vec_size = file_size / page_size;
+ if (file_size % page_size)
vec_size++;
vec = calloc(vec_size, sizeof(unsigned char));
@@ -213,7 +250,7 @@ TEST(check_file_mmap)
strerror(errno));
}
errno = 0;
- retval = fallocate(fd, 0, 0, FILE_SIZE);
+ retval = fallocate(fd, 0, 0, file_size);
ASSERT_EQ(0, retval) {
TH_LOG("Error allocating space for the temporary file: %s",
strerror(errno));
@@ -223,12 +260,12 @@ TEST(check_file_mmap)
* Map the whole file, the pages shouldn't be fetched yet.
*/
errno = 0;
- addr = mmap(NULL, FILE_SIZE, PROT_READ | PROT_WRITE,
+ addr = mmap(NULL, file_size, PROT_READ | PROT_WRITE,
MAP_SHARED, fd, 0);
ASSERT_NE(MAP_FAILED, addr) {
TH_LOG("mmap error: %s", strerror(errno));
}
- retval = mincore(addr, FILE_SIZE, vec);
+ retval = mincore(addr, file_size, vec);
ASSERT_EQ(0, retval);
for (i = 0; i < vec_size; i++) {
ASSERT_EQ(0, vec[i]) {
@@ -240,38 +277,41 @@ TEST(check_file_mmap)
* Touch a page in the middle of the mapping. We expect the next
* few pages (the readahead window) to be populated too.
*/
- addr[FILE_SIZE / 2] = 1;
- retval = mincore(addr, FILE_SIZE, vec);
+ addr[file_size / 2] = 1;
+ retval = mincore(addr, file_size, vec);
ASSERT_EQ(0, retval);
- ASSERT_EQ(1, vec[FILE_SIZE / 2 / page_size]) {
- TH_LOG("Page not found in memory after use");
- }
- i = FILE_SIZE / 2 / page_size + 1;
- while (i < vec_size && vec[i]) {
- ra_pages++;
- i++;
- }
- EXPECT_GT(ra_pages, 0) {
- TH_LOG("No read-ahead pages found in memory");
- }
+ /*
+ * Readahead window is [start, end). So far the sync readahead
+ * algorithm takes the page that triggers the page fault as the
+ * midpoint.
+ */
+ ra_pages = ra_size / page_size;
+ start = file_size / 2 / page_size - ra_pages / 2;
+ end = start + ra_pages;
- EXPECT_LT(i, vec_size) {
- TH_LOG("Read-ahead pages reached the end of the file");
+ /*
+ * Check there's no hole in the readahead window.
+ */
+ for (i = start; i < end; i++) {
+ ASSERT_EQ(1, vec[i]) {
+ TH_LOG("Hole found in read-ahead window");
+ }
}
+
/*
- * End of the readahead window. The rest of the pages shouldn't
- * be in memory.
+ * Check there's no page beyond the readahead window.
*/
- if (i < vec_size) {
- while (i < vec_size && !vec[i])
- i++;
- EXPECT_EQ(vec_size, i) {
+ for (i = 0; i < vec_size; i++) {
+ if (i == start)
+ i = end;
+
+ EXPECT_EQ(0, vec[i]) {
TH_LOG("Unexpected page in memory beyond readahead window");
}
}
- munmap(addr, FILE_SIZE);
+ munmap(addr, file_size);
close(fd);
free(vec);
}
--
2.27.0
It is documented in Documentation/admin-guide/hw-vuln/spectre.rst, that
disabling indirect branch speculation for a user-space process creates
more overhead and cause it to run slower. The performance hit varies by
CPU, but on the AMD A4-9120C and A6-9220C CPUs, a simple ping-pong using
pipes between two processes runs ~10x slower when disabling IB
speculation.
Patch 2, included in this RFC but not intended for commit, is a simple
program that demonstrates this issue. Running on a A4-9120C without IB
speculation disabled, each process ping-pong takes ~7us:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1936300, iter/sec: 135383, us/iter: 7
But when IB speculation is disabled, that number increases
significantly:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 16384, t: 1500518, iter/sec: 10918, us/iter: 91
Although this test is a worst-case scenario, we can also consider a real
situation: an audio server (i.e. pulse). If we imagine a low-latency
capture, with 10ms packets and a concurrent task on the same CPU (i.e.
video encoding, for a video call), the audio server will preempt the
CPU at a rate of 100HZ. At 91us overhead per preemption (switching to
and from the audio process), that's 0.9% overhead for one process doing
preemption. In real-world testing (on a A4-9120C), I've seen 9% of CPU
used by IBPB when doing a 2-person video call.
With this patch, the number of IBPBs issued can be reduced to the
minimum necessary, only when there's a potential attacker->victim
process switch.
Running on the same A4-9120C device, this patch reduces the performance
hit of IBPB by ~half, as expected:
localhost ~ # taskset 1 /usr/local/bin/test ds
...
iters: 32768, t: 1824043, iter/sec: 17964, us/iter: 55
It should be noted, CPUs from multiple vendors experience a performance
hit due to IBPB. I also tested a Intel i3-8130U which sees a noticable
(~2x) increase in process switch time due to IBPB.
IB spec enabled:
localhost ~ # taskset 1 /usr/local/bin/test
...
iters: 262144, t: 1210821us, iter/sec: 216501, us/iter: 4
IB spec disabled:
localhost ~ # taskset 1 /usr/local/bin/test d
...
iters: 131072, t: 1257583us, iter/sec: 104225, us/iter: 9
Open questions:
- There are a significant number of task flags, which also now reaches the
limit of the 'long' on 32-bit systems. Should the 'mode' flags be
stored somewhere else?
- Having x86-specific flags in linux/sched.h feels wrong. However, this
is the mechanism for doing atomic flag updates. Is there an alternate
approach?
Open tasks:
- Documentation
- Naming
Changes in v2:
- Make flag per-process using prctl().
Anand K Mistry (2):
x86/speculation: Allow per-process control of when to issue IBPB
selftests: Benchmark for the cost of disabling IB speculation
arch/x86/include/asm/thread_info.h | 4 +
arch/x86/kernel/cpu/bugs.c | 56 +++++++++
arch/x86/kernel/process.c | 10 ++
arch/x86/mm/tlb.c | 51 ++++++--
include/linux/sched.h | 10 ++
include/uapi/linux/prctl.h | 5 +
.../testing/selftests/ib_spec/ib_spec_bench.c | 109 ++++++++++++++++++
7 files changed, 236 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/ib_spec/ib_spec_bench.c
--
2.31.1.498.g6c1eba8ee3d-goog
Hi, a friend and I were chasing bug 205219 [1] listed in Bugzilla.
We step into something a little bit different when trying to reproduce
the buggy behavior. In our try, compilation failed with a message form
make asking us to clean the source tree. We couldn't run kunit_tool
after compiling the kernel for x86, as described by Ted in the
discussion pointed out by the bug report.
Steps to reproduce:
0) Run kunit_tool
$ ./tools/testing/kunit/kunit.py run
Works fine with a clean tree.
1) Compile the kernel for some architecture (we did it for x86_64).
2) Run kunit_tool again
$ ./tools/testing/kunit/kunit.py run
Fails with a message form make asking us to clean the source tree.
Removing the clean source tree check from the top-level Makefile gives
us a similar error to what was described in the bug report. We see that
after running `git clean -fdx` kunit_tool runs nicely again. However,
this is not a real solution since some kernel binaries are erased by git.
We also had a look into the commit messages of Masahiro Yamada but
couldn't quite grasp why the check for the tree to be clean was added.
We could invest more time in this issue but actually don't know how to
proceed. We'd be glad to receive any comment about it. We could also try
something else if it's a too hard issue for beginners.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=205219
Best Regards,
Marcelo
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v2:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
- Fixed error code sign.
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..3f941b1f4e78 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != -E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
v2:
- Added description to documentation of KVM_GET_CPUID2.
- Copy back nent only if E2BIG is returned.
Documentation/virt/kvm/api.rst | 81 ++++++++++++-------
arch/x86/kvm/x86.c | 11 ++-
.../selftests/kvm/lib/x86_64/processor.c | 20 +++--
3 files changed, 73 insertions(+), 39 deletions(-)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 245d80581f15..c7cfe4b9614e 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -711,7 +711,34 @@ resulting CPUID configuration through KVM_GET_CPUID2 in case.
};
-4.21 KVM_SET_SIGNAL_MASK
+4.21 KVM_GET_CPUID2
+------------------
+
+:Capability: basic
+:Architectures: x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_cpuid (in/out)
+:Returns: 0 on success, -1 on error
+
+Returns a full list of cpuid entries that are supported by this vcpu and were
+previously set by KVM_SET_CPUID/KVM_SET_CPUID2.
+
+The userspace must specify the number of cpuid entries it is ready to accept
+from the kernel in the 'nent' field of 'struct kmv_cpuid'.
+
+The kernel will try to return all the cpuid entries it has in the response.
+If the userspace nent value is too small for the full response, the kernel will
+set the error code to -E2BIG, set the same 'nent' field to the actual number of
+cpuid_entries and return without writing back any entries to the userspace.
+The userspace can thus implement a two-call sequence, where the first call is
+made with nent set to 0 to read the number of entries from the kernel and
+use this response to allocate enough memory for a full response for the second
+call.
+
+The call cal also return with error code -EFAULT in case of other errors.
+
+
+4.22 KVM_SET_SIGNAL_MASK
------------------------
:Capability: basic
@@ -737,7 +764,7 @@ signal mask.
};
-4.22 KVM_GET_FPU
+4.23 KVM_GET_FPU
----------------
:Capability: basic
@@ -766,7 +793,7 @@ Reads the floating point state from the vcpu.
};
-4.23 KVM_SET_FPU
+4.24 KVM_SET_FPU
----------------
:Capability: basic
@@ -795,7 +822,7 @@ Writes the floating point state to the vcpu.
};
-4.24 KVM_CREATE_IRQCHIP
+4.25 KVM_CREATE_IRQCHIP
-----------------------
:Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390)
@@ -817,7 +844,7 @@ Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
before KVM_CREATE_IRQCHIP can be used.
-4.25 KVM_IRQ_LINE
+4.26 KVM_IRQ_LINE
-----------------
:Capability: KVM_CAP_IRQCHIP
@@ -886,7 +913,7 @@ be used for a userspace interrupt controller.
};
-4.26 KVM_GET_IRQCHIP
+4.27 KVM_GET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -911,7 +938,7 @@ KVM_CREATE_IRQCHIP into a buffer provided by the caller.
};
-4.27 KVM_SET_IRQCHIP
+4.28 KVM_SET_IRQCHIP
--------------------
:Capability: KVM_CAP_IRQCHIP
@@ -936,7 +963,7 @@ KVM_CREATE_IRQCHIP from a buffer provided by the caller.
};
-4.28 KVM_XEN_HVM_CONFIG
+4.29 KVM_XEN_HVM_CONFIG
-----------------------
:Capability: KVM_CAP_XEN_HVM
@@ -972,7 +999,7 @@ fields must be zero.
No other flags are currently valid in the struct kvm_xen_hvm_config.
-4.29 KVM_GET_CLOCK
+4.30 KVM_GET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1005,7 +1032,7 @@ TSC is not stable.
};
-4.30 KVM_SET_CLOCK
+4.31 KVM_SET_CLOCK
------------------
:Capability: KVM_CAP_ADJUST_CLOCK
@@ -1027,7 +1054,7 @@ such as migration.
};
-4.31 KVM_GET_VCPU_EVENTS
+4.32 KVM_GET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1146,7 +1173,7 @@ directly to the virtual CPU).
__u32 reserved[12];
};
-4.32 KVM_SET_VCPU_EVENTS
+4.33 KVM_SET_VCPU_EVENTS
------------------------
:Capability: KVM_CAP_VCPU_EVENTS
@@ -1209,7 +1236,7 @@ exceptions by manipulating individual registers using the KVM_SET_ONE_REG API.
See KVM_GET_VCPU_EVENTS for the data structure.
-4.33 KVM_GET_DEBUGREGS
+4.34 KVM_GET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1231,7 +1258,7 @@ Reads debug registers from the vcpu.
};
-4.34 KVM_SET_DEBUGREGS
+4.35 KVM_SET_DEBUGREGS
----------------------
:Capability: KVM_CAP_DEBUGREGS
@@ -1246,7 +1273,7 @@ See KVM_GET_DEBUGREGS for the data structure. The flags field is unused
yet and must be cleared on entry.
-4.35 KVM_SET_USER_MEMORY_REGION
+4.36 KVM_SET_USER_MEMORY_REGION
-------------------------------
:Capability: KVM_CAP_USER_MEMORY
@@ -1315,7 +1342,7 @@ The KVM_SET_MEMORY_REGION does not allow fine grained control over memory
allocation and is deprecated.
-4.36 KVM_SET_TSS_ADDR
+4.37 KVM_SET_TSS_ADDR
---------------------
:Capability: KVM_CAP_SET_TSS_ADDR
@@ -1335,7 +1362,7 @@ because of a quirk in the virtualization implementation (see the internals
documentation when it pops into existence).
-4.37 KVM_ENABLE_CAP
+4.38 KVM_ENABLE_CAP
-------------------
:Capability: KVM_CAP_ENABLE_CAP
@@ -1390,7 +1417,7 @@ function properly, this is the place to put them.
The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
for vm-wide capabilities.
-4.38 KVM_GET_MP_STATE
+4.39 KVM_GET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1438,7 +1465,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not.
-4.39 KVM_SET_MP_STATE
+4.40 KVM_SET_MP_STATE
---------------------
:Capability: KVM_CAP_MP_STATE
@@ -1460,7 +1487,7 @@ For arm/arm64:
The only states that are valid are KVM_MP_STATE_STOPPED and
KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not.
-4.40 KVM_SET_IDENTITY_MAP_ADDR
+4.41 KVM_SET_IDENTITY_MAP_ADDR
------------------------------
:Capability: KVM_CAP_SET_IDENTITY_MAP_ADDR
@@ -1484,7 +1511,7 @@ documentation when it pops into existence).
Fails if any VCPU has already been created.
-4.41 KVM_SET_BOOT_CPU_ID
+4.42 KVM_SET_BOOT_CPU_ID
------------------------
:Capability: KVM_CAP_SET_BOOT_CPU_ID
@@ -1499,7 +1526,7 @@ is vcpu 0. This ioctl has to be called before vcpu creation,
otherwise it will return EBUSY error.
-4.42 KVM_GET_XSAVE
+4.43 KVM_GET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1518,7 +1545,7 @@ otherwise it will return EBUSY error.
This ioctl would copy current vcpu's xsave struct to the userspace.
-4.43 KVM_SET_XSAVE
+4.44 KVM_SET_XSAVE
------------------
:Capability: KVM_CAP_XSAVE
@@ -1537,7 +1564,7 @@ This ioctl would copy current vcpu's xsave struct to the userspace.
This ioctl would copy userspace's xsave struct to the kernel.
-4.44 KVM_GET_XCRS
+4.45 KVM_GET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1564,7 +1591,7 @@ This ioctl would copy userspace's xsave struct to the kernel.
This ioctl would copy current vcpu's xcrs to the userspace.
-4.45 KVM_SET_XCRS
+4.46 KVM_SET_XCRS
-----------------
:Capability: KVM_CAP_XCRS
@@ -1591,7 +1618,7 @@ This ioctl would copy current vcpu's xcrs to the userspace.
This ioctl would set vcpu's xcr to the value userspace specified.
-4.46 KVM_GET_SUPPORTED_CPUID
+4.47 KVM_GET_SUPPORTED_CPUID
----------------------------
:Capability: KVM_CAP_EXT_CPUID
@@ -1676,7 +1703,7 @@ if that returns true and you use KVM_CREATE_IRQCHIP, or if you emulate the
feature in userspace, then you can enable the feature for KVM_SET_CPUID2.
-4.47 KVM_PPC_GET_PVINFO
+4.48 KVM_PPC_GET_PVINFO
-----------------------
:Capability: KVM_CAP_PPC_GET_PVINFO
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..fa9bb6b751c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,17 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
+
+ if (r && r != E2BIG)
goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
KVM_GET_CPUID2 kvm ioctl is not very well documented, but the way it is
implemented in function kvm_vcpu_ioctl_get_cpuid2 suggests that even at
error path it will try to return number of entries to the caller. But
The dispatcher kvm vcpu ioctl dispatcher code in kvm_arch_vcpu_ioctl
ignores any output from this function if it sees the error return code.
It's very explicit by the code that it was designed to receive some
small number of entries to return E2BIG along with the corrected number.
This lost logic in the dispatcher code has been restored by removing the
lines that check for function return code and skip if error is found.
Without it, the ioctl caller will see both the number of entries and the
correct error.
In selftests relevant function vcpu_get_cpuid has also been modified to
utilize the number of cpuid entries returned along with errno E2BIG.
Signed-off-by: Valeriy Vdovin <valeriy.vdovin(a)virtuozzo.com>
---
arch/x86/kvm/x86.c | 10 +++++-----
.../selftests/kvm/lib/x86_64/processor.c | 20 +++++++++++--------
2 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index efc7a82ab140..df8a3e44e722 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4773,14 +4773,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
r = -EFAULT;
if (copy_from_user(&cpuid, cpuid_arg, sizeof(cpuid)))
goto out;
+
r = kvm_vcpu_ioctl_get_cpuid2(vcpu, &cpuid,
cpuid_arg->entries);
- if (r)
- goto out;
- r = -EFAULT;
- if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid)))
+
+ if (copy_to_user(cpuid_arg, &cpuid, sizeof(cpuid))) {
+ r = -EFAULT;
goto out;
- r = 0;
+ }
break;
}
case KVM_GET_MSRS: {
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index a8906e60a108..a412b39ad791 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -727,17 +727,21 @@ struct kvm_cpuid2 *vcpu_get_cpuid(struct kvm_vm *vm, uint32_t vcpuid)
cpuid = allocate_kvm_cpuid2();
max_ent = cpuid->nent;
+ cpuid->nent = 0;
- for (cpuid->nent = 1; cpuid->nent <= max_ent; cpuid->nent++) {
- rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
- if (!rc)
- break;
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
+ TEST_ASSERT(rc == -1 && errno == E2BIG,
+ "KVM_GET_CPUID2 should return E2BIG: %d %d",
+ rc, errno);
- TEST_ASSERT(rc == -1 && errno == E2BIG,
- "KVM_GET_CPUID2 should either succeed or give E2BIG: %d %d",
- rc, errno);
- }
+ TEST_ASSERT(cpuid->nent,
+ "KVM_GET_CPUID2 failed to set cpuid->nent with E2BIG");
+
+ TEST_ASSERT(cpuid->nent < max_ent,
+ "KVM_GET_CPUID2 has %d entries, expected maximum: %d",
+ cpuid->nent, max_ent);
+ rc = ioctl(vcpu->fd, KVM_GET_CPUID2, cpuid);
TEST_ASSERT(rc == 0, "KVM_GET_CPUID2 failed, rc: %i errno: %i",
rc, errno);
--
2.17.1
Hi Linus,
Please pull the following KUnit update for Linux 5.13-rc1.
This KUnit update for Linux 5.13-rc1 consists of several fixes and
new feature to support failure from dynamic analysis tools such as
UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure
(e.g. super_operations, iommu_ops, etc.).
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-kunit-5.13-rc1
for you to fetch changes up to de2fcb3e62013738f22bbb42cbd757d9a242574e:
Documentation: kunit: add tips for using current->kunit_test
(2021-04-07 16:40:37 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-5.13-rc1
This KUnit update for Linux 5.13-rc1 consists of several fixes and
new feature to support failure from dynamic analysis tools such as
UBSAN and fake ops for testing.
- a fake ops struct for testing a "free" function to complain if it
was called with an invalid argument, or caught a double-free. Most
return void and have no normal means of signalling failure
(e.g. super_operations, iommu_ops, etc.).
----------------------------------------------------------------
Daniel Latypov (4):
kunit: make KUNIT_EXPECT_STREQ() quote values, don't print literals
kunit: tool: make --kunitconfig accept dirs, add lib/kunit fragment
kunit: fix -Wunused-function warning for __kunit_fail_current_test
Documentation: kunit: add tips for using current->kunit_test
Lucas Stankus (1):
kunit: Match parenthesis alignment to improve code readability
Uriel Guajardo (1):
kunit: support failure from dynamic analysis tools
Documentation/dev-tools/kunit/tips.rst | 78
+++++++++++++++++++++++++++++++++-
include/kunit/test-bug.h | 29 +++++++++++++
lib/kunit/.kunitconfig | 3 ++
lib/kunit/assert.c | 61 ++++++++++++++++++--------
lib/kunit/test.c | 39 +++++++++++++++--
tools/testing/kunit/kunit.py | 4 +-
tools/testing/kunit/kunit_kernel.py | 2 +
tools/testing/kunit/kunit_tool_test.py | 6 +++
8 files changed, 198 insertions(+), 24 deletions(-)
create mode 100644 include/kunit/test-bug.h
create mode 100644 lib/kunit/.kunitconfig
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 5.13-rc1.
This Kselftest update for Linux 5.13-rc1 consists of:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit a38fd8748464831584a19438cbb3082b5a2dab15:
Linux 5.12-rc2 (2021-03-05 17:33:41 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-next-5.13-rc1
for you to fetch changes up to e75074781f1735c1976bc551e29ccf2ba9a4b17f:
selftests/resctrl: Change a few printed messages (2021-04-07 16:37:49
-0600)
----------------------------------------------------------------
linux-kselftest-next-5.13-rc1
This Kselftest update for Linux 5.13-rc1 consists of:
- fixes and updates to resctrl test from Fenghua Yu and Reinette Chatre
- fixes to Kselftest documentation, framework
- minor spelling correction in timers test
----------------------------------------------------------------
Antonio Terceiro (1):
Documentation: kselftest: fix path to test module files
Colin Ian King (1):
selftests/timers: Fix spelling mistake "clocksourc" -> "clocksource"
Fenghua Yu (20):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Rename CQM test as CMT test
selftests/resctrl: Call kselftest APIs to log test results
selftests/resctrl: Share show_cache_info() by CAT and CMT tests
selftests/resctrl: Add config dependencies
selftests/resctrl: Check for resctrl mount point only if resctrl
FS is supported
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix MBA/MBM results reporting format
selftests/resctrl: Don't hard code value of "no_of_bits" variable
selftests/resctrl: Modularize resctrl test suite main() function
selftests/resctrl: Skip the test if requested resctrl feature is
not supported
selftests/resctrl: Fix unmount resctrl FS
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
selftests/resctrl: Create .gitignore to include resctrl_tests
selftests/resctrl: Change a few printed messages
Ilya Leoshkevich (1):
selftests: fix prepending $(OUTPUT) to $(TEST_PROGS)
Reinette Chatre (2):
selftests/resctrl: Ensure sibling CPU is not same as original CPU
selftests/resctrl: Fix a printed message
Documentation/dev-tools/kselftest.rst | 4 +-
tools/testing/selftests/lib.mk | 3 +-
tools/testing/selftests/resctrl/.gitignore | 2 +
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/README | 4 +-
tools/testing/selftests/resctrl/cache.c | 52 ++++++-
tools/testing/selftests/resctrl/cat_test.c | 57 +++----
.../selftests/resctrl/{cqm_test.c => cmt_test.c} | 75 +++-------
tools/testing/selftests/resctrl/config | 2 +
tools/testing/selftests/resctrl/fill_buf.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 43 +++---
tools/testing/selftests/resctrl/mbm_test.c | 42 +++---
tools/testing/selftests/resctrl/resctrl.h | 29 +++-
tools/testing/selftests/resctrl/resctrl_tests.c | 163
++++++++++++++-------
tools/testing/selftests/resctrl/resctrl_val.c | 95 +++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 134 ++++++++++-------
.../testing/selftests/timers/clocksource-switch.c | 2 +-
17 files changed, 413 insertions(+), 300 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/.gitignore
rename tools/testing/selftests/resctrl/{cqm_test.c => cmt_test.c} (56%)
create mode 100644 tools/testing/selftests/resctrl/config
----------------------------------------------------------------
This patchset introduces batched operations for the per-cpu variant of
the array map.
Also updates the batch ops test for arrays.
v4 -> v5:
- Revert removal of percpu macros
v3 -> v4:
- Prefer 'calloc()' over 'malloc()' on batch ops tests
- Add missing static keyword in a couple of test functions
- 'offset' to 'cpu_offset' as suggested by Martin
v2 -> v3:
- Remove percpu macros as suggested by Andrii
- Update tests that used the per cpu macros
v1 -> v2:
- Amended a more descriptive commit message
Pedro Tammela (2):
bpf: add batched ops support for percpu array
bpf: selftests: update array map tests for per-cpu batched ops
kernel/bpf/arraymap.c | 2 +
.../bpf/map_tests/array_map_batch_ops.c | 104 +++++++++++++-----
2 files changed, 77 insertions(+), 29 deletions(-)
--
2.25.1
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v3->v4:
- Fix handling of the shmem private mcopy case. Previously, I had (incorrectly)
assumed that !vma_is_anonymous() was equivalent to "the page will be in the
page cache". But, in this case we have an optimization where we allocate a new
*anonymous* page. So, use a new "bool page_in_cache" instead, which checks if
page->mapping is set. Correct several places with this new check. [Hugh]
- Fix calling mm_counter() before page_add_..._rmap(). [Hugh]
- When modifying shmem_mcopy_atomic_pte() to use the new install_pte() helper,
just use lru_cache_add_inactive_or_unevictable(), no need to branch and maybe
use lru_cache_add(). [Hugh]
- De-pluralize mcopy_atomic_install_pte(s). [Hugh]
- Make "writable" a bool, and initialize consistently. [Hugh]
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
- Commit 10 is a small documentation update to reflect the new changes.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_pte()
userfaultfd: update documentation to mention shmem minor faults
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 17 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 115 +++-----
mm/userfaultfd.c | 175 ++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 364 insertions(+), 251 deletions(-)
--
2.31.1.368.gbe11c130af-goog
This small series expands futex timeout selftests by checking if all
operations that allows timeouts works as expected. When some version of
Thomas' series "futex: Bugfixes and FUTEX_LOCK_PI2"[0] get merged, I'll
add the new rules to the timeout test. This test should be used to check
for regressions when modifying the timeout path or changing the
interface.
Additionally, fix a bug in the Makefile that can be found when compiling
selftests with new operations, like the one defined at [0] or from the
futex2 patchset.
[0] https://lore.kernel.org/lkml/20210422194417.866740847@linutronix.de/
André Almeida (2):
selftests: futex: Correctly include headers dirs
selftests: futex: Expand timeout test
.../selftests/futex/functional/Makefile | 3 +-
.../futex/functional/futex_wait_timeout.c | 126 +++++++++++++++---
2 files changed, 112 insertions(+), 17 deletions(-)
--
2.31.1
Changelog v3-->v4
Based on review comments by Doug Smythies,
1. Parsing the thread_siblings_list for CPU topology information to
correctly identify the cores the test should run on in
default(quick) mode.
2. The source CPU to source CPU interaction in the IPI test will always
result in a lower latency and cause a bias in the average, hence
avoid adding the latency to be averaged for same cpu IPIs. The
latency will still be displayed in the detailed logs.
RFC v3: https://lkml.org/lkml/2021/4/4/31
---
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.
The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.
To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.
The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,
IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer
Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - State0: 3265
# Observed Average IPI latency(ns) - State1: 3507
# Observed Average IPI latency(ns) - State2: 3739
# Observed Average IPI latency(ns) - State3: 3807
# Observed Average IPI latency(ns) - State4: 17070
# Observed Average IPI latency(ns) - State5: 1038174
# Observed Average IPI latency(ns) - State6: 1068784
#
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - State0: 1640
# Observed Average timeout diff(ns) - State1: 1764
# Observed Average timeout diff(ns) - State2: 1715
# Observed Average timeout diff(ns) - State3: 1845
# Observed Average timeout diff(ns) - State4: 16581
# Observed Average timeout diff(ns) - State5: 939977
# Observed Average timeout diff(ns) - State6: 1073024
Things to keep in mind:
1. This kernel module + bash driver does not guarantee idleness on a
core when the IPI and the Timer is armed. It only invokes sleep and
hopes that the core is idle once the IPI/Timer is invoked onto it.
Hence this program must be run on a completely idle system for best
results
2. Even on a completely idle system, there maybe book-keeping tasks or
jitter tasks that can run on the core we want idle. This can create
outliers in the latency measurement. Thankfully, these outliers
should be large enough to easily weed them out.
3. A userspace only selftest variant was also sent out as RFC based on
suggestions over the previous patchset to simply the kernel
complexeity. However, a userspace only approach had more noise in
the latency measurement due to userspace-kernel interactions
which led to run to run variance and a lesser accurate test.
Another downside of the nature of a userspace program is that it
takes orders of magnitude longer to complete a full system test
compared to the kernel framework.
RFC patch: https://lkml.org/lkml/2020/9/2/356
4. For Intel Systems, the Timer based latencies don't exactly give out
the measure of idle latencies. This is because of a hardware
optimization mechanism that pre-arms a CPU when a timer is set to
wakeup. That doesn't make this metric useless for Intel systems,
it just means that is measuring IPI/Timer responding latency rather
than idle wakeup latencies.
(Source: https://lkml.org/lkml/2020/9/2/610)
For solution to this problem, a hardware based latency analyzer is
devised by Artem Bityutskiy from Intel.
https://youtu.be/Opk92aQyvt0?t=8266https://intel.github.io/wult/
Pratik Rajesh Sampat (2):
cpuidle: Extract IPI based and timer based wakeup latency from idle
states
selftest/cpuidle: Add support for cpuidle latency measurement
drivers/cpuidle/Makefile | 1 +
drivers/cpuidle/test-cpuidle_latency.c | 157 ++++++++
lib/Kconfig.debug | 10 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/cpuidle/Makefile | 6 +
tools/testing/selftests/cpuidle/cpuidle.sh | 402 +++++++++++++++++++++
tools/testing/selftests/cpuidle/settings | 2 +
7 files changed, 579 insertions(+)
create mode 100644 drivers/cpuidle/test-cpuidle_latency.c
create mode 100644 tools/testing/selftests/cpuidle/Makefile
create mode 100755 tools/testing/selftests/cpuidle/cpuidle.sh
create mode 100644 tools/testing/selftests/cpuidle/settings
--
2.17.1
We found that with the latest mainline kernel (5.12.0-051200rc8) on
some KVM instances / bare-metal systems, the following tests will take
longer than the kselftest framework default timeout (45 seconds) to
run and thus got terminated with TIMEOUT error:
* xfrm_policy.sh - took about 2m20s
* pmtu.sh - took about 3m5s
* udpgso_bench.sh - took about 60s
Bump the timeout setting to 5 minutes to allow them have a chance to
finish.
https://bugs.launchpad.net/bugs/1856010
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/Makefile | 2 ++
tools/testing/selftests/net/settings | 1 +
2 files changed, 3 insertions(+)
create mode 100644 tools/testing/selftests/net/settings
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 25f198b..2be4670 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -37,6 +37,8 @@ TEST_GEN_FILES += ipsec
TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
TEST_GEN_PROGS += reuseport_dualstack reuseaddr_conflict tls
+TEST_FILES := settings
+
KSFT_KHDR_INSTALL := 1
include ../lib.mk
diff --git a/tools/testing/selftests/net/settings b/tools/testing/selftests/net/settings
new file mode 100644
index 0000000..694d707
--- /dev/null
+++ b/tools/testing/selftests/net/settings
@@ -0,0 +1 @@
+timeout=300
--
2.7.4
Add in:
* kunit_kmalloc_array() and wire up kunit_kmalloc() to be a special
case of it.
* kunit_kcalloc() for symmetry with kunit_kzalloc()
This should using KUnit more natural by making it more similar to the
existing *alloc() APIs.
And while we shouldn't necessarily be writing unit tests where overflow
should be a concern, it can't hurt to be safe.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
include/kunit/test.h | 36 ++++++++++++++++++++++++++++++++----
lib/kunit/test.c | 22 ++++++++++++----------
2 files changed, 44 insertions(+), 14 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 49601c4b98b8..7fa0de4af977 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -577,16 +577,30 @@ static inline int kunit_destroy_named_resource(struct kunit *test,
void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/**
- * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * kunit_kmalloc_array() - Like kmalloc_array() except the allocation is *test managed*.
* @test: The test context object.
+ * @n: number of elements.
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * Just like `kmalloc(...)`, except the allocation is managed by the test case
+ * Just like `kmalloc_array(...)`, except the allocation is managed by the test case
* and is automatically cleaned up after the test case concludes. See &struct
* kunit_resource for more information.
*/
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp);
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t flags);
+
+/**
+ * kunit_kmalloc() - Like kmalloc() except the allocation is *test managed*.
+ * @test: The test context object.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kmalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+{
+ return kunit_kmalloc_array(test, 1, size, gfp);
+}
/**
* kunit_kfree() - Like kfree except for allocations managed by KUnit.
@@ -601,13 +615,27 @@ void kunit_kfree(struct kunit *test, const void *ptr);
* @size: The size in bytes of the desired memory.
* @gfp: flags passed to underlying kmalloc().
*
- * See kzalloc() and kunit_kmalloc() for more information.
+ * See kzalloc() and kunit_kmalloc_array() for more information.
*/
static inline void *kunit_kzalloc(struct kunit *test, size_t size, gfp_t gfp)
{
return kunit_kmalloc(test, size, gfp | __GFP_ZERO);
}
+/**
+ * kunit_kzalloc() - Just like kunit_kmalloc_array(), but zeroes the allocation.
+ * @test: The test context object.
+ * @n: number of elements.
+ * @size: The size in bytes of the desired memory.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kcalloc() and kunit_kmalloc_array() for more information.
+ */
+static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp_t flags)
+{
+ return kunit_kmalloc_array(test, n, size, flags | __GFP_ZERO);
+}
+
void kunit_cleanup(struct kunit *test);
void kunit_log_append(char *log, const char *fmt, ...);
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index ec9494e914ef..052fccf69eef 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -540,41 +540,43 @@ int kunit_destroy_resource(struct kunit *test, kunit_resource_match_t match,
}
EXPORT_SYMBOL_GPL(kunit_destroy_resource);
-struct kunit_kmalloc_params {
+struct kunit_kmalloc_array_params {
+ size_t n;
size_t size;
gfp_t gfp;
};
-static int kunit_kmalloc_init(struct kunit_resource *res, void *context)
+static int kunit_kmalloc_array_init(struct kunit_resource *res, void *context)
{
- struct kunit_kmalloc_params *params = context;
+ struct kunit_kmalloc_array_params *params = context;
- res->data = kmalloc(params->size, params->gfp);
+ res->data = kmalloc_array(params->n, params->size, params->gfp);
if (!res->data)
return -ENOMEM;
return 0;
}
-static void kunit_kmalloc_free(struct kunit_resource *res)
+static void kunit_kmalloc_array_free(struct kunit_resource *res)
{
kfree(res->data);
}
-void *kunit_kmalloc(struct kunit *test, size_t size, gfp_t gfp)
+void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
- struct kunit_kmalloc_params params = {
+ struct kunit_kmalloc_array_params params = {
.size = size,
+ .n = n,
.gfp = gfp
};
return kunit_alloc_resource(test,
- kunit_kmalloc_init,
- kunit_kmalloc_free,
+ kunit_kmalloc_array_init,
+ kunit_kmalloc_array_free,
gfp,
¶ms);
}
-EXPORT_SYMBOL_GPL(kunit_kmalloc);
+EXPORT_SYMBOL_GPL(kunit_kmalloc_array);
void kunit_kfree(struct kunit *test, const void *ptr)
{
base-commit: 16fc44d6387e260f4932e9248b985837324705d8
--
2.31.1.498.g6c1eba8ee3d-goog
The kernel now has a number of testing and debugging tools, and we've
seen a bit of confusion about what the differences between them are.
Add a basic documentation outlining the testing tools, when to use each,
and how they interact.
This is a pretty quick overview rather than the idealised "kernel
testing guide" that'd probably be optimal, but given the number of times
questions like "When do you use KUnit and when do you use Kselftest?"
are being asked, it seemed worth at least having something. Hopefully
this can form the basis for more detailed documentation later.
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Marco Elver <elver(a)google.com>
Reviewed-by: Daniel Latypov <dlatypov(a)google.com>
---
Thanks again. Assuming no-one has any objections, I think this is good
to go.
-- David
Changes since v2:
https://lore.kernel.org/linux-kselftest/20210414081428.337494-1-davidgow@go…
- A few typo fixes (Thanks Daniel)
- Reworded description of dynamic analysis tools.
- Updated dev-tools index page to not use ':doc:' syntax, but to provide
a path instead.
- Added Marco and Daniel's Reviewed-by tags.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20210410070529.4113432-1-davidgow@g…
- Note KUnit's speed and that one should provide selftests for syscalls
- Mention lockdep as a Dynamic Analysis Tool
- Refer to "Dynamic Analysis Tools" instead of "Sanitizers"
- A number of minor formatting tweaks and rewordings for clarity
Documentation/dev-tools/index.rst | 4 +
Documentation/dev-tools/testing-overview.rst | 117 +++++++++++++++++++
2 files changed, 121 insertions(+)
create mode 100644 Documentation/dev-tools/testing-overview.rst
diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst
index 1b1cf4f5c9d9..929d916ffd4c 100644
--- a/Documentation/dev-tools/index.rst
+++ b/Documentation/dev-tools/index.rst
@@ -7,6 +7,9 @@ be used to work on the kernel. For now, the documents have been pulled
together without any significant effort to integrate them into a coherent
whole; patches welcome!
+A brief overview of testing-specific tools can be found in
+Documentation/dev-tools/testing-overview.rst
+
.. class:: toc-title
Table of contents
@@ -14,6 +17,7 @@ whole; patches welcome!
.. toctree::
:maxdepth: 2
+ testing-overview
coccinelle
sparse
kcov
diff --git a/Documentation/dev-tools/testing-overview.rst b/Documentation/dev-tools/testing-overview.rst
new file mode 100644
index 000000000000..b5b46709969c
--- /dev/null
+++ b/Documentation/dev-tools/testing-overview.rst
@@ -0,0 +1,117 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+====================
+Kernel Testing Guide
+====================
+
+
+There are a number of different tools for testing the Linux kernel, so knowing
+when to use each of them can be a challenge. This document provides a rough
+overview of their differences, and how they fit together.
+
+
+Writing and Running Tests
+=========================
+
+The bulk of kernel tests are written using either the kselftest or KUnit
+frameworks. These both provide infrastructure to help make running tests and
+groups of tests easier, as well as providing helpers to aid in writing new
+tests.
+
+If you're looking to verify the behaviour of the Kernel — particularly specific
+parts of the kernel — then you'll want to use KUnit or kselftest.
+
+
+The Difference Between KUnit and kselftest
+------------------------------------------
+
+KUnit (Documentation/dev-tools/kunit/index.rst) is an entirely in-kernel system
+for "white box" testing: because test code is part of the kernel, it can access
+internal structures and functions which aren't exposed to userspace.
+
+KUnit tests therefore are best written against small, self-contained parts
+of the kernel, which can be tested in isolation. This aligns well with the
+concept of 'unit' testing.
+
+For example, a KUnit test might test an individual kernel function (or even a
+single codepath through a function, such as an error handling case), rather
+than a feature as a whole.
+
+This also makes KUnit tests very fast to build and run, allowing them to be
+run frequently as part of the development process.
+
+There is a KUnit test style guide which may give further pointers in
+Documentation/dev-tools/kunit/style.rst
+
+
+kselftest (Documentation/dev-tools/kselftest.rst), on the other hand, is
+largely implemented in userspace, and tests are normal userspace scripts or
+programs.
+
+This makes it easier to write more complicated tests, or tests which need to
+manipulate the overall system state more (e.g., spawning processes, etc.).
+However, it's not possible to call kernel functions directly from kselftest.
+This means that only kernel functionality which is exposed to userspace somehow
+(e.g. by a syscall, device, filesystem, etc.) can be tested with kselftest. To
+work around this, some tests include a companion kernel module which exposes
+more information or functionality. If a test runs mostly or entirely within the
+kernel, however, KUnit may be the more appropriate tool.
+
+kselftest is therefore suited well to tests of whole features, as these will
+expose an interface to userspace, which can be tested, but not implementation
+details. This aligns well with 'system' or 'end-to-end' testing.
+
+For example, all new system calls should be accompanied by kselftest tests.
+
+Code Coverage Tools
+===================
+
+The Linux Kernel supports two different code coverage measurement tools. These
+can be used to verify that a test is executing particular functions or lines
+of code. This is useful for determining how much of the kernel is being tested,
+and for finding corner-cases which are not covered by the appropriate test.
+
+:doc:`gcov` is GCC's coverage testing tool, which can be used with the kernel
+to get global or per-module coverage. Unlike KCOV, it does not record per-task
+coverage. Coverage data can be read from debugfs, and interpreted using the
+usual gcov tooling.
+
+:doc:`kcov` is a feature which can be built in to the kernel to allow
+capturing coverage on a per-task level. It's therefore useful for fuzzing and
+other situations where information about code executed during, for example, a
+single syscall is useful.
+
+
+Dynamic Analysis Tools
+======================
+
+The kernel also supports a number of dynamic analysis tools, which attempt to
+detect classes of issues when they occur in a running kernel. These typically
+each look for a different class of bugs, such as invalid memory accesses,
+concurrency issues such as data races, or other undefined behaviour like
+integer overflows.
+
+Some of these tools are listed below:
+
+* kmemleak detects possible memory leaks. See
+ Documentation/dev-tools/kmemleak.rst
+* KASAN detects invalid memory accesses such as out-of-bounds and
+ use-after-free errors. See Documentation/dev-tools/kasan.rst
+* UBSAN detects behaviour that is undefined by the C standard, like integer
+ overflows. See Documentation/dev-tools/ubsan.rst
+* KCSAN detects data races. See Documentation/dev-tools/kcsan.rst
+* KFENCE is a low-overhead detector of memory issues, which is much faster than
+ KASAN and can be used in production. See Documentation/dev-tools/kfence.rst
+* lockdep is a locking correctness validator. See
+ Documentation/locking/lockdep-design.rst
+* There are several other pieces of debug instrumentation in the kernel, many
+ of which can be found in lib/Kconfig.debug
+
+These tools tend to test the kernel as a whole, and do not "pass" like
+kselftest or KUnit tests. They can be combined with KUnit or kselftest by
+running tests on a kernel with these tools enabled: you can then be sure
+that none of these errors are occurring during the test.
+
+Some of these tools integrate with KUnit or kselftest and will
+automatically fail tests if an issue is detected.
+
--
2.31.1.295.g9ea45b61b8-goog
From: Colin Ian King <colin.king(a)canonical.com>
There are a few function prototypes that are missing a void parameter,
fix this by adding it in.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/vm/mremap_dontunmap.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/vm/mremap_dontunmap.c b/tools/testing/selftests/vm/mremap_dontunmap.c
index f01dc4a85b0b..78baaf0e85d9 100644
--- a/tools/testing/selftests/vm/mremap_dontunmap.c
+++ b/tools/testing/selftests/vm/mremap_dontunmap.c
@@ -42,7 +42,7 @@ static void dump_maps(void)
// Try a simple operation for to "test" for kernel support this prevents
// reporting tests as failed when it's run on an older kernel.
-static int kernel_support_for_mremap_dontunmap()
+static int kernel_support_for_mremap_dontunmap(void)
{
int ret = 0;
unsigned long num_pages = 1;
@@ -95,7 +95,7 @@ static int check_region_contains_byte(void *addr, unsigned long size, char byte)
// this test validates that MREMAP_DONTUNMAP moves the pagetables while leaving
// the source mapping mapped.
-static void mremap_dontunmap_simple()
+static void mremap_dontunmap_simple(void)
{
unsigned long num_pages = 5;
@@ -128,7 +128,7 @@ static void mremap_dontunmap_simple()
}
// This test validates that MREMAP_DONTUNMAP on a shared mapping works as expected.
-static void mremap_dontunmap_simple_shmem()
+static void mremap_dontunmap_simple_shmem(void)
{
unsigned long num_pages = 5;
@@ -181,7 +181,7 @@ static void mremap_dontunmap_simple_shmem()
// This test validates MREMAP_DONTUNMAP will move page tables to a specific
// destination using MREMAP_FIXED, also while validating that the source
// remains intact.
-static void mremap_dontunmap_simple_fixed()
+static void mremap_dontunmap_simple_fixed(void)
{
unsigned long num_pages = 5;
@@ -226,7 +226,7 @@ static void mremap_dontunmap_simple_fixed()
// This test validates that we can MREMAP_DONTUNMAP for a portion of an
// existing mapping.
-static void mremap_dontunmap_partial_mapping()
+static void mremap_dontunmap_partial_mapping(void)
{
/*
* source mapping:
--
2.30.2
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied first:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v2->v3:
- Picked up {Reviewed,Acked}-by's.
- Reorder commits: introduce CONTINUE before MINOR registration. [Hugh, Peter]
- Don't try to {unlock,put}_page an xarray value in shmem_getpage_gfp. [Hugh]
- Move enum mcopy_atomic_mode forward declare out of CONFIG_HUGETLB_PAGE. [Hugh]
- Keep mistakenly removed UFFD_USER_MODE_ONLY in selftest. [Peter]
- Cleanup context management in self test (make clear implicit, remove unneeded
return values now that we have err()). [Peter]
- Correct dst_pte argument to dst_pmd in shmem_mcopy_atomic_pte macro. [Hugh]
- Mention the new shmem support feature in documentation. [Hugh]
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (10):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes
userfaultfd: update documentation to mention shmem minor faults
Documentation/admin-guide/mm/userfaultfd.rst | 3 +-
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 17 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 114 +++-----
mm/userfaultfd.c | 183 +++++++++----
tools/testing/selftests/vm/userfaultfd.c | 274 ++++++++++++-------
11 files changed, 371 insertions(+), 251 deletions(-)
--
2.31.1.368.gbe11c130af-goog
Base
====
This series is based on (and therefore should apply cleanly to) the tag
"v5.12-rc7-mmots-2021-04-11-20-49", additionally with Peter's selftest cleanup
series applied *first*:
https://lore.kernel.org/patchwork/cover/1412450/
Changelog
=========
v1->v2:
- Pick up Reviewed-by's.
- Don't swapin page when a minor fault occurs. Notice that it needs to be
swapped in, and just immediately fire the minor fault. Let a future CONTINUE
deal with swapping in the page. [Peter]
- Clarify comment about i_size checks in mm/userfaultfd.c. [Peter]
- Only forward declare once (out of #ifdef) in hugetlb.h. [Peter]
Changes since [2]:
- Squash the fixes ([2]) in with the original series ([1]). This makes reviewing
easier, as we no longer have to sift through deltas undoing what we had done
before. [Hugh, Peter]
- Modify shmem_mcopy_atomic_pte() to use the new mcopy_atomic_install_ptes()
helper, reducing code duplication. [Hugh]
- Properly trigger handle_userfault() in the shmem_swapin_page() case. [Hugh]
- Use shmem_getpage() instead of find_lock_page() to lookup the existing page in
for continue. This properly deals with swapped-out pages. [Hugh]
- Unconditionally pte_mkdirty() for anon memory (as before). [Peter]
- Don't include userfaultfd_k.h in either hugetlb.h or shmem_fs.h. [Hugh]
- Add comment for UFFD_FEATURE_MINOR_SHMEM (to match _HUGETLBFS). [Hugh]
- Fix some small cleanup issues (parens, reworded conditionals, reduced plumbing
of some parameters, simplify labels/gotos, ...). [Hugh, Peter]
Overview
========
See the series which added minor faults for hugetlbfs [3] for a detailed
overview of minor fault handling in general. This series adds the same support
for shmem-backed areas.
This series is structured as follows:
- Commits 1 and 2 are cleanups.
- Commits 3 and 4 implement the new feature (minor fault handling for shmem).
- Commits 5, 6, 7, 8 update the userfaultfd selftest to exercise the feature.
- Commit 9 is one final cleanup, modifying an existing code path to re-use a new
helper we've introduced. We rely on the selftest to show that this change
doesn't break anything.
Use Case
========
In some cases it is useful to have VM memory backed by tmpfs instead of
hugetlbfs. So, this feature will be used to support the same VM live migration
use case described in my original series.
Additionally, Android folks (Lokesh Gidra <lokeshgidra(a)google.com>) hope to
optimize the Android Runtime garbage collector using this feature:
"The plan is to use userfaultfd for concurrently compacting the heap. With
this feature, the heap can be shared-mapped at another location where the
GC-thread(s) could continue the compaction operation without the need to
invoke userfault ioctl(UFFDIO_COPY) each time. OTOH, if and when Java threads
get faults on the heap, UFFDIO_CONTINUE can be used to resume execution.
Furthermore, this feature enables updating references in the 'non-moving'
portion of the heap efficiently. Without this feature, uneccessary page
copying (ioctl(UFFDIO_COPY)) would be required."
[1] https://lore.kernel.org/patchwork/cover/1388144/
[2] https://lore.kernel.org/patchwork/patch/1408161/
[3] https://lore.kernel.org/linux-fsdevel/20210301222728.176417-1-axelrasmussen…
Axel Rasmussen (9):
userfaultfd/hugetlbfs: avoid including userfaultfd_k.h in hugetlb.h
userfaultfd/shmem: combine shmem_{mcopy_atomic,mfill_zeropage}_pte
userfaultfd/shmem: support minor fault registration for shmem
userfaultfd/shmem: support UFFDIO_CONTINUE for shmem
userfaultfd/selftests: use memfd_create for shmem test type
userfaultfd/selftests: create alias mappings in the shmem test
userfaultfd/selftests: reinitialize test context in each test
userfaultfd/selftests: exercise minor fault handling shmem support
userfaultfd/shmem: modify shmem_mcopy_atomic_pte to use install_ptes
fs/userfaultfd.c | 6 +-
include/linux/hugetlb.h | 4 +-
include/linux/shmem_fs.h | 15 +-
include/linux/userfaultfd_k.h | 5 +
include/uapi/linux/userfaultfd.h | 7 +-
mm/hugetlb.c | 1 +
mm/memory.c | 8 +-
mm/shmem.c | 112 +++------
mm/userfaultfd.c | 183 ++++++++++-----
tools/testing/selftests/vm/userfaultfd.c | 280 +++++++++++++++--------
10 files changed, 377 insertions(+), 244 deletions(-)
--
2.31.1.295.g9ea45b61b8-goog