A BPF application, e.g., a TCP congestion control, might benefit from or
even require precise (=hardware) packet timestamps. These timestamps are
already available through __sk_buff.hwtstamp and
bpf_sock_ops.skb_hwtstamp, but could not be requested: BPF programs were
not allowed to set SO_TIMESTAMPING* on sockets.
Enable BPF programs to actively request the generation of timestamps
from a stream socket. The also required ioctl(SIOCSHWTSTAMP) on the
network device must still be done separately, in user space.
This patch had previously been submitted in a two-part series (first
link below). The second patch has been independently applied in commit
7f6ca95d16b9 ("net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)")
(second link below).
On the earlier submission, there was the open question whether to only
allow, thus enforce, SO_TIMESTAMPING_NEW in this patch:
For a BPF program, this won't make a difference: A timestamp, when
accessed through the fields mentioned above, is directly read from
skb_shared_info.hwtstamps, independent of the places where NEW/OLD is
relevant. See bpf_convert_ctx_access() besides others.
I am unsure, though, when it comes to the interconnection of user space
and BPF "space", when both are interested in the timestamps. I think it
would cause an unsolvable conflict when user space is bound to use
SO_TIMESTAMPING_OLD with a BPF program only allowed to set
SO_TIMESTAMPING_NEW *on the same socket*? Please correct me if I'm
mistaken.
Link: https://lore.kernel.org/lkml/20230703175048.151683-1-jthinz@mailbox.tu-berl…
Link: https://lore.kernel.org/all/20231221231901.67003-1-jthinz@mailbox.tu-berlin…
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Deepa Dinamani <deepa.kernel(a)gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel(a)gmail.com>
Signed-off-by: Jörn-Thorben Hinz <j-t.hinz(a)alumni.tu-berlin.de>
---
include/uapi/linux/bpf.h | 3 ++-
net/core/filter.c | 2 ++
tools/include/uapi/linux/bpf.h | 3 ++-
tools/testing/selftests/bpf/progs/bpf_tracing_net.h | 2 ++
tools/testing/selftests/bpf/progs/setget_sockopt.c | 4 ++++
5 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 754e68ca8744..8825d0648efe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@ union bpf_attr {
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/net/core/filter.c b/net/core/filter.c
index 8c9f67c81e22..4f5280874fd8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5144,6 +5144,8 @@ static int sol_socket_sockopt(struct sock *sk, int optname,
case SO_MAX_PACING_RATE:
case SO_BINDTOIFINDEX:
case SO_TXREHASH:
+ case SO_TIMESTAMPING_NEW:
+ case SO_TIMESTAMPING_OLD:
if (*optlen != sizeof(int))
return -EINVAL;
break;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7f24d898efbb..09eaafa6ab43 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2734,7 +2734,8 @@ union bpf_attr {
* **SO_RCVBUF**, **SO_SNDBUF**, **SO_MAX_PACING_RATE**,
* **SO_PRIORITY**, **SO_RCVLOWAT**, **SO_MARK**,
* **SO_BINDTODEVICE**, **SO_KEEPALIVE**, **SO_REUSEADDR**,
- * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**.
+ * **SO_REUSEPORT**, **SO_BINDTOIFINDEX**, **SO_TXREHASH**,
+ * **SO_TIMESTAMPING_NEW**, **SO_TIMESTAMPING_OLD**.
* * **IPPROTO_TCP**, which supports the following *optname*\ s:
* **TCP_CONGESTION**, **TCP_BPF_IW**,
* **TCP_BPF_SNDCWND_CLAMP**, **TCP_SAVE_SYN**,
diff --git a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
index 1bdc680b0e0e..95f5f169819e 100644
--- a/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
+++ b/tools/testing/selftests/bpf/progs/bpf_tracing_net.h
@@ -15,8 +15,10 @@
#define SO_RCVLOWAT 18
#define SO_BINDTODEVICE 25
#define SO_MARK 36
+#define SO_TIMESTAMPING_OLD 37
#define SO_MAX_PACING_RATE 47
#define SO_BINDTOIFINDEX 62
+#define SO_TIMESTAMPING_NEW 65
#define SO_TXREHASH 74
#define __SO_ACCEPTCON (1 << 16)
diff --git a/tools/testing/selftests/bpf/progs/setget_sockopt.c b/tools/testing/selftests/bpf/progs/setget_sockopt.c
index 7a438600ae98..54205d10793c 100644
--- a/tools/testing/selftests/bpf/progs/setget_sockopt.c
+++ b/tools/testing/selftests/bpf/progs/setget_sockopt.c
@@ -48,6 +48,10 @@ static const struct sockopt_test sol_socket_tests[] = {
{ .opt = SO_MARK, .new = 0xeb9f, .expected = 0xeb9f, },
{ .opt = SO_MAX_PACING_RATE, .new = 0xeb9f, .expected = 0xeb9f, },
{ .opt = SO_TXREHASH, .flip = 1, },
+ { .opt = SO_TIMESTAMPING_NEW, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+ .expected = SOF_TIMESTAMPING_RX_HARDWARE, },
+ { .opt = SO_TIMESTAMPING_OLD, .new = SOF_TIMESTAMPING_RX_HARDWARE,
+ .expected = SOF_TIMESTAMPING_RX_HARDWARE, },
{ .opt = 0, },
};
--
2.39.2
This extends the KVM RISC-V ONE_REG interface to report more ISA extensions
namely: Zbz, scalar crypto, vector crypto, Zfh[min], Zihintntl, Zvfh[min],
and Zfa.
This series depends upon the "riscv: report more ISA extensions through
hwprobe" series.from Clement.
(Link: https://lore.kernel.org/lkml/20231114141256.126749-1-cleger@rivosinc.com/)
To test these patches, use KVMTOOL from the riscv_more_exts_v1 branch at:
https://github.com/avpatel/kvmtool.git
These patches can also be found in the riscv_kvm_more_exts_v1 branch at:
https://github.com/avpatel/linux.git
Anup Patel (15):
KVM: riscv: selftests: Generate ISA extension reg_list using macros
RISC-V: KVM: Allow Zbc extension for Guest/VM
KVM: riscv: selftests: Add Zbc extension to get-reg-list test
RISC-V: KVM: Allow scalar crypto extensions for Guest/VM
KVM: riscv: selftests: Add scaler crypto extensions to get-reg-list
test
RISC-V: KVM: Allow vector crypto extensions for Guest/VM
KVM: riscv: selftests: Add vector crypto extensions to get-reg-list
test
RISC-V: KVM: Allow Zfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zihintntl extension for Guest/VM
KVM: riscv: selftests: Add Zihintntl extension to get-reg-list test
RISC-V: KVM: Allow Zvfh[min] extensions for Guest/VM
KVM: riscv: selftests: Add Zvfh[min] extensions to get-reg-list test
RISC-V: KVM: Allow Zfa extension for Guest/VM
KVM: riscv: selftests: Add Zfa extension to get-reg-list test
arch/riscv/include/uapi/asm/kvm.h | 27 ++
arch/riscv/kvm/vcpu_onereg.c | 54 +++
.../selftests/kvm/riscv/get-reg-list.c | 439 ++++++++----------
3 files changed, 265 insertions(+), 255 deletions(-)
--
2.34.1
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
build configs"), add more networking-specific config options which are
needed for bonding tests.
For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. All bonding tests passed.
Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6
Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options
Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon
Suggested-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier(a)nvidia.com>
---
tools/testing/selftests/drivers/net/bonding/config | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
index f85b16fc5128..899d7fb6ea8e 100644
--- a/tools/testing/selftests/drivers/net/bonding/config
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -1,5 +1,10 @@
CONFIG_BONDING=y
CONFIG_BRIDGE=y
CONFIG_DUMMY=y
+CONFIG_IPV6=y
CONFIG_MACVLAN=y
+CONFIG_NET_ACT_GACT=y
+CONFIG_NET_CLS_FLOWER=y
+CONFIG_NET_SCH_INGRESS=y
+CONFIG_NLMON=y
CONFIG_VETH=y
--
2.43.0
The device is exported with a fuzz of 4, meaning that the `+ t` here
is removed by the fuzz algorithm, making those tests failing.
Not sure why, but when I run this locally it was passing, but not in the
VM.
Link: https://gitlab.freedesktop.org/bentiss/hid/-/jobs/53692957#L3315
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Over the break the test suite wasn't properly running on my runner,
and this small issue sneaked in.
---
tools/testing/selftests/hid/tests/test_wacom_generic.py | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/hid/tests/test_wacom_generic.py b/tools/testing/selftests/hid/tests/test_wacom_generic.py
index 352fc39f3c6c..b62c7dba6777 100644
--- a/tools/testing/selftests/hid/tests/test_wacom_generic.py
+++ b/tools/testing/selftests/hid/tests/test_wacom_generic.py
@@ -880,8 +880,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
does not overlap with other contacts. The value of `t` may be
incremented over time to move the point along a linear path.
"""
- x = 50 + 10 * contact_id + t
- y = 100 + 100 * contact_id + t
+ x = 50 + 10 * contact_id + t * 11
+ y = 100 + 100 * contact_id + t * 11
return test_multitouch.Touch(contact_id, x, y)
def make_contacts(self, n, t=0):
@@ -902,8 +902,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
tracking_id = contact_ids.tracking_id
slot_num = contact_ids.slot_num
- x = 50 + 10 * contact_id + t
- y = 100 + 100 * contact_id + t
+ x = 50 + 10 * contact_id + t * 11
+ y = 100 + 100 * contact_id + t * 11
# If the data isn't supposed to be stored in any slots, there is
# nothing we can check for in the evdev stream.
---
base-commit: 80d5a73edcfbd1d8d6a4c2b755873c5d63a1ebd7
change-id: 20240117-b4-wip-wacom-tests-fixes-298b50bea47f
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
On Wed, Jan 17, 2024 at 7:12 PM Jason Gerecke <killertofu(a)gmail.com> wrote:
>
> LGTM. Acked-by: Jason Gerecke <jason.gerecke(a)wacom.com>
Thanks!
I'll add a:
Fixes: b0fb904d074e ("HID: wacom: Add additional tests of confidence behavior")
And send to Linus in the next round for 6.8 so we also fix the future
for-6.9 branches
Cheers,
Benjamin
>
>
> Jason
> ---
> Now instead of four in the eights place /
> you’ve got three, ‘Cause you added one /
> (That is to say, eight) to the two, /
> But you can’t take seven from three, /
> So you look at the sixty-fours....
>
>
>
> On Wed, Jan 17, 2024 at 5:27 AM Benjamin Tissoires <bentiss(a)kernel.org> wrote:
>>
>> The device is exported with a fuzz of 4, meaning that the `+ t` here
>> is removed by the fuzz algorithm, making those tests failing.
>>
>> Not sure why, but when I run this locally it was passing, but not in the
>> VM.
>>
>> Link: https://gitlab.freedesktop.org/bentiss/hid/-/jobs/53692957#L3315
>> Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
>> ---
>> Over the break the test suite wasn't properly running on my runner,
>> and this small issue sneaked in.
>> ---
>> tools/testing/selftests/hid/tests/test_wacom_generic.py | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/testing/selftests/hid/tests/test_wacom_generic.py b/tools/testing/selftests/hid/tests/test_wacom_generic.py
>> index 352fc39f3c6c..b62c7dba6777 100644
>> --- a/tools/testing/selftests/hid/tests/test_wacom_generic.py
>> +++ b/tools/testing/selftests/hid/tests/test_wacom_generic.py
>> @@ -880,8 +880,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
>> does not overlap with other contacts. The value of `t` may be
>> incremented over time to move the point along a linear path.
>> """
>> - x = 50 + 10 * contact_id + t
>> - y = 100 + 100 * contact_id + t
>> + x = 50 + 10 * contact_id + t * 11
>> + y = 100 + 100 * contact_id + t * 11
>> return test_multitouch.Touch(contact_id, x, y)
>>
>> def make_contacts(self, n, t=0):
>> @@ -902,8 +902,8 @@ class TestDTH2452Tablet(test_multitouch.BaseTest.TestMultitouch, TouchTabletTest
>> tracking_id = contact_ids.tracking_id
>> slot_num = contact_ids.slot_num
>>
>> - x = 50 + 10 * contact_id + t
>> - y = 100 + 100 * contact_id + t
>> + x = 50 + 10 * contact_id + t * 11
>> + y = 100 + 100 * contact_id + t * 11
>>
>> # If the data isn't supposed to be stored in any slots, there is
>> # nothing we can check for in the evdev stream.
>>
>> ---
>> base-commit: 80d5a73edcfbd1d8d6a4c2b755873c5d63a1ebd7
>> change-id: 20240117-b4-wip-wacom-tests-fixes-298b50bea47f
>>
>> Best regards,
>> --
>> Benjamin Tissoires <bentiss(a)kernel.org>
>>
>>
Hi Mohammad,
On 1/16/24 21:48, Mohammad Nassiri wrote:
> The end_server() function only operates in the server thread
> and always takes an accept socket instead of a listen socket as
> its input argument. To align with this, invert the boolean values
> used when calling verify_counters() within the end_server() function.
>
> Fixes: ("3c3ead555648 selftests/net: Add TCP-AO key-management test")
> Signed-off-by: Mohammad Nassiri <mnassiri(a)ciena.com>
> Link: https://lore.kernel.org/all/934627c5-eebb-4626-be23-cfb134c01d1a@arista.com/
As I've written you off-list, the patch probably was not delivered to
mailing lists due to SPF check not passing. Please, fix the send-email
setup when/if you want to send more patches.
Related to this patch: I'm going to carry and resend it together with 2
more patches, as this fix made 3 selftests fail and I've looked into that.
> ---
> tools/testing/selftests/net/tcp_ao/key-management.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/net/tcp_ao/key-management.c b/tools/testing/selftests/net/tcp_ao/key-management.c
> index c48b4970ca17..f6a9395e3cd7 100644
> --- a/tools/testing/selftests/net/tcp_ao/key-management.c
> +++ b/tools/testing/selftests/net/tcp_ao/key-management.c
> @@ -843,7 +843,7 @@ static void end_server(const char *tst_name, int sk,
> synchronize_threads(); /* 4: verified => closed */
> close(sk);
>
> - verify_counters(tst_name, true, false, begin, &end);
> + verify_counters(tst_name, false, true, begin, &end);
> synchronize_threads(); /* 5: counters */
> }
>
Thanks,
Dmitry
When running with CATEGORY= (thp | hugetlb) we see a large numbers of
tests failing. These failures are due to not being able to allocate a
hugepage and normally occur on memory contrainted systems or when using
large page sizes.
drop_cache and compact_memory before the tests for a higher chance at a
successful hugepage allocation.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/run_vmtests.sh | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 246d53a5d7f2..040f27e21f47 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -206,6 +206,15 @@ pretty_name() {
# Usage: run_test [test binary] [arbitrary test arguments...]
run_test() {
if test_selected ${CATEGORY}; then
+ # On memory constrainted systems some tests can fail to allocate hugepages.
+ # perform some cleanup before the test for a higher success rate.
+ if [ ${CATEGORY} == "thp" ] | [ ${CATEGORY} == "hugetlb" ]; then
+ echo 3 > /proc/sys/vm/drop_caches
+ sleep 2
+ echo 1 > /proc/sys/vm/compact_memory
+ sleep 2
+ fi
+
local test=$(pretty_name "$*")
local title="running $*"
local sep=$(echo -n "$title" | tr "[:graph:][:space:]" -)
--
2.43.0
hugetlb_madv_vs_map selftest was not part of the mm test-suite since we
didn't have a fix for the problem it found.
Now that the problem is already fixed (see previous commit), let's
enable this selftest in the default test-suite.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
tools/testing/selftests/mm/run_vmtests.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index a5e6ba8d3579..f41e1978e4d4 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -256,6 +256,7 @@ nr_hugepages_tmp=$(cat /proc/sys/vm/nr_hugepages)
# For this test, we need one and just one huge page
echo 1 > /proc/sys/vm/nr_hugepages
CATEGORY="hugetlb" run_test ./hugetlb_fault_after_madv
+CATEGORY="hugetlb" run_test ./hugetlb_madv_vs_map
# Restore the previous number of huge pages, since further tests rely on it
echo "$nr_hugepages_tmp" > /proc/sys/vm/nr_hugepages
--
2.34.1
From: Amit Cohen <amcohen(a)nvidia.com>
'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
using a shaper somewhere in the flow of the packets. In this area, the
buffer is smaller than the buffer at the beginning of the flow, so it fills
up until there is no more space left. The test configures there PFC
which is supposed to notice that the headroom is filling up and send PFC
Xoff to indicate the transmitter to stop sending traffic for the priorities
sharing this PG.
The Xon/Xoff threshold is auto-configured and always equal to
2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
traffic will keep arriving until the transmitter receives and processes
the PFC packet. This amount of traffic is known as the PFC delay allowance.
Currently the buffer for the delay traffic is configured as 100KB. The
MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
This allows 80KB extra to be stored in this buffer.
8-lane ports use two buffers among which the configured buffer is split,
the Xoff threshold then applies to each buffer in parallel.
The test does not take into account the behavior of 8-lane ports, when the
ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
packets are dropped and the test fails.
Check if the relevant ports use 8 lanes, in such case double the size of
the buffer, as the headroom is split half-half.
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kselftest(a)vger.kernel.org
Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test")
Signed-off-by: Amit Cohen <amcohen(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
---
.../selftests/drivers/net/mlxsw/qos_pfc.sh | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
index 49bef76083b8..0f0f4f05807c 100755
--- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
+++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh
@@ -119,6 +119,9 @@ h2_destroy()
switch_create()
{
+ local lanes_swp4
+ local pg1_size
+
# pools
# -----
@@ -228,7 +231,20 @@ switch_create()
dcb pfc set dev $swp4 prio-pfc all:off 1:on
# PG0 will get autoconfigured to Xoff, give PG1 arbitrarily 100K, which
# is (-2*MTU) about 80K of delay provision.
- dcb buffer set dev $swp4 buffer-size all:0 1:$_100KB
+ pg1_size=$_100KB
+
+ setup_wait_dev_with_timeout $swp4
+
+ lanes_swp4=$(ethtool $swp4 | grep 'Lanes:')
+ lanes_swp4=${lanes_swp4#*"Lanes: "}
+
+ # 8-lane ports use two buffers among which the configured buffer
+ # is split, so double the size to get twice (20K + 80K).
+ if [[ $lanes_swp4 -eq 8 ]]; then
+ pg1_size=$((pg1_size * 2))
+ fi
+
+ dcb buffer set dev $swp4 buffer-size all:0 1:$pg1_size
# bridges
# -------
--
2.42.0
One build issue comes up due to both mount.h included dev_in_maps.c
In file included from dev_in_maps.c:10:
/usr/include/sys/mount.h:35:3: error: expected identifier before numeric constant
35 | MS_RDONLY = 1, /* Mount read-only. */
| ^~~~~~~~~
In file included from dev_in_maps.c:13:
Remove one of them to solve conflict, another error comes up:
dev_in_maps.c:170:6: error: implicit declaration of function ‘mount’ [-Werror=implicit-function-declaration]
170 | if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
| ^~~~~
cc1: all warnings being treated as errors
and then , add sys_mount definition to solve it
After both above, dev_in_maps.c can be built correctly on my mache(gcc 10.2,glibc-2.32,kernel-5.10)
Signed-off-by: Hu Yadi <hu.yadi(a)h3c.com>
---
.../selftests/filesystems/overlayfs/dev_in_maps.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
index e19ab0e85709..759f86e7d263 100644
--- a/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
+++ b/tools/testing/selftests/filesystems/overlayfs/dev_in_maps.c
@@ -10,7 +10,6 @@
#include <linux/mount.h>
#include <sys/syscall.h>
#include <sys/stat.h>
-#include <sys/mount.h>
#include <sys/mman.h>
#include <sched.h>
#include <fcntl.h>
@@ -32,7 +31,11 @@ static int sys_fsmount(int fd, unsigned int flags, unsigned int attr_flags)
{
return syscall(__NR_fsmount, fd, flags, attr_flags);
}
-
+static int sys_mount(const char *src, const char *tgt, const char *fst,
+ unsigned long flags, const void *data)
+{
+ return syscall(__NR_mount, src, tgt, fst, flags, data);
+}
static int sys_move_mount(int from_dfd, const char *from_pathname,
int to_dfd, const char *to_pathname,
unsigned int flags)
@@ -166,8 +169,7 @@ int main(int argc, char **argv)
ksft_test_result_skip("unable to create a new mount namespace\n");
return 1;
}
-
- if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
+ if (sys_mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) == -1) {
pr_perror("mount");
return 1;
}
--
2.39.3
Running charge_reserved_hugetlb.sh generates errors if sh is set to
dash:
./charge_reserved_hugetlb.sh: 9: [[: not found
./charge_reserved_hugetlb.sh: 19: [[: not found
./charge_reserved_hugetlb.sh: 27: [[: not found
./charge_reserved_hugetlb.sh: 37: [[: not found
./charge_reserved_hugetlb.sh: 45: Syntax error: "(" unexpected
Switch to using /bin/bash instead of /bin/sh. Make the switch for
write_hugetlb_memory.sh as well which is called from
charge_reserved_hugetlb.sh.
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 2 +-
tools/testing/selftests/mm/write_hugetlb_memory.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index 0899019a7fcb..e14bdd4455f2 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
# Kselftest framework requirement - SKIP code is 4.
diff --git a/tools/testing/selftests/mm/write_hugetlb_memory.sh b/tools/testing/selftests/mm/write_hugetlb_memory.sh
index 70a02301f4c2..3d2d2eb9d6ff 100755
--- a/tools/testing/selftests/mm/write_hugetlb_memory.sh
+++ b/tools/testing/selftests/mm/write_hugetlb_memory.sh
@@ -1,4 +1,4 @@
-#!/bin/sh
+#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
set -e
--
2.42.0
From: Jeff Xu <jeffxu(a)chromium.org>
This is V4 of the patch, the patch has improved significantly since V1,
thanks to diverse inputs, a few discussions remain, please read those
in the open discussion section of v4 of change history.
-----------------------------------------------------------------
This patchset proposes a new mseal() syscall for the Linux kernel.
In a nutshell, mseal() protects the VMAs of a given virtual memory
range against modifications, such as changes to their permission bits.
Modern CPUs support memory permissions, such as the read/write (RW)
and no-execute (NX) bits. Linux has supported NX since the release of
kernel version 2.6.8 in August 2004 [1]. The memory permission feature
improves the security stance on memory corruption bugs, as an attacker
cannot simply write to arbitrary memory and point the code to it. The
memory must be marked with the X bit, or else an exception will occur.
Internally, the kernel maintains the memory permissions in a data
structure called VMA (vm_area_struct). mseal() additionally protects
the VMA itself against modifications of the selected seal type.
Memory sealing is useful to mitigate memory corruption issues where a
corrupted pointer is passed to a memory management system. For
example, such an attacker primitive can break control-flow integrity
guarantees since read-only memory that is supposed to be trusted can
become writable or .text pages can get remapped. Memory sealing can
automatically be applied by the runtime loader to seal .text and
.rodata pages and applications can additionally seal security critical
data at runtime. A similar feature already exists in the XNU kernel
with the VM_FLAGS_PERMANENT [3] flag and on OpenBSD with the
mimmutable syscall [4]. Also, Chrome wants to adopt this feature for
their CFI work [2] and this patchset has been designed to be
compatible with the Chrome use case.
Two system calls are involved in sealing the map: mmap() and mseal().
The new mseal() is an syscall on 64 bit CPU, and with
following signature:
int mseal(void addr, size_t len, unsigned long flags)
addr/len: memory range.
flags: reserved.
mseal() blocks following operations for the given memory range.
1> Unmapping, moving to another location, and shrinking the size,
via munmap() and mremap(), can leave an empty space, therefore can
be replaced with a VMA with a new set of attributes.
2> Moving or expanding a different VMA into the current location,
via mremap().
3> Modifying a VMA via mmap(MAP_FIXED).
4> Size expansion, via mremap(), does not appear to pose any specific
risks to sealed VMAs. It is included anyway because the use case is
unclear. In any case, users can rely on merging to expand a sealed VMA.
5> mprotect() and pkey_mprotect().
6> Some destructive madvice() behaviors (e.g. MADV_DONTNEED) for anonymous
memory, when users don't have write permission to the memory. Those
behaviors can alter region contents by discarding pages, effectively a
memset(0) for anonymous memory.
In addition: mmap() has two related changes.
The PROT_SEAL bit in prot field of mmap(). When present, it marks
the map sealed since creation.
The MAP_SEALABLE bit in the flags field of mmap(). When present, it marks
the map as sealable. A map created without MAP_SEALABLE will not support
sealing, i.e. mseal() will fail.
Applications that don't care about sealing will expect their behavior
unchanged. For those that need sealing support, opt-in by adding
MAP_SEALABLE in mmap().
The idea that inspired this patch comes from Stephen Röttger’s work in
V8 CFI [5]. Chrome browser in ChromeOS will be the first user of this
API.
Indeed, the Chrome browser has very specific requirements for sealing,
which are distinct from those of most applications. For example, in
the case of libc, sealing is only applied to read-only (RO) or
read-execute (RX) memory segments (such as .text and .RELRO) to
prevent them from becoming writable, the lifetime of those mappings
are tied to the lifetime of the process.
Chrome wants to seal two large address space reservations that are
managed by different allocators. The memory is mapped RW- and RWX
respectively but write access to it is restricted using pkeys (or in
the future ARM permission overlay extensions). The lifetime of those
mappings are not tied to the lifetime of the process, therefore, while
the memory is sealed, the allocators still need to free or discard the
unused memory. For example, with madvise(DONTNEED).
However, always allowing madvise(DONTNEED) on this range poses a
security risk. For example if a jump instruction crosses a page
boundary and the second page gets discarded, it will overwrite the
target bytes with zeros and change the control flow. Checking
write-permission before the discard operation allows us to control
when the operation is valid. In this case, the madvise will only
succeed if the executing thread has PKEY write permissions and PKRU
changes are protected in software by control-flow integrity.
Although the initial version of this patch series is targeting the
Chrome browser as its first user, it became evident during upstream
discussions that we would also want to ensure that the patch set
eventually is a complete solution for memory sealing and compatible
with other use cases. The specific scenario currently in mind is
glibc's use case of loading and sealing ELF executables. To this end,
Stephen is working on a change to glibc to add sealing support to the
dynamic linker, which will seal all non-writable segments at startup.
Once this work is completed, all applications will be able to
automatically benefit from these new protections.
--------------------------------------------------------------------
Change history:
===============
V4:
(Suggested by Linus Torvalds)
- new signature: mseal(start,len,flags)
- 32 bit is not supported. vm_seal is removed, use vm_flags instead.
- single bit in vm_flags for sealed state.
- CONFIG_MSEAL kernel config is removed.
- single bit of PROT_SEAL in the "Prot" field of mmap().
Other changes:
- update selftest (Suggested by Muhammad Usama Anjum)
- update documentation.
Open discussions:
=================
Below discussion were brought up in V3, and did not receive any input:
the one important to this patch is MAP_SEALABLE in mmap(), which is in
current version of patch, list here for input/comments.
---------------------------------------------------------------------
During the development of V3, I had new questions and thoughts and
wished to discuss.
1> shm/aio
From reading the code, it seems to me that aio/shm can mmap/munmap
maps on behalf of userspace, e.g. ksys_shmdt() in shm.c. The lifetime
of those mapping are not tied to the lifetime of the process. If those
memories are sealed from userspace, then unmap will fail. This isn’t a
huge problem, since the memory will eventually be freed at exit or
exec. However, it feels like the solution is not complete, because of
the leaks in VMA address space during the lifetime of the process.
2> Brk (heap/stack)
Currently, userspace applications can seal parts of the heap by
calling malloc() and mseal(). This raises the question of what the
expected behavior is when sealing the heap is attempted.
let's assume following calls from user space:
ptr = malloc(size);
mprotect(ptr, size, RO);
mseal(ptr, size, SEAL_PROT_PKEY);
free(ptr);
Technically, before mseal() is added, the user can change the
protection of the heap by calling mprotect(RO). As long as the user
changes the protection back to RW before free(), the memory can be
reused.
Adding mseal() into picture, however, the heap is then sealed
partially, user can still free it, but the memory remains to be RO,
and the result of brk-shrink is nondeterministic, depending on if
munmap() will try to free the sealed memory.(brk uses munmap to shrink
the heap).
3> Above two cases led to the third topic:
There one option to address the problem mentioned above.
Option 1: A “MAP_SEALABLE” flag in mmap().
If a map is created without this flag, the mseal() operation will
fail. Applications that are not concerned with sealing will expect
their behavior to be unchanged. For those that are concerned, adding a
flag at mmap time to opt in is not difficult. For the short term, this
solves problems 1 and 2 above. The memory in shm/aio/brk will not have
the MAP_SEALABLE flag at mmap(), and the same is true for the heap.
If we choose not to go with path, all mapping will by default
sealable. We could document above mentioned limitations so devs are
more careful at the time to choose what memory to seal. I think
deny of service through mseal() by attacker is probably not a concern,
if attackers have access to mseal() and unsealed memory, then they can
also do other harmful thing to the memory, such as munmap, etc.
4>
I think it might be possible to seal the stack or other special
mappings created at runtime (vdso, vsyscall, vvar). This means we can
enforce and seal W^X for certain types of application. For instance,
the stack is typically used in read-write mode, but in some cases, it
can become executable. To defend against unintented addition of executable
bit to stack, we could let the application to seal it.
Sealing the heap (for adding X) requires special handling, since the
heap can shrink, and shrink is implemented through munmap().
Indeed, it might be possible that all virtual memory accessible to user
space, regardless of its usage pattern, could be sealed. However, this
would require additional research and development work.
=====================================================================
V3:
- Abandon per-syscall approach, (Suggested by Linus Torvalds).
- Organize sealing types around their functionality, such as
MM_SEAL_BASE, MM_SEAL_PROT_PKEY.
- Extend the scope of sealing from calls originated in userspace to
both kernel and userspace. (Suggested by Linus Torvalds)
- Add seal type support in mmap(). (Suggested by Pedro Falcato)
- Add a new sealing type: MM_SEAL_DISCARD_RO_ANON to prevent
destructive operations of madvise. (Suggested by Jann Horn and
Stephen Röttger)
- Make sealed VMAs mergeable. (Suggested by Jann Horn)
- Add MAP_SEALABLE to mmap()
- Add documentation - mseal.rst
https://lore.kernel.org/linux-mm/20231212231706.2680890-2-jeffxu@chromium.o…
v2:
Use _BITUL to define MM_SEAL_XX type.
Use unsigned long for seal type in sys_mseal() and other functions.
Remove internal VM_SEAL_XX type and convert_user_seal_type().
Remove MM_ACTION_XX type.
Remove caller_origin(ON_BEHALF_OF_XX) and replace with sealing bitmask.
Add more comments in code.
Add a detailed commit message.
https://lore.kernel.org/lkml/20231017090815.1067790-1-jeffxu@chromium.org/
v1:
https://lore.kernel.org/lkml/20231016143828.647848-1-jeffxu@chromium.org/
----------------------------------------------------------------
[1] https://kernelnewbies.org/Linux_2_6_8
[2] https://v8.dev/blog/control-flow-integrity
[3] https://github.com/apple-oss-distributions/xnu/blob/1031c584a5e37aff177559b…
[4] https://man.openbsd.org/mimmutable.2
[5] https://docs.google.com/document/d/1O2jwK4dxI3nRcOJuPYkonhTkNQfbmwdvxQMyXge…
[6] https://lore.kernel.org/lkml/CAG48ez3ShUYey+ZAFsU2i1RpQn0a5eOs2hzQ426Fkcgnf…
[7] https://lore.kernel.org/lkml/20230515130553.2311248-1-jeffxu@chromium.org/
Jeff Xu (4):
mseal: Wire up mseal syscall
mseal: add mseal syscall
selftest mm/mseal memory sealing
mseal:add documentation
Documentation/userspace-api/mseal.rst | 181 ++
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/mm.h | 60 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/mman-common.h | 7 +
include/uapi/asm-generic/unistd.h | 5 +-
kernel/sys_ni.c | 1 +
mm/Makefile | 4 +
mm/madvise.c | 12 +
mm/mmap.c | 27 +
mm/mprotect.c | 10 +
mm/mremap.c | 31 +
mm/mseal.c | 330 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/mseal_test.c | 1971 +++++++++++++++++++
32 files changed, 2659 insertions(+), 2 deletions(-)
create mode 100644 Documentation/userspace-api/mseal.rst
create mode 100644 mm/mseal.c
create mode 100644 tools/testing/selftests/mm/mseal_test.c
--
2.43.0.195.gebba966016-goog
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
From: Yonghong Song <yonghong.song(a)linux.dev>
[ Upstream commit 100888fb6d8a185866b1520031ee7e3182b173de ]
With latest clang18 (main branch of llvm-project repo), when building bpf selftests,
[~/work/bpf-next (master)]$ make -C tools/testing/selftests/bpf LLVM=1 -j
The following compilation error happens:
fatal error: error in backend: Branch target out of insn range
...
Stack dump:
0. Program arguments: clang -g -Wall -Werror -D__TARGET_ARCH_x86 -mlittle-endian
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/include
-I/home/yhs/work/bpf-next/tools/testing/selftests/bpf -I/home/yhs/work/bpf-next/tools/include/uapi
-I/home/yhs/work/bpf-next/tools/testing/selftests/usr/include -idirafter
/home/yhs/work/llvm-project/llvm/build.18/install/lib/clang/18/include -idirafter /usr/local/include
-idirafter /usr/include -Wno-compare-distinct-pointer-types -DENABLE_ATOMICS_TESTS -O2 --target=bpf
-c progs/pyperf180.c -mcpu=v3 -o /home/yhs/work/bpf-next/tools/testing/selftests/bpf/pyperf180.bpf.o
1. <eof> parser at end of file
2. Code generation
...
The compilation failure only happens to cpu=v2 and cpu=v3. cpu=v4 is okay
since cpu=v4 supports 32-bit branch target offset.
The above failure is due to upstream llvm patch [1] where some inlining behavior
are changed in clang18.
To workaround the issue, previously all 180 loop iterations are fully unrolled.
The bpf macro __BPF_CPU_VERSION__ (implemented in clang18 recently) is used to avoid
unrolling changes if cpu=v4. If __BPF_CPU_VERSION__ is not available and the
compiler is clang18, the unrollng amount is unconditionally reduced.
[1] https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Tested-by: Alan Maguire <alan.maguire(a)oracle.com>
Link: https://lore.kernel.org/bpf/20231110193644.3130906-1-yonghong.song@linux.dev
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/progs/pyperf180.c | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/tools/testing/selftests/bpf/progs/pyperf180.c b/tools/testing/selftests/bpf/progs/pyperf180.c
index c39f559d3100..42c4a8b62e36 100644
--- a/tools/testing/selftests/bpf/progs/pyperf180.c
+++ b/tools/testing/selftests/bpf/progs/pyperf180.c
@@ -1,4 +1,26 @@
// SPDX-License-Identifier: GPL-2.0
// Copyright (c) 2019 Facebook
#define STACK_MAX_LEN 180
+
+/* llvm upstream commit at clang18
+ * https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
+ * changed inlining behavior and caused compilation failure as some branch
+ * target distance exceeded 16bit representation which is the maximum for
+ * cpu v1/v2/v3. Macro __BPF_CPU_VERSION__ is later implemented in clang18
+ * to specify which cpu version is used for compilation. So a smaller
+ * unroll_count can be set if __BPF_CPU_VERSION__ is less than 4, which
+ * reduced some branch target distances and resolved the compilation failure.
+ *
+ * To capture the case where a developer/ci uses clang18 but the corresponding
+ * repo checkpoint does not have __BPF_CPU_VERSION__, a smaller unroll_count
+ * will be set as well to prevent potential compilation failures.
+ */
+#ifdef __BPF_CPU_VERSION__
+#if __BPF_CPU_VERSION__ < 4
+#define UNROLL_COUNT 90
+#endif
+#elif __clang_major__ == 18
+#define UNROLL_COUNT 90
+#endif
+
#include "pyperf.h"
--
2.43.0
Rae has been shouldering a lot of the KUnit review burden for the last
year, and will continue to do so in the future. Thanks!
Signed-off-by: David Gow <davidgow(a)google.com>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index f8efcb72ad4b..2316d89806dd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11599,6 +11599,7 @@ F: fs/smb/server/
KERNEL UNIT TESTING FRAMEWORK (KUnit)
M: Brendan Higgins <brendanhiggins(a)google.com>
M: David Gow <davidgow(a)google.com>
+R: Rae Moar <rmoar(a)google.com>
L: linux-kselftest(a)vger.kernel.org
L: kunit-dev(a)googlegroups.com
S: Maintained
--
2.43.0.275.g3460e3d667-goog
Number of tests are failing when netdev renaming is active
on the system. Add udevadm settle in logic determining
the names.
Fixes: 242aaf03dc9b ("selftests: add a test for ethtool pause stats")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
---
CC: shuah(a)kernel.org
CC: saeedm(a)nvidia.com
CC: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh | 1 +
tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh | 1 +
2 files changed, 2 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh b/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
index 922744059aaa..80160579e0cc 100644
--- a/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/ethtool-common.sh
@@ -51,6 +51,7 @@ function make_netdev {
fi
echo $NSIM_ID $@ > /sys/bus/netdevsim/new_device
+ udevadm settle
# get new device name
ls /sys/bus/netdevsim/devices/netdevsim${NSIM_ID}/net/
}
diff --git a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
index 1b08e042cf94..4855ef597a15 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh
@@ -233,6 +233,7 @@ function print_tables {
function get_netdev_name {
local -n old=$1
+ udevadm settle
new=$(ls /sys/class/net)
for netdev in $new; do
--
2.43.0
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |---------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
This series is based on the first part which was merged [1], this series is to
add the cache invalidation interface or the userspace to invalidate cache after
modifying the stage-1 page table. This includes both the iommufd changes and the
VT-d driver changes.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
[1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1
Change log:
v11:
- Drop hw_error field in vtd cache invalidation uapi. devTLB invalidation
error is a serious security emergency requiring the host kernel to handle.
No need to expose it to userspace (especially given existing VMs doesn't
issue devTLB invalidation at all).
- The vtd qi_submit_sync() and related callers are reverted back to the
original state due to above drop.
- Align with the vtd path, drop the hw_error reporting in mock driver and
selftest as well since selftest is a demo of the real driver.
- Drop iommu_respond_struct_to_user_array() since no more driver want to
respond single entry in the user_array.
- Two typos from Wubinbin
v10: https://lore.kernel.org/all/20240102143834.146165-1-yi.l.liu@intel.com/
- Minor tweak to patch 07 (Kevin)
- Rebase on top of 6.7-rc8
v9: https://lore.kernel.org/linux-iommu/20231228150629.13149-1-yi.l.liu@intel.c…
- Add a test case which sets both IOMMU_TEST_INVALIDATE_FLAG_ALL and
IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR in flags, and expect to succeed
and see an 'error'. (Kevin)
- Returns -ETIMEOUT in qi_check_fault() if caller is interested with the
fault when timeout happens. If not, the qi_submit_sync() will keep retry
hence unable to report the error back to user. For now, only the user cache
invalidation path has interest on the time out error. So this change only
affects the user cache invalidation path. Other path will still hang in
qi_submit_sync() when timeout happens. (Kevin)
v8: https://lore.kernel.org/linux-iommu/20231227161354.67701-1-yi.l.liu@intel.c…
- Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user
op path (Kevin)
- Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang)
- Clear *fault per restart in qi_submit_sync() to avoid acroos submission error
accumulation. (Kevin)
- Define the vtd cache invalidation uapi structure in separate patch (Kevin)
- Rename inv_error to be hw_error (Kevin)
- Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr',
'data_type', "entry_len' and 'entry_num" (Kevin)
- Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR
in the same time (Kevin)
v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.…
- Remove domain->ops->cache_invalidate_user check in hwpt alloc path due
to failure in bisect (Baolu)
- Remove out_driver_error_code from struct iommu_hwpt_invalidate after
discussion in v6. Should expect per-entry error code.
- Rework the selftest cache invalidation part to report a per-entry error
- Allow user to pass in an empty array to have a try-and-fail mechanism for
user to check if a given req_type is supported by the kernel (Jason)
- Define a separate enum type for cache invalidation data (Jason)
- Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before
returning (Nicolin)
- Merge the VT-d nesting part 2/2
https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c…
into this series to avoid defining empty enum in the middle of the series.
The major difference is adding the VT-d related invalidation uapi structures
together with the generic data structures in patch 02 of this series.
- VT-d driver was refined to report ICE/ITE error from the bottom cache
invalidation submit helpers, hence the cache_invalidate_user op could
report such errors via the per-entry error field to user. VT-d driver
will not stop the invalidation array walking due to the ICE/ITE errors
as such errors are defined by VT-d spec, userspace should be able to
handle it and let the real user (say Virtual Machine) know about it.
But for other errors like invalid uapi data structure configuration,
memory copy failure, such errors should stop the array walking as it
may have more issues if go on.
- Minor fixes per Jason and Kevin's review comments
v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c…
- No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged
v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c…
- Split the iommufd nesting series into two parts of alloc_user and
invalidation (Jason)
- Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and
do the same with the structures/alloc()/abort()/destroy(). Reworked the
selftest accordingly too. (Jason)
- Move hwpt/data_type into struct iommu_user_data from standalone op
arguments. (Jason)
- Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA,
_TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin)
- Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin)
- Add macro to the iommu_copy_struct_from_user() to calculate min_size
(Jason)
- Fix two bugs spotted by ZhaoYan
v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.…
- Separate HWPT alloc/destroy/abort functions between user-managed HWPTs
and kernel-managed HWPTs
- Rework invalidate uAPI to be a multi-request array-based design
- Add a struct iommu_user_data_array and a helper for driver to sanitize
and copy the entry data from user space invalidation array
- Add a patch fixing TEST_LENGTH() in selftest program
- Drop IOMMU_RESV_IOVA_RANGES patches
- Update kdoc and inline comments
- Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation,
this does not change the rule that resv regions should only be added to the
kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series
as it is needed only by SMMU so far.
v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.…
- Add new uAPI things in alphabetical order
- Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for
sanity, replacing the previous op->domain_alloc_user_data_len solution
- Return ERR_PTR from domain_alloc_user instead of NULL
- Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin)
- Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence
userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O
page table). (Kevin)
- Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl
- Minor changes per Kevin's inputs
v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c…
- Add union iommu_domain_user_data to include all user data structures to avoid
passing void * in kernel APIs.
- Add iommu op to return user data length for user domain allocation
- Rename struct iommu_hwpt_alloc::data_type to be hwpt_type
- Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len
- Convert cache_invalidate_user op to be int instead of void
- Remove @data_type in struct iommu_hwpt_invalidate
- Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1
v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.…
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add cache_invalidate_user op
iommu/vt-d: Add iotlb flush for nested domain
Nicolin Chen (4):
iommu: Add iommu_copy_struct_from_user_array helper
iommufd/selftest: Add mock_domain_cache_invalidate_user support
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (2):
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd: Add data structure for Intel VT-d stage-1 cache invalidation
drivers/iommu/intel/nested.c | 88 ++++++++++
drivers/iommu/iommufd/hw_pagetable.c | 41 +++++
drivers/iommu/iommufd/iommufd_private.h | 10 ++
drivers/iommu/iommufd/iommufd_test.h | 23 +++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 76 +++++++++
include/linux/iommu.h | 79 +++++++++
include/uapi/linux/iommufd.h | 79 +++++++++
tools/testing/selftests/iommu/iommufd.c | 152 ++++++++++++++++++
tools/testing/selftests/iommu/iommufd_utils.h | 57 +++++++
10 files changed, 608 insertions(+)
--
2.34.1
From: Rae Moar <rmoar(a)google.com>
[ Upstream commit 8ae27bc7fff4ef467a7964821a6cedb34a05d3b2 ]
Add parsing of attributes as diagnostic data. Fixes issue with test plan
being parsed incorrectly as diagnostic data when located after
suite-level attributes.
Note that if there does not exist a test plan line, the diagnostic lines
between the suite header and the first result will be saved in the suite
log rather than the first test case log.
Signed-off-by: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_parser.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 79d8832c862a..ce34be15c929 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]:
Log of diagnostic lines
"""
log = [] # type: List[str]
- non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START]
+ non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN]
while lines and not any(re.match(lines.peek())
for re in non_diagnostic_lines):
log.append(lines.pop())
@@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# test plan
test.name = "main"
ktap_line = parse_ktap_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
else:
@@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
if parent_test:
# If KTAP version line and/or subtest header is found, attempt
# to parse test plan and print test header
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
print_test_header(test)
expected_count = test.expected_count
--
2.43.0
From: Thomas Weißschuh <linux(a)weissschuh.net>
[ Upstream commit bdeeeaba83682225a7bf5f100fe8652a59590d33 ]
qemu for LoongArch does not work properly with direct kernel boot.
The kernel will panic during initialization and hang without any output.
When booting in EFI mode everything work correctly.
While users most likely don't have the LoongArch EFI binary installed at
least an explicit error about 'file not found' is better than a hanging
test without output that can never succeed.
Link: https://lore.kernel.org/loongarch/1738d60a-df3a-4102-b1da-d16a29b6e06a@t-8c…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/r/20231031-nolibc-out-of-tree-v1-1-47c92f73590a@wei…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index dfe66776a331..be7711014ade 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -88,6 +88,13 @@ QEMU_ARCH_s390 = s390x
QEMU_ARCH_loongarch = loongarch64
QEMU_ARCH = $(QEMU_ARCH_$(XARCH))
+QEMU_BIOS_DIR = /usr/share/edk2/
+QEMU_BIOS_loongarch = $(QEMU_BIOS_DIR)/loongarch64/OVMF_CODE.fd
+
+ifneq ($(QEMU_BIOS_$(XARCH)),)
+QEMU_ARGS_BIOS = -bios $(QEMU_BIOS_$(XARCH))
+endif
+
# QEMU_ARGS : some arch-specific args to pass to qemu
QEMU_ARGS_i386 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_x86_64 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
@@ -101,7 +108,7 @@ QEMU_ARGS_ppc64le = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC
QEMU_ARGS_riscv = -M virt -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_s390 = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_loongarch = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_EXTRA)
+QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_BIOS) $(QEMU_ARGS_EXTRA)
# OUTPUT is only set when run from the main makefile, otherwise
# it defaults to this nolibc directory.
--
2.43.0
From: Michal Wajdeczko <michal.wajdeczko(a)intel.com>
[ Upstream commit 342fb9789267ee3908959bfa136b82e88e2ce918 ]
If we run parameterized test that uses test->priv to prepare some
custom data, then value of test->priv will leak to the next param
iteration and may be unexpected. This could be easily seen if
we promote example_priv_test to parameterized test as then only
first test iteration will be successful:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_priv*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] ==================== example_priv_test ====================
[ ] [PASSED] example value 3
[ ] # example_priv_test: initializing
[ ] # example_priv_test: ASSERTION FAILED at lib/kunit/kunit-example-test.c:230
[ ] Expected test->priv == ((void *)0), but
[ ] test->priv == 0000000060dfe290
[ ] ((void *)0) == 0000000000000000
[ ] # example_priv_test: cleaning up
[ ] [FAILED] example value 2
[ ] # example_priv_test: initializing
[ ] # example_priv_test: ASSERTION FAILED at lib/kunit/kunit-example-test.c:230
[ ] Expected test->priv == ((void *)0), but
[ ] test->priv == 0000000060dfe290
[ ] ((void *)0) == 0000000000000000
[ ] # example_priv_test: cleaning up
[ ] [FAILED] example value 1
[ ] # example_priv_test: initializing
[ ] # example_priv_test: ASSERTION FAILED at lib/kunit/kunit-example-test.c:230
[ ] Expected test->priv == ((void *)0), but
[ ] test->priv == 0000000060dfe290
[ ] ((void *)0) == 0000000000000000
[ ] # example_priv_test: cleaning up
[ ] [FAILED] example value 0
[ ] # example_priv_test: initializing
[ ] # example_priv_test: cleaning up
[ ] # example_priv_test: pass:1 fail:3 skip:0 total:4
[ ] ================ [FAILED] example_priv_test ================
[ ] # example: initializing suite
[ ] # module: kunit_example_test
[ ] # example: exiting suite
[ ] # Totals: pass:1 fail:3 skip:0 total:4
[ ] ===================== [FAILED] example =====================
Fix that by resetting test->priv after each param iteration, in
similar way what we did for the test->status.
Signed-off-by: Michal Wajdeczko <michal.wajdeczko(a)intel.com>
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
lib/kunit/test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 7aceb07a1af9..1cdc405daa30 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -660,6 +660,7 @@ int kunit_run_tests(struct kunit_suite *suite)
test.param_index++;
test.status = KUNIT_SUCCESS;
test.status_comment[0] = '\0';
+ test.priv = NULL;
}
}
--
2.43.0
From: Rae Moar <rmoar(a)google.com>
[ Upstream commit 8ae27bc7fff4ef467a7964821a6cedb34a05d3b2 ]
Add parsing of attributes as diagnostic data. Fixes issue with test plan
being parsed incorrectly as diagnostic data when located after
suite-level attributes.
Note that if there does not exist a test plan line, the diagnostic lines
between the suite header and the first result will be saved in the suite
log rather than the first test case log.
Signed-off-by: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_parser.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 79d8832c862a..ce34be15c929 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -450,7 +450,7 @@ def parse_diagnostic(lines: LineStream) -> List[str]:
Log of diagnostic lines
"""
log = [] # type: List[str]
- non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START]
+ non_diagnostic_lines = [TEST_RESULT, TEST_HEADER, KTAP_START, TAP_START, TEST_PLAN]
while lines and not any(re.match(lines.peek())
for re in non_diagnostic_lines):
log.append(lines.pop())
@@ -726,6 +726,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# test plan
test.name = "main"
ktap_line = parse_ktap_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
else:
@@ -737,6 +738,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
if parent_test:
# If KTAP version line and/or subtest header is found, attempt
# to parse test plan and print test header
+ test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
print_test_header(test)
expected_count = test.expected_count
--
2.43.0
From: Thomas Weißschuh <linux(a)weissschuh.net>
[ Upstream commit bdeeeaba83682225a7bf5f100fe8652a59590d33 ]
qemu for LoongArch does not work properly with direct kernel boot.
The kernel will panic during initialization and hang without any output.
When booting in EFI mode everything work correctly.
While users most likely don't have the LoongArch EFI binary installed at
least an explicit error about 'file not found' is better than a hanging
test without output that can never succeed.
Link: https://lore.kernel.org/loongarch/1738d60a-df3a-4102-b1da-d16a29b6e06a@t-8c…
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Acked-by: Willy Tarreau <w(a)1wt.eu>
Link: https://lore.kernel.org/r/20231031-nolibc-out-of-tree-v1-1-47c92f73590a@wei…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/nolibc/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index a0fc07253baf..eb258ae1d948 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -88,6 +88,13 @@ QEMU_ARCH_s390 = s390x
QEMU_ARCH_loongarch = loongarch64
QEMU_ARCH = $(QEMU_ARCH_$(XARCH))
+QEMU_BIOS_DIR = /usr/share/edk2/
+QEMU_BIOS_loongarch = $(QEMU_BIOS_DIR)/loongarch64/OVMF_CODE.fd
+
+ifneq ($(QEMU_BIOS_$(XARCH)),)
+QEMU_ARGS_BIOS = -bios $(QEMU_BIOS_$(XARCH))
+endif
+
# QEMU_ARGS : some arch-specific args to pass to qemu
QEMU_ARGS_i386 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_x86_64 = -M pc -append "console=ttyS0,9600 i8042.noaux panic=-1 $(TEST:%=NOLIBC_TEST=%)"
@@ -101,7 +108,7 @@ QEMU_ARGS_ppc64le = -M powernv -append "console=hvc0 panic=-1 $(TEST:%=NOLIBC
QEMU_ARGS_riscv = -M virt -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_s390 = -M s390-ccw-virtio -m 1G -append "console=ttyS0 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
QEMU_ARGS_loongarch = -M virt -append "console=ttyS0,115200 panic=-1 $(TEST:%=NOLIBC_TEST=%)"
-QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_EXTRA)
+QEMU_ARGS = $(QEMU_ARGS_$(XARCH)) $(QEMU_ARGS_BIOS) $(QEMU_ARGS_EXTRA)
# OUTPUT is only set when run from the main makefile, otherwise
# it defaults to this nolibc directory.
--
2.43.0