Change the parameter expected_buf from (const void *) to (const char *)
in the function __recvpair() as per the feedback in v1.
Add Fixes tag as per feedback in v1.
This change fixes the below warnings during test compilation:
```
In file included from msg_oob.c:14:
msg_oob.c: In function ‘__recvpair’:
../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’
../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument
of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’
```
v1:
lore.kernel.org/netdev/20240810134037.669765-1-jain.abhinav177@gmail.com
Fixes: d098d77232c3 ("selftest: af_unix: Add msg_oob.c.")
Signed-off-by: Abhinav Jain <jain.abhinav177(a)gmail.com>
---
tools/testing/selftests/net/af_unix/msg_oob.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/af_unix/msg_oob.c b/tools/testing/selftests/net/af_unix/msg_oob.c
index 16d0c172eaeb..535eb2c3d7d1 100644
--- a/tools/testing/selftests/net/af_unix/msg_oob.c
+++ b/tools/testing/selftests/net/af_unix/msg_oob.c
@@ -209,7 +209,7 @@ static void __sendpair(struct __test_metadata *_metadata,
static void __recvpair(struct __test_metadata *_metadata,
FIXTURE_DATA(msg_oob) *self,
- const void *expected_buf, int expected_len,
+ const char *expected_buf, int expected_len,
int buf_len, int flags)
{
int i, ret[2], recv_errno[2], expected_errno = 0;
--
2.34.1
There are 2 issues for the current udpgro test. The first one is the testing
doesn't record all the failures, which may report pass but the test actually
failed. e.g.
https://netdev-3.bots.linux.dev/vmksft-net/results/725661/45-udpgro-sh/stdo…
The other one is after commit d7db7775ea2e ("net: veth: do not manipulate
GRO when using XDP"), there is no need to load xdp program to enable GRO
on veth device.
Hangbin Liu (2):
selftests: udpgro: report error when receive failed
selftests: udpgro: no need to load xdp for gro
tools/testing/selftests/net/udpgro.sh | 50 +++++++++++++--------------
1 file changed, 25 insertions(+), 25 deletions(-)
--
2.45.0
Add documentation outlining the usage and details of devmem TCP.
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
Reviewed-by: Donald Hunter <donald.hunter(a)gmail.com>
---
v16:
- Add documentation on unbinding the NIC from dmabuf (Donald).
- Add note that any dmabuf should work (Donald).
v9: https://lore.kernel.org/netdev/20240403002053.2376017-14-almasrymina@google…
- Bagas doc suggestions.
v8:
- Applied docs suggestions (Randy). Thanks!
v7:
- Applied docs suggestions (Jakub).
v2:
- Missing spdx (simon)
- add to index.rst (simon)
---
Documentation/networking/devmem.rst | 269 ++++++++++++++++++++++++++++
Documentation/networking/index.rst | 1 +
2 files changed, 270 insertions(+)
create mode 100644 Documentation/networking/devmem.rst
diff --git a/Documentation/networking/devmem.rst b/Documentation/networking/devmem.rst
new file mode 100644
index 000000000000..417fc977844e
--- /dev/null
+++ b/Documentation/networking/devmem.rst
@@ -0,0 +1,269 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=================
+Device Memory TCP
+=================
+
+
+Intro
+=====
+
+Device memory TCP (devmem TCP) enables receiving data directly into device
+memory (dmabuf). The feature is currently implemented for TCP sockets.
+
+
+Opportunity
+-----------
+
+A large number of data transfers have device memory as the source and/or
+destination. Accelerators drastically increased the prevalence of such
+transfers. Some examples include:
+
+- Distributed training, where ML accelerators, such as GPUs on different hosts,
+ exchange data.
+
+- Distributed raw block storage applications transfer large amounts of data with
+ remote SSDs. Much of this data does not require host processing.
+
+Typically the Device-to-Device data transfers in the network are implemented as
+the following low-level operations: Device-to-Host copy, Host-to-Host network
+transfer, and Host-to-Device copy.
+
+The flow involving host copies is suboptimal, especially for bulk data transfers,
+and can put significant strains on system resources such as host memory
+bandwidth and PCIe bandwidth.
+
+Devmem TCP optimizes this use case by implementing socket APIs that enable
+the user to receive incoming network packets directly into device memory.
+
+Packet payloads go directly from the NIC to device memory.
+
+Packet headers go to host memory and are processed by the TCP/IP stack
+normally. The NIC must support header split to achieve this.
+
+Advantages:
+
+- Alleviate host memory bandwidth pressure, compared to existing
+ network-transfer + device-copy semantics.
+
+- Alleviate PCIe bandwidth pressure, by limiting data transfer to the lowest
+ level of the PCIe tree, compared to the traditional path which sends data
+ through the root complex.
+
+
+More Info
+---------
+
+ slides, video
+ https://netdevconf.org/0x17/sessions/talk/device-memory-tcp.html
+
+ patchset
+ [RFC PATCH v6 00/12] Device Memory TCP
+ https://lore.kernel.org/netdev/20240305020153.2787423-1-almasrymina@google.…
+
+
+Interface
+=========
+
+
+Example
+-------
+
+tools/testing/selftests/net/ncdevmem.c:do_server shows an example of setting up
+the RX path of this API.
+
+
+NIC Setup
+---------
+
+Header split, flow steering, & RSS are required features for devmem TCP.
+
+Header split is used to split incoming packets into a header buffer in host
+memory, and a payload buffer in device memory.
+
+Flow steering & RSS are used to ensure that only flows targeting devmem land on
+an RX queue bound to devmem.
+
+Enable header split & flow steering::
+
+ # enable header split
+ ethtool -G eth1 tcp-data-split on
+
+
+ # enable flow steering
+ ethtool -K eth1 ntuple on
+
+Configure RSS to steer all traffic away from the target RX queue (queue 15 in
+this example)::
+
+ ethtool --set-rxfh-indir eth1 equal 15
+
+
+The user must bind a dmabuf to any number of RX queues on a given NIC using
+the netlink API::
+
+ /* Bind dmabuf to NIC RX queue 15 */
+ struct netdev_queue *queues;
+ queues = malloc(sizeof(*queues) * 1);
+
+ queues[0]._present.type = 1;
+ queues[0]._present.idx = 1;
+ queues[0].type = NETDEV_RX_QUEUE_TYPE_RX;
+ queues[0].idx = 15;
+
+ *ys = ynl_sock_create(&ynl_netdev_family, &yerr);
+
+ req = netdev_bind_rx_req_alloc();
+ netdev_bind_rx_req_set_ifindex(req, 1 /* ifindex */);
+ netdev_bind_rx_req_set_dmabuf_fd(req, dmabuf_fd);
+ __netdev_bind_rx_req_set_queues(req, queues, n_queue_index);
+
+ rsp = netdev_bind_rx(*ys, req);
+
+ dmabuf_id = rsp->dmabuf_id;
+
+
+The netlink API returns a dmabuf_id: a unique ID that refers to this dmabuf
+that has been bound.
+
+The user can unbind the dmabuf from the netdevice by closing the netlink socket
+that established the binding. We do this so that the binding is automatically
+unbound even if the userspace process crashes.
+
+Note that any reasonably well-behaved dmabuf from any exporter should work with
+devmem TCP, even if the dmabuf is not actually backed by devmem. An example of
+this is udmabuf, which wraps user memory (non-devmem) in a dmabuf.
+
+
+Socket Setup
+------------
+
+The socket must be flow steered to the dmabuf bound RX queue::
+
+ ethtool -N eth1 flow-type tcp4 ... queue 15,
+
+
+Receiving data
+--------------
+
+The user application must signal to the kernel that it is capable of receiving
+devmem data by passing the MSG_SOCK_DEVMEM flag to recvmsg::
+
+ ret = recvmsg(fd, &msg, MSG_SOCK_DEVMEM);
+
+Applications that do not specify the MSG_SOCK_DEVMEM flag will receive an EFAULT
+on devmem data.
+
+Devmem data is received directly into the dmabuf bound to the NIC in 'NIC
+Setup', and the kernel signals such to the user via the SCM_DEVMEM_* cmsgs::
+
+ for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+ if (cm->cmsg_level != SOL_SOCKET ||
+ (cm->cmsg_type != SCM_DEVMEM_DMABUF &&
+ cm->cmsg_type != SCM_DEVMEM_LINEAR))
+ continue;
+
+ dmabuf_cmsg = (struct dmabuf_cmsg *)CMSG_DATA(cm);
+
+ if (cm->cmsg_type == SCM_DEVMEM_DMABUF) {
+ /* Frag landed in dmabuf.
+ *
+ * dmabuf_cmsg->dmabuf_id is the dmabuf the
+ * frag landed on.
+ *
+ * dmabuf_cmsg->frag_offset is the offset into
+ * the dmabuf where the frag starts.
+ *
+ * dmabuf_cmsg->frag_size is the size of the
+ * frag.
+ *
+ * dmabuf_cmsg->frag_token is a token used to
+ * refer to this frag for later freeing.
+ */
+
+ struct dmabuf_token token;
+ token.token_start = dmabuf_cmsg->frag_token;
+ token.token_count = 1;
+ continue;
+ }
+
+ if (cm->cmsg_type == SCM_DEVMEM_LINEAR)
+ /* Frag landed in linear buffer.
+ *
+ * dmabuf_cmsg->frag_size is the size of the
+ * frag.
+ */
+ continue;
+
+ }
+
+Applications may receive 2 cmsgs:
+
+- SCM_DEVMEM_DMABUF: this indicates the fragment landed in the dmabuf indicated
+ by dmabuf_id.
+
+- SCM_DEVMEM_LINEAR: this indicates the fragment landed in the linear buffer.
+ This typically happens when the NIC is unable to split the packet at the
+ header boundary, such that part (or all) of the payload landed in host
+ memory.
+
+Applications may receive no SO_DEVMEM_* cmsgs. That indicates non-devmem,
+regular TCP data that landed on an RX queue not bound to a dmabuf.
+
+
+Freeing frags
+-------------
+
+Frags received via SCM_DEVMEM_DMABUF are pinned by the kernel while the user
+processes the frag. The user must return the frag to the kernel via
+SO_DEVMEM_DONTNEED::
+
+ ret = setsockopt(client_fd, SOL_SOCKET, SO_DEVMEM_DONTNEED, &token,
+ sizeof(token));
+
+The user must ensure the tokens are returned to the kernel in a timely manner.
+Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
+and will lead to packet drops.
+
+
+Implementation & Caveats
+========================
+
+Unreadable skbs
+---------------
+
+Devmem payloads are inaccessible to the kernel processing the packets. This
+results in a few quirks for payloads of devmem skbs:
+
+- Loopback is not functional. Loopback relies on copying the payload, which is
+ not possible with devmem skbs.
+
+- Software checksum calculation fails.
+
+- TCP Dump and bpf can't access devmem packet payloads.
+
+
+Testing
+=======
+
+More realistic example code can be found in the kernel source under
+tools/testing/selftests/net/ncdevmem.c
+
+ncdevmem is a devmem TCP netcat. It works very similarly to netcat, but
+receives data directly into a udmabuf.
+
+To run ncdevmem, you need to run it on a server on the machine under test, and
+you need to run netcat on a peer to provide the TX data.
+
+ncdevmem has a validation mode as well that expects a repeating pattern of
+incoming data and validates it as such. For example, you can launch
+ncdevmem on the server by::
+
+ ncdevmem -s <server IP> -c <client IP> -f eth1 -d 3 -n 0000:06:00.0 -l \
+ -p 5201 -v 7
+
+On client side, use regular netcat to send TX data to ncdevmem process
+on the server::
+
+ yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | \
+ tr \\n \\0 | head -c 5G | nc <server IP> 5201 -p 5201
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index d1af04b952f8..0be9924db642 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -49,6 +49,7 @@ Contents:
cdc_mbim
dccp
dctcp
+ devmem
dns_resolver
driver
eql
--
2.46.0.76.ge559c4bf1a-goog
Add netdev_rx_queue_restart(), which resets an rx queue using the
queue API recently merged[1].
The queue API was merged to enable the core net stack to reset individual
rx queues to actuate changes in the rx queue's configuration. In later
patches in this series, we will use netdev_rx_queue_restart() to reset
rx queues after binding or unbinding dmabuf configuration, which will
cause reallocation of the page_pool to repopulate its memory using the
new configuration.
[1] https://lore.kernel.org/netdev/20240430231420.699177-1-shailend@google.com/…
Signed-off-by: David Wei <dw(a)davidwei.uk>
Signed-off-by: Mina Almasry <almasrymina(a)google.com>
Reviewed-by: Pavel Begunkov <asml.silence(a)gmail.com>
Reviewed-by: Jakub Kicinski <kuba(a)kernel.org>
---
v18:
- Add more color to commit message (Xuan Zhuo).
v17:
- Use ASSERT_RTNL() (Jakub).
v13:
- Add reviewed-by from Pavel (thanks!)
- Fixed comment (Pavel)
v11:
- Fix not checking dev->queue_mgmt_ops (Pavel).
- Fix ndo_queue_mem_free call that passed the wrong pointer (David).
v9: https://lore.kernel.org/all/20240502045410.3524155-4-dw@davidwei.uk/
(submitted by David).
- fixed SPDX license identifier (Simon).
- Rebased on top of merged queue API definition, and changed
implementation to match that.
- Replace rtnl_lock() with rtnl_is_locked() to make it useable from my
netlink code where rtnl is already locked.
---
include/net/netdev_rx_queue.h | 3 ++
net/core/Makefile | 1 +
net/core/netdev_rx_queue.c | 74 +++++++++++++++++++++++++++++++++++
3 files changed, 78 insertions(+)
create mode 100644 net/core/netdev_rx_queue.c
diff --git a/include/net/netdev_rx_queue.h b/include/net/netdev_rx_queue.h
index aa1716fb0e53..e78ca52d67fb 100644
--- a/include/net/netdev_rx_queue.h
+++ b/include/net/netdev_rx_queue.h
@@ -54,4 +54,7 @@ get_netdev_rx_queue_index(struct netdev_rx_queue *queue)
return index;
}
#endif
+
+int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq);
+
#endif
diff --git a/net/core/Makefile b/net/core/Makefile
index 62be9aef2528..f82232b358a2 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -19,6 +19,7 @@ obj-$(CONFIG_NETDEV_ADDR_LIST_TEST) += dev_addr_lists_test.o
obj-y += net-sysfs.o
obj-y += hotdata.o
+obj-y += netdev_rx_queue.o
obj-$(CONFIG_PAGE_POOL) += page_pool.o page_pool_user.o
obj-$(CONFIG_PROC_FS) += net-procfs.o
obj-$(CONFIG_NET_PKTGEN) += pktgen.o
diff --git a/net/core/netdev_rx_queue.c b/net/core/netdev_rx_queue.c
new file mode 100644
index 000000000000..da11720a5983
--- /dev/null
+++ b/net/core/netdev_rx_queue.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/netdevice.h>
+#include <net/netdev_queues.h>
+#include <net/netdev_rx_queue.h>
+
+int netdev_rx_queue_restart(struct net_device *dev, unsigned int rxq_idx)
+{
+ void *new_mem, *old_mem;
+ int err;
+
+ if (!dev->queue_mgmt_ops || !dev->queue_mgmt_ops->ndo_queue_stop ||
+ !dev->queue_mgmt_ops->ndo_queue_mem_free ||
+ !dev->queue_mgmt_ops->ndo_queue_mem_alloc ||
+ !dev->queue_mgmt_ops->ndo_queue_start)
+ return -EOPNOTSUPP;
+
+ ASSERT_RTNL();
+
+ new_mem = kvzalloc(dev->queue_mgmt_ops->ndo_queue_mem_size, GFP_KERNEL);
+ if (!new_mem)
+ return -ENOMEM;
+
+ old_mem = kvzalloc(dev->queue_mgmt_ops->ndo_queue_mem_size, GFP_KERNEL);
+ if (!old_mem) {
+ err = -ENOMEM;
+ goto err_free_new_mem;
+ }
+
+ err = dev->queue_mgmt_ops->ndo_queue_mem_alloc(dev, new_mem, rxq_idx);
+ if (err)
+ goto err_free_old_mem;
+
+ err = dev->queue_mgmt_ops->ndo_queue_stop(dev, old_mem, rxq_idx);
+ if (err)
+ goto err_free_new_queue_mem;
+
+ err = dev->queue_mgmt_ops->ndo_queue_start(dev, new_mem, rxq_idx);
+ if (err)
+ goto err_start_queue;
+
+ dev->queue_mgmt_ops->ndo_queue_mem_free(dev, old_mem);
+
+ kvfree(old_mem);
+ kvfree(new_mem);
+
+ return 0;
+
+err_start_queue:
+ /* Restarting the queue with old_mem should be successful as we haven't
+ * changed any of the queue configuration, and there is not much we can
+ * do to recover from a failure here.
+ *
+ * WARN if we fail to recover the old rx queue, and at least free
+ * old_mem so we don't also leak that.
+ */
+ if (dev->queue_mgmt_ops->ndo_queue_start(dev, old_mem, rxq_idx)) {
+ WARN(1,
+ "Failed to restart old queue in error path. RX queue %d may be unhealthy.",
+ rxq_idx);
+ dev->queue_mgmt_ops->ndo_queue_mem_free(dev, old_mem);
+ }
+
+err_free_new_queue_mem:
+ dev->queue_mgmt_ops->ndo_queue_mem_free(dev, new_mem);
+
+err_free_old_mem:
+ kvfree(old_mem);
+
+err_free_new_mem:
+ kvfree(new_mem);
+
+ return err;
+}
--
2.46.0.76.ge559c4bf1a-goog
GCC 13.2.0 reported warning about (void *) being used as a param where (char *)
is expected:
In file included from msg_oob.c:14:
msg_oob.c: In function ‘__recvpair’:
../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument of type ‘char *’, \
but argument 6 has type ‘const void *’ [-Wformat=]
106 | fprintf(TH_LOG_STREAM, "# %s:%d:%s:" fmt "\n", \
| ^~~~~~~~~~~~~
../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
101 | __TH_LOG(fmt, ##__VA_ARGS__); \
| ^~~~~~~~
msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’
235 | TH_LOG("Expected:%s", expected_errno ? strerror(expected_errno) : expected_buf);
| ^~~~~~
../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument of type ‘char *’, \
but argument 6 has type ‘const void *’ [-Wformat=]
106 | fprintf(TH_LOG_STREAM, "# %s:%d:%s:" fmt "\n", \
| ^~~~~~~~~~~~~
../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’
101 | __TH_LOG(fmt, ##__VA_ARGS__); \
| ^~~~~~~~
msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’
259 | TH_LOG("Expected:%s", expected_errno ? strerror(expected_errno) : expected_buf);
| ^~~~~~
As Simon suggested, all calls to __recvpair() have char * as expected_buf param, so
it is safe to change param type from (const void *) to (const char *), which silences
the warning.
Fixes: d098d77232c37 ("selftest: af_unix: Add msg_oob.c.")
Reported-by: Mirsad Todorovac <mtodorovac69(a)gmail.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Kuniyuki Iwashima <kuniyu(a)amazon.com>
Cc: netdev(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Suggested-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: Mirsad Todorovac <mtodorovac69(a)gmail.com>
---
v1 -> v2:
fixed a typo.
change funct param type rather than making two casts, as Simon suggested.
changed Subject: line to reflect the modification.
minor formatting changes.
v1:
initial version to fix the compiler warning.
tools/testing/selftests/net/af_unix/msg_oob.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/af_unix/msg_oob.c b/tools/testing/selftests/net/af_unix/msg_oob.c
index 16d0c172eaeb..535eb2c3d7d1 100644
--- a/tools/testing/selftests/net/af_unix/msg_oob.c
+++ b/tools/testing/selftests/net/af_unix/msg_oob.c
@@ -209,7 +209,7 @@ static void __sendpair(struct __test_metadata *_metadata,
static void __recvpair(struct __test_metadata *_metadata,
FIXTURE_DATA(msg_oob) *self,
- const void *expected_buf, int expected_len,
+ const char *expected_buf, int expected_len,
int buf_len, int flags)
{
int i, ret[2], recv_errno[2], expected_errno = 0;
--
2.43.0
The BPF tracing infrastructure has undergone significant evolution,
leading to the introduction of more robust and efficient APIs.
However, some of the existing tests in the samples/bpf directory have
not kept pace with these developments. These outdated tests not only
create confusion among users but also increase maintenance overhead.
For starter, this patchset focuses on cleaning up outdated 'tracing'
related tests within the BPF testing framework. The goal is to
modernize and streamline selftests by removing obsolete tests and
migrating necessaries to more appropriate locations.
Daniel T. Lee (3):
selftests/bpf: migrate tracepoint overhead test to prog_tests
selftests/bpf: add rename tracepoint bench test
samples/bpf: remove obsolete tracing related tests
samples/bpf/Makefile | 12 -
samples/bpf/test_overhead_kprobe.bpf.c | 41 ----
samples/bpf/test_overhead_raw_tp.bpf.c | 17 --
samples/bpf/test_overhead_tp.bpf.c | 23 --
samples/bpf/test_overhead_user.c | 225 ------------------
samples/bpf/test_override_return.sh | 16 --
samples/bpf/test_probe_write_user.bpf.c | 52 ----
samples/bpf/test_probe_write_user_user.c | 108 ---------
samples/bpf/tracex7.bpf.c | 15 --
samples/bpf/tracex7_user.c | 56 -----
tools/testing/selftests/bpf/bench.c | 2 +
.../selftests/bpf/benchs/bench_rename.c | 16 ++
.../selftests/bpf/benchs/run_bench_rename.sh | 2 +-
.../selftests/bpf/prog_tests/test_overhead.c | 14 +-
.../selftests/bpf/progs/test_overhead.c | 11 +-
15 files changed, 39 insertions(+), 571 deletions(-)
delete mode 100644 samples/bpf/test_overhead_kprobe.bpf.c
delete mode 100644 samples/bpf/test_overhead_raw_tp.bpf.c
delete mode 100644 samples/bpf/test_overhead_tp.bpf.c
delete mode 100644 samples/bpf/test_overhead_user.c
delete mode 100755 samples/bpf/test_override_return.sh
delete mode 100644 samples/bpf/test_probe_write_user.bpf.c
delete mode 100644 samples/bpf/test_probe_write_user_user.c
delete mode 100644 samples/bpf/tracex7.bpf.c
delete mode 100644 samples/bpf/tracex7_user.c
--
2.43.0
This patchset introduces SEV-SNP test to the kernel selftest framework.
It also adds negative testing of SEV, ES and SNP VM types.
Patch 1 - Extend the sev smoke tests to use the SNP specific ioctl
calls and sets up memory to boot a SNP guest VM
Patch 2 - Cleanup patch that decouples the ioctl calls from the sev
selftest library with its test assert and status counterparts. No
functional change introduced
Patch 3 - Introduce ioctl test for SEV, ES
Patch 4 - Introduce positive and negative ioctl test for SEV-SNP
Patch 5 - Adds the X86_FEATURE_SEV_SNP vm type test for KVM_SEV_INIT2
Any feedback/review on the approach and design is highly appreciated!
Pratik R. Sampat (5):
selftests: KVM: Add a basic SNP smoke test
selftests: KVM: Decouple SEV ioctls from asserts
selftests: KVM: SEV IOCTL test
selftests: KVM: SNP IOCTL test
selftests: KVM: SEV-SNP test for KVM_SEV_INIT2
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/include/x86_64/sev.h | 39 ++-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
.../selftests/kvm/lib/x86_64/processor.c | 6 +-
tools/testing/selftests/kvm/lib/x86_64/sev.c | 181 +++++++++++---
.../selftests/kvm/x86_64/sev_init2_tests.c | 13 +
.../selftests/kvm/x86_64/sev_smoke_test.c | 223 +++++++++++++++++-
7 files changed, 418 insertions(+), 52 deletions(-)
--
2.34.1
v18: https://patchwork.kernel.org/project/netdevbpf/list/?series=874848&state=*
====
v17 got minor feedback: (a) to beef up the description on patch 1 and (b)
to remove the leading underscores in the header definition.
I applied (a). (b) seems to be against current conventions so I did not
apply before further discussion.
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v17/
v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=*
====
v16 also got a very thorough review and some testing (thanks again!).
Thes version addresses all the concerns reported on v15, in terms of
feedback and issues reported.
Major changes:
- Use ASSERT_RTNL.
- Moved around some of the page_pool helpers definitions so I can hide
some netmem helpers in private files as Jakub suggested.
- Don't make every net_iov hold a ref on the binding as Jakub suggested.
- Fix issue reported by Taehee where we access queues after they have
been freed.
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v17/
v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=*
====
v15 got a thorough review and some testing, and this version addresses almost
all the feedback. Some more minor comments where the authors said it
could be done later, I left out.
Major changes:
- Addition of dma-buf introspection to page-pool-get and queue-get.
- Fixes to selftests suggested by Taehee.
- Fixes to documentation suggested by Donald.
- A couple of suggestions and fixes to TCP patches by Eric and David.
- Fixes to number assignements suggested by Arnd.
- Use rtnl_lock()ing to guard against queue reconfiguration while the
page_pool initialization is happening. (Jakub).
- Fixes to a few warnings reproduced by Taehee.
- Fixes to dma-buf binding suggested by Taehee and Jakub.
- Fixes to netlink UAPI suggested by Jakub
- Applied a number of Reviewed-bys and Acked-bys (including ones I lost
from v13+).
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v16/
One caveat: Taehee reproduced a KASAN warning and reported it here:
https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd…
I estimate the issue to be minor and easily fixable:
https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW…
I hope to be able to follow up with a fix to net tree as net-next closes
imminently, but if this iteration doesn't make it in, I will repost with
a fix squashed after net-next reopens, no problem.
v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=*
====
No material changes in this version, only a fix to linking against
libynl.a from the last version. Per Jakub's instructions I've pulled one
of his patches into this series, and now use the new libynl.a correctly,
I hope.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v15/
v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=…
====
No material changes in this version. Only rebase and re-verification on
top of net-next. v13, I think, raced with commit ebad6d0334793
("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to
net-next that caused a patchwork failure to apply. This series should
apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded
IP_GRE config").
I did not wait the customary 24hr as Jakub said it's OK to repost as soon
as I build test the rebased version:
https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
Cc: Taehee Yoo <ap420073(a)gmail.com>
Cc: Donald Hunter <donald.hunter(a)gmail.com>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (14):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: move dmaddr helpers to .c file
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
netdev: add dmabuf introspection
Documentation/netlink/specs/netdev.yaml | 61 +++
Documentation/networking/devmem.rst | 269 ++++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 9 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 128 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 164 +++++++-
include/net/page_pool/helpers.h | 42 +-
include/net/page_pool/types.h | 8 +
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 12 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 13 +
include/uapi/linux/uio.h | 17 +
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 7 +-
net/core/devmem.c | 378 +++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 111 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/netmem_priv.h | 36 ++
net/core/page_pool.c | 147 +++++--
net/core/page_pool_priv.h | 3 +
net/core/page_pool_user.c | 4 +
net/core/skbuff.c | 77 +++-
net/core/sock.c | 68 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 13 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 9 +
tools/testing/selftests/net/ncdevmem.c | 536 ++++++++++++++++++++++++
49 files changed, 2563 insertions(+), 121 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 net/core/netmem_priv.h
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.46.0.rc2.264.g509ed76dc8-goog
This patch series adds a selftest suite to validate the s390x
architecture specific ucontrol KVM interface.
When creating a VM on s390x it is possible to create it as userspace
controlled VM or in short ucontrol VM.
These VMs delegates the management of the VM to userspace instead
of handling most events within the kernel. Consequently the userspace
has to manage interrupts, memory allocation etc.
Before this patch set this functionality lacks any public test cases.
It is desirable to add test cases for this interface to be able to
reduce the risk of breaking changes in the future.
In order to provision a ucontrol VM the kernel needs to be compiled with
the CONFIG_KVM_S390_UCONTROL enabled. The users with sys_admin capability
can then create a new ucontrol VM providing the KVM_VM_S390_UCONTROL
parameter to the KVM_CREATE_VM ioctl.
The kernels existing selftest helper functions can only be partially be
reused for these tests.
The test cases cover existing special handling of ucontrol VMs within the
implementation and basic VM creation and handling cases:
* Reject setting HPAGE when VM is ucontrol
* Assert KVM_GET_DIRTY_LOG is rejected
* Assert KVM_S390_VM_MEM_LIMIT_SIZE is rejected
* Assert state of initial SIE flags setup by the kernel
* Run simple program in VM with and without DAT
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
* Assert that memory region operations are rejected for ucontrol VMs
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
The patch set does also contain some code cleanup / consolidation of
architecture specific defines that are now used in multiple test cases.
---
v5:
- PATCH 7: Simplify segment and page table generation
- PATCH 8: Remove irrelevant code in skey test
Add some comments to skey test
v4:
- PATCH 5: Remove not yet used include for debug print functions
- PATCH 6: Add include for debug print functions (removed from patch 5)
Remove no longer needed code since stopped but is reset
before starting since v3 (thanks Janosch)
Adjust test output to use leading zeros instead of spaces in sieic
- PATCH 7: Rename constant to PGM_SEGMENT_TRANSLATION (thanks Janosch)
Put comments on their own lines
v3:
- Remove stopped bit before starting the VM (no initial stop in multiple
test cases) (thanks Janosch)
- PATCH 2: Clarified SIE control block vs SIE instruction (thanks
Janosch)
- PATCH 3: Make use of CAP_TO_MASK(CAP_SYS_ADMIN) instead of custom
define (thanks Janosch)
Removed Reviewed-By: Claudio
- PATCH 4: Remove erroneous 1MB offset from self->base_hva (thanks
Janosch)
- PATCH 6-8: Change name of test program _pgm to _asm to prevent confusion
- PATCH 10: Move KVM_S390_UCONTROL default option to actual debug config
(thanks Christian)
v2:
- add ucontrol to s390 debug config (new patch)
- PATCH 2: changed atomic_t to __u32 (thanks Claudio)
- PATCH 4: reformatted comment in FIXTURE_SETUP(uc_kvm)
- PATCH 5: refactored to display 8 byte blocks + more internal reuse
(thanks Claudio)
- PATCH 7: make use of more declarative defines instead of magic values
- PATCH 8: make use of more declarative defines instead of magic values
(thanks Claudio)
- PATCH 9: add reference to fix verified by the test case
Christoph Schlameuss (10):
selftests: kvm: s390: Define page sizes in shared header
selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace
tests
selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
selftests: kvm: s390: Add test fixture and simple VM setup tests
selftests: kvm: s390: Add debug print functions
selftests: kvm: s390: Add VM run test case
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
arch/s390/configs/debug_defconfig | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/s390x/debug_print.h | 69 ++
.../selftests/kvm/include/s390x/processor.h | 5 +
.../testing/selftests/kvm/include/s390x/sie.h | 240 +++++++
.../selftests/kvm/lib/s390x/processor.c | 10 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +-
tools/testing/selftests/kvm/s390x/config | 2 +
.../testing/selftests/kvm/s390x/debug_test.c | 4 +-
tools/testing/selftests/kvm/s390x/memop.c | 4 +-
tools/testing/selftests/kvm/s390x/tprot.c | 5 +-
.../selftests/kvm/s390x/ucontrol_test.c | 598 ++++++++++++++++++
13 files changed, 931 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/s390x/debug_print.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/sie.h
create mode 100644 tools/testing/selftests/kvm/s390x/config
create mode 100644 tools/testing/selftests/kvm/s390x/ucontrol_test.c
base-commit: de9c2c66ad8e787abec7c9d7eff4f8c3cdd28aed
--
2.45.2
This test doesn't have support for other architectures. Altough resctrl
is supported on x86 and ARM, but arch_supports_noncont_cat() shows that
only x86 for AMD and Intel are supported by the test. We get build
errors when built for ARM and ARM64.
Hence add support in the Makefile to build this suite only for x86 and
x86_64 architectures.
Fixes: b733143cc455 ("selftests/resctrl: Make resctrl_tests run using kselftest framework")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/resctrl/Makefile | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/resctrl/Makefile b/tools/testing/selftests/resctrl/Makefile
index f408bd6bfc3d4..d5cf96315ef9b 100644
--- a/tools/testing/selftests/resctrl/Makefile
+++ b/tools/testing/selftests/resctrl/Makefile
@@ -3,7 +3,9 @@
CFLAGS = -g -Wall -O2 -D_FORTIFY_SOURCE=2
CFLAGS += $(KHDR_INCLUDES)
+ifeq ($(ARCH),$(filter $(ARCH),x86 x86_64))
TEST_GEN_PROGS := resctrl_tests
+endif
LOCAL_HDRS += $(wildcard *.h)
--
2.39.2
There are multiple possible timer sources which could be useful for
the sound stream synchronization: hrtimers, hardware clocks (e.g. PTP),
timer wheels (jiffies). Currently, using one of them to synchronize
the audio stream of snd-aloop module would require writing a
kernel-space driver which exports an ALSA timer through the
snd_timer interface.
However, it is not really convenient for application developers, who may
want to define their custom timer sources for audio synchronization.
For instance, we could have a network application which receives frames
and sends them to snd-aloop pcm device, and another application
listening on the other end of snd-aloop. It makes sense to transfer a
new period of data only when certain amount of frames is received
through the network, but definitely not when a certain amount of jiffies
on a local system elapses. Since all of the devices are purely virtual
it won't introduce any glitches and will help the application developers
to avoid using sample-rate conversion.
This patch series introduces userspace-driven ALSA timers: virtual
timers which are created and controlled from userspace. The timer can
be created from the userspace using the new ioctl SNDRV_TIMER_IOCTL_CREATE.
After creating a timer, it becomes available for use system-wide, so it
can be passed to snd-aloop as a timer source (timer_source parameter
would be "-1.SNDRV_TIMER_GLOBAL_UDRIVEN.{timer_id}"). When the userspace
app decides to trigger a timer, it calls another ioctl
SNDRV_TIMER_IOCTL_TRIGGER on the file descriptor of a timer. It
initiates a transfer of a new period of data.
Userspace-driven timers are associated with file descriptors. If the
application wishes to destroy the timer, it can simply release the file
descriptor of a virtual timer.
I believe introducing new ioctl calls is quite inconvenient (as we have
a limited amount of them), but other possible ways of app <-> kernel
communication (like virtual FS) seem completely inappropriate for this
task (but I'd love to discuss alternative solutions).
This patch series also updates the snd-aloop module so the global timers
can be used as a timer_source for it (it allows using userspace-driven
timers as timer source).
V1 -> V2:
- Fix some problems found by Christophe Jaillet
<christophe.jaillet(a)wanadoo.fr>
V2 -> V3:
- Add improvements suggested by Takashi Iwai <tiwai(a)suse.de>
V3 -> V4:
- Address comments from Jaroslav Kysela <perex(a)perex.cz> and Mark Brown
<broonie(a)kernel.org>
Please, find the patch-specific changelog in the following patches.
Ivan Orlov (4):
ALSA: aloop: Allow using global timers
Docs/sound: Add documentation for userspace-driven ALSA timers
ALSA: timer: Introduce virtual userspace-driven timers
selftests: ALSA: Cover userspace-driven timers with test
Documentation/sound/index.rst | 1 +
Documentation/sound/utimers.rst | 125 ++++++++++++
include/uapi/sound/asound.h | 16 +-
sound/core/Kconfig | 10 +
sound/core/timer.c | 210 ++++++++++++++++++++
sound/drivers/aloop.c | 2 +
tools/testing/selftests/alsa/Makefile | 4 +-
tools/testing/selftests/alsa/global-timer.c | 87 ++++++++
tools/testing/selftests/alsa/utimer-test.c | 164 +++++++++++++++
9 files changed, 616 insertions(+), 3 deletions(-)
create mode 100644 Documentation/sound/utimers.rst
create mode 100644 tools/testing/selftests/alsa/global-timer.c
create mode 100644 tools/testing/selftests/alsa/utimer-test.c
--
2.34.1
This patch series implements a new char misc driver, /dev/ntsync, which is used
to implement Windows NT synchronization primitives.
NT synchronization primitives are unique in that the wait functions both are
vectored, operate on multiple types of object with different behaviour (mutex,
semaphore, event), and affect the state of the objects they wait on. This model
is not compatible with existing kernel synchronization objects or interfaces,
and therefore the ntsync driver implements its own wait queues and locking.
This patch series is rebased against the "char-misc-next" branch of
gregkh/char-misc.git.
== Background ==
The Wine project emulates the Windows API in user space. One particular part of
that API, namely the NT synchronization primitives, have historically been
implemented via RPC to a dedicated "kernel" process. However, more recent
applications use these APIs more strenuously, and the overhead of RPC has become
a bottleneck.
The NT synchronization APIs are too complex to implement on top of existing
primitives without sacrificing correctness. Certain operations, such as
NtPulseEvent() or the "wait-for-all" mode of NtWaitForMultipleObjects(), require
direct control over the underlying wait queue, and implementing a wait queue
sufficiently robust for Wine in user space is not possible. This proposed
driver, therefore, implements the problematic interfaces directly in the Linux
kernel.
This driver was presented at Linux Plumbers Conference 2023. For those further
interested in the history of synchronization in Wine and past attempts to solve
this problem in user space, a recording of the presentation can be viewed here:
https://www.youtube.com/watch?v=NjU4nyWyhU8
== Performance ==
The performance measurements described below are copied from earlier versions of
the patch set. While some of the code has changed, I do not currently anticipate
that it has changed drastically enough to affect those measurements.
The gain in performance varies wildly depending on the application in question
and the user's hardware. For some games NT synchronization is not a bottleneck
and no change can be observed, but for others frame rate improvements of 50 to
150 percent are not atypical. The following table lists frame rate measurements
from a variety of games on a variety of hardware, taken by users Dmitry
Skvortsov, FuzzyQuils, OnMars, and myself:
Game Upstream ntsync improvement
===========================================================================
Anger Foot 69 99 43%
Call of Juarez 99.8 224.1 125%
Dirt 3 110.6 860.7 678%
Forza Horizon 5 108 160 48%
Lara Croft: Temple of Osiris 141 326 131%
Metro 2033 164.4 199.2 21%
Resident Evil 2 26 77 196%
The Crew 26 51 96%
Tiny Tina's Wonderlands 130 360 177%
Total War Saga: Troy 109 146 34%
===========================================================================
== Patches ==
The intended semantics of the patches are broadly intended to match those of the
corresponding Windows functions. For those not already familiar with the Windows
functions (or their undocumented behaviour), patch 27/28 provides a detailed
specification, and individual patches also include a brief description of the
API they are implementing.
The patches making use of this driver in Wine can be retrieved or browsed here:
https://repo.or.cz/wine/zf.git/shortlog/refs/heads/ntsync5
== Previous versions ==
Changes from v4:
* Rework wait-all locking code to avoid taking more than one spinlock at a time,
and also to fix a race where the wait-all lock would not be not correctly
taken. The new locking mechanism involves taking a simple spinlock for normal
"any" waits, and taking a device-wide mutex for "all" waits or when locking
any object that is involved in an "all" wait. The mechanism was written by
Peter Zijlstra.
* Try to reword or clarify various parts of the documentation (patch 27), per
Peter Zijlstra.
* I did not rename NTSYNC_IOC_SEM_POST to RELEASE (like NT) although this was
suggested by Peter Zijlstra, mostly because it's not clear to me that renaming
an already committed ioctl would be fine. The API committed isn't actually
usable yet, though, so if altering it would be fine on those grounds, I can
revise this series to rename the function accordingly.
* Similarly, I did not change the create ioctls to return the fd directly,
although this was suggested and would be a bit simpler and cleaner, because
NTSYNC_IOC_CREATE_SEM already exists upstream and returns the fd through a
struct. I can make this change in the next revision if that'd be preferable. I
also still would appreciate a clarification on the advice in [1].
[1] https://docs.kernel.org/driver-api/ioctl.html#return-code
* Link to v4: https://lore.kernel.org/lkml/20240416010837.333694-1-zfigura@codeweavers.co…
* Link to v3: https://lore.kernel.org/lkml/20240329000621.148791-1-zfigura@codeweavers.co…
* Link to v2: https://lore.kernel.org/lkml/20240219223833.95710-1-zfigura@codeweavers.com/
* Link to v1: https://lore.kernel.org/lkml/20240214233645.9273-1-zfigura@codeweavers.com/
* Link to RFC v2: https://lore.kernel.org/lkml/20240131021356.10322-1-zfigura@codeweavers.com/
* Link to RFC v1: https://lore.kernel.org/lkml/20240124004028.16826-1-zfigura@codeweavers.com/
Elizabeth Figura (28):
ntsync: Introduce NTSYNC_IOC_WAIT_ANY.
ntsync: Introduce NTSYNC_IOC_WAIT_ALL.
ntsync: Introduce NTSYNC_IOC_CREATE_MUTEX.
ntsync: Introduce NTSYNC_IOC_MUTEX_UNLOCK.
ntsync: Introduce NTSYNC_IOC_MUTEX_KILL.
ntsync: Introduce NTSYNC_IOC_CREATE_EVENT.
ntsync: Introduce NTSYNC_IOC_EVENT_SET.
ntsync: Introduce NTSYNC_IOC_EVENT_RESET.
ntsync: Introduce NTSYNC_IOC_EVENT_PULSE.
ntsync: Introduce NTSYNC_IOC_SEM_READ.
ntsync: Introduce NTSYNC_IOC_MUTEX_READ.
ntsync: Introduce NTSYNC_IOC_EVENT_READ.
ntsync: Introduce alertable waits.
selftests: ntsync: Add some tests for semaphore state.
selftests: ntsync: Add some tests for mutex state.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for NTSYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ANY.
selftests: ntsync: Add some tests for wakeup signaling with
WINESYNC_IOC_WAIT_ALL.
selftests: ntsync: Add some tests for manual-reset event state.
selftests: ntsync: Add some tests for auto-reset event state.
selftests: ntsync: Add some tests for wakeup signaling with events.
selftests: ntsync: Add tests for alertable waits.
selftests: ntsync: Add some tests for wakeup signaling via alerts.
selftests: ntsync: Add a stress test for contended waits.
maintainers: Add an entry for ntsync.
docs: ntsync: Add documentation for the ntsync uAPI.
ntsync: No longer depend on BROKEN.
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/ntsync.rst | 398 +++++
MAINTAINERS | 9 +
drivers/misc/Kconfig | 1 -
drivers/misc/ntsync.c | 989 +++++++++++-
include/uapi/linux/ntsync.h | 39 +
tools/testing/selftests/Makefile | 1 +
.../selftests/drivers/ntsync/.gitignore | 1 +
.../testing/selftests/drivers/ntsync/Makefile | 7 +
tools/testing/selftests/drivers/ntsync/config | 1 +
.../testing/selftests/drivers/ntsync/ntsync.c | 1407 +++++++++++++++++
11 files changed, 2850 insertions(+), 4 deletions(-)
create mode 100644 Documentation/userspace-api/ntsync.rst
create mode 100644 tools/testing/selftests/drivers/ntsync/.gitignore
create mode 100644 tools/testing/selftests/drivers/ntsync/Makefile
create mode 100644 tools/testing/selftests/drivers/ntsync/config
create mode 100644 tools/testing/selftests/drivers/ntsync/ntsync.c
base-commit: f5b335dc025cfee90957efa90dc72fada0d5abb4
--
2.43.0
Hello,
this series brings a new set of test converted to the test_progs framework.
Since the tests are quite small, I chose to group three tests conversion in
the same series, but feel free to let me know if I should keep one series
per test. The series focuses on cgroup testing and converts the following
tests:
- get_cgroup_id_user
- cgroup_storage
- test_skb_cgroup_id_user
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Changes in v3:
- Fixed multiple leaks on cgroup file descriptors and sockets
- Used dedicated network namespaces for tests involving network
- Link to v2: https://lore.kernel.org/r/20240806-convert_cgroup_tests-v2-0-180c57e5b710@b…
Changes in v2:
- Use global variables instead of maps when possible
- Collect review tags from Alan
- Link to v1: https://lore.kernel.org/r/20240731-convert_cgroup_tests-v1-0-14cbc51b6947@b…
---
Alexis Lothoré (eBPF Foundation) (4):
selftests/bpf: convert get_current_cgroup_id_user to test_progs
selftests/bpf: convert test_cgroup_storage to test_progs
selftests/bpf: add proper section name to bpf prog and rename it
selftests/bpf: convert test_skb_cgroup_id_user to test_progs
tools/testing/selftests/bpf/.gitignore | 3 -
tools/testing/selftests/bpf/Makefile | 8 +-
tools/testing/selftests/bpf/get_cgroup_id_user.c | 151 -----------------
.../selftests/bpf/prog_tests/cgroup_ancestor.c | 169 +++++++++++++++++++
.../bpf/prog_tests/cgroup_get_current_cgroup_id.c | 46 ++++++
.../selftests/bpf/prog_tests/cgroup_storage.c | 94 +++++++++++
...test_skb_cgroup_id_kern.c => cgroup_ancestor.c} | 14 +-
tools/testing/selftests/bpf/progs/cgroup_storage.c | 24 +++
.../selftests/bpf/progs/get_cgroup_id_kern.c | 26 +--
tools/testing/selftests/bpf/test_cgroup_storage.c | 174 --------------------
tools/testing/selftests/bpf/test_skb_cgroup_id.sh | 63 -------
.../selftests/bpf/test_skb_cgroup_id_user.c | 183 ---------------------
12 files changed, 342 insertions(+), 613 deletions(-)
---
base-commit: ab2c4aa104050a184c3411a973b165285549f732
change-id: 20240725-convert_cgroup_tests-d07c66053225
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Hello,
this series brings a new set of test converted to the test_progs framework.
Since the tests are quite small, I chose to group three tests conversion in
the same series, but feel free to let me know if I should keep one series
per test. The series focuses on cgroup testing and converts the following
tests:
- get_cgroup_id_user
- cgroup_storage
- test_skb_cgroup_id_user
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Changes in v2:
- Use global variables instead of maps when possible
- Collect review tags from Alan
- Link to v1: https://lore.kernel.org/r/20240731-convert_cgroup_tests-v1-0-14cbc51b6947@b…
---
Alexis Lothoré (eBPF Foundation) (4):
selftests/bpf: convert get_current_cgroup_id_user to test_progs
selftests/bpf: convert test_cgroup_storage to test_progs
selftests/bpf: add proper section name to bpf prog and rename it
selftests/bpf: convert test_skb_cgroup_id_user to test_progs
tools/testing/selftests/bpf/.gitignore | 3 -
tools/testing/selftests/bpf/Makefile | 8 +-
tools/testing/selftests/bpf/get_cgroup_id_user.c | 151 -----------------
.../selftests/bpf/prog_tests/cgroup_ancestor.c | 154 +++++++++++++++++
.../bpf/prog_tests/cgroup_get_current_cgroup_id.c | 45 +++++
.../selftests/bpf/prog_tests/cgroup_storage.c | 65 ++++++++
...test_skb_cgroup_id_kern.c => cgroup_ancestor.c} | 14 +-
tools/testing/selftests/bpf/progs/cgroup_storage.c | 24 +++
.../selftests/bpf/progs/get_cgroup_id_kern.c | 26 +--
tools/testing/selftests/bpf/test_cgroup_storage.c | 174 --------------------
tools/testing/selftests/bpf/test_skb_cgroup_id.sh | 63 -------
.../selftests/bpf/test_skb_cgroup_id_user.c | 183 ---------------------
12 files changed, 297 insertions(+), 613 deletions(-)
---
base-commit: 34dbece299dfc462db4504268a697f29750d2932
change-id: 20240725-convert_cgroup_tests-d07c66053225
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
[1] mentions that memfd_secret is only supported on arm64, riscv, x86
and x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise
on KernelCI:
memfd_secret.c: In function 'memfd_secret':
memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function);
did you mean 'memfd_secret'?
42 | return syscall(__NR_memfd_secret, flags);
| ^~~~~~~~~~~~~~~~~
| memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on
supported architectures.
Also check in run_vmtests script if memfd_secret binary is present
before executing it.
[1] https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/
Cc: stable(a)vger.kernel.org
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Mike Rapoport (Microsoft) <rppt(a)kernel.org>
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Changes since v1:
- Add error message to the patch and reviewed-by tags
---
tools/testing/selftests/mm/Makefile | 2 ++
tools/testing/selftests/mm/run_vmtests.sh | 3 +++
2 files changed, 5 insertions(+)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 1a83b70e84535..4ea188be0588a 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce6..36045edb10dea 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
--
2.39.2
If adding multiple config files to the merge_config.sh script and
rust/config is the fist one, then the last config fragment in this file
and the first config fragment in the second file wont be set, since
there isn't a newline in this file, so those two fragements end up at
the same row like:
CONFIG_SAMPLE_RUST_PRINT=mCONFIG_FRAGMENT=y
And non of those will be enabled when running 'olddefconfig' after.
Fixing the issue by adding a newline to the file.
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/rust/config | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rust/config b/tools/testing/selftests/rust/config
index b4002acd40bc..fa06cebae232 100644
--- a/tools/testing/selftests/rust/config
+++ b/tools/testing/selftests/rust/config
@@ -2,4 +2,4 @@ CONFIG_RUST=y
CONFIG_SAMPLES=y
CONFIG_SAMPLES_RUST=y
CONFIG_SAMPLE_RUST_MINIMAL=m
-CONFIG_SAMPLE_RUST_PRINT=m
\ No newline at end of file
+CONFIG_SAMPLE_RUST_PRINT=m
--
2.43.0
[1] mentions that memfd_secret is only supported on arm64, riscv, x86
and x86_64 for now. It doesn't support other architectures. I found the
build error on arm and decided to send the fix as it was creating noise
on KernelCI. Hence I'm adding condition that memfd_secret should only be
compiled on supported architectures.
Also check in run_vmtests script if memfd_secret binary is present
before executing it.
[1] https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/
Cc: stable(a)vger.kernel.org
Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)")
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
tools/testing/selftests/mm/Makefile | 2 ++
tools/testing/selftests/mm/run_vmtests.sh | 3 +++
2 files changed, 5 insertions(+)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 1a83b70e84535..4ea188be0588a 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -53,7 +53,9 @@ TEST_GEN_FILES += madv_populate
TEST_GEN_FILES += map_fixed_noreplace
TEST_GEN_FILES += map_hugetlb
TEST_GEN_FILES += map_populate
+ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64))
TEST_GEN_FILES += memfd_secret
+endif
TEST_GEN_FILES += migration
TEST_GEN_FILES += mkdirty
TEST_GEN_FILES += mlock-random-test
diff --git a/tools/testing/selftests/mm/run_vmtests.sh b/tools/testing/selftests/mm/run_vmtests.sh
index 03ac4f2e1cce6..36045edb10dea 100755
--- a/tools/testing/selftests/mm/run_vmtests.sh
+++ b/tools/testing/selftests/mm/run_vmtests.sh
@@ -374,8 +374,11 @@ CATEGORY="hmm" run_test bash ./test_hmm.sh smoke
# MADV_POPULATE_READ and MADV_POPULATE_WRITE tests
CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ]
+then
(echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix
CATEGORY="memfd_secret" run_test ./memfd_secret
+fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100
CATEGORY="ksm" run_test ./ksm_tests -H -s 100
--
2.39.2
This small series includes fixes for creation of veth pairs for
networkless kernels & adds tests for turning the different network
interface features on and off in selftests/net/netdevice.sh script.
Changes in v5:
Rectify the syntax for ip add link.
Fix the veth_created condition check.
Changes in v4:
https://lore.kernel.org/all/20240807175717.7775-1-jain.abhinav177@gmail.com/
Move veth creation/removal to the main shell script.
Tested using vng on a networkless kernel and the script works, sample
output below the changes.
Changes in v3:
https://lore.kernel.org/all/20240614113240.41550-1-jain.abhinav177@gmail.co…
Add a check for netdev, create veth pair for testing.
Restore feature to its initial state.
Changes in v2:
https://lore.kernel.org/all/20240609132124.51683-1-jain.abhinav177@gmail.co…
Remove tail usage; use read to parse the features from temp file.
v1:
https://lore.kernel.org/all/20240606212714.27472-1-jain.abhinav177@gmail.co…
```
# selftests: net: netdevice.sh
# No valid network device found, creating veth pair
# PASS: veth0: set interface up
# PASS: veth0: set MAC address
# SKIP: veth0: set IP address
# PASS: veth0: ethtool list features
# PASS: veth0: Turned off feature: rx-checksumming
# PASS: veth0: Turned on feature: rx-checksumming
# PASS: veth0: Restore feature rx-checksumming to initial state on
# Actual changes:
# tx-checksum-ip-generic: off
# tx-tcp-segmentation: off [not requested]
....
....
....
# PASS: veth1: Restore feature tx-nocache-copy to initial state off
# PASS: veth1: Turned off feature: tx-vlan-stag-hw-insert
# PASS: veth1: Turned on feature: tx-vlan-stag-hw-insert
# PASS: veth1: Restore feature tx-vlan-stag-hw-insert to initial state on
# PASS: veth1: Turned off feature: rx-vlan-stag-hw-parse
# PASS: veth1: Turned on feature: rx-vlan-stag-hw-parse
# PASS: veth1: Restore feature rx-vlan-stag-hw-parse to initial state on
# PASS: veth1: Turned off feature: rx-gro-list
# PASS: veth1: Turned on feature: rx-gro-list
# PASS: veth1: Restore feature rx-gro-list to initial state off
# PASS: veth1: Turned off feature: rx-udp-gro-forwarding
# PASS: veth1: Turned on feature: rx-udp-gro-forwarding
# PASS: veth1: Restore feature rx-udp-gro-forwarding to initial state off
# Cannot get register dump: Operation not supported
# SKIP: veth1: ethtool dump not supported
# PASS: veth1: ethtool stats
# PASS: veth1: stop interface
# Removed veth pair
ok 12 selftests: net: netdevice.sh
```
Abhinav Jain (2):
selftests: net: Create veth pair for testing in networkless kernel
selftests: net: Add on/off checks for non-fixed features of interface
tools/testing/selftests/net/netdevice.sh | 53 +++++++++++++++++++++++-
1 file changed, 52 insertions(+), 1 deletion(-)
--
2.34.1
The current support for LLVM and clang in nolibc and its testsuite is
very limited.
* Various architectures plain do not compile
* The user *has* to specify "-Os" otherwise the program crashes
* Cross-compilation of the tests does not work
* Using clang is not wired up in run-tests.sh
This series extends this support.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Add support for all architectures
- powerpc: "selftests/nolibc: don't use libgcc when building with clang"
- mips: "tools/nolibc: mips: load current function to $t9"
- s390: "selftests/nolibc: use correct clang target for s390/powerz"
- Expand commit messages
- Use __nolibc_ prefix for custom macros
- Link to v1: https://lore.kernel.org/r/20240728-nolibc-llvm-v1-0-bc384269bc35@weissschuh…
---
Thomas Weißschuh (15):
tools/nolibc: arm: use clang-compatible asm syntax
tools/nolibc: mips: load current function to $t9
tools/nolibc: powerpc: limit stack-protector workaround to GCC
tools/nolibc: compiler: introduce __nolibc_has_attribute()
tools/nolibc: move entrypoint specifics to compiler.h
tools/nolibc: compiler: use attribute((naked)) if available
selftests/nolibc: report failure if no testcase passed
selftests/nolibc: avoid passing NULL to printf("%s")
selftests/nolibc: determine $(srctree) first
selftests/nolibc: add support for LLVM= parameter
selftests/nolibc: add cc-option compatible with clang cross builds
selftests/nolibc: run-tests.sh: avoid overwriting CFLAGS_EXTRA
selftests/nolibc: don't use libgcc when building with clang
selftests/nolibc: use correct clang target for s390/powerz
selftests/nolibc: run-tests.sh: allow building through LLVM
tools/include/nolibc/arch-aarch64.h | 4 +--
tools/include/nolibc/arch-arm.h | 8 +++---
tools/include/nolibc/arch-i386.h | 4 +--
tools/include/nolibc/arch-loongarch.h | 4 +--
tools/include/nolibc/arch-mips.h | 8 ++++--
tools/include/nolibc/arch-powerpc.h | 6 ++--
tools/include/nolibc/arch-riscv.h | 4 +--
tools/include/nolibc/arch-s390.h | 4 +--
tools/include/nolibc/arch-x86_64.h | 4 +--
tools/include/nolibc/compiler.h | 24 +++++++++++-----
tools/testing/selftests/nolibc/Makefile | 41 +++++++++++++++++++---------
tools/testing/selftests/nolibc/nolibc-test.c | 4 +--
tools/testing/selftests/nolibc/run-tests.sh | 16 ++++++++---
13 files changed, 83 insertions(+), 48 deletions(-)
---
base-commit: ae1f550efc11eaf1496c431d9c6e784cb49124c5
change-id: 20240727-nolibc-llvm-3fad68590d4c
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hello Andrii Nakryiko,
This is a semi-automatic email about new static checker warnings.
Commit 8863238993e2 ("selftests/bpf: BPF register range bounds
tester") from Nov 11, 2023, leads to the following Smatch complaint:
./tools/testing/selftests/bpf/prog_tests/reg_bounds.c:1121 parse_reg_state()
warn: variable dereferenced before check 'p' (see line 1119)
./tools/testing/selftests/bpf/prog_tests/reg_bounds.c
1118 p = strpbrk(p, ",)");
1119 if (*p == ')')
1120 break;
1121 if (p)
Was this NULL check supposed to be checking for the NUL terminator?
1122 p++;
1123 }
regards,
dan carpenter
The "initial_nr_hugepages" variable is unsigned long so it takes up to
20 characters to print, plus 1 more character for the NUL terminator.
Unfortunately, this buffer is not quite large enough for the terminator
to fit. Also use snprintf() for a belt and suspenders approach.
Fixes: fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
---
tools/testing/selftests/mm/compaction_test.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index e140558e6f53..2c3a0eb6b22d 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -89,9 +89,10 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
int fd, ret = -1;
int compaction_index = 0;
char nr_hugepages[20] = {0};
- char init_nr_hugepages[20] = {0};
+ char init_nr_hugepages[24] = {0};
- sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
+ snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
+ "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
--
2.43.0
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executables are run
from elsewhere. Changing it to $ORIGIN makes it resolve relative
to the directory in which the executables reside, which is supposedly
the desired behaviour.
Discovered by the check-rpaths script[1][2] that checks for insecure
RPATH/RUNPATH[3], such as relative directories, during an attempt
to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/alsa/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/Makefile b/tools/testing/selftests/alsa/Makefile
index c1ce39874e2b..0f204da9ea8e 100644
--- a/tools/testing/selftests/alsa/Makefile
+++ b/tools/testing/selftests/alsa/Makefile
@@ -6,7 +6,7 @@ LDLIBS += $(shell pkg-config --libs alsa)
ifeq ($(LDLIBS),)
LDLIBS += -lasound
endif
-CFLAGS += -L$(OUTPUT) -Wl,-rpath=./
+CFLAGS += -L$(OUTPUT) -Wl,-rpath=\$$ORIGIN/
LDLIBS+=-lpthread
--
2.28.0
Hello Andrii Nakryiko,
This is a semi-automatic email about new static checker warnings.
Commit c381203eadb7 ("selftests/bpf: add trusted global subprog arg
tests") from Jan 29, 2024, leads to the following Smatch complaint:
./tools/testing/selftests/bpf/progs/verifier_global_ptr_args.c:88 trusted_task_arg_nonnull_fail2()
warn: variable dereferenced before check 'nullable' (see line 86)
./tools/testing/selftests/bpf/progs/verifier_global_ptr_args.c
85 /* should fail, PTR_TO_BTF_ID_OR_NULL */
86 res = subprog_trusted_task_nonnull(nullable);
^^^^^^^^
This is dereferenced
87
88 if (nullable)
^^^^^^^^
NULL check is too late
89 bpf_task_release(nullable);
90
regards,
dan carpenter
Hello,
KernelCI is hosting a bi-weekly call on Thursday to discuss improvements
to existing upstream tests, the development of new tests to increase
kernel testing coverage, and the enablement of these tests in KernelCI.
Below is a list of the tests the community has been working on and their
latest status updates, as discussed in the last meeting held on
2024-08-08:
*KTAP performance counters*
- Upcoming presentation @LPC2024 by Tim Bird on adding benchmark results
to KTAP: https://lpc.events/event/18/contributions/1791/
- Proposing new system to handle benchmark data, composed of 3 parts:
adding syntax to KTAP to support benchmark values, using a set of
external criteria for interpreting benchmark results, an automated tool
to determine and set the reference values used in these criteria.
- One related topic for discussion is where to store the reference files
and the test configuration, including all details that might impact the
results.
*Missing devices kselftest*
- Proposing new kselftest to report devices that go missing in the system:
https://lore.kernel.org/all/20240724-kselftest-dev-exist-v1-1-9bc21aa761b5@…
- Received feedback on the usability of the test and main unknowns about
management of reference files in tests
*Boot time test*
- RFC:
https://lore.kernel.org/all/20240725110622.96301-1-laura.nao@collabora.com/…
- Got feedback on the potential use of scripts from pm-graph. The
bootgraph.py script could be adapted for the test with some
modifications, such as adding support for machine-readable output and
different ftrace configurations.
*Suspend/resume in cpufreq kselftest*
- Sent v2 for patch adding RTC wakeup alarm in the cpufreq kselftest:
https://lore.kernel.org/lkml/20240715192634.19272-1-shreeya.patel@collabora…
- Looking into using the sleepgraph.py script from pm-graph to calculate
suspend/hibernation and resume time:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
- Would require additional configs to be enabled in an automated
environment
*TAP conformance in kselftests and other fixes*
- Some dead selftests were removed:
https://lore.kernel.org/all/20240725110817.659099-1-usama.anjum@collabora.c…
and
https://lore.kernel.org/all/20240725121212.808206-1-usama.anjum@collabora.c…
- Bitmap test module conversion to KUnit was not accepted:
https://lore.kernel.org/lkml/49108735-c776-4b6f-8264-62a827dd7b26@collabora…
- Fixing error when kvm suite is built for unsupported architecure:
https://lore.kernel.org/kvm/c2aaa06e-e86d-4af9-bce4-6067e53cdf39@collabora.…
Please reply to this thread if you'd like to join the call or discuss any
of the topics further. We look forward to collaborating with the community
to improve upstream tests and expand coverage to more areas of interest
within the kernel.
Best regards,
Laura Nao
From: Allison Henderson <allison.henderson(a)oracle.com>
Hi All,
This series is a new selftest that Vegard, Chuck and myself have been
working on to provide some test coverage for rds. I've modified the
scripts to include the feedback from the last version, but let me know
if there's anything missed. Questions and comments appreciated.
Thanks everyone!
Allison
Changes in v2:
- Removed qemu vm creation and related code
- Updated README.txt with examples of running the test with virtme
- Removed init.sh. run.sh now directly calls test.py
- Some clean up done with the return code handling since there is no
vm between the scripts anymore
- Imported ip python function in
tools/testing/selftests/net/lib/py/utils.py into test.py
- Adapted test.py to use the imported ip function, and removed the
local ip wrapper
- Some line wrap clean up
- Link to v1:
https://lore.kernel.org/netdev/20240626012834.5678-3-allison.henderson@orac…
Vegard Nossum (3):
.gitignore: add .gcda files
net: rds: add option for GCOV profiling
selftests: rds: add testing infrastructure
.gitignore | 1 +
Documentation/dev-tools/gcov.rst | 11 +
MAINTAINERS | 1 +
net/rds/Kconfig | 9 +
net/rds/Makefile | 5 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/net/rds/Makefile | 12 +
tools/testing/selftests/net/rds/README.txt | 41 ++++
tools/testing/selftests/net/rds/config.sh | 53 +++++
tools/testing/selftests/net/rds/run.sh | 224 ++++++++++++++++++
tools/testing/selftests/net/rds/test.py | 262 +++++++++++++++++++++
11 files changed, 620 insertions(+)
create mode 100644 tools/testing/selftests/net/rds/Makefile
create mode 100644 tools/testing/selftests/net/rds/README.txt
create mode 100755 tools/testing/selftests/net/rds/config.sh
create mode 100755 tools/testing/selftests/net/rds/run.sh
create mode 100644 tools/testing/selftests/net/rds/test.py
--
2.25.1
v17: https://patchwork.kernel.org/project/netdevbpf/list/?series=869900&state=*
====
v16 also got a very thorough review and some testing (thanks again!).
Thes version addresses all the concerns reported on v15, in terms of
feedback and issues reported.
Major changes:
- Use ASSERT_RTNL.
- Moved around some of the page_pool helpers definitions so I can hide
some netmem helpers in private files as Jakub suggested.
- Don't make every net_iov hold a ref on the binding as Jakub suggested.
- Fix issue reported by Taehee where we access queues after they have
been freed.
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v17/
v16: https://patchwork.kernel.org/project/netdevbpf/list/?series=866353&state=*
====
v15 got a thorough review and some testing, and this version addresses almost
all the feedback. Some more minor comments where the authors said it
could be done later, I left out.
Major changes:
- Addition of dma-buf introspection to page-pool-get and queue-get.
- Fixes to selftests suggested by Taehee.
- Fixes to documentation suggested by Donald.
- A couple of suggestions and fixes to TCP patches by Eric and David.
- Fixes to number assignements suggested by Arnd.
- Use rtnl_lock()ing to guard against queue reconfiguration while the
page_pool initialization is happening. (Jakub).
- Fixes to a few warnings reproduced by Taehee.
- Fixes to dma-buf binding suggested by Taehee and Jakub.
- Fixes to netlink UAPI suggested by Jakub
- Applied a number of Reviewed-bys and Acked-bys (including ones I lost
from v13+).
Full devmem TCP changes including the full GVE driver implementation is
here:
https://github.com/mina/linux/commits/tcpdevmem-v16/
One caveat: Taehee reproduced a KASAN warning and reported it here:
https://lore.kernel.org/netdev/CAMArcTUdCxOBYGF3vpbq=eBvqZfnc44KBaQTN7H-wqd…
I estimate the issue to be minor and easily fixable:
https://lore.kernel.org/netdev/CAHS8izNgaqC--GGE2xd85QB=utUnOHmioCsDd1TNxJW…
I hope to be able to follow up with a fix to net tree as net-next closes
imminently, but if this iteration doesn't make it in, I will repost with
a fix squashed after net-next reopens, no problem.
v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=*
====
No material changes in this version, only a fix to linking against
libynl.a from the last version. Per Jakub's instructions I've pulled one
of his patches into this series, and now use the new libynl.a correctly,
I hope.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v15/
v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=…
====
No material changes in this version. Only rebase and re-verification on
top of net-next. v13, I think, raced with commit ebad6d0334793
("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to
net-next that caused a patchwork failure to apply. This series should
apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded
IP_GRE config").
I did not wait the customary 24hr as Jakub said it's OK to repost as soon
as I build test the rebased version:
https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
Cc: Taehee Yoo <ap420073(a)gmail.com>
Cc: Donald Hunter <donald.hunter(a)gmail.com>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Mina Almasry (14):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: move dmaddr helpers to .c file
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
netdev: add dmabuf introspection
Documentation/netlink/specs/netdev.yaml | 61 +++
Documentation/networking/devmem.rst | 269 ++++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 9 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 128 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 164 +++++++-
include/net/page_pool/helpers.h | 42 +-
include/net/page_pool/types.h | 8 +
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 12 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 13 +
include/uapi/linux/uio.h | 17 +
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 7 +-
net/core/devmem.c | 378 +++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 111 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/netmem_priv.h | 36 ++
net/core/page_pool.c | 147 +++++--
net/core/page_pool_priv.h | 3 +
net/core/page_pool_user.c | 4 +
net/core/skbuff.c | 77 +++-
net/core/sock.c | 68 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 13 +
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 9 +
tools/testing/selftests/net/ncdevmem.c | 536 ++++++++++++++++++++++++
49 files changed, 2563 insertions(+), 121 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 net/core/netmem_priv.h
create mode 100644 tools/testing/selftests/net/ncdevmem.c
--
2.46.0.rc1.232.g9752f9e123-goog
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executable is run
from elsewhere. Changing it to $ORIGIN makes it resolve relative
to the directory in which the executable resides, which is supposedly
the desired behaviour.
Discovered by the check-rpaths script[1][2] that checks for insecure
RPATH/RUNPATH[3], such as relative directories, during an attempt
to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/bpf/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index dd49c1d23a60..6a3dc9b99159 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -241,7 +241,7 @@ $(OUTPUT)/urandom_read: urandom_read.c urandom_read_aux.c $(OUTPUT)/liburandom_r
$(filter-out -static,$(CFLAGS) $(LDFLAGS)) $(filter %.c,$^) \
-lurandom_read $(filter-out -static,$(LDLIBS)) -L$(OUTPUT) \
-fuse-ld=$(LLD) -Wl,-znoseparate-code -Wl,--build-id=sha1 \
- -Wl,-rpath=. -o $@
+ -Wl,-rpath=\$$ORIGIN/ -o $@
$(OUTPUT)/sign-file: ../../../../scripts/sign-file.c
$(call msg,SIGN-FILE,,$@)
--
2.28.0
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executables are run
from elsewhere. However, the sole sched test (cs_prctl_test)
does not require any locally-built libraries to run, so the RPATH
directive can be removed.
Discovered by the /usr/lib/rpm/check-rpaths script[1][2] that checks
for insecure RPATH/RUNPATH[3], such as containing relative directories,
during an attempt to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/sched/Makefile | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/tools/testing/selftests/sched/Makefile b/tools/testing/selftests/sched/Makefile
index 099ee9213557..0e4581ded9d6 100644
--- a/tools/testing/selftests/sched/Makefile
+++ b/tools/testing/selftests/sched/Makefile
@@ -4,8 +4,7 @@ ifneq ($(shell $(CC) --version 2>&1 | head -n 1 | grep clang),)
CLANG_FLAGS += -no-integrated-as
endif
-CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -Wl,-rpath=./ \
- $(CLANG_FLAGS)
+CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) $(CLANG_FLAGS)
LDLIBS += -lpthread
TEST_GEN_FILES := cs_prctl_test
--
2.28.0
The relative RPATH ("./") supplied to linker options in CFLAGS is resolved
relative to current working directory and not the executable directory,
which will lead in incorrect resolution when the test executables are run
from elsewhere. Changing it to $ORIGIN makes it resolve relative
to the directory in which the executables reside, which is supposedly
the desired behaviour.
Discovered by the /usr/lib/rpm/check-rpaths script[1][2] that checks
for insecure RPATH/RUNPATH[3], such as containing relative directories,
during an attempt to package BPF selftests for later use in CI:
ERROR 0004: file '/usr/libexec/kselftests/bpf/urandom_read' contains an insecure runpath '.' in [.]
[1] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[2] https://github.com/rpm-software-management/rpm/blob/master/scripts/check-rp…
[3] https://cwe.mitre.org/data/definitions/426.html
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
---
tools/testing/selftests/rseq/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rseq/Makefile b/tools/testing/selftests/rseq/Makefile
index 5a3432fceb58..27544a67d6f0 100644
--- a/tools/testing/selftests/rseq/Makefile
+++ b/tools/testing/selftests/rseq/Makefile
@@ -6,7 +6,7 @@ endif
top_srcdir = ../../../..
-CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=./ \
+CFLAGS += -O2 -Wall -g -I./ $(KHDR_INCLUDES) -L$(OUTPUT) -Wl,-rpath=\$$ORIGIN/ \
$(CLANG_FLAGS) -I$(top_srcdir)/tools/include
LDLIBS += -lpthread -ldl
--
2.28.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
So many "Address not found" messages occur at the end of forwarding tests
when using "ip address del" command for an invalid address:
TEST: FDB limits interacting with FDB type local [ OK ]
Error: ipv4: Address not found.
... ...
TEST: IGMPv3 S,G port entry automatic add to a *,G port [ OK ]
Error: ipv4: Address not found.
Error: ipv6: address not found.
... ...
TEST: Isolated port flooding [ OK ]
Error: ipv4: Address not found.
Error: ipv6: address not found.
... ...
TEST: Externally learned FDB entry - ageing & roaming [ OK ]
Error: ipv4: Address not found.
Error: ipv6: address not found.
This patch gnores these messages and redirects them to /dev/null in
__addr_add_del().
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
tools/testing/selftests/net/forwarding/lib.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh
index ff96bb7535ff..8670b6053cde 100644
--- a/tools/testing/selftests/net/forwarding/lib.sh
+++ b/tools/testing/selftests/net/forwarding/lib.sh
@@ -839,7 +839,7 @@ __addr_add_del()
array=("${@}")
for addrstr in "${array[@]}"; do
- ip address $add_del $addrstr dev $if_name
+ ip address $add_del $addrstr dev $if_name &> /dev/null
done
}
--
2.43.0
This small series includes fixes for creation of veth pairs for
networkless kernels & adds tests for turning the different network
interface features on and off in selftests/net/netdevice.sh script.
Changes in v4:
Move veth creation/removal to the main shell script.
Tested using vng on a networkless kernel and the script works, sample
output below the changes.
Changes in v3:
https://lore.kernel.org/all/20240614113240.41550-1-jain.abhinav177@gmail.co…
Add a check for netdev, create veth pair for testing.
Restore feature to its initial state.
Changes in v2:
https://lore.kernel.org/all/20240609132124.51683-1-jain.abhinav177@gmail.co…
Remove tail usage; use read to parse the features from temp file.
v1:
https://lore.kernel.org/all/20240606212714.27472-1-jain.abhinav177@gmail.co…
```
# selftests: net: netdevice.sh
# No valid network device found, creating veth pair
# PASS: veth0: set interface up
# PASS: veth0: set MAC address
# SKIP: veth0: set IP address
# PASS: veth0: ethtool list features
# PASS: veth0: Turned off feature: rx-checksumming
# PASS: veth0: Turned on feature: rx-checksumming
# PASS: veth0: Restore feature rx-checksumming to initial state on
# Actual changes:
# tx-checksum-ip-generic: off
# tx-tcp-segmentation: off [not requested]
....
....
....
# PASS: veth1: Restore feature tx-nocache-copy to initial state off
# PASS: veth1: Turned off feature: tx-vlan-stag-hw-insert
# PASS: veth1: Turned on feature: tx-vlan-stag-hw-insert
# PASS: veth1: Restore feature tx-vlan-stag-hw-insert to initial state on
# PASS: veth1: Turned off feature: rx-vlan-stag-hw-parse
# PASS: veth1: Turned on feature: rx-vlan-stag-hw-parse
# PASS: veth1: Restore feature rx-vlan-stag-hw-parse to initial state on
# PASS: veth1: Turned off feature: rx-gro-list
# PASS: veth1: Turned on feature: rx-gro-list
# PASS: veth1: Restore feature rx-gro-list to initial state off
# PASS: veth1: Turned off feature: rx-udp-gro-forwarding
# PASS: veth1: Turned on feature: rx-udp-gro-forwarding
# PASS: veth1: Restore feature rx-udp-gro-forwarding to initial state off
# Cannot get register dump: Operation not supported
# SKIP: veth1: ethtool dump not supported
# PASS: veth1: ethtool stats
# PASS: veth1: stop interface
# Removed veth pair
ok 12 selftests: net: netdevice.sh
```
Abhinav Jain (2):
selftests: net: Create veth pair for testing in networkless kernel
selftests: net: Add on/off checks for non-fixed features of interface
tools/testing/selftests/net/netdevice.sh | 55 +++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
--
2.34.1
Add a new kselftest to detect and report slowdowns in key boot events. The
test uses ftrace to track timings for specific boot events and compares
these timestamps against reference values provided in YAML format.
The test includes the following files:
- `bootconfig` file: configures ftrace and lists reference key boot
events.
- `config` fragment: enables boot time tracing and attaches the
bootconfig file to the kernel image.
- `kprobe_timestamps_to_yaml.py` script: parses the current trace file to
extract event names and timestamps and writes them to a YAML file. The
script is intended to be run once to generate initial reference values;
the generated file is not meant to be stored in the kernel sources but
should be provided as input to the test itself. YAML format was chosen
to allow easy integration with per-platform data used in other tests,
such as the discoverable devices probe test in
tools/testing/selftests/devices. Another option is to use JSON, as the
file is not intended for manual editing and JSON is already supported
by the Python standard library.
- `test_boot_time.py` script: parses the current trace file and compares
timestamps against the values in the YAML file provided as input.
Reports a failure if any timestamp differs from the reference value by
more than the specified delta.
- `trace_utils.py` file: utility functions to mount debugfs and parse the
trace file to extract relevant information.
The bootconfig file provided is an initial draft with some reference kprobe
events to showcase how the test works. I would appreciate feedback from
those interested in running this test on which boot events should be added.
Different key events might be relevant depending on the platform and its
boot time requirements. This file should serve as a common ground and be
populated with critical events and functions common to different platforms.
Feedback on the overall approach of this test and suggestions for
additional boot events to trace would be greatly appreciated.
Example output with a deliberately small delta of 0.01 to demonstrate failures:
TAP version 13
1..4
ok 1 populate_rootfs_begin
# 'run_init_process_begin' differs by 0.033990 seconds.
not ok 2 run_init_process_begin
# 'run_init_process_end' differs by 0.033796 seconds.
not ok 3 run_init_process_end
ok 4 unpack_to_rootfs_begin
# Totals: pass:2 fail:2 xfail:0 xpass:0 skip:0 error:0
This patch depends on "kselftest: Move ksft helper module to common
directory":
https://lore.kernel.org/all/20240705-dev-err-log-selftest-v2-2-163b9cd7b3c1…
which was picked through the usb tree and is queued for 6.11-rc1.
Best,
Laura
Laura Nao (1):
kselftests: Add test to detect boot event slowdowns
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/boot-time/Makefile | 17 ++++
tools/testing/selftests/boot-time/bootconfig | 8 ++
tools/testing/selftests/boot-time/config | 4 +
.../boot-time/kprobe_timestamps_to_yaml.py | 55 +++++++++++
.../selftests/boot-time/test_boot_time.py | 94 +++++++++++++++++++
.../selftests/boot-time/trace_utils.py | 63 +++++++++++++
7 files changed, 242 insertions(+)
create mode 100644 tools/testing/selftests/boot-time/Makefile
create mode 100644 tools/testing/selftests/boot-time/bootconfig
create mode 100644 tools/testing/selftests/boot-time/config
create mode 100755 tools/testing/selftests/boot-time/kprobe_timestamps_to_yaml.py
create mode 100755 tools/testing/selftests/boot-time/test_boot_time.py
create mode 100644 tools/testing/selftests/boot-time/trace_utils.py
--
2.30.2
There are multiple possible timer sources which could be useful for
the sound stream synchronization: hrtimers, hardware clocks (e.g. PTP),
timer wheels (jiffies). Currently, using one of them to synchronize
the audio stream of snd-aloop module would require writing a
kernel-space driver which exports an ALSA timer through the
snd_timer interface.
However, it is not really convenient for application developers, who may
want to define their custom timer sources for audio synchronization.
For instance, we could have a network application which receives frames
and sends them to snd-aloop pcm device, and another application
listening on the other end of snd-aloop. It makes sense to transfer a
new period of data only when certain amount of frames is received
through the network, but definitely not when a certain amount of jiffies
on a local system elapses. Since all of the devices are purely virtual
it won't introduce any glitches and will help the application developers
to avoid using sample-rate conversion.
This patch series introduces userspace-driven ALSA timers: virtual
timers which are created and controlled from userspace. The timer can
be created from the userspace using the new ioctl SNDRV_TIMER_IOCTL_CREATE.
After creating a timer, it becomes available for use system-wide, so it
can be passed to snd-aloop as a timer source (timer_source parameter
would be "-1.SNDRV_TIMER_GLOBAL_UDRIVEN.{timer_id}"). When the userspace
app decides to trigger a timer, it calls another ioctl
SNDRV_TIMER_IOCTL_TRIGGER on the file descriptor of a timer. It
initiates a transfer of a new period of data.
Userspace-driven timers are associated with file descriptors. If the
application wishes to destroy the timer, it can simply release the file
descriptor of a virtual timer.
I believe introducing new ioctl calls is quite inconvenient (as we have
a limited amount of them), but other possible ways of app <-> kernel
communication (like virtual FS) seem completely inappropriate for this
task (but I'd love to discuss alternative solutions).
This patch series also updates the snd-aloop module so the global timers
can be used as a timer_source for it (it allows using userspace-driven
timers as timer source).
V1 -> V2:
- Fix some problems found by Christophe Jaillet
<christophe.jaillet(a)wanadoo.fr>
V2 -> V3:
- Add improvements suggested by Takashi Iwai <tiwai(a)suse.de>
Please, find the patch-specific changelog in the following patches.
Ivan Orlov (4):
ALSA: aloop: Allow using global timers
Docs/sound: Add documentation for userspace-driven ALSA timers
ALSA: timer: Introduce virtual userspace-driven timers
selftests: ALSA: Cover userspace-driven timers with test
Documentation/sound/index.rst | 1 +
Documentation/sound/utimers.rst | 120 +++++++++++
include/uapi/sound/asound.h | 20 +-
sound/core/Kconfig | 10 +
sound/core/timer.c | 221 ++++++++++++++++++++
sound/drivers/aloop.c | 2 +
tools/testing/selftests/alsa/Makefile | 2 +-
tools/testing/selftests/alsa/global-timer.c | 87 ++++++++
tools/testing/selftests/alsa/utimer-test.c | 170 +++++++++++++++
9 files changed, 631 insertions(+), 2 deletions(-)
create mode 100644 Documentation/sound/utimers.rst
create mode 100644 tools/testing/selftests/alsa/global-timer.c
create mode 100644 tools/testing/selftests/alsa/utimer-test.c
--
2.34.1
Hi Kees and All,
There are several tests in kselftest subsystem which load modules to tests
the internals of the kernel. Most of these test modules are just loaded by
the kselftest, their status isn't read and reported to the user logs. Hence
they don't provide benefit of executing those tests.
I've found patches from Kees where he has been converting such kselftests
to kunit tests [1]. The probable motivation is to move tests output of
kselftest subsystem which only triggers tests without correctly reporting
the results. On the other hand, kunit is there to test the kernel's
internal functions which can't be done by userspace.
Kselftest: Test user facing APIs from userspace
Kunit: Test kernel's internal functions from kernelspace
This brings me to conclusion that kselftest which are loading modules to
test kernelspace should be converted to kunit tests. I've noted several
such kselftests.
This is just my understanding. Please mention if I'm correct above or more
reasons to support kselftest test modules transformation into kunit test.
[1] https://lore.kernel.org/all/20221018082824.never.845-kees@kernel.org/
--
BR,
Muhammad Usama Anjum
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). The user must provide a shadow stack
address and size, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with a shadow stack token at the top of the
stack.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
Please further note that the token consumption done by clone3() is not
currently implemented in an atomic fashion, Rick indicated that he would
look into fixing this if people are OK with the implementation.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (9):
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Explicitly handle child exits due to signals
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 41 ++++
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 104 +++++++---
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 13 ++
include/uapi/linux/sched.h | 13 +-
kernel/fork.c | 76 ++++++--
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 224 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 40 +++-
tools/testing/selftests/ksft_shstk.h | 63 ++++++
15 files changed, 511 insertions(+), 88 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A regression happened where running the ownership test passes on the first
iteration but fails running it a second time. This was caught and fixed,
but a later change brought it back. The regression was missed because the
automated tests only run the tests once per boot.
Change the ownership test to iterate through the tests twice, as this will
catch the regression with a single run.
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
.../ftrace/test.d/00basic/test_ownership.tc | 34 +++++++++++--------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
index c45094d1e1d2..71e43a92352a 100644
--- a/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
+++ b/tools/testing/selftests/ftrace/test.d/00basic/test_ownership.tc
@@ -83,32 +83,38 @@ run_tests() {
done
}
-mount -o remount,"$new_options" .
+# Run the tests twice as leftovers can cause issues
+for loop in 1 2 ; do
-run_tests
+ echo "Running iteration $loop"
-mount -o remount,"$mount_options" .
+ mount -o remount,"$new_options" .
-for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
- test "$d" $original_group
-done
+ run_tests
+
+ mount -o remount,"$mount_options" .
+
+ for d in "." "events" "events/sched" "events/sched/sched_switch" "events/sched/sched_switch/enable" $canary; do
+ test "$d" $original_group
+ done
# check instances as well
-chgrp $other_group instances
+ chgrp $other_group instances
-instance="$(mktemp -u test-XXXXXX)"
+ instance="$(mktemp -u test-XXXXXX)"
-mkdir instances/$instance
+ mkdir instances/$instance
-cd instances/$instance
+ cd instances/$instance
-run_tests
+ run_tests
-cd ../..
+ cd ../..
-rmdir instances/$instance
+ rmdir instances/$instance
-chgrp $original_group instances
+ chgrp $original_group instances
+done
exit 0
--
2.43.0
This small series fixes is_madv_discard() and adds a small sanity check
test to selftests/mm/mseal_test. Without this patch, is_madv_discard()
erroneously thinks innocent ops like MADV_RANDOM are discard operations
(which they are not, and are supposed to be allowed, per the overall
design).
Based on Linus's tree and taken from my mseal depessimization series[1].
[1]: https://lore.kernel.org/all/20240806212808.1885309-1-pedro.falcato@gmail.co…
Pedro Falcato (2):
mseal: Fix is_madv_discard()
selftests/mm: Add mseal test for no-discard madvise
mm/mseal.c | 14 +++++++---
tools/testing/selftests/mm/mseal_test.c | 34 +++++++++++++++++++++++++
2 files changed, 45 insertions(+), 3 deletions(-)
--
2.46.0
The dma-iommu needs to find the correct domain for MSI mapping. With an
IOMMU_DOMAIN_NESTED, the mapping resides in its parent paging domain.
Add a get_msi_mapping_domain op for drivers to return paging domains.
Add an iommufd selftest coverage for that, by doing a loopback test.
Add arm_smmu_get_msi_mapping_domain in the SMMUv3 driver so its nesting
feature could work with MSI correctly.
This is based on top of the reserved-IOVA change:
https://lore.kernel.org/all/20240802053458.2754673-1-nicolinc@nvidia.com/
And Jason's SMMUv3 nesting series:
https://lore.kernel.org/all/0-v1-54e734311a7f+14f72-smmuv3_nesting_jgg@nvid…
This series is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_nesting_sw_msi/
[changelog]
v3:
* Refined PATCH-2 commit message
* Added domain->ops check in PATCH-2
* Added PATCH-4 to implement in SMMUv3 driver
v2:
https://lore.kernel.org/all/cover.1722644866.git.nicolinc@nvidia.com/
* Resent with a proper bug fix.
Thanks
Nicolin
Nicolin Chen (3):
iommufd: Reorder include files
iommufd/selftest: Add coverage for IOMMU_RESV_SW_MSI
iommu/arm-smmu-v3: Implement arm_smmu_get_msi_mapping_domain
Robin Murphy (1):
iommu/dma: Support MSIs through nested domains
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++
drivers/iommu/dma-iommu.c | 18 +++-
drivers/iommu/iommufd/device.c | 4 +-
drivers/iommu/iommufd/fault.c | 4 +-
drivers/iommu/iommufd/io_pagetable.c | 8 +-
drivers/iommu/iommufd/io_pagetable.h | 2 +-
drivers/iommu/iommufd/ioas.c | 2 +-
drivers/iommu/iommufd/iommufd_private.h | 9 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +-
drivers/iommu/iommufd/iova_bitmap.c | 2 +-
drivers/iommu/iommufd/main.c | 8 +-
drivers/iommu/iommufd/pages.c | 10 +-
drivers/iommu/iommufd/selftest.c | 92 ++++++++++++++++++-
include/linux/iommu.h | 4 +
include/linux/iommufd.h | 4 +-
include/uapi/linux/iommufd.h | 2 +-
tools/testing/selftests/iommu/iommufd_utils.h | 9 ++
17 files changed, 160 insertions(+), 34 deletions(-)
--
2.43.0
This patch series adds a selftest suite to validate the s390x
architecture specific ucontrol KVM interface.
When creating a VM on s390x it is possible to create it as userspace
controlled VM or in short ucontrol VM.
These VMs delegates the management of the VM to userspace instead
of handling most events within the kernel. Consequently the userspace
has to manage interrupts, memory allocation etc.
Before this patch set this functionality lacks any public test cases.
It is desirable to add test cases for this interface to be able to
reduce the risk of breaking changes in the future.
In order to provision a ucontrol VM the kernel needs to be compiled with
the CONFIG_KVM_S390_UCONTROL enabled. The users with sys_admin capability
can then create a new ucontrol VM providing the KVM_VM_S390_UCONTROL
parameter to the KVM_CREATE_VM ioctl.
The kernels existing selftest helper functions can only be partially be
reused for these tests.
The test cases cover existing special handling of ucontrol VMs within the
implementation and basic VM creation and handling cases:
* Reject setting HPAGE when VM is ucontrol
* Assert KVM_GET_DIRTY_LOG is rejected
* Assert KVM_S390_VM_MEM_LIMIT_SIZE is rejected
* Assert state of initial SIE flags setup by the kernel
* Run simple program in VM with and without DAT
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
The patch set does also contain some code cleanup / consolidation of
architecture specific defines that are now used in multiple test cases.
---
v4:
- PATCH 5: Remove not yet used include for debug print functions
- PATCH 6: Add include for debug print functions (removed from patch 5)
Remove no longer needed code since stopped but is reset
before starting since v3 (thanks Janosch)
Adjust test output to use leading zeros instead of spaces in sieic
- PATCH 7: Rename constant to PGM_SEGMENT_TRANSLATION (thanks Janosch)
Put comments on their own lines
v3:
- Remove stopped bit before starting the VM (no initial stop in multiple
test cases) (thanks Janosch)
- PATCH 2: Clarified SIE control block vs SIE instruction (thanks
Janosch)
- PATCH 3: Make use of CAP_TO_MASK(CAP_SYS_ADMIN) instead of custom
define (thanks Janosch)
Removed Reviewed-By: Claudio
- PATCH 4: Remove erroneous 1MB offset from self->base_hva (thanks
Janosch)
- PATCH 6-8: Change name of test program _pgm to _asm to prevent confusion
- PATCH 10: Move KVM_S390_UCONTROL default option to actual debug config
(thanks Christian)
v2:
- add ucontrol to s390 debug config (new patch)
- PATCH 2: changed atomic_t to __u32 (thanks Claudio)
- PATCH 4: reformatted comment in FIXTURE_SETUP(uc_kvm)
- PATCH 5: refactored to display 8 byte blocks + more internal reuse
(thanks Claudio)
- PATCH 7: make use of more declarative defines instead of magic values
- PATCH 8: make use of more declarative defines instead of magic values
(thanks Claudio)
- PATCH 9: add reference to fix verified by the test case
Christoph Schlameuss (10):
selftests: kvm: s390: Define page sizes in shared header
selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace
tests
selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
selftests: kvm: s390: Add test fixture and simple VM setup tests
selftests: kvm: s390: Add debug print functions
selftests: kvm: s390: Add VM run test case
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
arch/s390/configs/debug_defconfig | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/s390x/debug_print.h | 69 ++
.../selftests/kvm/include/s390x/processor.h | 5 +
.../testing/selftests/kvm/include/s390x/sie.h | 240 +++++++
.../selftests/kvm/lib/s390x/processor.c | 10 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +-
tools/testing/selftests/kvm/s390x/config | 2 +
.../testing/selftests/kvm/s390x/debug_test.c | 4 +-
tools/testing/selftests/kvm/s390x/memop.c | 4 +-
tools/testing/selftests/kvm/s390x/tprot.c | 5 +-
.../selftests/kvm/s390x/ucontrol_test.c | 596 ++++++++++++++++++
13 files changed, 929 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/s390x/debug_print.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/sie.h
create mode 100644 tools/testing/selftests/kvm/s390x/config
create mode 100644 tools/testing/selftests/kvm/s390x/ucontrol_test.c
base-commit: c0ecd6388360d930440cc5554026818895199923
--
2.45.2
First 4 patches are more-or-less cleanups/preparations.
Patch 5 was sent to me/contributed off-list by Mohammad, who wants 32-bit
kernels to run TCP-AO.
Patch 6 is a workaround/fix for slow VMs. Albeit, I can't reproduce
the issue, but I hope it will fix netdev flakes for connect-deny-*
tests.
And the biggest change is adding TCP-AO tracepoints to selftests.
I think it's a good addition by the following reasons:
- The related tracepoints are now tested;
- It allows tcp-ao selftests to raise expectations on the kernel
behavior - up from the syscalls exit statuses + net counters.
- Provides tracepoints usage samples.
As tracepoints are not a stable ABI, any kernel changes done to them
will be reflected to the selftests, which also will allow users
to see how to change their code. It's quite better than parsing dmesg
(what BGP was doing pre-tracepoints, ugh).
Somewhat arguably, the code parses trace_pipe, rather than uses
libtraceevent (which any sane user should do). The reason behind that is
the same as for rt-netlink macros instead of libmnl: I'm trying
to minimize the library dependencies of the selftests. And the
performance of formatting text in kernel and parsing it again in a test
is not critical.
Current output sample:
> ok 73 Trace events matched expectations: 13 tcp_hash_md5_required[2] tcp_hash_md5_unexpected[4] tcp_hash_ao_required[3] tcp_ao_key_not_found[4]
Previously, tracepoints selftests were part of kernel tcp tracepoints
submission [1], but since then the code was quite changed:
- Now generic tracing setup is in lib/ftrace.c, separate from
lib/ftrace-tcp.c which utilizes TCP trace points. This separation
allows future selftests to trace non-TCP events, i.e. to find out
an skb's drop reason, which was useful in the creation of TCP-CLOSE
stress-test (not in this patch set, but used in attempt to reproduce
the issue from [2]).
- Another change is that in the previous submission the trace events
where used only to detect unexpected TCP-AO/TCP-MD5 events. In this
version the selftests will fail if an expected trace event didn't
appear.
Let's see how reliable this is on the netdev bot - it obviously passes
on my testing, but potentially may require a temporary XFAIL patch
if it misbehaves on a slow VM.
[1] https://lore.kernel.org/lkml/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3…
Signed-off-by: Dmitry Safonov <0x7f454c46(a)gmail.com>
---
Changes in v2:
- Fixed two issues with parsing TCP-AO events: the socket state and TCP
segment flags. Hopefully, won't fail on netdev.
- Reword patch 1 & 2 messages to be more informative and at some degree
formal (Paolo)
- Since commit e33a02ed6a4f ("selftests: Add printf attribute to
kselftest prints") it's possible to use __printf instead of "raw" gcc
attribute - switch using that, as checkpatch suggests.
- Link to v1: https://lore.kernel.org/r/20240730-tcp-ao-selftests-upd-6-12-v1-0-ffd4bf15d…
---
Dmitry Safonov (6):
selftests/net: Clean-up double assignment
selftests/net: Provide test_snprintf() helper
selftests/net: Be consistent in kconfig checks
selftests/net: Don't forget to close nsfd after switch_save_ns()
selftests/net: Synchronize client/server before counters checks
selftests/net: Add trace events matching to tcp_ao
Mohammad Nassiri (1):
selftests/tcp_ao: Fix printing format for uint64_t
tools/testing/selftests/net/tcp_ao/Makefile | 3 +-
tools/testing/selftests/net/tcp_ao/bench-lookups.c | 2 +-
tools/testing/selftests/net/tcp_ao/config | 1 +
tools/testing/selftests/net/tcp_ao/connect-deny.c | 25 +-
tools/testing/selftests/net/tcp_ao/connect.c | 6 +-
tools/testing/selftests/net/tcp_ao/icmps-discard.c | 2 +-
.../testing/selftests/net/tcp_ao/key-management.c | 18 +-
tools/testing/selftests/net/tcp_ao/lib/aolib.h | 176 ++++++-
.../testing/selftests/net/tcp_ao/lib/ftrace-tcp.c | 549 +++++++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/ftrace.c | 466 +++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/kconfig.c | 31 +-
tools/testing/selftests/net/tcp_ao/lib/setup.c | 15 +-
tools/testing/selftests/net/tcp_ao/lib/sock.c | 1 -
tools/testing/selftests/net/tcp_ao/lib/utils.c | 26 +
tools/testing/selftests/net/tcp_ao/restore.c | 30 +-
tools/testing/selftests/net/tcp_ao/rst.c | 2 +-
tools/testing/selftests/net/tcp_ao/self-connect.c | 19 +-
tools/testing/selftests/net/tcp_ao/seq-ext.c | 28 +-
.../selftests/net/tcp_ao/setsockopt-closed.c | 6 +-
tools/testing/selftests/net/tcp_ao/unsigned-md5.c | 35 +-
20 files changed, 1375 insertions(+), 66 deletions(-)
---
base-commit: 3361a6eae59664ffae640ff7a838f5bd89c24461
change-id: 20240730-tcp-ao-selftests-upd-6-12-4d3e53a74f3f
Best regards,
--
Dmitry Safonov <0x7f454c46(a)gmail.com>
This revision only updates the tests from the previous revision[1], and
integrates an Acked-by[2] and a Reviewed-By[3] into the first commit
message.
Documentation/admin-guide/cgroup-v2.rst | 22 ++-
include/linux/cgroup-defs.h | 5 +
include/linux/cgroup.h | 3 +
include/linux/memcontrol.h | 5 +
include/linux/page_counter.h | 11 +-
kernel/cgroup/cgroup-internal.h | 2 +
kernel/cgroup/cgroup.c | 7 +
mm/memcontrol.c | 116 +++++++++++++--
mm/page_counter.c | 30 +++-
tools/testing/selftests/cgroup/cgroup_util.c | 22 +++
tools/testing/selftests/cgroup/cgroup_util.h | 2 +
tools/testing/selftests/cgroup/test_memcontrol.c | 264 ++++++++++++++++++++++++++++++++-
12 files changed, 454 insertions(+), 35 deletions(-)
[1]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/
[2]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/#…
[3]: https://lore.kernel.org/cgroups/20240729143743.34236-1-davidf@vimeo.com/T/#…
Thank you all for the support and reviews so far!
David Finkel
Senior Principal Software Engineer
Vimeo Inc.
Hello,
this series brings a new set of test converted to the test_progs framework.
Since the tests are quite small, I chose to group three tests conversion in
the same series, but feel free to let me know if I should keep one series
per test. The series focuses on cgroup testing and converts the following
tests:
- get_cgroup_id_user
- cgroup_storage
- test_skb_cgroup_id_user
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore(a)bootlin.com>
---
Alexis Lothoré (eBPF Foundation) (4):
selftests/bpf: convert get_current_cgroup_id_user to test_progs
selftests/bpf: convert test_cgroup_storage to test_progs
selftests/bpf: add proper section name to bpf prog and rename it
selftests/bpf: convert test_skb_cgroup_id_user to test_progs
tools/testing/selftests/bpf/.gitignore | 3 -
tools/testing/selftests/bpf/Makefile | 8 +-
tools/testing/selftests/bpf/get_cgroup_id_user.c | 151 -----------------
.../selftests/bpf/prog_tests/cgroup_ancestor.c | 159 ++++++++++++++++++
.../bpf/prog_tests/cgroup_get_current_cgroup_id.c | 58 +++++++
.../selftests/bpf/prog_tests/cgroup_storage.c | 65 ++++++++
...test_skb_cgroup_id_kern.c => cgroup_ancestor.c} | 2 +-
tools/testing/selftests/bpf/progs/cgroup_storage.c | 24 +++
tools/testing/selftests/bpf/test_cgroup_storage.c | 174 --------------------
tools/testing/selftests/bpf/test_skb_cgroup_id.sh | 63 -------
.../selftests/bpf/test_skb_cgroup_id_user.c | 183 ---------------------
11 files changed, 309 insertions(+), 581 deletions(-)
---
base-commit: 0e2eaf4b33f65e904b69bae6b956f3f610dbba9a
change-id: 20240725-convert_cgroup_tests-d07c66053225
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
The system register definitions in the arm64 get-reg-list are all done
with directly specified magic numbers rather than using the definitions
we import from the main kernel. This is error prone, and requires us to
audit the additions to get-reg-list separately to what we do when
specifying the registers for the main kernel. Since Marc has indicated
that this isn't a deliberate or desired choice let's start using the
constants we have defined.
We first manually update the data used to filter registers based on ID
register fields to use a simplified macro that specifies the register
and ID field in a muc more compact fashion. This is done first since
there is an error in the ID register field for the S1PIE registers. We
then replace all the remaining named system register specifications with
use of the existing KVM_ARM64_SYS_REG() macro.
This is just a first step, there's a bunch more work we could be doing
here, the main thing being making use of the encodings in
arch/arm64/tools/sysreg to convert more of the registers (including
updating as more registers are converted to use the generator).
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Add use of designated initalisers when converting filtering macros.
- Manual handling of CNTV_CTL_EL0 and CNTV_CVAL_EL0.
- Commit message tweaks.
- Link to v1: https://lore.kernel.org/r/20240802-kvm-arm64-get-reg-list-v1-0-3a5bf8f80765…
---
Mark Brown (3):
KVM: selftests: arm64: Simplify specification of filtered registers
KVM: selftests: arm64: Use symbolic definitions for incorrect encodings
KVM: selftests: arm64: Use generated defines for named system registers
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 244 ++++++++++-----------
1 file changed, 122 insertions(+), 122 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20240802-kvm-arm64-get-reg-list-a86a37460bdd
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The system register definitions in the arm64 get-reg-list are all done
with directly specified magic numbers rather than using the definitions
we import from the main kernel. This is error prone, and requires us to
audit the additions to get-reg-list separately to what we do when
specifying the registers for the main kernel. Since Marc has indicated
that this isn't a deliberate or desired choice let's start using the
constants we have defined.
We first manually update the data used to filter registers based on ID
register fields to use a simplified macro that specifies the register
and ID field in a muc more compact fashion. This is done first since
there is an error in the ID register field for the S1PIE registers. We
then replace all the remaining named system register specifications with
use of the existing KVM_ARM64_SYS_REG() macro.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
KVM: selftests: arm64: Simplify specification of filtered registers
KVM: selftests: arm64: Use generated defines for named system registers
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 237 ++++++++++-----------
1 file changed, 115 insertions(+), 122 deletions(-)
---
base-commit: 8400291e289ee6b2bf9779ff1c83a291501f017b
change-id: 20240802-kvm-arm64-get-reg-list-a86a37460bdd
Best regards,
--
Mark Brown <broonie(a)kernel.org>
In commit 7d3c33b290b1 ("kunit: Device wrappers should also manage driver name"),
the kunit_kstrdup_const() and kunit_kfree_const() were introduced as an
optimisation of kunit_kstrdup(), which only copy/free strings from the
kernel rodata.
However, these are inline functions, and is_kernel_rodata() only works
for built-in code. This causes problems in two cases:
- If kunit is built as a module, __{start,end}_rodata is not defined.
- If a kunit test using these functions is built as a module, it will
suffer the same fate.
Restrict the is_kernel_rodata() case to when KUnit is built as a module,
which fixes the first case, at the cost of losing the optimisation.
Also, make kunit_{kstrdup,kfree}_const non-inline, so that other modules
using them will not accidentally depend on is_kernel_rodata(). If KUnit
is built-in, they'll benefit from the optimisation, if KUnit is not,
they won't, but the string will be properly duplicated.
(And fix a couple of typos in the doc comment, too.)
Reported-by: Nico Pache <npache(a)redhat.com>
Closes: https://lore.kernel.org/all/CAA1CXcDKht4vOL-acxrARbm6JhGna8_k8wjYJ-vHONink8…
Fixes: 7d3c33b290b1 ("kunit: Device wrappers should also manage driver name")
Signed-off-by: David Gow <davidgow(a)google.com>
---
include/kunit/test.h | 16 +++-------------
lib/kunit/test.c | 19 +++++++++++++++++++
2 files changed, 22 insertions(+), 13 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index da9e84de14c0..5ac237c949a0 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -489,11 +489,7 @@ static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp
* Calls kunit_kfree() only if @x is not in .rodata section.
* See kunit_kstrdup_const() for more information.
*/
-static inline void kunit_kfree_const(struct kunit *test, const void *x)
-{
- if (!is_kernel_rodata((unsigned long)x))
- kunit_kfree(test, x);
-}
+void kunit_kfree_const(struct kunit *test, const void *x);
/**
* kunit_kstrdup() - Duplicates a string into a test managed allocation.
@@ -527,16 +523,10 @@ static inline char *kunit_kstrdup(struct kunit *test, const char *str, gfp_t gfp
* @gfp: flags passed to underlying kmalloc().
*
* Calls kunit_kstrdup() only if @str is not in the rodata section. Must be freed with
- * kunit_free_const() -- not kunit_free().
+ * kunit_kfree_const() -- not kunit_kfree().
* See kstrdup_const() and kunit_kmalloc_array() for more information.
*/
-static inline const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
-{
- if (is_kernel_rodata((unsigned long)str))
- return str;
-
- return kunit_kstrdup(test, str, gfp);
-}
+const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp);
/**
* kunit_vm_mmap() - Allocate KUnit-tracked vm_mmap() area
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e8b1b52a19ab..089c832e3cdb 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -874,6 +874,25 @@ void kunit_kfree(struct kunit *test, const void *ptr)
}
EXPORT_SYMBOL_GPL(kunit_kfree);
+void kunit_kfree_const(struct kunit *test, const void *x)
+{
+#if !IS_MODULE(CONFIG_KUNIT)
+ if (!is_kernel_rodata((unsigned long)x))
+#endif
+ kunit_kfree(test, x);
+}
+EXPORT_SYMBOL_GPL(kunit_kfree_const);
+
+const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
+{
+#if !IS_MODULE(CONFIG_KUNIT)
+ if (is_kernel_rodata((unsigned long)str))
+ return str;
+#endif
+ return kunit_kstrdup(test, str, gfp);
+}
+EXPORT_SYMBOL_GPL(kunit_kstrdup_const);
+
void kunit_cleanup(struct kunit *test)
{
struct kunit_resource *res;
--
2.46.0.rc2.264.g509ed76dc8-goog
kunit_driver_create() accepts a name for the driver, but does not copy
it, so if that name is either on the stack, or otherwise freed, we end
up with a use-after-free when the driver is cleaned up.
Instead, strdup() the name, and manage it as another KUnit allocation.
As there was no existing kunit_kstrdup(), we add one. Further, add a
kunit_ variant of strdup_const() and kfree_const(), so we don't need to
allocate and manage the string in the majority of cases where it's a
constant.
This fixes a KASAN splat with overflow.overflow_allocation_test, when
built as a module.
Fixes: d03c720e03bd ("kunit: Add APIs for managing devices")
Reported-by: Nico Pache <npache(a)redhat.com>
Closes: https://groups.google.com/g/kunit-dev/c/81V9b9QYON0
Signed-off-by: David Gow <davidgow(a)google.com>
Reviewed-by: Kees Cook <kees(a)kernel.org>
---
There's some more serious changes since the RFC I sent, so please take a
closer look.
Thanks,
-- David
Changes since RFC:
https://groups.google.com/g/kunit-dev/c/81V9b9QYON0/m/PFKNKDKAAAAJ
- Add and use the kunit_kstrdup_const() and kunit_free_const()
functions.
- Fix a typo in the doc comments.
---
include/kunit/test.h | 58 ++++++++++++++++++++++++++++++++++++++++++++
lib/kunit/device.c | 7 ++++--
2 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index e2a1f0928e8b..da9e84de14c0 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -28,6 +28,7 @@
#include <linux/types.h>
#include <asm/rwonce.h>
+#include <asm/sections.h>
/* Static key: true if any KUnit tests are currently running */
DECLARE_STATIC_KEY_FALSE(kunit_running);
@@ -480,6 +481,63 @@ static inline void *kunit_kcalloc(struct kunit *test, size_t n, size_t size, gfp
return kunit_kmalloc_array(test, n, size, gfp | __GFP_ZERO);
}
+
+/**
+ * kunit_kfree_const() - conditionally free test managed memory
+ * @x: pointer to the memory
+ *
+ * Calls kunit_kfree() only if @x is not in .rodata section.
+ * See kunit_kstrdup_const() for more information.
+ */
+static inline void kunit_kfree_const(struct kunit *test, const void *x)
+{
+ if (!is_kernel_rodata((unsigned long)x))
+ kunit_kfree(test, x);
+}
+
+/**
+ * kunit_kstrdup() - Duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * See kstrdup() and kunit_kmalloc_array() for more information.
+ */
+static inline char *kunit_kstrdup(struct kunit *test, const char *str, gfp_t gfp)
+{
+ size_t len;
+ char *buf;
+
+ if (!str)
+ return NULL;
+
+ len = strlen(str) + 1;
+ buf = kunit_kmalloc(test, len, gfp);
+ if (buf)
+ memcpy(buf, str, len);
+ return buf;
+}
+
+/**
+ * kunit_kstrdup_const() - Conditionally duplicates a string into a test managed allocation.
+ *
+ * @test: The test context object.
+ * @str: The NULL-terminated string to duplicate.
+ * @gfp: flags passed to underlying kmalloc().
+ *
+ * Calls kunit_kstrdup() only if @str is not in the rodata section. Must be freed with
+ * kunit_free_const() -- not kunit_free().
+ * See kstrdup_const() and kunit_kmalloc_array() for more information.
+ */
+static inline const char *kunit_kstrdup_const(struct kunit *test, const char *str, gfp_t gfp)
+{
+ if (is_kernel_rodata((unsigned long)str))
+ return str;
+
+ return kunit_kstrdup(test, str, gfp);
+}
+
/**
* kunit_vm_mmap() - Allocate KUnit-tracked vm_mmap() area
* @test: The test context object.
diff --git a/lib/kunit/device.c b/lib/kunit/device.c
index 25c81ed465fb..520c1fccee8a 100644
--- a/lib/kunit/device.c
+++ b/lib/kunit/device.c
@@ -89,7 +89,7 @@ struct device_driver *kunit_driver_create(struct kunit *test, const char *name)
if (!driver)
return ERR_PTR(err);
- driver->name = name;
+ driver->name = kunit_kstrdup_const(test, name, GFP_KERNEL);
driver->bus = &kunit_bus_type;
driver->owner = THIS_MODULE;
@@ -192,8 +192,11 @@ void kunit_device_unregister(struct kunit *test, struct device *dev)
const struct device_driver *driver = to_kunit_device(dev)->driver;
kunit_release_action(test, device_unregister_wrapper, dev);
- if (driver)
+ if (driver) {
+ const char *driver_name = driver->name;
kunit_release_action(test, driver_unregister_wrapper, (void *)driver);
+ kunit_kfree_const(test, driver_name);
+ }
}
EXPORT_SYMBOL_GPL(kunit_device_unregister);
--
2.46.0.rc1.232.g9752f9e123-goog
In arm64 pKVM and QuIC's Gunyah protected VM model, we want to support
grabbing shmem user pages instead of using KVM's guestmemfd. These
hypervisors provide a different isolation model than the CoCo
implementations from x86. KVM's guest_memfd is focused on providing
memory that is more isolated than AVF requires. Some specific examples
include ability to pre-load data onto guest-private pages, dynamically
sharing/isolating guest pages without copy, and (future) migrating
guest-private pages. In sum of those differences after a discussion in
[1] and at PUCK, we want to try to stick with existing shmem and extend
GUP to support the isolation needs for arm64 pKVM and Gunyah. To that
end, we introduce the concept of "exclusive GUP pinning", which enforces
that only one pin of any kind is allowed when using the FOLL_EXCLUSIVE
flag is set. This behavior doesn't affect FOLL_GET or any other folio
refcount operations that don't go through the FOLL_PIN path.
[1]: https://lore.kernel.org/all/20240319143119.GA2736@willie-the-truck/
Tree with patches at:
https://git.codelinaro.org/clo/linux-kernel/gunyah-linux/-/tree/sent/exclus…
anup(a)brainfault.org, paul.walmsley(a)sifive.com,
palmer(a)dabbelt.com, aou(a)eecs.berkeley.edu, seanjc(a)google.com,
viro(a)zeniv.linux.org.uk, brauner(a)kernel.org,
willy(a)infradead.org, akpm(a)linux-foundation.org,
xiaoyao.li(a)intel.com, yilun.xu(a)intel.com,
chao.p.peng(a)linux.intel.com, jarkko(a)kernel.org,
amoorthy(a)google.com, dmatlack(a)google.com,
yu.c.zhang(a)linux.intel.com, isaku.yamahata(a)intel.com,
mic(a)digikod.net, vbabka(a)suse.cz, vannapurve(a)google.com,
ackerleytng(a)google.com, mail(a)maciej.szmigiero.name,
david(a)redhat.com, michael.roth(a)amd.com, wei.w.wang(a)intel.com,
liam.merwick(a)oracle.com, isaku.yamahata(a)gmail.com,
kirill.shutemov(a)linux.intel.com, suzuki.poulose(a)arm.com,
steven.price(a)arm.com, quic_eberman(a)quicinc.com,
quic_mnalajal(a)quicinc.com, quic_tsoni(a)quicinc.com,
quic_svaddagi(a)quicinc.com, quic_cvanscha(a)quicinc.com,
quic_pderrin(a)quicinc.com, quic_pheragu(a)quicinc.com,
catalin.marinas(a)arm.com, james.morse(a)arm.com,
yuzenghui(a)huawei.com, oliver.upton(a)linux.dev, maz(a)kernel.org,
will(a)kernel.org, qperret(a)google.com, keirf(a)google.com,
tabba(a)google.com
Signed-off-by: Elliot Berman <quic_eberman(a)quicinc.com>
---
Elliot Berman (2):
mm/gup-test: Verify exclusive pinned
mm/gup_test: Verify GUP grabs same pages twice
Fuad Tabba (3):
mm/gup: Move GUP_PIN_COUNTING_BIAS to page_ref.h
mm/gup: Add an option for obtaining an exclusive pin
mm/gup: Add support for re-pinning a normal pinned page as exclusive
include/linux/mm.h | 57 ++++----
include/linux/mm_types.h | 2 +
include/linux/page_ref.h | 74 ++++++++++
mm/Kconfig | 5 +
mm/gup.c | 265 ++++++++++++++++++++++++++++++----
mm/gup_test.c | 108 ++++++++++++++
mm/gup_test.h | 1 +
tools/testing/selftests/mm/gup_test.c | 5 +-
8 files changed, 457 insertions(+), 60 deletions(-)
---
base-commit: 6ba59ff4227927d3a8530fc2973b80e94b54d58f
change-id: 20240509-exclusive-gup-66259138bbff
Best regards,
--
Elliot Berman <quic_eberman(a)quicinc.com>
Hi Linus,
Please pull the kselftest fixes update for Linux 6.11-rc3.
This kselftest fixes update consists of a single fix to the conditional
in ksft.py script which incorrectly flags a test suite failed when there
are skipped tests in the mix. The logic is fixed to take skipped tests
into account and report the test as passed.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 8400291e289ee6b2bf9779ff1c83a291501f017b:
Linux 6.11-rc1 (2024-07-28 14:19:55 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux_kselftest-fixes-6.11-rc3
for you to fetch changes up to 170c966cbe274e664288cfc12ee919d5e706dc50:
selftests: ksft: Fix finished() helper exit code on skipped tests (2024-07-31 11:38:56 -0600)
----------------------------------------------------------------
linux_kselftest-fixes-6.11-rc3
This kselftest fixes update consists of a single fix to the conditional
in ksft.py script which incorrectly flags a test suite failed when there
are skipped tests in the mix. The logic is fixed to take skipped tests
into account and report the test as passed.
----------------------------------------------------------------
Laura Nao (1):
selftests: ksft: Fix finished() helper exit code on skipped tests
tools/testing/selftests/kselftest/ksft.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------
The first 2 patches in this series fix cpuset bugs found by Chen Ridong.
Patch 3 streamlines the sched domain rebuild process for hotplug
operation and eliminates the use of intermediate cpuset states for
sched domain generation. Patch 4 modifies generate_sched_domains()
to check the correctness of partition roots with non-overlapping CPUs.
Patch 5 adds new test cases to cover the bugs fixed in patches 1 and 2.
Chen Ridong (1):
cgroup/cpuset: fix panic caused by partcmd_update
Waiman Long (4):
cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if
cpus.exclusive not set
cgroup/cpuset: Eliminate unncessary sched domains rebuilds in hotplug
cgroup/cpuset: Check for partition roots with overlapping CPUs
selftest/cgroup: Add new test cases to test_cpuset_prs.sh
kernel/cgroup/cpuset.c | 70 ++++++++++---------
.../selftests/cgroup/test_cpuset_prs.sh | 12 +++-
2 files changed, 49 insertions(+), 33 deletions(-)
--
2.43.5
The current support for LLVM and clang in nolibc and its testsuite is
very limited.
* Various architectures plain do not compile
* The user *has* to specify "-Os" otherwise the program crashes
* Cross-compilation of the tests does not work
* Using clang is not wired up in run-tests.sh
This series extends this support.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (12):
tools/nolibc: use clang-compatible asm syntax in arch-arm.h
tools/nolibc: limit powerpc stack-protector workaround to GCC
tools/nolibc: move entrypoint specifics to compiler.h
tools/nolibc: use attribute((naked)) if available
selftests/nolibc: report failure if no testcase passed
selftests/nolibc: avoid passing NULL to printf("%s")
selftests/nolibc: determine $(srctree) first
selftests/nolibc: setup objtree without Makefile.include
selftests/nolibc: add support for LLVM= parameter
selftests/nolibc: add cc-option compatible with clang cross builds
selftests/nolibc: run-tests.sh: avoid overwriting CFLAGS_EXTRA
selftests/nolibc: run-tests.sh: allow building through LLVM
tools/include/nolibc/arch-aarch64.h | 4 ++--
tools/include/nolibc/arch-arm.h | 8 ++++----
tools/include/nolibc/arch-i386.h | 4 ++--
tools/include/nolibc/arch-loongarch.h | 4 ++--
tools/include/nolibc/arch-mips.h | 4 ++--
tools/include/nolibc/arch-powerpc.h | 6 +++---
tools/include/nolibc/arch-riscv.h | 4 ++--
tools/include/nolibc/arch-s390.h | 4 ++--
tools/include/nolibc/arch-x86_64.h | 4 ++--
tools/include/nolibc/compiler.h | 12 ++++++++++++
tools/testing/selftests/nolibc/Makefile | 27 ++++++++++++++++-----------
tools/testing/selftests/nolibc/nolibc-test.c | 4 ++--
tools/testing/selftests/nolibc/run-tests.sh | 20 ++++++++++++++++----
13 files changed, 67 insertions(+), 38 deletions(-)
---
base-commit: 0db287736bc586fcd5a2925518ef09eec6924803
change-id: 20240727-nolibc-llvm-3fad68590d4c
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Don't print that 88 sub-tests are going to be executed, but then skip.
This is against TAP compliance. Instead check pre-requisites first
before printing total number of tests.
Old non-tap compliant output:
TAP version 13
1..88
ok 2 # SKIP all tests require euid == 0
# Planned tests != run tests (88 != 1)
# Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0
New and correct output:
TAP version 13
1..0 # SKIP all tests require euid == 0
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Changes since v1:
- Remove simplifying if condition lines
- Update the patch message
---
tools/testing/selftests/openat2/resolve_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/openat2/resolve_test.c b/tools/testing/selftests/openat2/resolve_test.c
index bbafad440893c..85a4c64ee950d 100644
--- a/tools/testing/selftests/openat2/resolve_test.c
+++ b/tools/testing/selftests/openat2/resolve_test.c
@@ -508,12 +508,13 @@ void test_openat2_opath_tests(void)
int main(int argc, char **argv)
{
ksft_print_header();
- ksft_set_plan(NUM_TESTS);
/* NOTE: We should be checking for CAP_SYS_ADMIN here... */
if (geteuid() != 0)
ksft_exit_skip("all tests require euid == 0\n");
+ ksft_set_plan(NUM_TESTS);
+
test_openat2_opath_tests();
if (ksft_get_fail_cnt() + ksft_get_error_cnt() > 0)
--
2.39.2
A small bugfix for "run-user XARCH=ppc64le" and run-user support for
run-tests.sh.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: introduce QEMU_ARCH_USER
selftests/nolibc: run-tests.sh: enable testing via qemu-user
tools/testing/selftests/nolibc/Makefile | 5 ++++-
tools/testing/selftests/nolibc/run-tests.sh | 22 +++++++++++++++++++---
2 files changed, 23 insertions(+), 4 deletions(-)
---
base-commit: ba335752620565c25c3028fff9496bb8ef373602
change-id: 20770915-nolibc-run-user-845375a3ec4f
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Extend pmu_counters_test to AMD CPUs.
As the AMD PMU is quite different from Intel with different events and
feature sets, this series introduces a new code path to test it,
specifically focusing on the core counters including the
PerfCtrExtCore and PerfMonV2 features. Northbridge counters and cache
counters exist, but are not as important and can be deferred to a
later series.
The first patch is a bug fix that could be submitted separately.
The series has been tested on both Intel and AMD machines, but I have
not found an AMD machine old enough to lack PerfCtrExtCore. I have
made efforts that no part of the code has any dependency on its
presence.
I am aware of similar work in this direction done by Jinrong Liang
[1]. He told me he is not working on it currently and I am not
intruding by making my own submission.
[1] https://lore.kernel.org/kvm/20231121115457.76269-1-cloudliang@tencent.com/
Colton Lewis (6):
KVM: x86: selftests: Fix typos in macro variable use
KVM: x86: selftests: Define AMD PMU CPUID leaves
KVM: x86: selftests: Set up AMD VM in pmu_counters_test
KVM: x86: selftests: Test read/write core counters
KVM: x86: selftests: Test core events
KVM: x86: selftests: Test PerfMonV2
.../selftests/kvm/include/x86_64/processor.h | 7 +
.../selftests/kvm/x86_64/pmu_counters_test.c | 267 ++++++++++++++++--
2 files changed, 249 insertions(+), 25 deletions(-)
--
2.46.0.rc2.264.g509ed76dc8-goog
Introduce a new test to identify regressions causing devices to go
missing on the system.
For each bus and class on the system the test checks the number of
devices present against a reference file, which needs to have been
generated by the program at a previous point on a known-good kernel, and
if there are missing devices they are reported.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Hi,
Key points about this test:
* Goal: Identify regressions causing devices to go missing on the system
* Focus:
* Ease of maintenance: the reference file is generated programatically
* Minimum of false-positives: the script makes as few assumptions as possible
about the stability of device identifiers to ensure renames/refactors don't
trigger false-positives
* How it works: For each bus and class on the system the test checks the number
of devices present against a reference file, which needs to have been
generated by the program at a previous point on a known-good kernel, and if
there are missing devices they are reported.
* Comparison to other tests: It might be possible(*) to replace the discoverable
devices test [1] with this. The benefits of this test is that it's easier
to setup and maintain and has wider coverage of devices.
Additional detail:
* Having more devices on the running system than the reference does not cause a
failure, but a warning is printed in that case to suggest that the reference
be updated.
* Missing devices are detected per bus/class based on the number of devices.
When the test fails, the known metadata for each of the expected and detected
devices is printed and some simple similitarity comparison is done to suggest
the devices that are the most likely to be missing.
* The proposed place to store the generated reference files is the
'platform-test-parameters' repository in KernelCI [2].
Example output: This is an example of a failing test case when one of the two
devices in the nvmem bus went missing:
# Missing devices for subsystem 'nvmem': 1 (Expected 2, found 1)
# =================
# Devices expected:
#
# uevent:
# OF_NAME=efuse
# OF_FULLNAME=/soc/efuse@11c10000
# OF_COMPATIBLE_0=mediatek,mt8195-efuse
# OF_COMPATIBLE_1=mediatek,efuse
# OF_COMPATIBLE_N=2
#
# uevent:
# OF_NAME=flash
# OF_FULLNAME=/soc/spi@1132c000/flash@0
# OF_COMPATIBLE_0=jedec,spi-nor
# OF_COMPATIBLE_N=1
#
# -----------------
# Devices found:
#
# uevent:
# OF_NAME=efuse
# OF_FULLNAME=/soc/efuse@11c10000
# OF_COMPATIBLE_0=mediatek,mt8195-efuse
# OF_COMPATIBLE_1=mediatek,efuse
# OF_COMPATIBLE_N=2
#
# -----------------
# Devices missing (best guess):
#
# uevent:
# OF_NAME=flash
# OF_FULLNAME=/soc/spi@1132c000/flash@0
# OF_COMPATIBLE_0=jedec,spi-nor
# OF_COMPATIBLE_N=1
#
# =================
not ok 19 bus.nvmem
Example of how the data for these devices is encoded in the reference file:
bus:
...
nvmem:
count: 2
devices:
- info:
uevent: 'OF_NAME=efuse
OF_FULLNAME=/soc/efuse@11c10000
OF_COMPATIBLE_0=mediatek,mt8195-efuse
OF_COMPATIBLE_1=mediatek,efuse
OF_COMPATIBLE_N=2
'
- info:
uevent: 'OF_NAME=flash
OF_FULLNAME=/soc/spi@1132c000/flash@0
OF_COMPATIBLE_0=jedec,spi-nor
OF_COMPATIBLE_N=1
'
(Full reference file: http://0x0.st/Xp60.yaml;)
Caveat: Relying only on the count of devices in a subsystem makes the test
susceptible to false-negatives eg. if a device goes missing and another in the
same subsystem is added the count will be the same so this regression won't be
reported. In order to avoid this we may include properties that must match
individual devices, but we must be very careful (and it's why I haven't done it)
since matching against properties that aren't guaranteed to be stable will
introduce false-positives (ie. detecting false regressions) due to eventual
renames.
Some things to improve in the near future / gather feedback on:
* (*): Currently this test only checks for the existence of devices. We could
extend it to also encode into the reference which devices are bound to drivers
to be able to completely replace the discoverable devices probe kselftest [1].
* Expanding identifying properties: Currently the properties that are stored
(when present) in the reference for each device to be used for identification
in the result output are uevent, device/uevent, firmware_node/uevent and name.
Suggestions of others properties to add are welcome.
* Adding more filtering to reduce noise:
* Ignoring buses/classes: Currently the devlink class is ignored by the test
since it seems like a kernel internal detail that userspace doesn't actually
care about. We should add others that are similar.
* Ignoring non-devices: There can be entries in /sys/class/ that aren't
devices. For now we're filtering down to only symlinks, but there might be a
better way.
* As mentioned in the caveat section above we may want to add actual matching
of devices based on properties to avoid false-negatives if we identify
suitable properties.
* It would be nice to have an option in the program to compare a newer reference
to an older one to make it easier for the user to see the differences and
decide if the new reference is ok.
* Since the reference file is not supposed to be manually edited, JSON might be
a better choice than YAML since it is included in the python standard library.
Let me know your thoughts.
Thanks,
Nícolas
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/too…
[2] https://github.com/kernelci/platform-test-parameters
---
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/devices/exist/Makefile | 3 +
tools/testing/selftests/devices/exist/exist.py | 268 +++++++++++++++++++++++++
3 files changed, 272 insertions(+)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index bc8fe9e8f7f2..9c49b5ec5bef 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -14,6 +14,7 @@ TARGETS += cpufreq
TARGETS += cpu-hotplug
TARGETS += damon
TARGETS += devices/error_logs
+TARGETS += devices/exist
TARGETS += devices/probe
TARGETS += dmabuf-heaps
TARGETS += drivers/dma-buf
diff --git a/tools/testing/selftests/devices/exist/Makefile b/tools/testing/selftests/devices/exist/Makefile
new file mode 100644
index 000000000000..3075cac32092
--- /dev/null
+++ b/tools/testing/selftests/devices/exist/Makefile
@@ -0,0 +1,3 @@
+TEST_PROGS := exist.py
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/devices/exist/exist.py b/tools/testing/selftests/devices/exist/exist.py
new file mode 100755
index 000000000000..8241b2fabc8e
--- /dev/null
+++ b/tools/testing/selftests/devices/exist/exist.py
@@ -0,0 +1,268 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Collabora Ltd
+
+# * Goal: Identify regressions causing devices to go missing on the system
+# * Focus:
+# * Ease of maintenance: the reference file is generated programatically
+# * Minimum of false-positives: the script makes as few assumptions as
+# possible about the stability of device identifiers to ensure
+# renames/refactors don't trigger false-positives
+# * How it works: For each bus and class on the system the test checks the
+# number of devices present against a reference file, which needs to have been
+# generated by the program at a previous point on a known-good kernel, and if
+# there are missing devices they are reported.
+
+import os
+import sys
+import argparse
+
+import yaml
+
+# Allow ksft module to be imported from different directory
+this_dir = os.path.dirname(os.path.realpath(__file__))
+sys.path.append(os.path.join(this_dir, "../../kselftest/"))
+
+import ksft
+
+
+def generate_devs_obj():
+ obj = {}
+
+ device_sources = [
+ {
+ "base_dir": "/sys/class",
+ "add_path": "",
+ "key_name": "class",
+ "ignored": ["devlink"],
+ },
+ {
+ "base_dir": "/sys/bus",
+ "add_path": "devices",
+ "key_name": "bus",
+ "ignored": [],
+ },
+ ]
+
+ properties = sorted(["uevent", "device/uevent", "firmware_node/uevent", "name"])
+
+ for source in device_sources:
+ source_subsystems = {}
+ for subsystem in sorted(os.listdir(source["base_dir"])):
+ if subsystem in source["ignored"]:
+ continue
+
+ devs_path = os.path.join(source["base_dir"], subsystem, source["add_path"])
+ dev_dirs = [dev for dev in os.scandir(devs_path) if dev.is_symlink()]
+ devs_data = []
+ for dev_dir in dev_dirs:
+ dev_path = os.path.join(devs_path, dev_dir)
+ dev_data = {"info": {}}
+ for prop in properties:
+ if os.path.isfile(os.path.join(dev_path, prop)):
+ with open(os.path.join(dev_path, prop)) as f:
+ dev_data["info"][prop] = f.read()
+ devs_data.append(dev_data)
+ if len(dev_dirs):
+ source_subsystems[subsystem] = {
+ "count": len(dev_dirs),
+ "devices": devs_data,
+ }
+ obj[source["key_name"]] = source_subsystems
+
+ return obj
+
+
+def commented(s):
+ return s.replace("\n", "\n# ")
+
+
+def indented(s, n):
+ return " " * n + s.replace("\n", "\n" + " " * n)
+
+
+def stripped(s):
+ return s.strip("\n")
+
+
+def devices_difference(dev1, dev2):
+ difference = 0
+
+ for prop in dev1["info"].keys():
+ for l1, l2 in zip(
+ dev1["info"].get(prop, "").split("\n"),
+ dev2["info"].get(prop, "").split("\n"),
+ ):
+ if l1 != l2:
+ difference += 1
+ return difference
+
+
+def guess_missing_devices(cur_devs_subsystem, ref_devs_subsystem):
+ # Detect what devices on the current system are the most similar to devices
+ # on the reference one by one until the leftovers are the most dissimilar
+ # devices and therefore most likely the missing ones.
+ found_count = cur_devs_subsystem["count"]
+ expected_count = ref_devs_subsystem["count"]
+ missing_count = found_count - expected_count
+
+ diffs = []
+ for cur_d in cur_devs_subsystem["devices"]:
+ for ref_d in ref_devs_subsystem["devices"]:
+ diffs.append((devices_difference(cur_d, ref_d), cur_d, ref_d))
+
+ diffs.sort(key=lambda x: x[0])
+
+ assigned_ref_devs = []
+ assigned_cur_devs = []
+ for diff in diffs:
+ if len(assigned_ref_devs) >= expected_count - missing_count:
+ break
+ if diff[1] in assigned_cur_devs or diff[2] in assigned_ref_devs:
+ continue
+ assigned_cur_devs.append(diff[1])
+ assigned_ref_devs.append(diff[2])
+
+ missing_devices = []
+ for d in ref_devs_subsystem["devices"]:
+ if d not in assigned_ref_devs:
+ missing_devices.append(d)
+
+ return missing_devices
+
+
+def dump_devices_info(cur_devs_subsystem, ref_devs_subsystem):
+ def dump_device_info(dev):
+ for name, val in dev["info"].items():
+ ksft.print_msg(indented(name + ":", 2))
+ val = stripped(val)
+ if val:
+ ksft.print_msg(commented(indented(val, 4)))
+ ksft.print_msg("")
+
+ ksft.print_msg("=================")
+ ksft.print_msg("Devices expected:")
+ ksft.print_msg("")
+ for d in ref_devs_subsystem["devices"]:
+ dump_device_info(d)
+ ksft.print_msg("-----------------")
+ ksft.print_msg("Devices found:")
+ ksft.print_msg("")
+ for d in cur_devs_subsystem["devices"]:
+ dump_device_info(d)
+ ksft.print_msg("-----------------")
+ ksft.print_msg("Devices missing (best guess):")
+ ksft.print_msg("")
+ missing_devices = guess_missing_devices(cur_devs_subsystem, ref_devs_subsystem)
+ for d in missing_devices:
+ dump_device_info(d)
+ ksft.print_msg("=================")
+
+
+def run_test(ref_filename):
+ ksft.print_msg(f"Using reference file: {ref_filename}")
+
+ with open(ref_filename) as f:
+ ref_devs_obj = yaml.safe_load(f)
+
+ num_tests = 0
+ for dev_source in ref_devs_obj.values():
+ num_tests += len(dev_source)
+ ksft.set_plan(num_tests)
+
+ cur_devs_obj = generate_devs_obj()
+
+ reference_outdated = False
+
+ for source, ref_devs_source_obj in ref_devs_obj.items():
+ for subsystem, ref_devs_subsystem_obj in ref_devs_source_obj.items():
+ test_name = f"{source}.{subsystem}"
+ if not (
+ cur_devs_obj.get(source) and cur_devs_obj.get(source).get(subsystem)
+ ):
+ ksft.print_msg(f"Device subsystem '{subsystem}' missing")
+ ksft.test_result_fail(test_name)
+ continue
+ cur_devs_subsystem_obj = cur_devs_obj[source][subsystem]
+
+ found_count = cur_devs_subsystem_obj["count"]
+ expected_count = ref_devs_subsystem_obj["count"]
+ if found_count < expected_count:
+ ksft.print_msg(
+ f"Missing devices for subsystem '{subsystem}': {expected_count - found_count} (Expected {expected_count}, found {found_count})"
+ )
+ dump_devices_info(cur_devs_subsystem_obj, ref_devs_subsystem_obj)
+ ksft.test_result_fail(test_name)
+ else:
+ ksft.test_result_pass(test_name)
+ if found_count > expected_count:
+ reference_outdated = True
+
+ if len(cur_devs_obj[source]) > len(ref_devs_source_obj):
+ reference_outdated = True
+
+ if reference_outdated:
+ ksft.print_msg(
+ "Warning: The current system contains more devices and/or subsystems than the reference. Updating the reference is recommended."
+ )
+
+
+def get_possible_ref_filenames():
+ filenames = []
+
+ dt_board_compatible_file = "/proc/device-tree/compatible"
+ if os.path.exists(dt_board_compatible_file):
+ with open(dt_board_compatible_file) as f:
+ for line in f:
+ compatibles = [compat for compat in line.split("\0") if compat]
+ filenames.extend(compatibles)
+ else:
+ dmi_id_dir = "/sys/devices/virtual/dmi/id"
+ vendor_dmi_file = os.path.join(dmi_id_dir, "sys_vendor")
+ product_dmi_file = os.path.join(dmi_id_dir, "product_name")
+
+ with open(vendor_dmi_file) as f:
+ vendor = f.read().replace("\n", "")
+ with open(product_dmi_file) as f:
+ product = f.read().replace("\n", "")
+
+ filenames = [vendor + "," + product]
+
+ return filenames
+
+
+def get_ref_filename(ref_dir):
+ chosen_ref_filename = ""
+ full_ref_paths = [os.path.join(ref_dir, f + ".yaml") for f in get_possible_ref_filenames()]
+ for path in full_ref_paths:
+ if os.path.exists(path):
+ chosen_ref_filename = path
+ break
+
+ if not chosen_ref_filename:
+ tried_paths = ",".join(["'" + p + "'" for p in full_ref_paths])
+ ksft.print_msg(f"No matching reference file found (tried {tried_paths})")
+ ksft.exit_fail()
+
+ return chosen_ref_filename
+
+
+parser = argparse.ArgumentParser()
+parser.add_argument(
+ "--reference-dir", default=".", help="Directory containing the reference files"
+)
+parser.add_argument("--generate-reference", action="store_true", help="Generate a reference file with the devices on the running system")
+args = parser.parse_args()
+
+if args.generate_reference:
+ print(f"# Kernel version: {os.uname().release}")
+ print(yaml.dump(generate_devs_obj()))
+ sys.exit(0)
+
+ksft.print_header()
+
+ref_filename = get_ref_filename(args.reference_dir)
+
+run_test(ref_filename)
+
+ksft.finished()
---
base-commit: 73399b58e5e5a1b28a04baf42e321cfcfc663c2f
change-id: 20240724-kselftest-dev-exist-bb1bcf884654
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Hello Davide,
In the following commit:
commit ca22da2fbd693b54dc8e3b7b54ccc9f7e9ba3640
Author: Davide Caratti <dcaratti(a)redhat.com>
Date: Fri Jan 20 18:01:40 2023 +0100
act_mirred: use the backlog for nested calls to mirred ingress
you added the mirred_egress_to_ingress_tcp_test kselftest that, I noticed, hangs with "Ncat: TIMEOUT." if openvswitch module is loaded.
Is this the right behaviour or was such configuration not intended?
Regards,
Lenar Khannanov
Adds a selftest that creates two virtual interfaces, assigns one to a
new namespace, and assigns IP addresses to both.
It listens on the destination interface using netcat and configures a
dynamic target on netconsole, pointing to the destination IP address.
The test then checks if the message was received properly on the
destination interface.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
MAINTAINERS | 1 +
.../net/netconsole/basic_integration_test.sh | 153 ++++++++++++++++++
2 files changed, 154 insertions(+)
create mode 100755 tools/testing/selftests/net/netconsole/basic_integration_test.sh
diff --git a/MAINTAINERS b/MAINTAINERS
index c0a3d9e93689..59207365c9f5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15768,6 +15768,7 @@ M: Breno Leitao <leitao(a)debian.org>
S: Maintained
F: Documentation/networking/netconsole.rst
F: drivers/net/netconsole.c
+F: tools/testing/selftests/net/netconsole/
NETDEVSIM
M: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/netconsole/basic_integration_test.sh b/tools/testing/selftests/net/netconsole/basic_integration_test.sh
new file mode 100755
index 000000000000..fbabbc633451
--- /dev/null
+++ b/tools/testing/selftests/net/netconsole/basic_integration_test.sh
@@ -0,0 +1,153 @@
+#!/usr/bin/env bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test creates two virtual interfaces, assigns one of them (the "destination
+# interface") to a new namespace, and assigns IP addresses to both interfaces.
+#
+# It listens on the destination interface using netcat (nc) and configures a
+# dynamic target on netconsole, pointing to the destination IP address.
+#
+# Finally, it checks whether the message was received properly on the
+# destination interface. Note that this test may pollute the kernel log buffer
+# (dmesg) and relies on dynamic configuration and namespaces being configured."
+#
+# Author: Breno Leitao <leitao(a)debian.org>
+
+SCRIPTDIR=$(dirname "$(readlink -e "${BASH_SOURCE[0]}")")
+
+# Simple script to test dynamic targets in netconsole
+SRCIF="veth0"
+SRCIP=192.168.1.1
+
+DSTIF="veth1"
+DSTIP=192.168.1.2
+
+PORT="6666"
+MSG="netconsole selftest"
+TARGET=$(mktemp -u netcons_XXXXX)
+NETCONS_CONFIGFS="/sys/kernel/config/netconsole"
+FULLPATH="${NETCONS_CONFIGFS}"/"${TARGET}"
+# This will be have some tmp values appened to it in set_network()
+NAMESPACE="netconsns"
+
+# Used to create and delete namespaces
+source "${SCRIPTDIR}"/../lib.sh
+
+function set_network() {
+ # this is coming from lib.sh
+ setup_ns "${NAMESPACE}"
+ NAMESPACE=${NS_LIST[0]}
+ ip link add "${SRCIF}" type veth peer name "${DSTIF}"
+
+ # "${DSTIF}"
+ ip link set "${DSTIF}" netns "${NAMESPACE}"
+ ip netns exec "${NAMESPACE}" ip addr add "${DSTIP}"/24 dev "${DSTIF}"
+ ip netns exec "${NAMESPACE}" ip link set "${DSTIF}" up
+
+ # later, configure "${SRCIF}"
+ ip addr add "${SRCIP}"/24 dev "${SRCIF}"
+ ip link set "${SRCIF}" up
+}
+
+function create_dynamic_target() {
+ DSTMAC=$(ip netns exec "${NAMESPACE}" ip link show "${DSTIF}" | awk '/ether/ {print $2}')
+
+ # Create a dynamic target
+ mkdir ${FULLPATH}
+
+ echo ${DSTIP} > ${FULLPATH}/remote_ip
+ echo ${SRCIP} > ${FULLPATH}/local_ip
+ echo ${DSTMAC} > ${FULLPATH}/remote_mac
+ echo "${SRCIF}" > ${FULLPATH}/dev_name
+
+ echo 1 > ${FULLPATH}/enabled
+}
+
+function cleanup() {
+ echo 0 > "${FULLPATH}"/enabled
+ rmdir "${FULLPATH}"
+ # This will delete DSTIF also
+ ip link del "${SRCIF}"
+ # this is coming from lib.sh
+ cleanup_all_ns
+}
+
+function listen_port() {
+ OUTPUT=${1}
+ echo "Saving content in ${OUTPUT}"
+ timeout 2 ip netns exec "${NAMESPACE}" nc -u -l "${PORT}" | sed '/^$/q' > ${OUTPUT}
+}
+
+function validate_result() {
+ TMPFILENAME=/tmp/"${TARGET}"
+
+ # sleep until the file isc reated
+ sleep 1
+ # Check if the file exists
+ if [ ! -f "$TMPFILENAME" ]; then
+ echo "FAIL: File was not generate." >&2
+ return ${ksft_fail}
+ fi
+
+ if ! grep -q "${MSG}" "${TMPFILENAME}"; then
+ echo "FAIL: ${MSG} not found in ${TMPFILENAME}" >&2
+ cat ${TMPFILENAME} >&2
+ return ${ksft_fail}
+ fi
+
+ rm ${TMPFILENAME}
+ return ${ksft_pass}
+}
+
+function check_for_dependencies() {
+ if [ "$(id -u)" -ne 0 ]; then
+ echo "This script must be run as root" >&2
+ exit 1
+ fi
+
+ if ! which nc > /dev/null ; then
+ echo "SKIP: nc(1) is not available" >&2
+ exit ${ksft_skip}
+ fi
+
+ if ! which ip > /dev/null ; then
+ echo "SKIP: ip(1) is not available" >&2
+ exit ${ksft_skip}
+ fi
+
+ if [ ! -d "${NETCONS_CONFIGFS}" ]; then
+ echo "SKIP: directory ${NETCONS_CONFIGFS} does not exist. Check if NETCONSOLE_DYNAMIC is enabled" >&2
+ exit ${ksft_skip}
+ fi
+
+ if ip link show veth0 2> /dev/null; then
+ echo "SKIP: interface veth0 exists in the system. Not overwriting it."
+ exit ${ksft_skip}
+ fi
+}
+
+# ========== #
+# Start here #
+# ========== #
+
+# Check for basic system dependency and exit if not found
+check_for_dependencies
+# Create one namespace and two interfaces
+set_network
+# Create a dynamic target for netconsole
+create_dynamic_target
+# Listed for netconsole port inside the namespace and destination interface
+listen_port /tmp/"${TARGET}" &
+# Wait for nc to start and listen to the port.
+sleep 1
+
+# Send the message
+echo "${MSG}: ${TARGET}" > /dev/kmsg
+
+# Make sure the message was received in the dst part
+validate_result
+ret=$?
+# Remove the namespace, interfaces and netconsole target
+cleanup
+
+exit ${ret}
--
2.43.0
When looking at improving the user experience around the MPTCP endpoints
setup, I noticed that setting an endpoint with both the 'signal' and the
'subflow' flags -- as it has been done in the past by users according to
bug reports we got -- was resulting on only announcing the endpoint, but
not using it to create subflows: the 'subflow' flag was then ignored.
My initial thought was to modify IPRoute2 to warn the user when the two
flags were set, but it doesn't sound normal to ignore one of them. I
then looked at modifying the kernel not to allow having the two flags
set, but when discussing about that with Mat, we thought it was maybe
not ideal to do that, as there might be use-cases, we might break some
configs. Then I saw it was working before v5.17. So instead, I fixed the
support on the kernel side (patch 5) using Paolo's suggestion. This also
includes a fix on the options side (patch 1: for v5.11+), an explicit
deny of some options combinations (patch 2: for v5.18+), and some
refactoring (patches 3 and 4) to ease the inclusion of the patch 5.
While at it, I added a new selftest (patch 7) to validate this case --
including a modification of the chk_add_nr helper to inverse the sides
were the counters are checked (patch 6) -- and allowed ADD_ADDR echo
just after the MP_JOIN 3WHS.
The selftests modification have the same Fixes tag as the previous
commit, but no 'Cc: Stable': if the backport can work, that's good --
but it still need to be verified by running the selftests -- if not, no
need to worry, many CIs will use the selftests from the last stable
version to validate previous stable releases.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (7):
mptcp: fully established after ADD_ADDR echo on MPJ
mptcp: pm: deny endp with signal + subflow + port
mptcp: pm: reduce indentation blocks
mptcp: pm: don't try to create sf if alloc failed
mptcp: pm: do not ignore 'subflow' if 'signal' flag is also set
selftests: mptcp: join: ability to invert ADD_ADDR check
selftests: mptcp: join: test both signal & subflow
net/mptcp/options.c | 3 +-
net/mptcp/pm_netlink.c | 47 +++++++++++++--------
tools/testing/selftests/net/mptcp/mptcp_join.sh | 55 ++++++++++++++++++-------
3 files changed, 73 insertions(+), 32 deletions(-)
---
base-commit: 0bf50cead4c4710d9f704778c32ab8af47ddf070
change-id: 20240731-upstream-net-20240731-mptcp-endp-subflow-signal-181d640cf5e8
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Corrected the typographical of the word "different"
in the "name" field of the JSON object with ID "4319".
Signed-off-by: Karan Sanghavi <karansanghvi98(a)gmail.com>
---
tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json b/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json
index 03723cf84..6897ff5ad 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/filters/cgroup.json
@@ -1189,7 +1189,7 @@
},
{
"id": "4319",
- "name": "Replace cgroup filter with diffferent match",
+ "name": "Replace cgroup filter with different match",
"category": [
"filter",
"cgroup"
--
2.43.0
From: Zijian Zhang <zijianzhang(a)bytedance.com>
Original notification mechanism needs poll + recvmmsg which is not
easy for applcations to accommodate. And, it also incurs unignorable
overhead including extra system calls.
While making maximum reuse of the existing MSG_ZEROCOPY related code,
this patch set introduces a new zerocopy socket notification mechanism.
Users of sendmsg pass a control message as a placeholder for the incoming
notifications. Upon returning, kernel embeds notifications directly into
user arguments passed in. By doing so, we can reduce the complexity and
the overhead for managing notifications.
We also have the logic related to copying cmsg to the userspace in sendmsg
generic for any possible uses cases in the future. However, it introduces
ABI change of sendmsg.
Changelog:
v1 -> v2:
- Reuse errormsg queue in the new notification mechanism,
users can actually use these two mechanisms in hybrid way
if they want to do so.
- Update case SCM_ZC_NOTIFICATION in __sock_cmsg_send
1. Regardless of 32-bit, 64-bit program, we will always handle
u64 type user address.
2. The size of data to copy_to_user is precisely calculated
in case of kernel stack leak.
- fix (kbuild-bot)
1. Add SCM_ZC_NOTIFICATION to arch-specific header files.
2. header file types.h in include/uapi/linux/socket.h
v2 -> v3:
- 1. Users can now pass in the address of the zc_info_elem directly
with appropriate cmsg_len instead of the ugly user interface. Plus,
the handler is now compatible with MSG_CMSG_COMPAT and 32-bit
pointer.
- 2. Suggested by Willem, another strategy of getting zc info is
briefly taking the lock of sk_error_queue and move to a private
list, like net_rx_action. I thought sk_error_queue is protected by
sock_lock, so that it's impossible for the handling of zc info and
users recvmsg from the sk_error_queue at the same time.
However, sk_error_queue is protected by its own lock. I am afraid
that during the time it is handling the private list, users may
fail to get other error messages in the queue via recvmsg. Thus,
I don't implement the splice logic in this version. Any comments?
v3 -> v4:
- 1. Change SOCK_ZC_INFO_MAX to 64 to avoid large stack frame size.
- 2. Fix minor typos.
- 3. Change cfg_zerocopy from int to enum in msg_zerocopy.c
Initially, we expect users to pass the user address of the user array
as a data in cmsg, so that the kernel can copy_to_user to this address
directly.
As Willem commented,
> The main design issue with this series is this indirection, rather
> than passing the array of notifications as cmsg.
> This trick circumvents having to deal with compat issues and having to
> figure out copy_to_user in ____sys_sendmsg (as msg_control is an
> in-kernel copy).
> This is quite hacky, from an API design PoV.
> As is passing a pointer, but expecting msg_controllen to hold the
> length not of the pointer, but of the pointed to user buffer.
> I had also hoped for more significant savings. Especially with the
> higher syscall overhead due to meltdown and spectre mitigations vs
> when MSG_ZEROCOPY was introduced and I last tried this optimization.
We solve it by supporting put_cmsg to userspace in sendmsg path starting
from v5.
v4 -> v5:
- 1. Passing user address directly to kernel raises concerns about
ABI. In this version, we support put_cmsg to userspace in TX path
to solve this problem.
v5 -> v6:
- 1. Cleanly copy cmsg to user upon returning of ___sys_sendmsg
v6 -> v7:
- 1. Remove flag MSG_CMSG_COPY_TO_USER, use a member in msghdr instead
- 2. Pass msg to __sock_cmsg_send.
- 3. sendmsg_copy_cmsg_to_user should be put at the end of
____sys_sendmsg to make sure msg_sys->msg_control is a valid pointer.
- 4. Add struct zc_info to contain the array of zc_info_elem, so that
the kernel can update the zc_info->size. Another possible solution is
updating the cmsg_len directly, but it will break for_each_cmsghdr.
- 5. Update selftest to make cfg_notification_limit have the same
semantics in both methods for better comparison.
v7 -> v8:
- 1. Add a static_branch in ____sys_sendmsg to avoid overhead in the
hot path.
- 2. Add ZC_NOTIFICATION_MAX to limit the max size of zc_info->arr.
- 3. Minimize the code in SCM_ZC_NOTIFICATION handler by adding a local
sk_buff_head.
* Performance
We update selftests/net/msg_zerocopy.c to accommodate the new mechanism,
cfg_notification_limit has the same semantics for both methods. Test
results are as follows, we update skb_orphan_frags_rx to the same as
skb_orphan_frags to support zerocopy in the localhost test.
cfg_notification_limit = 1, both method get notifications after 1 calling
of sendmsg. In this case, the new method has around 17% cpu savings in TCP
and 23% cpu savings in UDP.
+----------------------+---------+---------+---------+---------+
| Test Type / Protocol | TCP v4 | TCP v6 | UDP v4 | UDP v6 |
+----------------------+---------+---------+---------+---------+
| ZCopy (MB) | 7523 | 7706 | 7489 | 7304 |
+----------------------+---------+---------+---------+---------+
| New ZCopy (MB) | 8834 | 8993 | 9053 | 9228 |
+----------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy | 117.42% | 116.70% | 120.88% | 126.34% |
+----------------------+---------+---------+---------+---------+
cfg_notification_limit = 32, both get notifications after 32 calling of
sendmsg, which means more chances to coalesce notifications, and less
overhead of poll + recvmsg for the original method. In this case, the new
method has around 7% cpu savings in TCP and slightly better cpu usage in
UDP. In the env of selftest, notifications of TCP are more likely to be
out of order than UDP, it's easier to coalesce more notifications in UDP.
The original method can get one notification with range of 32 in a recvmsg
most of the time. In TCP, most notifications' range is around 2, so the
original method needs around 16 recvmsgs to get notified in one round.
That's the reason for the "New ZCopy / ZCopy" diff in TCP and UDP here.
+----------------------+---------+---------+---------+---------+
| Test Type / Protocol | TCP v4 | TCP v6 | UDP v4 | UDP v6 |
+----------------------+---------+---------+---------+---------+
| ZCopy (MB) | 8842 | 8735 | 10072 | 9380 |
+----------------------+---------+---------+---------+---------+
| New ZCopy (MB) | 9366 | 9477 | 10108 | 9385 |
+----------------------+---------+---------+---------+---------+
| New ZCopy / ZCopy | 106.00% | 108.28% | 100.31% | 100.01% |
+----------------------+---------+---------+---------+---------+
In conclusion, when notification interval is small or notifications are
hard to be coalesced, the new mechanism is highly recommended. Otherwise,
the performance gain from the new mechanism is very limited.
Zijian Zhang (3):
sock: support copying cmsgs to the user space in sendmsg
sock: add MSG_ZEROCOPY notification mechanism based on msg_control
selftests: add MSG_ZEROCOPY msg_control notification test
arch/alpha/include/uapi/asm/socket.h | 2 +
arch/mips/include/uapi/asm/socket.h | 2 +
arch/parisc/include/uapi/asm/socket.h | 2 +
arch/sparc/include/uapi/asm/socket.h | 2 +
include/linux/socket.h | 8 ++
include/net/sock.h | 2 +-
include/uapi/asm-generic/socket.h | 2 +
include/uapi/linux/socket.h | 23 +++++
net/core/sock.c | 72 +++++++++++++-
net/ipv4/ip_sockglue.c | 2 +-
net/ipv6/datagram.c | 2 +-
net/socket.c | 63 +++++++++++-
tools/testing/selftests/net/msg_zerocopy.c | 101 ++++++++++++++++++--
tools/testing/selftests/net/msg_zerocopy.sh | 1 +
14 files changed, 265 insertions(+), 19 deletions(-)
--
2.20.1
This patch series adds a selftest suite to validate the s390x
architecture specific ucontrol KVM interface.
When creating a VM on s390x it is possible to create it as userspace
controlled VM or in short ucontrol VM.
These VMs delegates the management of the VM to userspace instead
of handling most events within the kernel. Consequently the userspace
has to manage interrupts, memory allocation etc.
Before this patch set this functionality lacks any public test cases.
It is desirable to add test cases for this interface to be able to
reduce the risk of breaking changes in the future.
In order to provision a ucontrol VM the kernel needs to be compiled with
the CONFIG_KVM_S390_UCONTROL enabled. The users with sys_admin capability
can then create a new ucontrol VM providing the KVM_VM_S390_UCONTROL
parameter to the KVM_CREATE_VM ioctl.
The kernels existing selftest helper functions can only be partially be
reused for these tests.
The test cases cover existing special handling of ucontrol VMs within the
implementation and basic VM creation and handling cases:
* Reject setting HPAGE when VM is ucontrol
* Assert KVM_GET_DIRTY_LOG is rejected
* Assert KVM_S390_VM_MEM_LIMIT_SIZE is rejected
* Assert state of initial SIE flags setup by the kernel
* Run simple program in VM with and without DAT
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
The patch set does also contain some code cleanup / consolidation of
architecture specific defines that are now used in multiple test cases.
---
V1 -> V2:
- add ucontrol to s390 debug config (new patch)
- PATCH 2: changed atomic_t to __u32 (thanks Claudio)
- PATCH 4: reformatted comment in FIXTURE_SETUP(uc_kvm)
- PATCH 5: refactored to display 8 byte blocks + more internal reuse
(thanksClaudio)
- PATCH 7: make use of more declarative defines instead of magic values
- PATCH 8: make use of more declarative defines instead of magic values
(thanks Claudio)
- PATCH 9: add reference to fix verified by the test case
V2 -> V3:
- Remove stopped bit before starting the VM (no initial stop in multiple
test cases) (thanks Janosch)
- PATCH 2:
- Clarified SIE control block vs SIE instruction (thanks Janosch)
- PATCH 3:
- Make use of CAP_TO_MASK(CAP_SYS_ADMIN) instead of custom define (thanks
Janosch)
- Removed Reviewed-By: Claudio
- PATCH 4:
- Remove erroneous 1MB offset from self->base_hva (thanks Janosch)
- PATCH 6-8: Change name of test program _pgm to _asm to prevent confusion
- PATCH 10: Move KVM_S390_UCONTROL default option to actual debug config
(thanks Christian)
Christoph Schlameuss (10):
selftests: kvm: s390: Define page sizes in shared header
selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace
tests
selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
selftests: kvm: s390: Add test fixture and simple VM setup tests
selftests: kvm: s390: Add debug print functions
selftests: kvm: s390: Add VM run test case
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
arch/s390/configs/debug_defconfig | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/s390x/debug_print.h | 69 ++
.../selftests/kvm/include/s390x/processor.h | 5 +
.../testing/selftests/kvm/include/s390x/sie.h | 240 +++++++
.../selftests/kvm/lib/s390x/processor.c | 10 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +-
tools/testing/selftests/kvm/s390x/config | 2 +
.../testing/selftests/kvm/s390x/debug_test.c | 4 +-
tools/testing/selftests/kvm/s390x/memop.c | 4 +-
tools/testing/selftests/kvm/s390x/tprot.c | 5 +-
.../selftests/kvm/s390x/ucontrol_test.c | 596 ++++++++++++++++++
13 files changed, 929 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/s390x/debug_print.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/sie.h
create mode 100644 tools/testing/selftests/kvm/s390x/config
create mode 100644 tools/testing/selftests/kvm/s390x/ucontrol_test.c
base-commit: 94ede2a3e9135764736221c080ac7c0ad993dc2d
--
2.45.2
In order to validate ITS-MSI hwirq entry in the /proc/interrupts, we
have created a shell script to check is there any duplicated ITS-MSI
hwirq entry.
Joseph Jang (1):
selftest: drivers: Add support its msi hwirq checking
tools/testing/selftests/drivers/irq/Makefile | 5 +++++
.../selftests/drivers/irq/its-msi-irq-test.sh | 20 +++++++++++++++++++
2 files changed, 25 insertions(+)
create mode 100644 tools/testing/selftests/drivers/irq/Makefile
create mode 100755 tools/testing/selftests/drivers/irq/its-msi-irq-test.sh
--
2.34.1
On 31/07/2024 15:07, Karan Sanghavi wrote:
> Corrected the typographical of the word "different"
> in the "name" field of the JSON object with ID "4319".
>
> Signed-off-by: Karan Sanghavi <karansanghvi98(a)gmail.com>
Thank you for the patch.
I'd suggest you transform this patch and the other one you sent, which
has the same subject, into a patch series.
Also please change the subject to target net-next and to specify which
test files your patches are touching.
cheers,
Victor
Hello,
this small series aims to integrate test_dev_cgroup in test_progs so it
could be run automatically in CI. The new version brings a few differences
with the current one:
- test now uses directly syscalls instead of wrapping commandline tools
into system() calls
- test_progs manipulates /dev/null (eg: redirecting test logs into it), so
disabling access to it in the bpf program confuses the tests. To fix this,
the first commit modifies the bpf program to allow access to char devices
1:3 (/dev/null), and disable access to char devices 1:5 (/dev/zero)
- once test is converted, add a small subtest to also check for device type
interpretation (char or block)
- paths used in mknod tests are now in /dev instead of /tmp: due to the CI
runner organisation and mountpoints manipulations, trying to create nodes
in /tmp leads to errors unrelated to the test (ie, mknod calls refused by
kernel, not the bpf program). I don't understand exactly the root cause
at the deepest point (all I see in CI is an -ENXIO error on mknod when trying to
create the node in tmp, and I can not make sense out of it neither
replicate it locally), so I would gladly take inputs from anyone more
educated than me about this.
The new test_progs part has been tested in a local qemu environment as well
as in upstream CI:
./test_progs -a cgroup_dev
47/1 cgroup_dev/allow-mknod:OK
47/2 cgroup_dev/allow-read:OK
47/3 cgroup_dev/allow-write:OK
47/4 cgroup_dev/deny-mknod:OK
47/5 cgroup_dev/deny-read:OK
47/6 cgroup_dev/deny-write:OK
47/7 cgroup_dev/deny-mknod-wrong-type:OK
47 cgroup_dev:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
---
Changes in v4:
- Fix mixup between ret and errno by testing both
- Properly apply ack tag from Stanislas
- Link to v3: https://lore.kernel.org/r/20240730-convert_dev_cgroup-v3-0-93e573b74357@boo…
Changes in v3:
- delete mknod file only if it has been created
- use bpf_program__attach_cgroup() instead of bpf_prog_attach
- reorganize subtests order
- collect review/ack tags from Alan and Stanislas
- Link to v2: https://lore.kernel.org/r/20240729-convert_dev_cgroup-v2-0-4c1fc0520545@boo…
Changes in v2:
- directly pass expected ret code to subtests instead of boolean pass/not
pass
- fix faulty fd check in subtest expected to fail on open
- fix wrong subtest name
- pass test buffer and corresponding size to read/write subtests
- use correct series prefix
- Link to v1: https://lore.kernel.org/r/20240725-convert_dev_cgroup-v1-0-2c8cbd487c44@boo…
---
Alexis Lothoré (eBPF Foundation) (3):
selftests/bpf: do not disable /dev/null device access in cgroup dev test
selftests/bpf: convert test_dev_cgroup to test_progs
selftests/bpf: add wrong type test to cgroup dev
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 2 -
.../testing/selftests/bpf/prog_tests/cgroup_dev.c | 125 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/dev_cgroup.c | 4 +-
tools/testing/selftests/bpf/test_dev_cgroup.c | 85 --------------
5 files changed, 127 insertions(+), 90 deletions(-)
---
base-commit: 2107cb4bff1c21110ebf7a17458a918282c1a8c9
change-id: 20240723-convert_dev_cgroup-6464b0d37f1a
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
The Python finished() helper currently exits with KSFT_FAIL when there
are only passed and skipped tests. Fix the logic to exit with KSFT_PASS
instead, making it consistent with its C and bash counterparts
(ksft_finished() and ktap_finished() respectively).
Reviewed-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Fixes: dacf1d7a78bf ("kselftest: Add test to verify probe of devices from discoverable buses")
Signed-off-by: Laura Nao <laura.nao(a)collabora.com>
---
This is a revised version of the patch initially submitted as "[PATCH]
selftests: ksft: Track skipped tests when finishing the test suite":
https://lore.kernel.org/all/20240722154319.619944-1-laura.nao@collabora.com/
Depends on "[PATCH v2 2/3] kselftest: Move ksft helper module to common
directory":
https://lore.kernel.org/all/20240705-dev-err-log-selftest-v2-2-163b9cd7b3c1…
(picked through the usb tree, queued for 6.11-rc1)
Changes in v2:
- Reworded the commit title and message to more accurately describe the
incorrect behavior of the finished() helper addressed by the patch.
---
tools/testing/selftests/kselftest/ksft.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/ksft.py b/tools/testing/selftests/kselftest/ksft.py
index cd89fb2bc10e..bf215790a89d 100644
--- a/tools/testing/selftests/kselftest/ksft.py
+++ b/tools/testing/selftests/kselftest/ksft.py
@@ -70,7 +70,7 @@ def test_result(condition, description=""):
def finished():
- if ksft_cnt["pass"] == ksft_num_tests:
+ if ksft_cnt["pass"] + ksft_cnt["skip"] == ksft_num_tests:
exit_code = KSFT_PASS
else:
exit_code = KSFT_FAIL
--
2.30.2
Series takes care of few bugs and missing features with the aim to improve
the test coverage of sockmap/sockhash.
Last patch is a create_pair() rewrite making use of
__attribute__((cleanup)) to handle socket fd lifetime.
v0: https://lore.kernel.org/netdev/027fdb41-ee11-4be0-a493-22f28a1abd7c@rbox.co/
- No declarations in function body (Jakub)
- Don't touch output arguments until function succeeds (Jakub)
Signed-off-by: Michal Luczaj <mhal(a)rbox.co>
---
Michal Luczaj (6):
selftest/bpf: Support more socket types in create_pair()
selftest/bpf: Socket pair creation, cleanups
selftest/bpf: Simplify inet_socketpair() and vsock_unix_redir_connectible()
selftest/bpf: Respect the sotype of af_unix redir tests
selftest/bpf: Exercise SOCK_STREAM unix_inet_redir_to_connected()
selftest/bpf: Introduce __attribute__((cleanup)) in create_pair()
.../selftests/bpf/prog_tests/sockmap_basic.c | 28 ++--
.../selftests/bpf/prog_tests/sockmap_helpers.h | 145 ++++++++++++++-------
.../selftests/bpf/prog_tests/sockmap_listen.c | 117 ++---------------
3 files changed, 120 insertions(+), 170 deletions(-)
---
base-commit: 13c9b702e6cb8e406d5fa6b2dca422fa42d2f13e
change-id: 20240723-sockmap-selftest-fixes-666769755137
Best regards,
--
Michal Luczaj <mhal(a)rbox.co>
Hello,
this small series aims to integrate test_dev_cgroup in test_progs so it
could be run automatically in CI. The new version brings a few differences
with the current one:
- test now uses directly syscalls instead of wrapping commandline tools
into system() calls
- test_progs manipulates /dev/null (eg: redirecting test logs into it), so
disabling access to it in the bpf program confuses the tests. To fix this,
the first commit modifies the bpf program to allow access to char devices
1:3 (/dev/null), and disable access to char devices 1:5 (/dev/zero)
- once test is converted, add a small subtest to also check for device type
interpretation (char or block)
- paths used in mknod tests are now in /dev instead of /tmp: due to the CI
runner organisation and mountpoints manipulations, trying to create nodes
in /tmp leads to errors unrelated to the test (ie, mknod calls refused by
kernel, not the bpf program). I don't understand exactly the root cause
at the deepest point (all I see in CI is an -ENXIO error on mknod when trying to
create the node in tmp, and I can not make sense out of it neither
replicate it locally), so I would gladly take inputs from anyone more
educated than me about this.
The new test_progs part has been tested in a local qemu environment as well
as in upstream CI:
./test_progs -a cgroup_dev
47/1 cgroup_dev/allow-mknod:OK
47/2 cgroup_dev/allow-read:OK
47/3 cgroup_dev/allow-write:OK
47/4 cgroup_dev/deny-mknod:OK
47/5 cgroup_dev/deny-read:OK
47/6 cgroup_dev/deny-write:OK
47/7 cgroup_dev/deny-mknod-wrong-type:OK
47 cgroup_dev:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
---
Changes in v3:
- delete mknod file only if it has been created
- use bpf_program__attach_cgroup() instead of bpf_prog_attach
- reorganize subtests order
- collect review/ack tags from Alan and Stanislas
- Link to v2: https://lore.kernel.org/r/20240729-convert_dev_cgroup-v2-0-4c1fc0520545@boo…
Changes in v2:
- directly pass expected ret code to subtests instead of boolean pass/not
pass
- fix faulty fd check in subtest expected to fail on open
- fix wrong subtest name
- pass test buffer and corresponding size to read/write subtests
- use correct series prefix
- Link to v1: https://lore.kernel.org/r/20240725-convert_dev_cgroup-v1-0-2c8cbd487c44@boo…
---
Alexis Lothoré (eBPF Foundation) (3):
selftests/bpf: do not disable /dev/null device access in cgroup dev test
selftests/bpf: convert test_dev_cgroup to test_progs
selftests/bpf: add wrong type test to cgroup dev
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 2 -
.../testing/selftests/bpf/prog_tests/cgroup_dev.c | 118 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/dev_cgroup.c | 4 +-
tools/testing/selftests/bpf/test_dev_cgroup.c | 85 ---------------
5 files changed, 120 insertions(+), 90 deletions(-)
---
base-commit: 2107cb4bff1c21110ebf7a17458a918282c1a8c9
change-id: 20240723-convert_dev_cgroup-6464b0d37f1a
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
From: Tony Ambardar <tony.ambardar(a)gmail.com>
Hello all,
This is part 2 of a series of fixes for libc-related issues encountered
building for musl-based systems. The series has been tested with the
kernel-patches/bpf CI and locally using mips64el-gcc/musl-libc and QEMU
with an OpenWrt rootfs.
The patches cover a few areas of portability issues:
- improper libc usage (strtok_r(), reserved identifiers)
- gcc compile errors (include header ordering, sequence-point errors)
- POSIX vs GNU basename()
- missing GNU extensions (<execinfo.h>, C++ <stdbool.h>)
- Y2038 and setsockopt() / SO_TIMESTAMPNS_NEW
Feedback and suggestions are appreciated!
Thanks,
Tony
Tony Ambardar (8):
selftests/bpf: Use portable POSIX basename()
selftests/bpf: Fix arg parsing in veristat, test_progs
selftests/bpf: Fix error compiling test_lru_map.c
selftests/bpf: Fix C++ compile error from missing _Bool type
selftests/bpf: Fix order-of-include compile errors in lwt_reroute.c
selftests/bpf: Fix compile if backtrace support missing in libc
selftests/bpf: Fix using stdout, stderr as struct field names
selftests/bpf: Fix error compiling tc_redirect.c with musl libc
.../selftests/bpf/prog_tests/lwt_helpers.h | 3 +-
.../selftests/bpf/prog_tests/reg_bounds.c | 2 +-
.../selftests/bpf/prog_tests/tc_redirect.c | 12 +--
tools/testing/selftests/bpf/test_cpp.cpp | 4 +
tools/testing/selftests/bpf/test_lru_map.c | 3 +-
tools/testing/selftests/bpf/test_progs.c | 75 ++++++++++---------
tools/testing/selftests/bpf/test_progs.h | 8 +-
tools/testing/selftests/bpf/testing_helpers.c | 2 +-
tools/testing/selftests/bpf/veristat.c | 12 +--
tools/testing/selftests/bpf/xskxceiver.c | 1 +
10 files changed, 68 insertions(+), 54 deletions(-)
--
2.34.1
First 4 patches are more-or-less cleanups/preparations.
Patch 5 was sent to me/contributed off-list by Mohammad, who wants 32-bit
kernels to run TCP-AO.
Patch 6 is a workaround/fix for slow VMs. Albeit, I can't reproduce
the issue, but I hope it will fix netdev flakes for connect-deny-*
tests.
And the biggest change is adding TCP-AO tracepoints to selftests.
I think it's a good addition by the following reasons:
- The related tracepoints are now tested;
- It allows tcp-ao selftests to raise expectations on the kernel
behavior - up from the syscalls exit statuses + net counters.
- Provides tracepoints usage samples.
As tracepoints are not a stable ABI, any kernel changes done to them
will be reflected to the selftests, which also will allow users
to see how to change their code. It's quite better than parsing dmesg
(what BGP was doing pre-tracepoints, ugh).
Somewhat arguably, the code parses trace_pipe, rather than uses
libtraceevent (which any sane user should do). The reason behind that is
the same as for rt-netlink macros instead of libmnl: I'm trying
to minimize the library dependencies of the selftests. And the
performance of formatting text in kernel and parsing it again in a test
is not critical.
Current output sample:
> ok 73 Trace events matched expectations: 13 tcp_hash_md5_required[2] tcp_hash_md5_unexpected[4] tcp_hash_ao_required[3] tcp_ao_key_not_found[4]
Previously, tracepoints selftests were part of kernel tcp tracepoints
submission [1], but since then the code was quite changed:
- Now generic tracing setup is in lib/ftrace.c, separate from
lib/ftrace-tcp.c which utilizes TCP trace points. This separation
allows future selftests to trace non-TCP events, i.e. to find out
an skb's drop reason, which was useful in the creation of TCP-CLOSE
stress-test (not in this patch set, but used in attempt to reproduce
the issue from [2]).
- Another change is that in the previous submission the trace events
where used only to detect unexpected TCP-AO/TCP-MD5 events. In this
version the selftests will fail if an expected trace event didn't
appear.
Let's see how reliable this is on the netdev bot - it obviously passes
on my testing, but potentially may require a temporary XFAIL patch
if it misbehaves on a slow VM.
[1] https://lore.kernel.org/lkml/20240224-tcp-ao-tracepoints-v1-0-15f31b7f30a7@…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=3…
Signed-off-by: Dmitry Safonov <0x7f454c46(a)gmail.com>
---
Dmitry Safonov (6):
selftests/net: Clean-up double assignment
selftests/net: Provide test_snprintf() helper
selftests/net: Be consistent in kconfig checks
selftests/net: Don't forget to close nsfd after switch_save_ns()
selftests/net: Synchronize client/server before counters checks
selftests/net: Add trace events matching to tcp_ao
Mohammad Nassiri (1):
selftests/tcp_ao: Fix printing format for uint64_t
tools/testing/selftests/net/tcp_ao/Makefile | 3 +-
tools/testing/selftests/net/tcp_ao/bench-lookups.c | 2 +-
tools/testing/selftests/net/tcp_ao/config | 1 +
tools/testing/selftests/net/tcp_ao/connect-deny.c | 25 +-
tools/testing/selftests/net/tcp_ao/connect.c | 6 +-
tools/testing/selftests/net/tcp_ao/icmps-discard.c | 2 +-
.../testing/selftests/net/tcp_ao/key-management.c | 18 +-
tools/testing/selftests/net/tcp_ao/lib/aolib.h | 173 ++++++-
.../testing/selftests/net/tcp_ao/lib/ftrace-tcp.c | 549 +++++++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/ftrace.c | 466 +++++++++++++++++
tools/testing/selftests/net/tcp_ao/lib/kconfig.c | 31 +-
tools/testing/selftests/net/tcp_ao/lib/setup.c | 15 +-
tools/testing/selftests/net/tcp_ao/lib/sock.c | 1 -
tools/testing/selftests/net/tcp_ao/lib/utils.c | 26 +
tools/testing/selftests/net/tcp_ao/restore.c | 30 +-
tools/testing/selftests/net/tcp_ao/rst.c | 2 +-
tools/testing/selftests/net/tcp_ao/self-connect.c | 19 +-
tools/testing/selftests/net/tcp_ao/seq-ext.c | 28 +-
.../selftests/net/tcp_ao/setsockopt-closed.c | 6 +-
tools/testing/selftests/net/tcp_ao/unsigned-md5.c | 35 +-
20 files changed, 1374 insertions(+), 64 deletions(-)
---
base-commit: 1722389b0d863056d78287a120a1d6cadb8d4f7b
change-id: 20240730-tcp-ao-selftests-upd-6-12-4d3e53a74f3f
Best regards,
--
Dmitry Safonov <0x7f454c46(a)gmail.com>
Hello,
This is v4 of the patch series for TDX selftests.
It has been updated for Intel’s v17 of the TDX host patches which was
proposed here:
https://lore.kernel.org/all/cover.1699368322.git.isaku.yamahata@intel.com/
The tree can be found at:
https://github.com/googleprodkernel/linux-cc/tree/tdx-selftests-rfc-v5
Changes from RFC v4:
Added patch to propagate KVM_EXIT_MEMORY_FAULT to userspace.
Minor tweaks to align the tests to the new TDX 1.5 spec such as changes
in the expected values in TDG.VP.INFO.
In RFCv5, TDX selftest code is organized into:
+ headers in tools/testing/selftests/kvm/include/x86_64/tdx/
+ common code in tools/testing/selftests/kvm/lib/x86_64/tdx/
+ selftests in tools/testing/selftests/kvm/x86_64/tdx_*
Dependencies
+ Peter’s patches, which provide functions for the host to allocate
and track protected memory in the guest.
https://lore.kernel.org/all/20230110175057.715453-1-pgonda@google.com/
Further work for this patch series/TODOs
+ Sean’s comments for the non-confidential UPM selftests patch series
at https://lore.kernel.org/lkml/Y8dC8WDwEmYixJqt@google.com/T/#u apply
here as well
+ Add ucall support for TDX selftests
I would also like to acknowledge the following people, who helped
review or test patches in previous versions:
+ Sean Christopherson <seanjc(a)google.com>
+ Zhenzhong Duan <zhenzhong.duan(a)intel.com>
+ Peter Gonda <pgonda(a)google.com>
+ Andrew Jones <drjones(a)redhat.com>
+ Maxim Levitsky <mlevitsk(a)redhat.com>
+ Xiaoyao Li <xiaoyao.li(a)intel.com>
+ David Matlack <dmatlack(a)google.com>
+ Marc Orr <marcorr(a)google.com>
+ Isaku Yamahata <isaku.yamahata(a)gmail.com>
+ Maciej S. Szmigiero <maciej.szmigiero(a)oracle.com>
Links to earlier patch series
+ RFC v1: https://lore.kernel.org/lkml/20210726183816.1343022-1-erdemaktas@google.com…
+ RFC v2: https://lore.kernel.org/lkml/20220830222000.709028-1-sagis@google.com/T/#u
+ RFC v3: https://lore.kernel.org/lkml/20230121001542.2472357-1-ackerleytng@google.co…
+ RFC v4: https://lore.kernel.org/lkml/20230725220132.2310657-1-afranji@google.com/
*** BLURB HERE ***
Ackerley Tng (12):
KVM: selftests: Add function to allow one-to-one GVA to GPA mappings
KVM: selftests: Expose function that sets up sregs based on VM's mode
KVM: selftests: Store initial stack address in struct kvm_vcpu
KVM: selftests: Refactor steps in vCPU descriptor table initialization
KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs'
attribute configuration
KVM: selftests: TDX: Update load_td_memory_region for VM memory backed
by guest memfd
KVM: selftests: Add functions to allow mapping as shared
KVM: selftests: Expose _vm_vaddr_alloc
KVM: selftests: TDX: Add support for TDG.MEM.PAGE.ACCEPT
KVM: selftests: TDX: Add support for TDG.VP.VEINFO.GET
KVM: selftests: TDX: Add TDX UPM selftest
KVM: selftests: TDX: Add TDX UPM selftests for implicit conversion
Erdem Aktas (3):
KVM: selftests: Add helper functions to create TDX VMs
KVM: selftests: TDX: Add TDX lifecycle test
KVM: selftests: TDX: Adding test case for TDX port IO
Roger Wang (1):
KVM: selftests: TDX: Add TDG.VP.INFO test
Ryan Afranji (2):
KVM: selftests: TDX: Verify the behavior when host consumes a TD
private memory
KVM: selftests: TDX: Add shared memory test
Sagi Shahar (11):
KVM: selftests: TDX: Add report_fatal_error test
KVM: selftests: TDX: Add basic TDX CPUID test
KVM: selftests: TDX: Add basic get_td_vmcall_info test
KVM: selftests: TDX: Add TDX IO writes test
KVM: selftests: TDX: Add TDX IO reads test
KVM: selftests: TDX: Add TDX MSR read/write tests
KVM: selftests: TDX: Add TDX HLT exit test
KVM: selftests: TDX: Add TDX MMIO reads test
KVM: selftests: TDX: Add TDX MMIO writes test
KVM: selftests: TDX: Add TDX CPUID TDVMCALL test
KVM: selftests: Propagate KVM_EXIT_MEMORY_FAULT to userspace
tools/testing/selftests/kvm/Makefile | 8 +
.../selftests/kvm/include/kvm_util_base.h | 30 +
.../selftests/kvm/include/x86_64/processor.h | 4 +
.../kvm/include/x86_64/tdx/td_boot.h | 82 +
.../kvm/include/x86_64/tdx/td_boot_asm.h | 16 +
.../selftests/kvm/include/x86_64/tdx/tdcall.h | 59 +
.../selftests/kvm/include/x86_64/tdx/tdx.h | 65 +
.../kvm/include/x86_64/tdx/tdx_util.h | 19 +
.../kvm/include/x86_64/tdx/test_util.h | 164 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 101 +-
.../selftests/kvm/lib/x86_64/processor.c | 77 +-
.../selftests/kvm/lib/x86_64/tdx/td_boot.S | 101 ++
.../selftests/kvm/lib/x86_64/tdx/tdcall.S | 158 ++
.../selftests/kvm/lib/x86_64/tdx/tdx.c | 262 ++++
.../selftests/kvm/lib/x86_64/tdx/tdx_util.c | 558 +++++++
.../selftests/kvm/lib/x86_64/tdx/test_util.c | 101 ++
.../kvm/x86_64/tdx_shared_mem_test.c | 135 ++
.../selftests/kvm/x86_64/tdx_upm_test.c | 469 ++++++
.../selftests/kvm/x86_64/tdx_vm_tests.c | 1319 +++++++++++++++++
19 files changed, 3693 insertions(+), 35 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/td_boot_asm.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdcall.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/tdx_util.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/tdx/test_util.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/td_boot.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdcall.S
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/tdx_util.c
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/tdx/test_util.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_shared_mem_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_upm_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/tdx_vm_tests.c
--
2.43.0.472.g3155946c3a-goog
Verify that total device stats don't decrease after it has been turned down.
Also make sure the device doesn't crash when we access per-queue stats
when it's down (in case it tries to access some pointers that are NULL).
KTAP version 1
1..5
ok 1 stats.check_pause
ok 2 stats.check_fec
ok 3 stats.pkt_byte_sum
ok 4 stats.qstat_by_ifindex
ok 5 stats.check_down
\# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Stanislav Fomichev <sdf(a)fomichev.me>
--
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Joe Damato <jdamato(a)fastly.com>
Cc: Petr Machata <petrm(a)nvidia.com>
Cc: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/drivers/net/stats.py | 31 +++++++++++++++++++-
1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/stats.py b/tools/testing/selftests/drivers/net/stats.py
index 820b8e0a22c6..6f8bef379565 100755
--- a/tools/testing/selftests/drivers/net/stats.py
+++ b/tools/testing/selftests/drivers/net/stats.py
@@ -5,6 +5,7 @@ from lib.py import ksft_run, ksft_exit, ksft_pr
from lib.py import ksft_ge, ksft_eq, ksft_in, ksft_true, ksft_raises, KsftSkipEx, KsftXfailEx
from lib.py import EthtoolFamily, NetdevFamily, RtnlFamily, NlError
from lib.py import NetDrvEnv
+from lib.py import ip
ethnl = EthtoolFamily()
netfam = NetdevFamily()
@@ -133,9 +134,37 @@ rtnl = RtnlFamily()
ksft_eq(cm.exception.nl_msg.extack['bad-attr'], '.ifindex')
+def check_down(cfg) -> None:
+ try:
+ qstat = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)
+ except NlError as e:
+ if e.error == 95:
+ raise KsftSkipEx("qstats not supported by the device")
+ raise
+
+ ip(f"link set dev {cfg.dev['ifname']} down")
+
+ try:
+ qstat = qstat[0]
+ qstat2 = netfam.qstats_get({"ifindex": cfg.ifindex}, dump=True)[0]
+ for k, v in qstat.items():
+ if k not in qstat2:
+ # skip the stats that are not globally preserved
+ continue
+ if qstat2[k] < qstat[k]:
+ raise Exception(f"{k} ({qstat2[k]}) should be preserved but has lower value ({qstat[k]}) when the device is down")
+
+ # exercise per-queue API to make sure that "device down" state
+ # is handled correctly and doesn't crash
+ netfam.qstats_get({"ifindex": cfg.ifindex, "scope": "queue"}, dump=True)
+ finally:
+ ip(f"link set dev {cfg.dev['ifname']} up")
+
+
def main() -> None:
with NetDrvEnv(__file__) as cfg:
- ksft_run([check_pause, check_fec, pkt_byte_sum, qstat_by_ifindex],
+ ksft_run([check_pause, check_fec, pkt_byte_sum, qstat_by_ifindex,
+ check_down],
args=(cfg, ))
ksft_exit()
--
2.45.2
This is an updated patchset rebasing onto -torvalds master (post
6.11-rc1), and addressing comments from Michal and Tejun.
As requested by Tejun and Johannes, I've removed the explicit check for
the string "reset", so it now allows any non-empty string. (Empty
strings get filtered before our write handler executes)
I've also made several of the field reads and writes atomic with
{READ,WRITE}_ONCE, and adjusted more of the types to be unsigned.
Documentation/admin-guide/cgroup-v2.rst | 22 ++--
include/linux/cgroup-defs.h | 5 +
include/linux/cgroup.h | 3 +
include/linux/memcontrol.h | 5 +
include/linux/page_counter.h | 11 +-
kernel/cgroup/cgroup-internal.h | 2 +
kernel/cgroup/cgroup.c | 7 +
mm/memcontrol.c | 116 +++++++++++++++--
mm/page_counter.c | 30 +++--
tools/testing/selftests/cgroup/cgroup_util.c | 22 ++++
tools/testing/selftests/cgroup/cgroup_util.h | 2 +
tools/testing/selftests/cgroup/test_memcontrol.c | 229 +++++++++++++++++++++++++++++++--
12 files changed, 419 insertions(+), 35 deletions(-)
[1]: https://lore.kernel.org/cgroups/20240724161942.3448841-3-davidf@vimeo.com/T/
In all the MPTCP backup related tests, the backup flag was set on one
side, and the expected behaviour is to have both sides respecting this
decision. That's also the "natural" way, and what the users seem to
expect.
On the scheduler side, only the 'backup' field was checked, which is
supposed to be set only if the other peer flagged a subflow as backup.
But in various places, this flag was also set when the local host
flagged the subflow as backup, certainly to have the expected behaviour
mentioned above.
Patch 1 modifies the packet scheduler to check if the backup flag has
been set on both directions, not to change its behaviour after having
applied the following patches. That's what the default packet scheduler
should have done since the beginning in v5.7.
Patch 2 fixes the backup flag being mirrored on the MPJ+SYN+ACK by
accident since its introduction in v5.7. Instead, the received and sent
backup flags are properly distinguished in requests.
Patch 3 stops setting the received backup flag as well when sending an
MP_PRIO, something that was done since the MP_PRIO support in v5.12.
Patch 4 adds related and missing MIB counters to be able to easily check
if MP_JOIN are sent with a backup flag. Certainly because these counters
were not there, the behaviour that is fixed by patches here was not
properly verified.
Patch 5 validates the previous patch by extending the MPTCP Join
selftest.
Patch 6 fixes the backup support in signal endpoints: if a signal
endpoint had the backup flag, it was not set in the MPJ+SYN+ACK as
expected. It was only set for ongoing connections, but not future ones
as expected, since the introduction of the backup flag in endpoints in
v5.10.
Patch 7 validates the previous patch by extending the MPTCP Join
selftest as well.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Matthieu Baerts (NGI0) (7):
mptcp: sched: check both directions for backup
mptcp: distinguish rcv vs sent backup flag in requests
mptcp: pm: only set request_bkup flag when sending MP_PRIO
mptcp: mib: count MPJ with backup flag
selftests: mptcp: join: validate backup in MPJ
mptcp: pm: fix backup support in signal endpoints
selftests: mptcp: join: check backup support in signal endp
include/trace/events/mptcp.h | 2 +-
net/mptcp/mib.c | 2 +
net/mptcp/mib.h | 2 +
net/mptcp/options.c | 2 +-
net/mptcp/pm.c | 12 +++++
net/mptcp/pm_netlink.c | 19 ++++++-
net/mptcp/pm_userspace.c | 18 +++++++
net/mptcp/protocol.c | 10 ++--
net/mptcp/protocol.h | 4 ++
net/mptcp/subflow.c | 10 ++++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 72 ++++++++++++++++++++-----
11 files changed, 132 insertions(+), 21 deletions(-)
---
base-commit: 301927d2d2eb8e541357ba850bc7a1a74dbbd670
change-id: 20240727-upstream-net-20240727-mptcp-backup-signal-948235f2ad08
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Hello,
this small series aims to integrate test_dev_cgroup in test_progs so it
could be run automatically in CI. The new version brings a few differences
with the current one:
- test now uses directly syscalls instead of wrapping commandline tools
into system() calls
- test_progs manipulates /dev/null (eg: redirecting test logs into it), so
disabling access to it in the bpf program confuses the tests. To fix this,
the first commit modifies the bpf program to allow access to char devices
1:3 (/dev/null), and disable access to char devices 1:5 (/dev/zero)
- once test is converted, add a small subtest to also check for device type
interpretation (char or block)
- paths used in mknod tests are now in /dev instead of /tmp: due to the CI
runner organisation and mountpoints manipulations, trying to create nodes
in /tmp leads to errors unrelated to the test (ie, mknod calls refused by
kernel, not the bpf program). I don't understand exactly the root cause
at the deepest point (all I see in CI is an -ENXIO error on mknod when trying to
create the node in tmp, and I can not make sense out of it neither
replicate it locally), so I would gladly take inputs from anyone more
educated than me about this.
The new test_progs part has been tested in a local qemu environment as well
as in upstream CI:
./test_progs -a cgroup_dev
47/1 cgroup_dev/deny-mknod:OK
47/2 cgroup_dev/allow-mknod:OK
47/3 cgroup_dev/deny-mknod-wrong-type:OK
47/4 cgroup_dev/allow-read:OK
47/5 cgroup_dev/allow-write:OK
47/6 cgroup_dev/deny-read:OK
47/7 cgroup_dev/deny-write:OK
47 cgroup_dev:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
---
Changes in v2:
- directly pass expected ret code to subtests instead of boolean pass/not
pass
- fix faulty fd check in subtest expected to fail on open
- fix wrong subtest name
- pass test buffer and corresponding size to read/write subtests
- use correct series prefix
- Link to v1: https://lore.kernel.org/r/20240725-convert_dev_cgroup-v1-0-2c8cbd487c44@boo…
---
Alexis Lothoré (eBPF Foundation) (3):
selftests/bpf: do not disable /dev/null device access in cgroup dev test
selftests/bpf: convert test_dev_cgroup to test_progs
selftests/bpf: add wrong type test to cgroup dev
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 2 -
.../testing/selftests/bpf/prog_tests/cgroup_dev.c | 113 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/dev_cgroup.c | 4 +-
tools/testing/selftests/bpf/test_dev_cgroup.c | 85 ----------------
5 files changed, 115 insertions(+), 90 deletions(-)
---
base-commit: 3e9fef7751a84a7d02bbe14a67d3a5d301cbd156
change-id: 20240723-convert_dev_cgroup-6464b0d37f1a
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Include dmabuf-heaps selftests in the correct entry so that updates to it
can be sent to the right place.
Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 42decde38320..b7f24c9fb0e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6660,6 +6660,7 @@ F: drivers/dma-buf/dma-heap.c
F: drivers/dma-buf/heaps/*
F: include/linux/dma-heap.h
F: include/uapi/linux/dma-heap.h
+F: tools/testing/selftests/dmabuf-heaps/
DMC FREQUENCY DRIVER FOR SAMSUNG EXYNOS5422
M: Lukasz Luba <lukasz.luba(a)arm.com>
--
2.33.0
This makes pids.events:max affine to pids.max limit.
How are the new events supposed to be useful?
- pids.events.local:max
- tells that cgroup's limit is hit (too tight?)
- pids.events:*
- "only" directs top-down search to cgroups of interest
Changes from v4 (https://lore.kernel.org/r/20240416142014.27630-1-mkoutny@suse.com)
- rebased on cgroup/for-6.10 (rather cgroup/for-next, there's no rush)
- introduce pids_files_legacy at one place (Tejun)
- more descriptive Documentation/ (Tejun)
Changes from v3 (https://lore.kernel.org/r/20240405170548.15234-1-mkoutny@suse.com)
- use existing functions for TAP output in selftest (Muhammad)
- formatting in selftest (Muhammad)
- remove pids.events:max.imposed event, keep it internal (Johannes)
- allow legacy behavior with a mount option
- detach migration charging patches
- drop RFC prefix
Changes from v2 (https://lore.kernel.org/r/20200205134426.10570-1-mkoutny@suse.com)
- implemented pids.events.local (Tejun)
- added migration charging
[1] https://lore.kernel.org/r/20230202155626.1829121-1-hannes@cmpxchg.org/
Michal Koutný (5):
cgroup/pids: Separate semantics of pids.events related to pids.max
cgroup/pids: Make event counters hierarchical
cgroup/pids: Add pids.events.local
selftests: cgroup: Lexicographic order in Makefile
selftests: cgroup: Add basic tests for pids controller
Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 21 ++-
include/linux/cgroup-defs.h | 7 +-
kernel/cgroup/cgroup.c | 15 +-
kernel/cgroup/pids.c | 129 +++++++++++---
tools/testing/selftests/cgroup/.gitignore | 11 +-
tools/testing/selftests/cgroup/Makefile | 25 +--
tools/testing/selftests/cgroup/test_pids.c | 178 +++++++++++++++++++
8 files changed, 346 insertions(+), 43 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c
base-commit: 21c38a3bd4ee3fb7337d013a638302fb5e5f9dc2
--
2.44.0
There are multiple possible timer sources which could be useful for
the sound stream synchronization: hrtimers, hardware clocks (e.g. PTP),
timer wheels (jiffies). Currently, using one of them to synchronize
the audio stream of snd-aloop module would require writing a
kernel-space driver which exports an ALSA timer through the
snd_timer interface.
However, it is not really convenient for application developers, who may
want to define their custom timer sources for audio synchronization.
For instance, we could have a network application which receives frames
and sends them to snd-aloop pcm device, and another application
listening on the other end of snd-aloop. It makes sense to transfer a
new period of data only when certain amount of frames is received
through the network, but definitely not when a certain amount of jiffies
on a local system elapses. Since all of the devices are purely virtual
it won't introduce any glitches and will help the application developers
to avoid using sample-rate conversion.
This patch series introduces userspace-driven ALSA timers: virtual
timers which are created and controlled from userspace. The timer can
be created from the userspace using the new ioctl SNDRV_TIMER_IOCTL_CREATE.
After creating a timer, it becomes available for use system-wide, so it
can be passed to snd-aloop as a timer source (timer_source parameter
would be "-1.SNDRV_TIMER_GLOBAL_UDRIVEN.{timer_id}"). When the userspace
app decides to trigger a timer, it calls another ioctl
SNDRV_TIMER_IOCTL_TRIGGER on the file descriptor of a timer. It
initiates a transfer of a new period of data.
Userspace-driven timers are associated with file descriptors. If the
application wishes to destroy the timer, it can simply release the file
descriptor of a virtual timer.
I believe introducing new ioctl calls is quite inconvenient (as we have
a limited amount of them), but other possible ways of app <-> kernel
communication (like virtual FS) seem completely inappropriate for this
task (but I'd love to discuss alternative solutions).
This patch series also updates the snd-aloop module so the global timers
can be used as a timer_source for it (it allows using userspace-driven
timers as timer source).
V2 of this patch series fixes some problems found by Christophe Jaillet
<christophe.jaillet(a)wanadoo.fr>. Please, find the patch-specific
changelog in the following patches.
Ivan Orlov (4):
ALSA: aloop: Allow using global timers
Docs/sound: Add documentation for userspace-driven ALSA timers
ALSA: timer: Introduce virtual userspace-driven timers
selftests: ALSA: Cover userspace-driven timers with test
Documentation/sound/index.rst | 1 +
Documentation/sound/utimers.rst | 120 +++++++++++
include/uapi/sound/asound.h | 17 ++
sound/core/Kconfig | 10 +
sound/core/timer.c | 213 ++++++++++++++++++++
sound/drivers/aloop.c | 2 +
tools/testing/selftests/alsa/Makefile | 2 +-
tools/testing/selftests/alsa/global-timer.c | 87 ++++++++
tools/testing/selftests/alsa/utimer-test.c | 137 +++++++++++++
9 files changed, 588 insertions(+), 1 deletion(-)
create mode 100644 Documentation/sound/utimers.rst
create mode 100644 tools/testing/selftests/alsa/global-timer.c
create mode 100644 tools/testing/selftests/alsa/utimer-test.c
--
2.34.1
There are a couple of spelling mistakes in ksft_test_result_fail messages.
Fix them.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/riscv/vector/v_initval_nolibc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/riscv/vector/v_initval_nolibc.c b/tools/testing/selftests/riscv/vector/v_initval_nolibc.c
index 1dd94197da30..6838c561e4c9 100644
--- a/tools/testing/selftests/riscv/vector/v_initval_nolibc.c
+++ b/tools/testing/selftests/riscv/vector/v_initval_nolibc.c
@@ -49,14 +49,14 @@ int main(void)
ksft_print_msg("vl = %lu\n", vl);
if (datap[0] != 0x00 && datap[0] != 0xff) {
- ksft_test_result_fail("v-regesters are not properly initialized\n");
+ ksft_test_result_fail("v-registers are not properly initialized\n");
dump(datap, vl * 4);
exit(-1);
}
for (i = 1; i < vl * 4; i++) {
if (datap[i] != datap[0]) {
- ksft_test_result_fail("detect stale values on v-regesters\n");
+ ksft_test_result_fail("detect stale values on v-registers\n");
dump(datap, vl * 4);
exit(-2);
}
--
2.39.2
This is an update of a previous series[1] addressing Johannes' comments,
and rebasing on top of linus's master.
Unfortunately, linus's master doesn't seem to be bootable at the
moment, so I haven't re-run the tests on this change yet. I'll see about
re-running everything in the morning. (root= resolution seems to be
failing both for x86-64 (in qemu) and usermode linux)
Documentation/admin-guide/cgroup-v2.rst | 26 ++--
include/linux/cgroup-defs.h | 5 +
include/linux/cgroup.h | 3 +
include/linux/memcontrol.h | 5 +
include/linux/page_counter.h | 6 +-
kernel/cgroup/cgroup-internal.h | 2 +
kernel/cgroup/cgroup.c | 7 ++
mm/memcontrol.c | 117 ++++++++++++++++--
mm/page_counter.c | 30 +++--
tools/testing/selftests/cgroup/cgroup_util.c | 22 ++++
tools/testing/selftests/cgroup/cgroup_util.h | 2 +
tools/testing/selftests/cgroup/test_memcontrol.c | 226 ++++++++++++++++++++++++++++++++--
12 files changed, 416 insertions(+), 35 deletions(-)
[1]: https://lore.kernel.org/cgroups/20240722235554.2911971-1-davidf@vimeo.com/T/
Thanks for all the constructive comments and discussion!
David Finkel
Senior Principal Engineer, Core Services
Vimeo Inc.
This patch revision addresses a few comments from Longman and Johannes
here[1], and rebases onto master again.
(I still have issues with starting my UML instance to run the tests.
I'll see if I an figure out what's going on with that after I get some
other work done today)
[1]: https://lore.kernel.org/cgroups/20240723233149.3226636-1-davidf@vimeo.com/T/
Thanks again,
David Finkel
Senior Principal Software Engineer, Core Service
Vimeo Inc.
Issue #501 [1] showed that the Netlink PM currently doesn't correctly
support removal and re-add of signal endpoints.
Patches 1 and 2 address the issue: the first one in the userspace path-
manager, introduced in v5.19 ; and the second one in the in-kernel path-
manager, introduced in v5.7.
Patch 3 introduces a related selftest. There is no 'Fixes' tag, because
it might be hard to backport it automatically, as missing helpers in
Bash will not be caught when compiling the kernel or the selftests.
The last two patches address two small issues in the MPTCP selftests,
one introduced in v6.6., and the other one in v5.17.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/501 [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Liu Jing (1):
selftests: mptcp: always close input's FD if opened
Paolo Abeni (4):
mptcp: fix user-space PM announced address accounting
mptcp: fix NL PM announced address accounting
selftests: mptcp: add explicit test case for remove/readd
selftests: mptcp: fix error path
net/mptcp/pm_netlink.c | 27 ++++++++++++++------
tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 +++---
tools/testing/selftests/net/mptcp/mptcp_join.sh | 31 ++++++++++++++++++++++-
3 files changed, 53 insertions(+), 13 deletions(-)
---
base-commit: 301927d2d2eb8e541357ba850bc7a1a74dbbd670
change-id: 20240726-upstream-net-20240726-mptcp-fix-signal-readd-f3c72bbcbceb
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Even if a vgem device is configured in, we will skip the import_vgem_fd()
test almost every time.
TAP version 13
1..11
# Testing heap: system
# =======================================
# Testing allocation and importing:
ok 1 # SKIP Could not open vgem -1
The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver
version information but leave the name field a non-null-terminated string.
Terminate it properly to actually test against the vgem device.
While at it, let's check the length of the driver name is exactly 4 bytes
and return early otherwise (in case there is a name like "vgemfoo" that
gets converted to "vgem\0" unexpectedly).
Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com>
---
* From v1 [1]:
- Check version.name_len is exactly 4 bytes and return early otherwise
[1] https://lore.kernel.org/r/20240708134654.1725-1-yuzenghui@huawei.com
P.S., Maybe worth including the kselftests file into "DMA-BUF HEAPS
FRAMEWORK" MAINTAINERS entry?
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
index 5f541522364f..5d0a809dc2df 100644
--- a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
+++ b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
@@ -29,9 +29,11 @@ static int check_vgem(int fd)
version.name = name;
ret = ioctl(fd, DRM_IOCTL_VERSION, &version);
- if (ret)
+ if (ret || version.name_len != 4)
return 0;
+ name[4] = '\0';
+
return !strcmp(name, "vgem");
}
--
2.33.0
Hi all,
This is part of a hackathon organized by LKCAMP[1], focused on writing
tests using KUnit. We reached out a while ago asking for advice on what would
be a useful contribution[2] and ended up choosing data structures that did
not yet have tests.
This patch adds tests for the llist data structure, defined in
include/linux/llist.h, and is inspired by the KUnit tests for the doubly
linked list in lib/list-test.c[3].
[1] https://lkcamp.dev/about/
[2] https://lore.kernel.org/all/Zktnt7rjKryTh9-N@arch/
[3] https://elixir.bootlin.com/linux/latest/source/lib/list-test.c
Artur Alves (1):
lib/llist_kunit.c: add KUnit tests for llist
lib/Kconfig.debug | 11 ++
lib/Makefile | 1 +
lib/llist_kunit.c | 360 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 372 insertions(+)
create mode 100644 lib/llist_kunit.c
--
2.45.2
There are multiple possible timer sources which could be useful for
the sound stream synchronization: hrtimers, hardware clocks (e.g. PTP),
timer wheels (jiffies). Currently, using one of them to synchronize
the audio stream of snd-aloop module would require writing a
kernel-space driver which exports an ALSA timer through the
snd_timer interface.
However, it is not really convenient for application developers, who may
want to define their custom timer sources for audio synchronization.
For instance, we could have a network application which receives frames
and sends them to snd-aloop pcm device, and another application
listening on the other end of snd-aloop. It makes sense to transfer a
new period of data only when certain amount of frames is received
through the network, but definitely not when a certain amount of jiffies
on a local system elapses. Since all of the devices are purely virtual
it won't introduce any glitches and will help the application developers
to avoid using sample-rate conversion.
This patch series introduces userspace-driven ALSA timers: virtual
timers which are created and controlled from userspace. The timer can
be created from the userspace using the new ioctl SNDRV_TIMER_IOCTL_CREATE.
After creating a timer, it becomes available for use system-wide, so it
can be passed to snd-aloop as a timer source (timer_source parameter
would be "-1.SNDRV_TIMER_GLOBAL_UDRIVEN.{timer_id}"). When the userspace
app decides to trigger a timer, it calls another ioctl
SNDRV_TIMER_IOCTL_TRIGGER on the file descriptor of a timer. It
initiates a transfer of a new period of data.
Userspace-driven timers are associated with file descriptors. If the
application wishes to destroy the timer, it can simply release the file
descriptor of a virtual timer.
I believe introducing new ioctl calls is quite inconvenient (as we have
a limited amount of them), but other possible ways of app <-> kernel
communication (like virtual FS) seem completely inappropriate for this
task (but I'd love to discuss alternative solutions).
This patch series also updates the snd-aloop module so the global timers
can be used as a timer_source for it (it allows using userspace-driven
timers as timer source).
Ivan Orlov (4):
ALSA: aloop: Allow using global timers
Docs/sound: Add documentation for userspace-driven ALSA timers
ALSA: timer: Introduce virtual userspace-driven timers
selftests: ALSA: Cover userspace-driven timers with test
Documentation/sound/index.rst | 1 +
Documentation/sound/utimers.rst | 120 +++++++++++
include/uapi/sound/asound.h | 17 ++
sound/core/Kconfig | 11 +
sound/core/timer.c | 226 ++++++++++++++++++++
sound/drivers/aloop.c | 2 +
tools/testing/selftests/alsa/Makefile | 2 +-
tools/testing/selftests/alsa/global-timer.c | 87 ++++++++
tools/testing/selftests/alsa/utimer-test.c | 133 ++++++++++++
9 files changed, 598 insertions(+), 1 deletion(-)
create mode 100644 Documentation/sound/utimers.rst
create mode 100644 tools/testing/selftests/alsa/global-timer.c
create mode 100644 tools/testing/selftests/alsa/utimer-test.c
--
2.34.1
Hello,
this small series aims to integrate test_dev_cgroup in test_progs so it
could be run automatically in CI. The new version brings a few differences
with the current one:
- test now uses directly syscalls instead of wrapping commandline tools
into system() calls
- test_progs manipulates /dev/null (eg: redirecting test logs into it), so
disabling access to it in the bpf program confuses the tests. To fix this,
the first commit modifies the bpf program to allow access to char devices
1:3 (/dev/null), and disable access to char devices 1:5 (/dev/zero)
- once test is converted, add a small subtest to also check for device type
interpretation (char or block)
- paths used in mknod tests are now in /dev instead of /tmp: due to the CI
runner organisation and mountpoints manipulations, trying to create nodes
in /tmp leads to errors unrelated to the test (ie, mknod calls refused by
kernel, not the bpf program). I don't understand exactly the root cause
at the deepest point (all I see in CI is an -ENXIO error on mknod when trying to
create the node in tmp, and I can not make sense out of it neither
replicate it locally), so I would gladly take inputs from anyone more
educated than me about this.
The new test_progs part has been tested in a local qemu environment as well
as in upstream CI:
./test_progs -a cgroup_dev
47/1 cgroup_dev/deny-mknod:OK
47/2 cgroup_dev/allow-mknod:OK
47/3 cgroup_dev/deny-mknod-wrong-type:OK
47/4 cgroup_dev/allow-read:OK
47/5 cgroup_dev/allow-write:OK
47/6 cgroup_dev/deny-read:OK
47/7 cgroup_dev/deny-write:OK
47 cgroup_dev:OK
Summary: 1/7 PASSED, 0 SKIPPED, 0 FAILED
---
Alexis Lothoré (eBPF Foundation) (3):
selftests/bpf: do not disable /dev/null device access in cgroup dev test
selftests/bpf: convert test_dev_cgroup to test_progs
selftests/bpf: add wrong type test to cgroup dev
tools/testing/selftests/bpf/.gitignore | 1 -
tools/testing/selftests/bpf/Makefile | 2 -
.../testing/selftests/bpf/prog_tests/cgroup_dev.c | 123 +++++++++++++++++++++
tools/testing/selftests/bpf/progs/dev_cgroup.c | 4 +-
tools/testing/selftests/bpf/test_dev_cgroup.c | 85 --------------
5 files changed, 125 insertions(+), 90 deletions(-)
---
base-commit: c90e2d4a7738a24fbb3657092dbd1ca20c040ed1
change-id: 20240723-convert_dev_cgroup-6464b0d37f1a
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
The format specifier in fprintf is "%u", that "%u" should use
unsigned int type instead.the problem is discovered by reading code.
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
v1->v2:
modify commit info add how to find the problem in the log
v2->v3:
Seems this can use macro WTERMSIG like those above usage, rather than
changing the print format.
v3->v4:
Now the commit summary doesn't match the change you are making.
Also WTERMSIG() is incorrect for this conditional code path.
See comments below in the code path.
I would leave the v2 code intact. How are you testing this change?
Please include the details in the change log.
v4->v5:
Compile the kernel for testing using make
tools/testing/selftests/kselftest_harness.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index e05ac8261046..675b8f43e148 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -910,7 +910,7 @@ void __wait_for_test(struct __test_metadata *t)
.sa_flags = SA_SIGINFO,
};
struct sigaction saved_action;
- int status;
+ unsigned int status;
if (sigaction(SIGALRM, &action, &saved_action)) {
t->passed = 0;
--
2.17.1
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/lkml/20240719-support_vendor_extensions-v3-4-0af758…
---
Changes in v8:
- Rebase onto palmer's for-next
- Link to v7: https://lore.kernel.org/r/20240724-xtheadvector-v7-0-b741910ada3e@rivosinc.…
Changes in v7:
- Add defs for has_xtheadvector_no_alternatives() and has_xtheadvector()
when vector disabled. (Palmer)
- Link to v6: https://lore.kernel.org/r/20240722-xtheadvector-v6-0-c9af0130fa00@rivosinc.…
Changes in v6:
- Fix return type of is_vector_supported()/is_xthead_supported() to be bool
- Link to v5: https://lore.kernel.org/r/20240719-xtheadvector-v5-0-4b485fc7d55f@rivosinc.…
Changes in v5:
- Rebase on for-next
- Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.…
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 ++
arch/riscv/include/asm/hwprobe.h | 3 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 225 ++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 51 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 24 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 68 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 8 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 890 insertions(+), 269 deletions(-)
---
base-commit: 2709e400c2e06ddae9ad120f301a5254f629cf3e
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Add RTC wakeup alarm for devices to resume after specific time interval.
This improvement in the test will help in enabling this test
in the CI systems and will eliminate the need of manual intervention
for resuming back the devices after suspend/hibernation.
Signed-off-by: Shreeya Patel <shreeya.patel(a)collabora.com>
---
Changes in v2
- Use rtcwake utility instead of sysfs for setting up
a RTC wakeup alarm
tools/testing/selftests/cpufreq/cpufreq.sh | 15 +++++++++++++++
tools/testing/selftests/cpufreq/main.sh | 13 ++++++++++++-
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cpufreq/cpufreq.sh b/tools/testing/selftests/cpufreq/cpufreq.sh
index b583a2fb4504..a427de1f9e08 100755
--- a/tools/testing/selftests/cpufreq/cpufreq.sh
+++ b/tools/testing/selftests/cpufreq/cpufreq.sh
@@ -232,6 +232,21 @@ do_suspend()
for i in `seq 1 $2`; do
printf "Starting $1\n"
+
+ if [ "$3" = "rtc" ]; then
+ if ! command -v rtcwake &> /dev/null; then
+ printf "rtcwake could not be found, please install it.\n"
+ return 1
+ fi
+
+ rtcwake -m $filename -s 15
+
+ if [ $? -ne 0 ]; then
+ printf "Failed to suspend using RTC wake alarm\n"
+ return 1
+ fi
+ fi
+
echo $filename > $SYSFS/power/state
printf "Came out of $1\n"
diff --git a/tools/testing/selftests/cpufreq/main.sh b/tools/testing/selftests/cpufreq/main.sh
index 60ce18ed0666..b1ca4147a5e6 100755
--- a/tools/testing/selftests/cpufreq/main.sh
+++ b/tools/testing/selftests/cpufreq/main.sh
@@ -24,6 +24,8 @@ helpme()
[-t <basic: Basic cpufreq testing
suspend: suspend/resume,
hibernate: hibernate/resume,
+ suspend_rtc: suspend/resume back using the RTC wakeup alarm,
+ hibernate_rtc: hibernate/resume back using the RTC wakeup alarm,
modtest: test driver or governor modules. Only to be used with -d or -g options,
sptest1: Simple governor switch to produce lockdep.
sptest2: Concurrent governor switch to produce lockdep.
@@ -76,7 +78,8 @@ parse_arguments()
helpme
;;
- t) # --func_type (Function to perform: basic, suspend, hibernate, modtest, sptest1/2/3/4 (default: basic))
+ t) # --func_type (Function to perform: basic, suspend, hibernate,
+ # suspend_rtc, hibernate_rtc, modtest, sptest1/2/3/4 (default: basic))
FUNC=$OPTARG
;;
@@ -122,6 +125,14 @@ do_test()
do_suspend "hibernate" 1
;;
+ "suspend_rtc")
+ do_suspend "suspend" 1 rtc
+ ;;
+
+ "hibernate_rtc")
+ do_suspend "hibernate" 1 rtc
+ ;;
+
"modtest")
# Do we have modules in place?
if [ -z $DRIVER_MOD ] && [ -z $GOVERNOR_MOD ]; then
--
2.39.2
After HID-BPF struct_ops was merged into 6.11-rc0, there are a few
mishaps:
- the bpf_wq API changed and needs to be updated here
- libbpf now auto-attach all the struct_ops it sees in the bpf object,
leading to attempting at attaching them multiple times
Fix the selftests but also prevent the same struct_ops to be attached
more than once as this enters various locks, confusions, and kernel
oopses.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Benjamin Tissoires (4):
selftests/hid: fix bpf_wq new API
selftests/hid: disable struct_ops auto-attach
HID: bpf: prevent the same struct_ops to be attached more than once
selftests/hid: add test for attaching multiple time the same struct_ops
drivers/hid/bpf/hid_bpf_struct_ops.c | 5 +++++
tools/testing/selftests/hid/hid_bpf.c | 26 ++++++++++++++++++++++
tools/testing/selftests/hid/progs/hid.c | 2 +-
.../testing/selftests/hid/progs/hid_bpf_helpers.h | 2 +-
4 files changed, 33 insertions(+), 2 deletions(-)
---
base-commit: 6e504d2c61244a01226c5100c835e44fb9b85ca8
change-id: 20240723-fix-6-11-bpf-cfa63dcda5bc
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
The format specifier in fprintf is "%u", that "%u" should use
unsigned int type instead.the problem is discovered by reading code.
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
v1->v2:
modify commit info add how to find the problem in the log
v2->v3:
Seems this can use macro WTERMSIG like those above usage, rather than
changing the print format.
v3->v4:
Now the commit summary doesn't match the change you are making.
Also WTERMSIG() is incorrect for this conditional code path.
See comments below in the code path.
I would leave the v2 code intact. How are you testing this change?
Please include the details in the change log.
tools/testing/selftests/kselftest_harness.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index e05ac8261046..675b8f43e148 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -910,7 +910,7 @@ void __wait_for_test(struct __test_metadata *t)
.sa_flags = SA_SIGINFO,
};
struct sigaction saved_action;
- int status;
+ unsigned int status;
if (sigaction(SIGALRM, &action, &saved_action)) {
t->passed = 0;
--
2.17.1
This patch series adds a selftest suite to validate the s390x
architecture specific ucontrol KVM interface.
When creating a VM on s390x it is possible to create it as userspace
controlled VM or in short ucontrol VM.
These VMs delegates the management of the VM to userspace instead
of handling most events within the kernel. Consequently the userspace
has to manage interrupts, memory allocation etc.
Before this patch set this functionality lacks any public test cases.
It is desirable to add test cases for this interface to be able to
reduce the risk of breaking changes in the future.
In order to provision a ucontrol VM the kernel needs to be compiled with
the CONFIG_KVM_S390_UCONTROL enabled. The users with sys_admin capability
can then create a new ucontrol VM providing the KVM_VM_S390_UCONTROL
parameter to the KVM_CREATE_VM ioctl.
The kernels existing selftest helper functions can only be partially be
reused for these tests.
The test cases cover existing special handling of ucontrol VMs within the
implementation and basic VM creation and handling cases:
* Reject setting HPAGE when VM is ucontrol
* Assert KVM_GET_DIRTY_LOG is rejected
* Assert KVM_S390_VM_MEM_LIMIT_SIZE is rejected
* Assert state of initial SIE flags setup by the kernel
* Run simple program in VM with and without DAT
* Assert KVM_EXIT_S390_UCONTROL exit on not mapped memory access
* Assert functionality of storage keys in ucontrol VM
Running the test cases requires sys_admin capabilities to start the
ucontrol VM.
This can be achieved by running as root or with a command like:
sudo setpriv --reuid nobody --inh-caps -all,+sys_admin \
--ambient-caps -all,+sys_admin --bounding-set -all,+sys_admin \
./ucontrol_test
The patch set does also contain some code cleanup / consolidation of
architecture specific defines that are now used in multiple test cases.
---
V1 -> V2:
- add ucontrol to s390 debug config (new patch)
- PATCH 2: changed atomic_t to __u32 (thanks Claudio)
- PATCH 4: reformatted comment in FIXTURE_SETUP(uc_kvm)
- PATCH 5: refactored to display 8 byte blocks + more internal reuse
(thanks Claudio)
- PATCH 7: make use of more declarative defines instead of magic values
- PATCH 8: make use of more declarative defines instead of magic values
(thanks Claudio)
- PATCH 9: add reference to fix verified by the test case
Christoph Schlameuss (10):
selftests: kvm: s390: Define page sizes in shared header
selftests: kvm: s390: Add kvm_s390_sie_block definition for userspace
tests
selftests: kvm: s390: Add s390x ucontrol test suite with hpage test
selftests: kvm: s390: Add test fixture and simple VM setup tests
selftests: kvm: s390: Add debug print functions
selftests: kvm: s390: Add VM run test case
selftests: kvm: s390: Add uc_map_unmap VM test case
selftests: kvm: s390: Add uc_skey VM test case
selftests: kvm: s390: Verify reject memory region operations for
ucontrol VMs
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
arch/s390/Kconfig.debug | 3 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/s390x/debug_print.h | 69 ++
.../selftests/kvm/include/s390x/processor.h | 5 +
.../testing/selftests/kvm/include/s390x/sie.h | 240 +++++++
.../selftests/kvm/lib/s390x/processor.c | 10 +-
tools/testing/selftests/kvm/s390x/cmma_test.c | 7 +-
tools/testing/selftests/kvm/s390x/config | 2 +
.../testing/selftests/kvm/s390x/debug_test.c | 4 +-
tools/testing/selftests/kvm/s390x/memop.c | 4 +-
tools/testing/selftests/kvm/s390x/tprot.c | 5 +-
.../selftests/kvm/s390x/ucontrol_test.c | 614 ++++++++++++++++++
13 files changed, 949 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/s390x/debug_print.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/sie.h
create mode 100644 tools/testing/selftests/kvm/s390x/config
create mode 100644 tools/testing/selftests/kvm/s390x/ucontrol_test.c
base-commit: 66ebbdfdeb093e097399b1883390079cd4c3022b
--
2.45.2
Hello everyone,
this small series is a first step in a larger effort aiming to help improve
eBPF selftests and the testing coverage in CI. It focuses for now on
test_xdp_veth.sh, a small test which is not integrated yet in test_progs.
The series is mostly about a rewrite of test_xdp_veth.sh to make it able to
run under test_progs, relying on libbpf to manipulate bpf programs involved
in the test.
Signed-off-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
---
Changes in v4:
- add missing netns_close in an error path during programs attach
- Link to v3: https://lore.kernel.org/r/20240716-convert_test_xdp_veth-v3-0-7b01389e3cb3@…
Changes in v3:
- Fix doc style in the new test
- Collect acked-by tags
- Link to v2: https://lore.kernel.org/r/20240715-convert_test_xdp_veth-v2-0-46290b82f6d2@…
Changes in v2:
- fix many formatting issues raised by checkpatch
- use static namespaces instead of random ones
- use SYS_NOFAIL instead of snprintf() + system ()
- squashed the new test addition patch and the old test removal patch
- Link to v1: https://lore.kernel.org/r/20240711-convert_test_xdp_veth-v1-0-868accb0a727@…
---
Alexis Lothoré (eBPF Foundation) (2):
selftests/bpf: update xdp_redirect_map prog sections for libbpf
selftests/bpf: integrate test_xdp_veth into test_progs
tools/testing/selftests/bpf/Makefile | 1 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 213 +++++++++++++++++++++
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 6 +-
tools/testing/selftests/bpf/test_xdp_veth.sh | 121 ------------
4 files changed, 216 insertions(+), 125 deletions(-)
---
base-commit: 4837cbaa1365cdb213b58577197c5b10f6e2aa81
change-id: 20240710-convert_test_xdp_veth-04cc05f5557d
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/lkml/20240719-support_vendor_extensions-v3-4-0af758…
---
Changes in v7:
- Add defs for has_xtheadvector_no_alternatives() and has_xtheadvector()
when vector disabled. (Palmer)
- Link to v6: https://lore.kernel.org/r/20240722-xtheadvector-v6-0-c9af0130fa00@rivosinc.…
Changes in v6:
- Fix return type of is_vector_supported()/is_xthead_supported() to be bool
- Link to v5: https://lore.kernel.org/r/20240719-xtheadvector-v5-0-4b485fc7d55f@rivosinc.…
Changes in v5:
- Rebase on for-next
- Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.…
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 ++
arch/riscv/include/asm/hwprobe.h | 5 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 225 ++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 54 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 24 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 68 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 8 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 891 insertions(+), 273 deletions(-)
---
base-commit: 554462ced9ac97487c8f725fe466a9c20ed87521
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Hello all,
This series includes the bulk of libc-related compile fixes accumulated to
support systems using musl, with smaller numbers to follow. These patches
are simple and straightforward, and the series has been tested with the
kernel-patches/bpf CI and locally using mips64el-gcc/musl-libc and QEMU
with an OpenWrt rootfs.
The patches address a few general categories of libc portability issues:
- missing, redundant or incorrect include headers
- disabled GNU header extensions (i.e. missing #define _GNU_SOURCE)
- issues with types and casting
Feedback and suggestions for improvement are welcome!
Thanks,
Tony
Tony Ambardar (19):
selftests/bpf: Use pid_t consistently in test_progs.c
selftests/bpf: Fix compile error from rlim_t in sk_storage_map.c
selftests/bpf: Fix error compiling bpf_iter_setsockopt.c with musl
libc
selftests/bpf: Drop unneeded include in unpriv_helpers.c
selftests/bpf: Drop unneeded include in sk_lookup.c
selftests/bpf: Drop unneeded include in flow_dissector.c
selftests/bpf: Fix missing ARRAY_SIZE() definition in bench.c
selftests/bpf: Fix missing UINT_MAX definitions in benchmarks
selftests/bpf: Fix missing BUILD_BUG_ON() declaration
selftests/bpf: Fix include of <sys/fcntl.h>
selftests/bpf: Fix compiling parse_tcp_hdr_opt.c with musl-libc
selftests/bpf: Fix compiling kfree_skb.c with musl-libc
selftests/bpf: Fix compiling flow_dissector.c with musl-libc
selftests/bpf: Fix compiling tcp_rtt.c with musl-libc
selftests/bpf: Fix compiling core_reloc.c with musl-libc
selftests/bpf: Fix errors compiling lwt_redirect.c with musl libc
selftests/bpf: Fix errors compiling decap_sanity.c with musl libc
selftests/bpf: Fix errors compiling crypto_sanity.c with musl libc
selftests/bpf: Fix errors compiling cg_storage_multi.h with musl libc
tools/testing/selftests/bpf/bench.c | 1 +
tools/testing/selftests/bpf/bench.h | 1 +
tools/testing/selftests/bpf/map_tests/sk_storage_map.c | 2 +-
tools/testing/selftests/bpf/prog_tests/bpf_iter_setsockopt.c | 2 +-
tools/testing/selftests/bpf/prog_tests/core_reloc.c | 1 +
tools/testing/selftests/bpf/prog_tests/crypto_sanity.c | 1 -
tools/testing/selftests/bpf/prog_tests/decap_sanity.c | 1 -
tools/testing/selftests/bpf/prog_tests/flow_dissector.c | 2 +-
tools/testing/selftests/bpf/prog_tests/kfree_skb.c | 1 +
tools/testing/selftests/bpf/prog_tests/lwt_redirect.c | 1 -
tools/testing/selftests/bpf/prog_tests/ns_current_pid_tgid.c | 2 +-
tools/testing/selftests/bpf/prog_tests/parse_tcp_hdr_opt.c | 1 +
tools/testing/selftests/bpf/prog_tests/sk_lookup.c | 1 -
tools/testing/selftests/bpf/prog_tests/tcp_rtt.c | 1 +
tools/testing/selftests/bpf/prog_tests/user_ringbuf.c | 1 +
tools/testing/selftests/bpf/progs/cg_storage_multi.h | 2 --
tools/testing/selftests/bpf/test_progs.c | 2 +-
tools/testing/selftests/bpf/unpriv_helpers.c | 1 -
18 files changed, 12 insertions(+), 12 deletions(-)
--
2.34.1
commit 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM")
changed the env variable for the architecture from MACHINE to ARCH.
This is preventing 3 required TEST_GEN_FILES from being included when
cross compiling s390x and errors when trying to run the test suite.
This is due to the ARCH variable already being set and the arch folder
name being s390.
Add "s390" to the filtered list to cover this case and have the 3 files
included in the build.
Fixes: 0518dbe97fe6 ("selftests/mm: fix cross compilation with LLVM")
Cc: stable(a)kernel.org
Cc: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 901e0d07765b..7b8a5def54a1 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -110,7 +110,7 @@ endif
endif
-ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64))
+ifneq (,$(filter $(ARCH),arm64 ia64 mips64 parisc64 powerpc riscv64 s390x sparc64 x86_64 s390))
TEST_GEN_FILES += va_high_addr_switch
TEST_GEN_FILES += virtual_address_range
TEST_GEN_FILES += write_to_hugetlbfs
--
2.45.2
'%u' in format string requires 'unsigned int' in __wait_for_test()
but the argument type is 'signed int' that this problem was discovered
by reading code.use macro WTERMSIG like those above usage to
fix the wrong format specifier.
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
Changes
v1->v2:
modify commit info add how to find the problem in the log
v2->v3:
Seems this can use macro WTERMSIG like those above usage, rather than
changing the print format.
tools/testing/selftests/kselftest_harness.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index dbbbcc6c04ee..f41f4435e9a4 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -1086,7 +1086,7 @@ void __wait_for_test(struct __test_metadata *t)
fprintf(TH_LOG_STREAM,
"# %s: Test ended in some other way [%d]\n",
t->name,
- status);
+ WTERMSIG(status));
}
}
--
2.17.1
In this series from Geliang, modifying MPTCP BPF selftests, we have:
- A new MPTCP subflow BPF program setting socket options per subflow: it
looks better to have this old test program in the BPF selftests to
track regressions and to serve as example.
Note: Nicolas is no longer working for Tessares, but he did this work
while working for them, and his email address is no longer available.
- A new symlink to MPTCP's pm_nl_ctl tool is added in BPF selftests, to
be able to use it instead of 'ip mptcp' which is not supported by the
BPF CI running IPRoute 5.5.0.
- A new MPTCP BPF subtest validating the new BPF program added in the
first patch.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v3:
- Sorry for the delay between v2 and v3, this series was conflicting
with the "add netns helpers", but it looks like it is on hold:
https://lore.kernel.org/cover.1715821541.git.tanggeliang@kylinos.cn
- Patch 1/3 includes "bpf_tracing_net.h", introduced in between.
- New patch 2/3: "selftests/bpf: Add mptcp pm_nl_ctl link".
- Patch 3/3: use the tool introduced in patch 2/3 + SYS_NOFAIL() helper.
- Link to v2: https://lore.kernel.org/r/20240509-upstream-bpf-next-20240506-mptcp-subflow…
Changes in v2:
- Previous patches 1/4 and 2/4 have been dropped from this series:
- 1/4: "selftests/bpf: Handle SIGINT when creating netns":
- A new version, more generic and no longer specific to MPTCP BPF
selftest will be sent later, as part of a new series. (Alexei)
- 2/4: "selftests/bpf: Add RUN_MPTCP_TEST macro":
- Removed, not to hide helper functions in macros. (Alexei)
- The commit message of patch 1/2 has been clarified to avoid some
possible confusions spot by Alexei.
- Link to v1: https://lore.kernel.org/r/20240507-upstream-bpf-next-20240506-mptcp-subflow…
---
Geliang Tang (2):
selftests/bpf: Add mptcp pm_nl_ctl link
selftests/bpf: Add mptcp subflow subtest
Nicolas Rybowski (1):
selftests/bpf: Add mptcp subflow example
MAINTAINERS | 1 +
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/mptcp_pm_nl_ctl.c | 1 +
tools/testing/selftests/bpf/prog_tests/mptcp.c | 104 ++++++++++++++++++++++
tools/testing/selftests/bpf/progs/mptcp_subflow.c | 59 ++++++++++++
5 files changed, 167 insertions(+), 1 deletion(-)
---
base-commit: fd8db07705c55a995c42b1e71afc42faad675b0b
change-id: 20240506-upstream-bpf-next-20240506-mptcp-subflow-test-faef6654bfa3
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
v15: https://patchwork.kernel.org/project/netdevbpf/list/?series=865481&state=*
====
No material changes in this version, only a fix to linking against
libynl.a from the last version. Per Jakub's instructions I've pulled one
of his patches into this series, and now use the new libynl.a correctly,
I hope.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v15/
v14: https://patchwork.kernel.org/project/netdevbpf/list/?series=865135&archive=…
====
No material changes in this version. Only rebase and re-verification on
top of net-next. v13, I think, raced with commit ebad6d0334793
("net/ipv4: Use nested-BH locking for ipv4_tcp_sk.") being merged to
net-next that caused a patchwork failure to apply. This series should
apply cleanly on commit c4532232fa2a4 ("selftests: net: remove unneeded
IP_GRE config").
I did not wait the customary 24hr as Jakub said it's OK to repost as soon
as I build test the rebased version:
https://lore.kernel.org/netdev/20240625075926.146d769d@kernel.org/
v13: https://patchwork.kernel.org/project/netdevbpf/list/?series=861406&archive=…
====
Major changes:
--------------
This iteration addresses Pavel's review comments, applies his
reviewed-by's, and seeks to fix the patchwork build error (sorry!).
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v13/
v12: https://patchwork.kernel.org/project/netdevbpf/list/?series=859747&state=*
====
Major changes:
--------------
This iteration only addresses one minor comment from Pavel with regards
to the trace printing of netmem, and the patchwork build error
introduced in v11 because I missed doing an allmodconfig build, sorry.
Other than that v11, AFAICT, received no feedback. There is one
discussion about how the specifics of plugging io uring memory through
the page pool, but not relevant to content in this particular patchset,
AFAICT.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v12/
v11: https://patchwork.kernel.org/project/netdevbpf/list/?series=857457&state=*
====
Major Changes:
--------------
v11 addresses feedback received in v10. The major change is the removal
of the memory provider ops as requested by Christoph. We still
accomplish the same thing, but utilizing direct function calls with if
statements rather than generic ops.
Additionally address sparse warnings, bugs and review comments from
folks that reviewed.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v11/
Detailed changelog:
-------------------
- Fixes in netdev_rx_queue_restart() from Pavel & David.
- Remove commit e650e8c3a36f5 ("net: page_pool: create hooks for
custom page providers") from the series to address Christoph's
feedback and rebased other patches on the series on this change.
- Fixed build errors with CONFIG_DMA_SHARED_BUFFER &&
!CONFIG_GENERIC_ALLOCATOR build.
- Fixed sparse warnings pointed out by Paolo.
- Drop unnecessary gro_pull_from_frag0 checks.
- Added Bagas reviewed-by to docs.
Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Nikolay Aleksandrov <razor(a)blackwall.org>
v10: https://patchwork.kernel.org/project/netdevbpf/list/?series=852422&state=*
====
Major Changes:
--------------
v9 was sent right before the merge window closed (sorry!). v10 is almost
a re-send of the series now that the merge window re-opened. Only
rebased to latest net-next and addressed some minor iterative comments
received on v9.
As usual, the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v10/
Detailed changelog:
-------------------
- Fixed tokens leaking in DONTNEED setsockopt (Nikolay).
- Moved net_iov_dma_addr() to devmem.c and made it a devmem specific
helpers (David).
- Rename hook alloc_pages to alloc_netmems as alloc_pages is now
preprocessor macro defined and causes a build error.
v9:
===
Major Changes:
--------------
GVE queue API has been merged. Submitting this version as non-RFC after
rebasing on top of the merged API, and dropped the out of tree queue API
I was carrying on github. Addressed the little feedback v8 has received.
Detailed changelog:
------------------
- Added new patch from David Wei to this series for
netdev_rx_queue_restart()
- Fixed sparse error.
- Removed CONFIG_ checks in netmem_is_net_iov()
- Flipped skb->readable to skb->unreadable
- Minor fixes to selftests & docs.
RFC v8:
=======
Major Changes:
--------------
- Fixed build error generated by patch-by-patch build.
- Applied docs suggestions from Randy.
RFC v7:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the feedback
RFCv6 received from folks, namely Jakub, Yunsheng, Arnd, David, & Pavel.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v7/
Detailed changelog:
- Use admin-perm in netlink API.
- Addressed feedback from Jakub with regards to netlink API
implementation.
- Renamed devmem.c functions to something more appropriate for that
file.
- Improve the performance seen through the page_pool benchmark.
- Fix the value definition of all the SO_DEVMEM_* uapi.
- Various fixes to documentation.
Perf - page-pool benchmark:
---------------------------
Improved performance of bench_page_pool_simple.ko tests compared to v6:
https://pastebin.com/raw/v5dYRg8L
net-next base: 8 cycle fast path.
RFC v6: 10 cycle fast path.
RFC v7: 9 cycle fast path.
RFC v7 with CONFIG_DMA_SHARED_BUFFER disabled: 8 cycle fast path,
same as baseline.
Perf - Devmem TCP benchmark:
---------------------
Perf is about the same regardless of the changes in v7, namely the
removal of the static_branch_unlikely to improve the page_pool benchmark
performance:
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
RFC v6:
=======
Major Changes:
--------------
This revision largely rebases on top of net-next and addresses the little
feedback RFCv5 received.
The series remains in RFC because the queue-API ndos defined in this
series are not yet implemented. I have a GVE implementation I carry out
of tree for my testing. A upstreamable GVE implementation is in the
works. Aside from that, in my estimation all the patches are ready for
review/merge. Please do take a look.
As usual the full devmem TCP changes including the full GVE driver
implementation is here:
https://github.com/mina/linux/commits/tcpdevmem-v6/
This version also comes with some performance data recorded in the cover
letter (see below changelog).
Detailed changelog:
- Rebased on top of the merged netmem_ref changes.
- Converted skb->dmabuf to skb->readable (Pavel). Pavel's original
suggestion was to remove the skb->dmabuf flag entirely, but when I
looked into it closely, I found the issue that if we remove the flag
we have to dereference the shinfo(skb) pointer to obtain the first
frag to tell whether an skb is readable or not. This can cause a
performance regression if it dirties the cache line when the
shinfo(skb) was not really needed. Instead, I converted the skb->dmabuf
flag into a generic skb->readable flag which can be re-used by io_uring
0-copy RX.
- Squashed a few locking optimizations from Eric Dumazet in the RX path
and the DEVMEM_DONTNEED setsockopt.
- Expanded the tests a bit. Added validation for invalid scenarios and
added some more coverage.
Perf - page-pool benchmark:
---------------------------
bench_page_pool_simple.ko tests with and without these changes:
https://pastebin.com/raw/ncHDwAbn
AFAIK the number that really matters in the perf tests is the
'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
cycles without the changes but there is some 1 cycle noise in some
results.
With the patches this regresses to 9 cycles with the changes but there
is 1 cycle noise occasionally running this test repeatedly.
Lastly I tried disable the static_branch_unlikely() in
netmem_is_net_iov() check. To my surprise disabling the
static_branch_unlikely() check reduces the fast path back to 8 cycles,
but the 1 cycle noise remains.
Perf - Devmem TCP benchmark:
---------------------
189/200gbps bi-directional throughput with RX devmem TCP and regular TCP
TX i.e. ~95% line rate.
Major changes in RFC v5:
========================
1. Rebased on top of 'Abstract page from net stack' series and used the
new netmem type to refer to LSB set pointers instead of re-using
struct page.
2. Downgraded this series back to RFC and called it RFC v5. This is
because this series is now dependent on 'Abstract page from net
stack'[1] and the queue API. Both are removed from the series to
reduce the patch # and those bits are fairly independent or
pre-requisite work.
3. Reworked the page_pool devmem support to use netmem and for some
more unified handling.
4. Reworked the reference counting of net_iov (renamed from
page_pool_iov) to use pp_ref_count for refcounting.
The full changes including the dependent series and GVE page pool
support is here:
https://github.com/mina/linux/commits/tcpdevmem-rfcv5/
[1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774
Major changes in v1:
====================
1. Implemented MVP queue API ndos to remove the userspace-visible
driver reset.
2. Fixed issues in the napi_pp_put_page() devmem frag unref path.
3. Removed RFC tag.
Many smaller addressed comments across all the patches (patches have
individual change log).
Full tree including the rest of the GVE driver changes:
https://github.com/mina/linux/commits/tcpdevmem-v1
Changes in RFC v3:
==================
1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the
series reviewable and mergeable.
2. Implemented multi-rx-queue binding which was a todo in v2.
3. Fix to cmsg handling.
The sticking point in RFC v2[2] was the device reset required to refill
the device rx-queues after the dmabuf bind/unbind. The solution
suggested as I understand is a subset of the per-queue management ops
Jakub suggested or similar:
https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/
This is not addressed in this revision, because:
1. This point was discussed at netconf & netdev and there is openness to
using the current approach of requiring a device reset.
2. Implementing individual queue resetting seems to be difficult for my
test bed with GVE. My prototype to test this ran into issues with the
rx-queues not coming back up properly if reset individually. At the
moment I'm unsure if it's a mistake in the POC or a genuine issue in
the virtualization stack behind GVE, which currently doesn't test
individual rx-queue restart.
3. Our usecases are not bothered by requiring a device reset to refill
the buffer queues, and we'd like to support NICs that run into this
limitation with resetting individual queues.
My thought is that drivers that have trouble with per-queue configs can
use the support in this series, while drivers that support new netdev
ops to reset individual queues can automatically reset the queue as
part of the dma-buf bind/unbind.
The same approach with device resets is presented again for consideration
with other sticking points addressed.
This proposal includes the rx devmem path only proposed for merge. For a
snapshot of my entire tree which includes the GVE POC page pool support &
device memory support:
https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3
[1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.…
[2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4…
Changes in RFC v2:
==================
The sticking point in RFC v1[1] was the dma-buf pages approach we used to
deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept
that attempts to resolve this by implementing scatterlist support in the
networking stack, such that we can import the dma-buf scatterlist
directly. This is the approach proposed at a high level here[2].
Detailed changes:
1. Replaced dma-buf pages approach with importing scatterlist into the
page pool.
2. Replace the dma-buf pages centric API with a netlink API.
3. Removed the TX path implementation - there is no issue with
implementing the TX path with scatterlist approach, but leaving
out the TX path makes it easier to review.
4. Functionality is tested with this proposal, but I have not conducted
perf testing yet. I'm not sure there are regressions, but I removed
perf claims from the cover letter until they can be re-confirmed.
5. Added Signed-off-by: contributors to the implementation.
6. Fixed some bugs with the RX path since RFC v1.
Any feedback welcome, but specifically the biggest pending questions
needing feedback IMO are:
1. Feedback on the scatterlist-based approach in general.
2. Netlink API (Patch 1 & 2).
3. Approach to handle all the drivers that expect to receive pages from
the page pool (Patch 6).
[1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c…
[2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX…
==================
* TL;DR:
Device memory TCP (devmem TCP) is a proposal for transferring data to and/or
from device memory efficiently, without bouncing the data to a host memory
buffer.
* Problem:
A large amount of data transfers have device memory as the source and/or
destination. Accelerators drastically increased the volume of such transfers.
Some examples include:
- ML accelerators transferring large amounts of training data from storage into
GPU/TPU memory. In some cases ML training setup time can be as long as 50% of
TPU compute time, improving data transfer throughput & efficiency can help
improving GPU/TPU utilization.
- Distributed training, where ML accelerators, such as GPUs on different hosts,
exchange data among them.
- Distributed raw block storage applications transfer large amounts of data with
remote SSDs, much of this data does not require host processing.
Today, the majority of the Device-to-Device data transfers the network are
implemented as the following low level operations: Device-to-Host copy,
Host-to-Host network transfer, and Host-to-Device copy.
The implementation is suboptimal, especially for bulk data transfers, and can
put significant strains on system resources, such as host memory bandwidth,
PCIe bandwidth, etc. One important reason behind the current state is the
kernel’s lack of semantics to express device to network transfers.
* Proposal:
In this patch series we attempt to optimize this use case by implementing
socket APIs that enable the user to:
1. send device memory across the network directly, and
2. receive incoming network packets directly into device memory.
Packet _payloads_ go directly from the NIC to device memory for receive and from
device memory to NIC for transmit.
Packet _headers_ go to/from host memory and are processed by the TCP/IP stack
normally. The NIC _must_ support header split to achieve this.
Advantages:
- Alleviate host memory bandwidth pressure, compared to existing
network-transfer + device-copy semantics.
- Alleviate PCIe BW pressure, by limiting data transfer to the lowest level
of the PCIe tree, compared to traditional path which sends data through the
root complex.
* Patch overview:
** Part 1: netlink API
Gives user ability to bind dma-buf to an RX queue.
** Part 2: scatterlist support
Currently the standard for device memory sharing is DMABUF, which doesn't
generate struct pages. On the other hand, networking stack (skbs, drivers, and
page pool) operate on pages. We have 2 options:
1. Generate struct pages for dmabuf device memory, or,
2. Modify the networking stack to process scatterlist.
Approach #1 was attempted in RFC v1. RFC v2 implements approach #2.
** part 3: page pool support
We piggy back on page pool memory providers proposal:
https://github.com/kuba-moo/linux/tree/pp-providers
It allows the page pool to define a memory provider that provides the
page allocation and freeing. It helps abstract most of the device memory
TCP changes from the driver.
** part 4: support for unreadable skb frags
Page pool iovs are not accessible by the host; we implement changes
throughput the networking stack to correctly handle skbs with unreadable
frags.
** Part 5: recvmsg() APIs
We define user APIs for the user to send and receive device memory.
Not included with this series is the GVE devmem TCP support, just to
simplify the review. Code available here if desired:
https://github.com/mina/linux/tree/tcpdevmem
This series is built on top of net-next with Jakub's pp-providers changes
cherry-picked.
* NIC dependencies:
1. (strict) Devmem TCP require the NIC to support header split, i.e. the
capability to split incoming packets into a header + payload and to put
each into a separate buffer. Devmem TCP works by using device memory
for the packet payload, and host memory for the packet headers.
2. (optional) Devmem TCP works better with flow steering support & RSS support,
i.e. the NIC's ability to steer flows into certain rx queues. This allows the
sysadmin to enable devmem TCP on a subset of the rx queues, and steer
devmem TCP traffic onto these queues and non devmem TCP elsewhere.
The NIC I have access to with these properties is the GVE with DQO support
running in Google Cloud, but any NIC that supports these features would suffice.
I may be able to help reviewers bring up devmem TCP on their NICs.
* Testing:
The series includes a udmabuf kselftest that show a simple use case of
devmem TCP and validates the entire data path end to end without
a dependency on a specific dmabuf provider.
** Test Setup
Kernel: net-next with this series and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: David Wei <dw(a)davidwei.uk>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Yunsheng Lin <linyunsheng(a)huawei.com>
Cc: Shailend Chand <shailend(a)google.com>
Cc: Harshitha Ramamurthy <hramamurthy(a)google.com>
Cc: Shakeel Butt <shakeel.butt(a)linux.dev>
Cc: Jeroen de Borst <jeroendb(a)google.com>
Cc: Praveen Kaligineedi <pkaligineedi(a)google.com>
Jakub Kicinski (1):
tools: net: package libynl for use in selftests
Mina Almasry (13):
netdev: add netdev_rx_queue_restart()
net: netdev netlink api to bind dma-buf to a net device
netdev: support binding dma-buf to netdevice
netdev: netdevice devmem allocator
page_pool: convert to use netmem
page_pool: devmem support
memory-provider: dmabuf devmem memory provider
net: support non paged skb frags
net: add support for skbs with unreadable frags
tcp: RX path for devmem TCP
net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags
net: add devmem TCP documentation
selftests: add ncdevmem, netcat for devmem TCP
Documentation/netlink/specs/netdev.yaml | 57 +++
Documentation/networking/devmem.rst | 258 +++++++++++
Documentation/networking/index.rst | 1 +
arch/alpha/include/uapi/asm/socket.h | 6 +
arch/mips/include/uapi/asm/socket.h | 6 +
arch/parisc/include/uapi/asm/socket.h | 6 +
arch/sparc/include/uapi/asm/socket.h | 6 +
include/linux/skbuff.h | 61 ++-
include/linux/skbuff_ref.h | 11 +-
include/linux/socket.h | 1 +
include/net/devmem.h | 124 ++++++
include/net/mp_dmabuf_devmem.h | 44 ++
include/net/netdev_rx_queue.h | 5 +
include/net/netmem.h | 208 ++++++++-
include/net/page_pool/helpers.h | 124 ++++--
include/net/page_pool/types.h | 22 +-
include/net/sock.h | 2 +
include/net/tcp.h | 5 +-
include/trace/events/page_pool.h | 30 +-
include/uapi/asm-generic/socket.h | 6 +
include/uapi/linux/netdev.h | 19 +
include/uapi/linux/uio.h | 17 +
net/bpf/test_run.c | 5 +-
net/core/Makefile | 3 +-
net/core/datagram.c | 6 +
net/core/dev.c | 6 +-
net/core/devmem.c | 376 ++++++++++++++++
net/core/gro.c | 3 +-
net/core/netdev-genl-gen.c | 23 +
net/core/netdev-genl-gen.h | 6 +
net/core/netdev-genl.c | 103 +++++
net/core/netdev_rx_queue.c | 74 ++++
net/core/page_pool.c | 362 +++++++++-------
net/core/skbuff.c | 83 +++-
net/core/sock.c | 61 +++
net/ipv4/esp4.c | 3 +-
net/ipv4/tcp.c | 261 +++++++++++-
net/ipv4/tcp_input.c | 13 +-
net/ipv4/tcp_ipv4.c | 16 +
net/ipv4/tcp_minisocks.c | 2 +
net/ipv4/tcp_output.c | 5 +-
net/ipv6/esp6.c | 3 +-
net/packet/af_packet.c | 4 +-
tools/include/uapi/linux/netdev.h | 19 +
tools/net/ynl/Makefile | 6 +-
tools/net/ynl/lib/Makefile | 4 +-
tools/testing/selftests/net/.gitignore | 1 +
tools/testing/selftests/net/Makefile | 9 +
tools/testing/selftests/net/ncdevmem.c | 542 ++++++++++++++++++++++++
tools/testing/selftests/net/ynl.mk | 21 +
50 files changed, 2786 insertions(+), 253 deletions(-)
create mode 100644 Documentation/networking/devmem.rst
create mode 100644 include/net/devmem.h
create mode 100644 include/net/mp_dmabuf_devmem.h
create mode 100644 net/core/devmem.c
create mode 100644 net/core/netdev_rx_queue.c
create mode 100644 tools/testing/selftests/net/ncdevmem.c
create mode 100644 tools/testing/selftests/net/ynl.mk
--
2.45.2.803.g4e1b14247a-goog
If the testing kernel doesn't support setting fdb_max_learned or show
fdb_n_learned, just skip it. Or we will get errors like
./bridge_fdb_learning_limit.sh: line 218: [: null: integer expression expected
./bridge_fdb_learning_limit.sh: line 225: [: null: integer expression expected
Fixes: 6f84090333bb ("selftests: forwarding: bridge_fdb_learning_limit: Add a new selftest")
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
.../forwarding/bridge_fdb_learning_limit.sh | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh b/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh
index 0760a34b7114..a21b7085da2e 100755
--- a/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_fdb_learning_limit.sh
@@ -178,6 +178,22 @@ fdb_del()
check_err $? "Failed to remove a FDB entry of type ${type}"
}
+check_fdb_n_learned_support()
+{
+ if ! ip link help bridge 2>&1 | grep -q "fdb_max_learned"; then
+ echo "SKIP: iproute2 too old, missing bridge max learned support"
+ exit $ksft_skip
+ fi
+
+ ip link add dev br0 type bridge
+ local learned=$(fdb_get_n_learned)
+ ip link del dev br0
+ if [ "$learned" == "null" ]; then
+ echo "SKIP: kernel too old; bridge fdb_n_learned feature not supported."
+ exit $ksft_skip
+ fi
+}
+
check_accounting_one_type()
{
local type=$1 is_counted=$2 overrides_learned=$3
@@ -274,6 +290,8 @@ check_limit()
done
}
+check_fdb_n_learned_support
+
trap cleanup EXIT
setup_prepare
--
2.45.0
Based on feedback from Linus[1] and follow-up discussions, change the
suggested file naming for KUnit tests.
Link: https://lore.kernel.org/lkml/CAHk-=wgim6pNiGTBMhP8Kd3tsB7_JTAuvNJ=XYd3wPvvk… [1]
Signed-off-by: Kees Cook <kees(a)kernel.org>
---
Cc: David Gow <davidgow(a)google.com>
Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
Cc: Rae Moar <rmoar(a)google.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: kunit-dev(a)googlegroups.com
Cc: linux-doc(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-hardening(a)vger.kernel.org
---
Documentation/dev-tools/kunit/style.rst | 25 +++++++++++++++----------
1 file changed, 15 insertions(+), 10 deletions(-)
diff --git a/Documentation/dev-tools/kunit/style.rst b/Documentation/dev-tools/kunit/style.rst
index b6d0d7359f00..1538835cd0e2 100644
--- a/Documentation/dev-tools/kunit/style.rst
+++ b/Documentation/dev-tools/kunit/style.rst
@@ -188,15 +188,20 @@ For example, a Kconfig entry might look like:
Test File and Module Names
==========================
-KUnit tests can often be compiled as a module. These modules should be named
-after the test suite, followed by ``_test``. If this is likely to conflict with
-non-KUnit tests, the suffix ``_kunit`` can also be used.
-
-The easiest way of achieving this is to name the file containing the test suite
-``<suite>_test.c`` (or, as above, ``<suite>_kunit.c``). This file should be
-placed next to the code under test.
+Whether a KUnit test is compiled as a separate module or via an
+``#include`` in a core kernel source file, the file should be named
+after the test suite, followed by ``_kunit``, and live in a ``tests``
+subdirectory to avoid conflicting with regular modules (e.g. if "foobar"
+is the core module, then "foobar_kunit" is the KUnit test module) or the
+core kernel source file names (e.g. for tab-completion). Many existing
+tests use a ``_test`` suffix, but this is considered deprecated.
+
+So for the common case, name the file containing the test suite
+``tests/<suite>_kunit.c``. The ``tests`` directory should be placed at
+the same level as the code under test. For example, tests for
+``lib/string.c`` live in ``lib/tests/string_kunit.c``.
If the suite name contains some or all of the name of the test's parent
-directory, it may make sense to modify the source filename to reduce redundancy.
-For example, a ``foo_firmware`` suite could be in the ``foo/firmware_test.c``
-file.
+directory, it may make sense to modify the source filename to reduce
+redundancy. For example, a ``foo_firmware`` suite could be in the
+``tests/foo/firmware_kunit.c`` file.
--
2.34.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This set is part 11 of series "use network helpers" all BPF selftests
wide.
Finally something new in this set.
The helper make_sockaddr is extended to support sockets of AF_PACKET,
AF_ALG and AF_VSOCK families. Then these types of sockets can be used
to start_server_str() helper too.
Imitating connect_to_* interfaces, send_to_* interfaces are added to
support sendto() with given FD or the address string.
Add more conditions to control listen: nolisten flag, listen_support()
helper and clear "type" bits for listen.
Patch 1 for AF_UNIX socket:
Patch 1 uses start_server_str for a AF_UNIX socket.
Patches 2-6 for AF_PACKET sockets:
Patch 2 adds AF_PACKET support for make_sockaddr.
Patch 3 uses start_server_str for a AF_PACKET socket.
Patches 4-5 adds send_to_fd_opts/send_to_addr_str helpers.
Patch 6 uses send_to_addr_str for a AF_PACKET socket.
Patches 7-9 for AF_ALG sockets:
Patch 7 adds AF_ALG support for make_sockaddr.
Patch 8 add nolisten flag, needed by patch 9.
Patch 9 uses send_to_addr_str for a AF_ALG socket.
Patches 10-15 for AF_VSOCK sockets:
Patch 10 adds AF_VSOCK support for make_sockaddr.
Patches 11-12 uses make_sockaddr for AF_VSOCK sockets.
Patches 13-14 adds more conditions to control listen.
Patch 15 uses start_server_str for AF_VSOCK sockets.
Geliang Tang (15):
selftests/bpf: Use start_server_str in skc_to_unix_sock
selftests/bpf: AF_PACKET support for make_sockaddr
selftests/bpf: Use start_server_str in lwt_redirect
selftests/bpf: Add send_to_fd_opts helper
selftests/bpf: Add send_to_addr_str helper
selftests/bpf: Use send_to_addr_str in xdp_bonding
selftests/bpf: AF_ALG support for make_sockaddr
selftests/bpf: Add nolisten for network_helper_opts
selftests/bpf: Use start_server_str in crypto_sanity
selftests/bpf: AF_VSOCK support for make_sockaddr
selftests/bpf: Add loopback_addr_str helper
selftests/bpf: Use make_sockaddr in sockmap_helpers
selftests/bpf: Check listen support for start_server_addr
selftests/bpf: Clear type bits for start_server_addr
selftests/bpf: Use start_server_str in sockmap_helpers
tools/testing/selftests/bpf/network_helpers.c | 144 +++++++++++++++---
tools/testing/selftests/bpf/network_helpers.h | 21 +++
.../selftests/bpf/prog_tests/crypto_sanity.c | 12 +-
.../selftests/bpf/prog_tests/lwt_redirect.c | 21 +--
.../bpf/prog_tests/migrate_reuseport.c | 2 +-
.../bpf/prog_tests/skc_to_unix_sock.c | 22 +--
.../bpf/prog_tests/sockmap_helpers.h | 101 +++---------
.../selftests/bpf/prog_tests/xdp_bonding.c | 20 +--
8 files changed, 186 insertions(+), 157 deletions(-)
--
2.43.0
'%u' in format string requires 'unsigned int' in __wait_for_test()
but the argument type is 'signed int' that this problem was discovered
by reading code
Signed-off-by: Zhu Jun <zhujun2(a)cmss.chinamobile.com>
---
Changes in v2:
- modify commit info add how to find the problem in the log
tools/testing/selftests/kselftest_harness.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h
index b634969cbb6f..dbbbcc6c04ee 100644
--- a/tools/testing/selftests/kselftest_harness.h
+++ b/tools/testing/selftests/kselftest_harness.h
@@ -1084,7 +1084,7 @@ void __wait_for_test(struct __test_metadata *t)
}
} else {
fprintf(TH_LOG_STREAM,
- "# %s: Test ended in some other way [%u]\n",
+ "# %s: Test ended in some other way [%d]\n",
t->name,
status);
}
--
2.17.1
Hello,
This series includes two fixes to support builds targeting MIPS systems.
The patches have been tested both with the kernel-patches/bpf CI and
locally using mips64el-gcc/musl-libc and QEMU with an OpenWrt rootfs.
Patch 1 adds support for MIPS system includes when compiling BPF.
Patch 2 fixes a MIPS GOT issue when linking uprobe_multi.
Feedback and suggestions for improvement are welcome!
Thanks,
Tony
Tony Ambardar (2):
selftests/bpf: Add missing system defines for mips
selftests/bpf: Fix error linking uprobe_multi on mips
tools/testing/selftests/bpf/Makefile | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
--
2.34.1
This is an updated patchset following some excellent comments from Roman
and Longman. [1]
As suggested, I've broken this into two commits:
1) with the implementation changes
2) extending the tests tests
I haven't been able to induce anything problematic, but I'm a bit
unclear as to whether there's reference counting on cgroups such that
we don't need to handle the case where the cgroup is freed before the
one of the peak files is closed.
Documentation/admin-guide/cgroup-v2.rst | 26 ++-
include/linux/cgroup.h | 8 +
include/linux/memcontrol.h | 5 +
include/linux/page_counter.h | 11 +-
kernel/cgroup/cgroup-internal.h | 2 +
kernel/cgroup/cgroup.c | 7 +
mm/memcontrol.c | 129 +++++++++++++--
mm/page_counter.c | 36 ++++-
tools/testing/selftests/cgroup/cgroup_util.c | 22 +++
tools/testing/selftests/cgroup/cgroup_util.h | 2 +
tools/testing/selftests/cgroup/test_memcontrol.c | 227 ++++++++++++++++++++++++++-
11 files changed, 444 insertions(+), 31 deletions(-)
[1]: https://lore.kernel.org/cgroups/20240722151713.2724855-1-davidf@vimeo.com/T/
Thank you for your efforts and reviews,
David Finkel
Senior Principal Software Engineer
Vimeo Inc.
From: Geliang Tang <tanggeliang(a)kylinos.cn>
This set is part 10 of series "use network helpers" all BPF selftests
wide.
Patches 1-3 drop local functions make_client(), make_socket() and
inetaddr_len() in sk_lookup.c. Patch 4 drops a useless function
__start_server() in network_helpers.c.
Geliang Tang (4):
selftests/bpf: Drop make_client in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
selftests/bpf: Drop inetaddr_len in sk_lookup
selftests/bpf: Drop __start_server in network_helpers
tools/testing/selftests/bpf/network_helpers.c | 26 ++---
.../selftests/bpf/prog_tests/sk_lookup.c | 110 +++++-------------
2 files changed, 40 insertions(+), 96 deletions(-)
--
2.43.0
From: Xu Kuohai <xukuohai(a)huawei.com>
LSM BPF prog returning a positive number attached to the hook
file_alloc_security makes kernel panic.
Here is a panic log:
[ 441.235774] BUG: kernel NULL pointer dereference, address: 00000000000009
[ 441.236748] #PF: supervisor write access in kernel mode
[ 441.237429] #PF: error_code(0x0002) - not-present page
[ 441.238119] PGD 800000000b02f067 P4D 800000000b02f067 PUD b031067 PMD 0
[ 441.238990] Oops: 0002 [#1] PREEMPT SMP PTI
[ 441.239546] CPU: 0 PID: 347 Comm: loader Not tainted 6.8.0-rc6-gafe0cbf23373 #22
[ 441.240496] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b4
[ 441.241933] RIP: 0010:alloc_file+0x4b/0x190
[ 441.242485] Code: 8b 04 25 c0 3c 1f 00 48 8b b0 30 0c 00 00 e8 9c fe ff ff 48 3d 00 f0 ff fb
[ 441.244820] RSP: 0018:ffffc90000c67c40 EFLAGS: 00010203
[ 441.245484] RAX: ffff888006a891a0 RBX: ffffffff8223bd00 RCX: 0000000035b08000
[ 441.246391] RDX: ffff88800b95f7b0 RSI: 00000000001fc110 RDI: f089cd0b8088ffff
[ 441.247294] RBP: ffffc90000c67c58 R08: 0000000000000001 R09: 0000000000000001
[ 441.248209] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000001
[ 441.249108] R13: ffffc90000c67c78 R14: ffffffff8223bd00 R15: fffffffffffffff4
[ 441.250007] FS: 00000000005f3300(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
[ 441.251053] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 441.251788] CR2: 00000000000001a9 CR3: 000000000bdc4003 CR4: 0000000000170ef0
[ 441.252688] Call Trace:
[ 441.253011] <TASK>
[ 441.253296] ? __die+0x24/0x70
[ 441.253702] ? page_fault_oops+0x15b/0x480
[ 441.254236] ? fixup_exception+0x26/0x330
[ 441.254750] ? exc_page_fault+0x6d/0x1c0
[ 441.255257] ? asm_exc_page_fault+0x26/0x30
[ 441.255792] ? alloc_file+0x4b/0x190
[ 441.256257] alloc_file_pseudo+0x9f/0xf0
[ 441.256760] __anon_inode_getfile+0x87/0x190
[ 441.257311] ? lock_release+0x14e/0x3f0
[ 441.257808] bpf_link_prime+0xe8/0x1d0
[ 441.258315] bpf_tracing_prog_attach+0x311/0x570
[ 441.258916] ? __pfx_bpf_lsm_file_alloc_security+0x10/0x10
[ 441.259605] __sys_bpf+0x1bb7/0x2dc0
[ 441.260070] __x64_sys_bpf+0x20/0x30
[ 441.260533] do_syscall_64+0x72/0x140
[ 441.261004] entry_SYSCALL_64_after_hwframe+0x6e/0x76
[ 441.261643] RIP: 0033:0x4b0349
[ 441.262045] Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 88
[ 441.264355] RSP: 002b:00007fff74daee38 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
[ 441.265293] RAX: ffffffffffffffda RBX: 00007fff74daef30 RCX: 00000000004b0349
[ 441.266187] RDX: 0000000000000040 RSI: 00007fff74daee50 RDI: 000000000000001c
[ 441.267114] RBP: 000000000000001b R08: 00000000005ef820 R09: 0000000000000000
[ 441.268018] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000004
[ 441.268907] R13: 0000000000000004 R14: 00000000005ef018 R15: 00000000004004e8
This is because the filesystem uses IS_ERR to check if the return value
is an error code. If it is not, the filesystem takes the return value
as a file pointer. Since the positive number returned by the BPF prog
is not a real file pointer, this misinterpretation causes a panic.
Since other LSM modules always return either a negative error code
or a valid pointer, this specific issue only exists in BPF LSM. The
proposed solution is to reject LSM BPF progs returning unexpected
values in the verifier. This patch set adds return value check to
ensure only BPF progs returning expected values are accepted.
Since each LSM hook has different excepted return values, we need to
know the expected return values for each individual hook to do the
check. Earlier versions of the patch set used LSM hook annotations
to specify the return value range for each hook. Based on Paul's
suggestion, current version gets rid of such annotations and instead
converts hook return values to a common pattern: return 0 on success
and negative error code on failure.
Basically, LSM hooks are divided into two types: hooks that return a
negative error code and zero or other values, and hooks that do not
return a negative error code. This patch set converts all hooks of the
first type and part of the second type to return 0 on success and a
negative error code on failure (see patches 1-10). For certain hooks,
like ismaclabel and inode_xattr_skipcap, the hook name already imply
that returning 0 or 1 is the best choice, so they are not converted.
There are four unconverted hooks. Except for ismaclabel, which is not
used by BPF LSM, the other three are specified with a BTF ID list to
only return 0 or 1.
v4:
1. remove LSM_HOOK return value annotaion and convert LSM hook return
value to a common patern: return 0 on success and negative error code
on failure (patch 1-10)
2. enable BPF LSM progs to read and write output params (patch 12)
3. add a special case for bitwise AND on range [-1, 0] (patch 16)
4. add a 32-bit comparing flag for retval_range_within (patch 15)
5. collect ACKs, style fix, etc
v3: https://lore.kernel.org/bpf/20240411122752.2873562-1-xukuohai@huaweicloud.c…
1. Fix incorrect lsm hook return value ranges, and add disabled hook
list for bpf lsm, and merge two LSM_RET_INT patches. (KP Singh)
2. Avoid bpf lsm progs attached to different hooks to call each other
with tail call
3. Fix a CI failure caused by false rejection of AND operation
4. Add tests
v2: https://lore.kernel.org/bpf/20240325095653.1720123-1-xukuohai@huaweicloud.c…
fix bpf ci failure
v1: https://lore.kernel.org/bpf/20240316122359.1073787-1-xukuohai@huaweicloud.c…
Xu Kuohai (20):
lsm: Refactor return value of LSM hook vm_enough_memory
lsm: Refactor return value of LSM hook inode_need_killpriv
lsm: Refactor return value of LSM hook inode_getsecurity
lsm: Refactor return value of LSM hook inode_listsecurity
lsm: Refactor return value of LSM hook inode_copy_up_xattr
lsm: Refactor return value of LSM hook getselfattr
lsm: Refactor return value of LSM hook setprocattr
lsm: Refactor return value of LSM hook getprocattr
lsm: Refactor return value of LSM hook key_getsecurity
lsm: Refactor return value of LSM hook audit_rule_match
bpf, lsm: Add disabled BPF LSM hook list
bpf, lsm: Enable BPF LSM prog to read/write return value parameters
bpf, lsm: Add check for BPF LSM return value
bpf: Prevent tail call between progs attached to different hooks
bpf: Fix compare error in function retval_range_within
bpf: Add a special case for bitwise AND on range [-1, 0]
selftests/bpf: Avoid load failure for token_lsm.c
selftests/bpf: Add return value checks for failed tests
selftests/bpf: Add test for lsm tail call
selftests/bpf: Add verifier tests for bpf lsm
fs/attr.c | 5 +-
fs/inode.c | 4 +-
fs/nfs/nfs4proc.c | 5 +-
fs/overlayfs/copy_up.c | 6 +-
fs/proc/base.c | 10 +-
fs/xattr.c | 24 +-
include/linux/bpf.h | 2 +
include/linux/bpf_lsm.h | 15 +
include/linux/lsm_hook_defs.h | 22 +-
include/linux/security.h | 62 ++--
include/linux/tnum.h | 3 +
kernel/bpf/bpf_lsm.c | 64 +++-
kernel/bpf/btf.c | 21 +-
kernel/bpf/core.c | 21 +-
kernel/bpf/tnum.c | 25 ++
kernel/bpf/verifier.c | 173 ++++++++++-
net/socket.c | 9 +-
security/apparmor/audit.c | 22 +-
security/apparmor/include/audit.h | 2 +-
security/apparmor/lsm.c | 22 +-
security/commoncap.c | 32 +-
security/integrity/evm/evm_main.c | 2 +-
security/keys/keyctl.c | 11 +-
security/lsm_syscalls.c | 6 +-
security/security.c | 167 ++++++++---
security/selinux/hooks.c | 94 +++---
security/selinux/include/audit.h | 8 +-
security/selinux/ss/services.c | 54 ++--
security/smack/smack_lsm.c | 104 ++++---
.../selftests/bpf/prog_tests/test_lsm.c | 46 ++-
.../selftests/bpf/prog_tests/verifier.c | 2 +
tools/testing/selftests/bpf/progs/err.h | 10 +
.../selftests/bpf/progs/lsm_tailcall.c | 34 +++
.../selftests/bpf/progs/test_sig_in_xattr.c | 4 +
.../bpf/progs/test_verify_pkcs7_sig.c | 8 +-
tools/testing/selftests/bpf/progs/token_lsm.c | 4 +-
.../bpf/progs/verifier_global_subprogs.c | 7 +-
.../selftests/bpf/progs/verifier_lsm.c | 274 ++++++++++++++++++
38 files changed, 1098 insertions(+), 286 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/lsm_tailcall.c
create mode 100644 tools/testing/selftests/bpf/progs/verifier_lsm.c
--
2.30.2
This patch series is motivated by the following observation:
Raise a signal, jump to signal handler. The ucontext_t structure dumped
by kernel to userspace has a uc_sigmask field having the mask of blocked
signals. If you run a fresh minimalistic program doing this, this field
is empty, even if you block some signals while registering the handler
with sigaction().
Here is what the man-pages have to say:
sigaction(2): "sa_mask specifies a mask of signals which should be blocked
(i.e., added to the signal mask of the thread in which the signal handler
is invoked) during execution of the signal handler. In addition, the
signal which triggered the handler will be blocked, unless the SA_NODEFER
flag is used."
signal(7): Under "Execution of signal handlers", (1.3) implies:
"The thread's current signal mask is accessible via the ucontext_t
object that is pointed to by the third argument of the signal handler."
But, (1.4) states:
"Any signals specified in act->sa_mask when registering the handler with
sigprocmask(2) are added to the thread's signal mask. The signal being
delivered is also added to the signal mask, unless SA_NODEFER was
specified when registering the handler. These signals are thus blocked
while the handler executes."
There clearly is no distinction being made in the man pages between
"Thread's signal mask" and ucontext_t; this logically should imply
that a signal blocked by populating struct sigaction should be visible
in ucontext_t.
Here is what the kernel code does (for Aarch64):
do_signal() -> handle_signal() -> sigmask_to_save(), which returns
¤t->blocked, is passed to setup_rt_frame() -> setup_sigframe() ->
__copy_to_user(). Hence, ¤t->blocked is copied to ucontext_t
exposed to userspace. Returning back to handle_signal(),
signal_setup_done() -> signal_delivered() -> sigorsets() and
set_current_blocked() are responsible for using information from
struct ksignal ksig, which was populated through the sigaction()
system call in kernel/signal.c:
copy_from_user(&new_sa.sa, act, sizeof(new_sa.sa)),
to update ¤t->blocked; hence, the set of blocked signals for the
current thread is updated AFTER the kernel dumps ucontext_t to
userspace.
Assuming that the above is indeed the intended behaviour, because it
semantically makes sense, since the signals blocked using sigaction()
remain blocked only till the execution of the handler, and not in the
context present before jumping to the handler (but nothing can be
confirmed from the man-pages), the series introduces a test for
mangling with uc_sigmask. I will send a separate series to fix the
man-pages.
The proposed selftest has been tested out on Aarch32, Aarch64 and x86_64.
v3->v4:
- Allocate sigsets as automatic variables to avoid malloc()
v2->v3:
- ucontext describes current state -> ucontext describes interrupted context
- Add a comment for blockage of USR2 even after return from handler
- Describe blockage of signals in a better way
v1->v2:
- Replace all occurrences of SIGPIPE with SIGSEGV
- Fixed a mismatch between code comment and ksft log
- Add a testcase: Raise the same signal again; it must not be queued
- Remove unneeded <assert.h>, <unistd.h>
- Give a detailed test description in the comments; also describe the
exact meaning of delivered and blocked
- Handle errors for all libc functions/syscalls
- Mention tests in Makefile and .gitignore in alphabetical order
v1:
- https://lore.kernel.org/all/20240607122319.768640-1-dev.jain@arm.com/
Dev Jain (2):
selftests: Rename sigaltstack to generic signal
selftests: Add a test mangling with uc_sigmask
tools/testing/selftests/Makefile | 2 +-
.../{sigaltstack => signal}/.gitignore | 3 +-
.../{sigaltstack => signal}/Makefile | 3 +-
.../current_stack_pointer.h | 0
.../selftests/signal/mangle_uc_sigmask.c | 186 ++++++++++++++++++
.../sas.c => signal/sigaltstack.c} | 0
6 files changed, 191 insertions(+), 3 deletions(-)
rename tools/testing/selftests/{sigaltstack => signal}/.gitignore (57%)
rename tools/testing/selftests/{sigaltstack => signal}/Makefile (53%)
rename tools/testing/selftests/{sigaltstack => signal}/current_stack_pointer.h (100%)
create mode 100644 tools/testing/selftests/signal/mangle_uc_sigmask.c
rename tools/testing/selftests/{sigaltstack/sas.c => signal/sigaltstack.c} (100%)
--
2.34.1
Hi everyone,
I'm currently working on optimizing the Linux kernel for an ARM-based project and would appreciate some guidance. I've already gone through the basics of kernel configuration, but I'm looking for more advanced tips. Specifically, I'm interested in:
Best practices for enabling/disabling specific kernel features for ARM.
Effective ways to profile and analyze performance bottlenecks.
Recommendations for debugging tools tailored for ARM architecture.
Any common pitfalls to avoid during the optimization process.
If you have any experiences or resources to share, I'd be very grateful. Looking forward to your insights!
Best,
Selena Thomas
https://www.igmguru.com/data-science-bi/msbi-certification-training/
Hi everyone,
I'm currently working on optimizing the Linux kernel for an ARM-based project and would appreciate some guidance. I've already gone through the basics of kernel configuration, but I'm looking for more advanced tips. Specifically, I'm interested in:
Best practices for enabling/disabling specific kernel features for ARM.
Effective ways to profile and analyze performance bottlenecks.
Recommendations for debugging tools tailored for ARM architecture.
Any common pitfalls to avoid during the optimization process.
If you have any experiences or resources to share, I'd be very grateful. Looking forward to your insights!
Best regards
Selena
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/lkml/20240719-support_vendor_extensions-v3-4-0af758…
---
Changes in v6:
- Fix return type of is_vector_supported()/is_xthead_supported() to be bool
- Link to v5: https://lore.kernel.org/r/20240719-xtheadvector-v5-0-4b485fc7d55f@rivosinc.…
Changes in v5:
- Rebase on for-next
- Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.…
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 ++
arch/riscv/include/asm/hwprobe.h | 5 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 223 ++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 54 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 24 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 68 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 8 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
34 files changed, 889 insertions(+), 273 deletions(-)
---
base-commit: 554462ced9ac97487c8f725fe466a9c20ed87521
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
My last patch[1] was met with a general desire for a safer scheme that
avoided global resets, which expose unclear ownership.
Fortunately, Johannes[2] suggested a reasonably simple scheme to provide
an FD-local reset, which eliminates most of those issues.
The one open question I have is whether the cgroup/memcg itself is kept
alive by an open FD, or if we need to update the memcg freeing code to
traverse the new list of "watchers" so they don't try to access freed
memory.
Thank you,
David Finkel
Senior Principal Software Engineer, Core Services
Vimeo Inc.
[1]: https://lore.kernel.org/cgroups/20240715203625.1462309-1-davidf@vimeo.com/
[2]: https://lore.kernel.org/cgroups/20240717170408.GC1321673@cmpxchg.org/
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
The series titled "riscv: Extend cpufeature.c to detect vendor
extensions" is still under development and this series is based on that
series! [4]
I have tested this with an Allwinner Nezha board. I ran into issues
booting the board after 6.9-rc1 so I applied these patches to 6.8. There
are a couple of minor merge conflicts that do arrise when doing that, so
please let me know if you have been able to boot this board with a 6.9
kernel. I used SkiffOS [1] to manage building the image, but upgraded
the U-Boot version to Samuel Holland's more up-to-date version [2] and
changed out the device tree used by U-Boot with the device trees that
are present in upstream linux and this series. Thank you Samuel for all
of the work you did to make this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/linux-riscv/20240609-support_vendor_extensions-v2-0…
---
Changes in v5:
- Rebase on for-next
- Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.…
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (12):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 ++
arch/riscv/include/asm/hwprobe.h | 5 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 223 ++++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 42 +++
.../include/asm/vendor_extensions/thead_hwprobe.h | 18 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/cpufeature.c | 54 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 24 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 18 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
tools/testing/selftests/riscv/abi/ptrace | Bin 0 -> 759368 bytes
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 93 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 67 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 7 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 295 ++++++++++++---------
35 files changed, 887 insertions(+), 273 deletions(-)
---
base-commit: 554462ced9ac97487c8f725fe466a9c20ed87521
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
This is a rebase of a patch I sent a few months ago, on which I received
two acks, but the thread petered out:
https://www.spinics.net/lists/cgroups/msg40602.html.
I'm hoping that it can make it into the next LTS (and 6.11 if possible)
Thank you,
David Finkel
Sr. Principal Software Engineer, Vimeo
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.
When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations. The current GCS pointer
can not be directly written to by userspace. When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match. GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.
The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.
This series implements support for use of GCS by userspace, along with
support for use of GCS within KVM guests. It does not enable use of GCS
by either EL1 or EL2, this will be implemented separately. Executables
are started without GCS and must use a prctl() to enable it, it is
expected that this will be done very early in application execution by
the dynamic linker or other startup code. For dynamic linking this will
be done by checking that everything in the executable is marked as GCS
compatible.
x86 has an equivalent feature called shadow stacks, this series depends
on the x86 patches for generic memory management support for the new
guarded/shadow stack page type and shares APIs as much as possible. As
there has been extensive discussion with the wider community around the
ABI for shadow stacks I have as far as practical kept implementation
decisions close to those for x86, anticipating that review would lead to
similar conclusions in the absence of strong reasoning for divergence.
The main divergence I am concious of is that x86 allows shadow stack to
be enabled and disabled repeatedly, freeing the shadow stack for the
thread whenever disabled, while this implementation keeps the GCS
allocated after disable but refuses to reenable it. This is to avoid
races with things actively walking the GCS during a disable, we do
anticipate that some systems will wish to disable GCS at runtime but are
not aware of any demand for subsequently reenabling it.
x86 uses an arch_prctl() to manage enable and disable, since only x86
and S/390 use arch_prctl() a generic prctl() was proposed[1] as part of a
patch set for the equivalent RISC-V Zicfiss feature which I initially
adopted fairly directly but following review feedback has been revised
quite a bit.
We currently maintain the x86 pattern of implicitly allocating a shadow
stack for threads started with shadow stack enabled, there has been some
discussion of removing this support and requiring the use of clone3()
with explicit allocation of shadow stacks instead. I have no strong
feelings either way, implicit allocation is not really consistent with
anything else we do and creates the potential for errors around thread
exit but on the other hand it is existing ABI on x86 and minimises the
changes needed in userspace code.
glibc and bionic changes using this ABI have been implemented and
tested. Headless Android systems have been validated and Ross Burton
has used this code has been used to bring up a Yocto system with GCS
enabed as standard, a test implementation of V8 support has also been
done.
There is an open issue with support for CRIU, on x86 this required the
ability to set the GCS mode via ptrace. This series supports
configuring mode bits other than enable/disable via ptrace but it needs
to be confirmed if this is sufficient.
The series depends on support for shadow stacks in clone3(), that series
includes the addition of ARCH_HAS_USER_SHADOW_STACK.
https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
You can see a branch with the full set of dependencies against Linus'
tree at:
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/misc.git arm64-gcs
[1] https://lore.kernel.org/lkml/20230213045351.3945824-1-debug@rivosinc.com/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v9:
- Rebase onto v6.10-rc3.
- Restructure and clarify memory management fault handling.
- Fix up basic-gcs for the latest clone3() changes.
- Convert to newly merged KVM ID register based feature configuration.
- Fixes for NV traps.
- Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org
Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org
Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org
Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org
Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org
Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org
Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org
Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org
---
Mark Brown (39):
arm64/mm: Restructure arch_validate_flags() for extensibility
prctl: arch-agnostic prctl for shadow stack
mman: Add map_shadow_stack() flags
arm64: Document boot requirements for Guarded Control Stacks
arm64/gcs: Document the ABI for Guarded Control Stacks
arm64/sysreg: Add definitions for architected GCS caps
arm64/gcs: Add manual encodings of GCS instructions
arm64/gcs: Provide put_user_gcs()
arm64/cpufeature: Runtime detection of Guarded Control Stack (GCS)
arm64/mm: Allocate PIE slots for EL0 guarded control stack
mm: Define VM_SHADOW_STACK for arm64 when we support GCS
arm64/mm: Map pages for guarded control stack
KVM: arm64: Manage GCS registers for guests
arm64/gcs: Allow GCS usage at EL0 and EL1
arm64/idreg: Add overrride for GCS
arm64/hwcap: Add hwcap for GCS
arm64/traps: Handle GCS exceptions
arm64/mm: Handle GCS data aborts
arm64/gcs: Context switch GCS state for EL0
arm64/gcs: Ensure that new threads have a GCS
arm64/gcs: Implement shadow stack prctl() interface
arm64/mm: Implement map_shadow_stack()
arm64/signal: Set up and restore the GCS context for signal handlers
arm64/signal: Expose GCS state in signal frames
arm64/ptrace: Expose GCS via ptrace and core files
arm64: Add Kconfig for Guarded Control Stack (GCS)
kselftest/arm64: Verify the GCS hwcap
kselftest: Provide shadow stack enable helpers for arm64
selftests/clone3: Enable arm64 shadow stack testing
kselftest/arm64: Add GCS as a detected feature in the signal tests
kselftest/arm64: Add framework support for GCS to signal handling tests
kselftest/arm64: Allow signals tests to specify an expected si_code
kselftest/arm64: Always run signals tests with GCS enabled
kselftest/arm64: Add very basic GCS test program
kselftest/arm64: Add a GCS test program built with the system libc
kselftest/arm64: Add test coverage for GCS mode locking
kselftest/arm64: Add GCS signal tests
kselftest/arm64: Add a GCS stress test
kselftest/arm64: Enable GCS for the FP stress tests
Documentation/admin-guide/kernel-parameters.txt | 6 +
Documentation/arch/arm64/booting.rst | 22 +
Documentation/arch/arm64/elf_hwcaps.rst | 2 +
Documentation/arch/arm64/gcs.rst | 233 +++++++
Documentation/arch/arm64/index.rst | 1 +
Documentation/filesystems/proc.rst | 2 +-
arch/arm64/Kconfig | 20 +
arch/arm64/include/asm/cpufeature.h | 6 +
arch/arm64/include/asm/el2_setup.h | 17 +
arch/arm64/include/asm/esr.h | 28 +-
arch/arm64/include/asm/exception.h | 2 +
arch/arm64/include/asm/gcs.h | 107 +++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/asm/kvm_host.h | 14 +
arch/arm64/include/asm/mman.h | 23 +-
arch/arm64/include/asm/pgtable-prot.h | 14 +-
arch/arm64/include/asm/processor.h | 7 +
arch/arm64/include/asm/sysreg.h | 20 +
arch/arm64/include/asm/uaccess.h | 40 ++
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/ptrace.h | 8 +
arch/arm64/include/uapi/asm/sigcontext.h | 9 +
arch/arm64/kernel/cpufeature.c | 19 +
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kernel/entry-common.c | 23 +
arch/arm64/kernel/pi/idreg-override.c | 2 +
arch/arm64/kernel/process.c | 85 +++
arch/arm64/kernel/ptrace.c | 59 ++
arch/arm64/kernel/signal.c | 242 ++++++-
arch/arm64/kernel/traps.c | 11 +
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 48 +-
arch/arm64/kvm/sys_regs.c | 25 +-
arch/arm64/mm/Makefile | 1 +
arch/arm64/mm/fault.c | 43 ++
arch/arm64/mm/gcs.c | 325 +++++++++
arch/arm64/mm/mmap.c | 13 +-
arch/arm64/tools/cpucaps | 1 +
arch/x86/include/uapi/asm/mman.h | 3 -
fs/proc/task_mmu.c | 3 +
include/linux/mm.h | 16 +-
include/uapi/asm-generic/mman.h | 4 +
include/uapi/linux/elf.h | 1 +
include/uapi/linux/prctl.h | 22 +
kernel/sys.c | 30 +
tools/testing/selftests/arm64/Makefile | 2 +-
tools/testing/selftests/arm64/abi/hwcap.c | 19 +
tools/testing/selftests/arm64/fp/assembler.h | 15 +
tools/testing/selftests/arm64/fp/fpsimd-test.S | 2 +
tools/testing/selftests/arm64/fp/sve-test.S | 2 +
tools/testing/selftests/arm64/fp/za-test.S | 2 +
tools/testing/selftests/arm64/fp/zt-test.S | 2 +
tools/testing/selftests/arm64/gcs/.gitignore | 5 +
tools/testing/selftests/arm64/gcs/Makefile | 24 +
tools/testing/selftests/arm64/gcs/asm-offsets.h | 0
tools/testing/selftests/arm64/gcs/basic-gcs.c | 357 ++++++++++
tools/testing/selftests/arm64/gcs/gcs-locking.c | 200 ++++++
.../selftests/arm64/gcs/gcs-stress-thread.S | 311 +++++++++
tools/testing/selftests/arm64/gcs/gcs-stress.c | 532 +++++++++++++++
tools/testing/selftests/arm64/gcs/gcs-util.h | 100 +++
tools/testing/selftests/arm64/gcs/libc-gcs.c | 736 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../testing/selftests/arm64/signal/test_signals.c | 17 +-
.../testing/selftests/arm64/signal/test_signals.h | 6 +
.../selftests/arm64/signal/test_signals_utils.c | 32 +-
.../selftests/arm64/signal/test_signals_utils.h | 39 ++
.../arm64/signal/testcases/gcs_exception_fault.c | 62 ++
.../selftests/arm64/signal/testcases/gcs_frame.c | 88 +++
.../arm64/signal/testcases/gcs_write_fault.c | 67 ++
.../selftests/arm64/signal/testcases/testcases.c | 7 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/clone3/clone3_selftests.h | 26 +
tools/testing/selftests/ksft_shstk.h | 37 ++
73 files changed, 4213 insertions(+), 41 deletions(-)
---
base-commit: 4c8cf8814957090ce50ad18f318f72e6fe0d1a32
change-id: 20230303-arm64-gcs-e311ab0d8729
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: diegodvv <diego.daniel.professional(a)gmail.com>
Hi all,
This is part of a hackathon organized by LKCAMP[1], focused on writing
tests using KUnit. We reached out a while ago asking for advice on what would
be a useful contribution[2] and ended up choosing data structures that did
not yet have tests.
This patch adds tests for the kfifo data structure, defined in
include/linux/kfifo.h, and is inspired by the KUnit tests for the doubly
linked list in lib/list-test.c[3].
[1] https://lkcamp.dev/about/
[2] https://lore.kernel.org/all/Zktnt7rjKryTh9-N@arch/
[3] https://elixir.bootlin.com/linux/latest/source/lib/list-test.c
Diego Vieira (1):
lib/kfifo-test.c: add tests for the kfifo structure
lib/Kconfig.debug | 14 +++
lib/Makefile | 1 +
lib/kfifo-test.c | 222 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 237 insertions(+)
create mode 100644 lib/kfifo-test.c
--
2.34.1
The checksum_32 code was originally written to only handle 2-byte
aligned buffers, but was later extended to support arbitrary alignment.
However, the non-PPro variant doesn't apply the carry before jumping to
the 2- or 4-byte aligned versions, which clear CF.
This causes the new checksum_kunit test to fail, as it runs with a large
number of different possible alignments and both with and without
carries.
For example:
./tools/testing/kunit/kunit.py run --arch i386 --kconfig_add CONFIG_M486=y checksum
Gives:
KTAP version 1
# Subtest: checksum
1..3
ok 1 test_csum_fixed_random_inputs
# test_csum_all_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:267
Expected result == expec, but
result == 65281 (0xff01)
expec == 65280 (0xff00)
not ok 2 test_csum_all_carry_inputs
# test_csum_no_carry_inputs: ASSERTION FAILED at lib/checksum_kunit.c:314
Expected result == expec, but
result == 65535 (0xffff)
expec == 65534 (0xfffe)
not ok 3 test_csum_no_carry_inputs
With this patch, it passes.
KTAP version 1
# Subtest: checksum
1..3
ok 1 test_csum_fixed_random_inputs
ok 2 test_csum_all_carry_inputs
ok 3 test_csum_no_carry_inputs
I also tested it on a real 486DX2, with the same results.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: David Gow <davidgow(a)google.com>
---
Re-sending this from [1]. While there's an argument that the whole
32-bit checksum code could do with rewriting, it's:
(a) worth fixing before someone takes the time to rewrite it, and
(b) worth any future rewrite starting from a point where the tests pass
I don't think there should be any downside to this fix: it only affects
ancient computers, and adds a single instruction which isn't in a loop.
Cheers,
-- David
[1]: https://lore.kernel.org/lkml/20230704083206.693155-2-davidgow@google.com/
---
arch/x86/lib/checksum_32.S | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/lib/checksum_32.S b/arch/x86/lib/checksum_32.S
index 68f7fa3e1322..a5123b29b403 100644
--- a/arch/x86/lib/checksum_32.S
+++ b/arch/x86/lib/checksum_32.S
@@ -62,6 +62,7 @@ SYM_FUNC_START(csum_partial)
jl 8f
movzbl (%esi), %ebx
adcl %ebx, %eax
+ adcl $0, %eax
roll $8, %eax
inc %esi
testl $2, %esi
--
2.45.2.1089.g2a221341d9-goog
Hello everyone,
this small series is a first step in a larger effort aiming to help improve
eBPF selftests and the testing coverage in CI. It focuses for now on
test_xdp_veth.sh, a small test which is not integrated yet in test_progs.
The series is mostly about a rewrite of test_xdp_veth.sh to make it able to
run under test_progs, relying on libbpf to manipulate bpf programs involved
in the test.
Signed-off-by: Alexis Lothoré <alexis.lothore(a)bootlin.com>
---
Changes in v3:
- Fix doc style in the new test
- Collect acked-by tags
- Link to v2: https://lore.kernel.org/r/20240715-convert_test_xdp_veth-v2-0-46290b82f6d2@…
Changes in v2:
- fix many formatting issues raised by checkpatch
- use static namespaces instead of random ones
- use SYS_NOFAIL instead of snprintf() + system ()
- squashed the new test addition patch and the old test removal patch
- Link to v1: https://lore.kernel.org/r/20240711-convert_test_xdp_veth-v1-0-868accb0a727@…
---
Alexis Lothoré (eBPF Foundation) (2):
selftests/bpf: update xdp_redirect_map prog sections for libbpf
selftests/bpf: integrate test_xdp_veth into test_progs
tools/testing/selftests/bpf/Makefile | 1 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 211 +++++++++++++++++++++
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 6 +-
tools/testing/selftests/bpf/test_xdp_veth.sh | 121 ------------
4 files changed, 214 insertions(+), 125 deletions(-)
---
base-commit: 4837cbaa1365cdb213b58577197c5b10f6e2aa81
change-id: 20240710-convert_test_xdp_veth-04cc05f5557d
Best regards,
--
Alexis Lothoré, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
v1:
- patch 2:
- [1/2] add bpf_file_d_path helper
- [2/2] add selftest to it
Hi, we are looking to add the "bpf_file_d_path" helper,
used to retrieve the path from a struct file object.
bpf_file_d_path(void *file, char *dst, u32 size);
It's worth noting that the "file" parameter is defined as "void*" type.
* Our problems *
Previously, we encountered issues
on some user-space operating systems(OS):
1.Difficulty using vmlinux.h
(1) The OS lacks support for bpftool.
We can not use:
"bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h".
Bpftool need a separate complex cross-compilation environment to build.
(2) Many duplicate definitions between OS and vmlinux.h.
(3) The vmlinux.h size is large (2.8MB on arm64/Android),
causing increased ebpf prog size and user space consumption.
2.The "struct file" has many internal variables and definitions,
and maybe change along with Linux version iterations,
making it hard to copy it to OS.
* Benefits of this commit *
1.There is no need to include vmlinux.h or redefine "struct file".
For example, with bpf on kprobe,
we can directly pass param "(void*)PT_REGS_PARM1(pt_regs)"
to "bpf_file_d_path" helper in order to retrieve the path.
Appreciate your review and assistance. Thank you.
Yikai
Lin Yikai (2):
bpf: Add bpf_file_d_path helper
selftests/bpf:Adding test for bpf_file_d_path helper
include/uapi/linux/bpf.h | 20 +++
kernel/trace/bpf_trace.c | 34 ++++++
tools/include/uapi/linux/bpf.h | 20 +++
.../selftests/bpf/prog_tests/file_d_path.c | 115 ++++++++++++++++++
.../selftests/bpf/progs/test_file_d_path.c | 32 +++++
5 files changed, 221 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/file_d_path.c
create mode 100644 tools/testing/selftests/bpf/progs/test_file_d_path.c
--
2.34.1
Currently while accessing debugfs with Secure Boot enabled on PowerPC,
it is causing the kprobe_opt_types.tc test to fail. Below is the snippet
of the error:
+++ grep kernel_clone /sys/kernel/debug/kprobes/list
grep: /sys/kernel/debug/kprobes/list: Operation not permitted
++ PROBE=
+ '[' 2 -ne 0 ']'
+ kill -s 37 7595
++ SIG_RESULT=1
+ eval_result 1
+ case $1 in
+ prlog ' [\033[31mFAIL\033[0m]'
+ newline='\n'
+ '[' ' [\033[31mFAIL\033[0m]' = -n ']'
+ printf ' [\033[31mFAIL\033[0m]\n'
[FAIL]
This is happening when secure boot is enabled, as it enables lockdown
by default. With lockdown, access to certain debug features and
filesystems like debugfs may be restricted or completely disabled.
To fix this, modify the test to check for Secure Boot status using
lsprop /proc/device-tree/ibm,secure-boot. And, skip execution of the
test on PowerPC if Secure Boot is enabled (00000002).
With this patch, test skips as unsupported:
=== Ftrace unit tests ===
[1] Register/unregister optimized probe [UNSUPPORTED]
Signed-off-by: Akanksha J N <akanksha(a)linux.ibm.com>
---
.../selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
index 9f5d99328086..87e2f81e46b8 100644
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
+++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
@@ -10,6 +10,11 @@ x86_64)
arm*)
;;
ppc*)
+lsprop_output=$(lsprop /proc/device-tree/ibm,secure-boot)
+if echo "$lsprop_output" | grep -q "00000002"; then
+ echo "Secure Boot is enabled on PowerPC."
+ exit_unsupported
+fi
;;
*)
echo "Please implement other architecture here"
--
2.39.3 (Apple Git-146)
+cc linux-kselftest(a)vger.kernel.org
On 19/07/2024 11:21 am, Jeremy Szu wrote:
>
>
> James Clark 於 7/17/24 10:48 PM 寫道:
>>
>>
>> On 16/07/2024 3:06 am, Jeremy Szu wrote:
>>> Hi James,
>>>
>>> Thank you for your reply.
>>>
>>> James Clark 於 7/12/24 5:01 PM 寫道:
>>>>
>>>>
>>>> On 11/07/2024 7:03 pm, Catalin Marinas wrote:
>>>>> On Wed, Jul 10, 2024 at 02:27:32PM +0800, Jeremy Szu wrote:
>>>>>> Add a script to test the coresight functionalities by performing the
>>>>>> perf test 'coresight'.
>>>>>>
>>>>>> The script checks the prerequisites and launches a bundle of
>>>>>> Coresight-related tests with a 180-second timeout.
>>>>>>
>>>>
>>>> Hi Jeremy,
>>>>
>>>> On the whole I'm not sure running the Perf tests under kself tests
>>>> is a good fit. We're already running all the Perf tests in various
>>>> CIs, so this is going to duplicate effort. Especially with setup and
>>>> parsing of the results output.
>>>>
>>>> There is also no clean line between what's a kernel issue and whats
>>>> a Perf issue when these fail.
>>>>
>>>> And thirdly why only run the Coresight tests? Most of the Perf tests
>>>> test some part of the kernel, so if we were going to do this I think
>>>> it would make sense to make some kind of proper harness and run them
>>>> all. I have some recollection that someone said it might be
>>>> something we could do, but I can't remember the discussion.
>>>
>>> The idea I'm trying to pursue is to use arm64 kselftest to run as
>>> many test cases as possible for ARM SoCs across different designs and
>>> distros. I believe it could provide an alert if there is an issue,
>>> whether it originates from userspace or kernel, similar to how perf
>>> is used in other categories.
>>>
>>> I'm not sure if all perf tests could be counted in soc
>>> (selftests/arm64) category such as some tests may target to storage,
>>> memory or devices. I
>>
>> Could we not put the Perf tests in .../selftests/perf.sh, then it
>> doesn't really matter which subsystem they're targeting and we can run
>> all the Perf tests?
>>
>
> The .../sefltests/ seems for the kselftest framework only, not sure if
> having a new .../selftests/perf will make more sense?
>
Yeah that sounds better, but it probably requires changing the title of
the patch to "[RFC] kselftest: Run Perf tests under kselftest" or
something like that.
>>> could replace 'arm64/coresight' with 'arm64/perf' if it makes more
>>> sense. I believe it could help users verify functionality more
>>> conveniently.
>>>
>>>>
>>>> Ignoring the main issue above I've left some comments about this
>>>> patch inline below:
>>>>
>>>>>> Signed-off-by: Jeremy Szu <jszu(a)nvidia.com>
>>>>>
>>>>> I have not idea how to test coresight, so adding Suzuki as well.
>>>>>
>>>>>> ---
>>>>>> tools/testing/selftests/arm64/Makefile | 2 +-
>>>>>> .../selftests/arm64/coresight/Makefile | 5 +++
>>>>>> .../selftests/arm64/coresight/coresight.sh | 40
>>>>>> +++++++++++++++++++
>>>>>> .../selftests/arm64/coresight/settings | 1 +
>>>>>> 4 files changed, 47 insertions(+), 1 deletion(-)
>>>>>> create mode 100644 tools/testing/selftests/arm64/coresight/Makefile
>>>>>> create mode 100755
>>>>>> tools/testing/selftests/arm64/coresight/coresight.sh
>>>>>> create mode 100644 tools/testing/selftests/arm64/coresight/settings
>>>>>>
>>>>>> diff --git a/tools/testing/selftests/arm64/Makefile
>>>>>> b/tools/testing/selftests/arm64/Makefile
>>>>>> index 28b93cab8c0dd..2b788d7bab22d 100644
>>>>>> --- a/tools/testing/selftests/arm64/Makefile
>>>>>> +++ b/tools/testing/selftests/arm64/Makefile
>>>>>> @@ -4,7 +4,7 @@
>>>>>> ARCH ?= $(shell uname -m 2>/dev/null || echo not)
>>>>>> ifneq (,$(filter $(ARCH),aarch64 arm64))
>>>>>> -ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi
>>>>>> +ARM64_SUBTARGETS ?= tags signal pauth fp mte bti abi coresight
>>>>>> else
>>>>>> ARM64_SUBTARGETS :=
>>>>>> endif
>>>>>> diff --git a/tools/testing/selftests/arm64/coresight/Makefile
>>>>>> b/tools/testing/selftests/arm64/coresight/Makefile
>>>>>> new file mode 100644
>>>>>> index 0000000000000..1cc8c1f2a997e
>>>>>> --- /dev/null
>>>>>> +++ b/tools/testing/selftests/arm64/coresight/Makefile
>>>>>> @@ -0,0 +1,5 @@
>>>>>> +# SPDX-License-Identifier: GPL-2.0
>>>>>> +
>>>>>> +TEST_PROGS := coresight.sh
>>>>>> +
>>>>>> +include ../../lib.mk
>>>>>> diff --git a/tools/testing/selftests/arm64/coresight/coresight.sh
>>>>>> b/tools/testing/selftests/arm64/coresight/coresight.sh
>>>>>> new file mode 100755
>>>>>> index 0000000000000..e550957cf593b
>>>>>> --- /dev/null
>>>>>> +++ b/tools/testing/selftests/arm64/coresight/coresight.sh
>>>>>> @@ -0,0 +1,40 @@
>>>>>> +#!/bin/bash
>>>>>> +# SPDX-License-Identifier: GPL-2.0
>>>>>> +
>>>>>> +skip() {
>>>>>> + echo "SKIP: $1"
>>>>>> + exit 4
>>>>>> +}
>>>>>> +
>>>>>> +fail() {
>>>>>> + echo "FAIL: $1"
>>>>>> + exit 255
>>>>>> +}
>>>>>> +
>>>>>> +is_coresight_supported() {
>>>>>> + if [ -d "/sys/bus/coresight/devices" ]; then
>>>>>> + return 0
>>>>>> + fi
>>>>>> + return 255
>>>>>> +}
>>>>
>>>> The Perf coresight tests already have a skip mechanism built in so
>>>> can we rely on that instead of duplicating it here? There are also
>>>> other scenarios for skipping like Perf not linked with OpenCSD which
>>>> aren't covered here.
>>>>
>>>
>>> Will it return the different error code to indicate if it's failed to
>>> be executed or the coresight is not supported?
>>>
>>
>> I think the exit code is only used for more serious errors. For things
>> like missing tests, SKIP and FAIL it still exits with 0 but you have
>> to grep for the strings. Actually for a missing test it prints nothing
>> and exits 0.
>> >>>>> +
>>>>>> +if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
>>>>>> + [ "$(id -u)" -ne 0 ] && \
>>>>>> + skip "this test must be run as root."
>>>>>> + which perf >/dev/null 2>&1 || \
>>>>>> + skip "perf is not installed."
>>>>>> + perf test list 2>&1 | grep -qi 'coresight' || \
>>>>>> + skip "perf doesn't support testing coresight."
>>>>
>>>> Can this be an error instead? The coresight tests were added in 5.10
>>>> and I don't think new kselftest needs to support such old kernels.
>>>> This skip risks turning an error with installing the tests into a
>>>> silent failure.
>>>>
>>>> Also as far as I know a lot of distros will refuse to open Perf
>>>> unless it matches the kernel version if it was installed from the
>>>> package manager, so we don't need to worry about old versions.
>>>>
>>>
>>> The idea to skip it here is because I thought either a distro
>>> (custom) kernel doesn't enable the kconfig or the distro built the
>>> perf with CORESIGHT=0.
>>>
>>> I could make it as an error and put it after "is_coresight_support"
>>> if it makes more sense.
>>>
>>
>> But the Coresight test already checks these things, which is my point.
>> You can grep for "SKIP" which it will print for any case where the
>> coresight test can't be run due to some missing config.
>>
>
> Oh, I guess you mean to rely on something like the `perf list cs_etm |
> grep -q cs_etm || exit 2`. Yes, that makes more sense.
>
If we're leaning towards running all the Perf tests then none of this is
required anymore.
This is another example of why it's better to run them all. We can't
realistically hard code a kself test with loads of logic for each Perf
subtest when Perf will happily run all the tests on it's own without any
extra work.
>>>>>> + is_coresight_supported || \
>>>>>> + skip "coresight is not supported."
>>>>>> +
>>>>>> + cmd_output=$(perf test -vv 'coresight' 2>&1)
>>>>>> + perf_ret=$?
>>>>>> +
>>>>>> + if [ $perf_ret -ne 0 ]; then
>>>>>> + fail "perf command returns non-zero."
>>>>>> + elif [[ $cmd_output == *"FAILED!"* ]]; then
>>>>>> + echo $cmd_output
>>>>
>>>> It's probably helpful to print cmd_output in both failure cases.
>>>>
>>>
>>> ok, will do.
>>>
>>>>>> + fail "perf test 'arm coresight' test failed!"
>
> I think I should remove the `is_coresight_supported` there and checks
> the output as a "else" to see if the test is PASS or SKIP. Does it make
> sense to you?
>
Same as above applies, the only thing that's really required is parsing
for "FAILED".
>>>>>> + fi
>>>>>> +fi
>>>>>> diff --git a/tools/testing/selftests/arm64/coresight/settings
>>>>>> b/tools/testing/selftests/arm64/coresight/settings
>>>>>> new file mode 100644
>>>>>> index 0000000000000..a953c96aa16e1
>>>>>> --- /dev/null
>>>>>> +++ b/tools/testing/selftests/arm64/coresight/settings
>>>>>> @@ -0,0 +1 @@
>>>>>> +timeout=180
>>>>
>>>> I timed 331 seconds on n1sdp, and probably even longer on Juno.
>>>>
>>>> It doesn't need to run for this long and it's an issue with the
>>>> tests, but currently that's how long it takes so the timeout needs
>>>> to be longer.
>>>>
>>>
>>> ok, will extend it to 600.
>>>
>>>>>> --
>>>>>> 2.34.1
>>>>>
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v3:
- patch 2:
- clear errno before connect_to_fd_opts.
- print err logs in run_test.
- set err to -1 when fd >= 0.
- patch 3:
- drop "int err".
v2:
- update patch 2 as Martin suggested.
This is the 9th part of series "use network helpers" all BPF selftests
wide.
Patches 1-2 update network helpers interfaces suggested by Martin.
Patch 3 adds a new helper connect_to_addr_str() as Martin suggested
instead of adding connect_fd_to_addr_str().
Patch 4 uses this newly added helper in make_client().
Patch 5 uses make_client() in sk_lookup and drop make_socket().
Geliang Tang (5):
selftests/bpf: Drop type of connect_to_fd_opts
selftests/bpf: Drop must_fail from network_helper_opts
selftests/bpf: Add connect_to_addr_str helper
selftests/bpf: Use connect_to_addr_str in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 65 ++++++--------
tools/testing/selftests/bpf/network_helpers.h | 5 +-
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 2 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 16 ++--
.../selftests/bpf/prog_tests/sk_lookup.c | 84 ++++---------------
5 files changed, 56 insertions(+), 116 deletions(-)
--
2.43.0
The first patch fixes unstable naming of tests due to probe ordering not
being stable, the second just provides a bit more information.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Switch to using ID rather than longame.
- Log the PCM ID too.
- Link to v1: https://lore.kernel.org/r/20240711-alsa-kselftest-board-name-v1-1-ab5cf2dbb…
---
Mark Brown (2):
kselftest/alsa: Use card name rather than number in test names
kselftest/alsa: Log the PCM ID in pcm-test
tools/testing/selftests/alsa/mixer-test.c | 98 +++++++++++++++++++------------
tools/testing/selftests/alsa/pcm-test.c | 68 +++++++++++++++------
2 files changed, 108 insertions(+), 58 deletions(-)
---
base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
change-id: 20240711-alsa-kselftest-board-name-e4a1add4cfa0
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On Tue, Jul 16, 2024 at 01:10:41PM -0700, Linus Torvalds wrote:
> On Mon, 15 Jul 2024 at 09:21, Kees Cook <kees(a)kernel.org> wrote:
> >
> > fs/exec.c | 49 ++++++++--
> > fs/exec_test.c | 141 ++++++++++++++++++++++++++++
>
> I've pulled this, but *PLEASE* don't do this.
>
> This screws up my workflow of just using tab-completion for filenames.
> As a result, I absolutely abhor anybody who uses the same base-name
> for different things.
>
> No, this is not the first time it happens, and it won't be the last.
> And we had that same horrific pattern for fs/binfmt_elf_test.c from
> before, and I didn't notice because it's not a core file to me, and I
> seldom actually edit it.
>
> I would suggest that people use the patterns from lib/, which is
> admittedly a bit schizophrenic in that you can either use
> "lib/kunit/*.c" (probably preferred) or "lib/test_xyz.c".
>
> (Other subsystems use a "tests" subdirectory, so we do have a lot of
> different ways to deal with this).
>
> Any of those models will keep the unit testing parts clearly separate,
> and not mess up basic command line workflows.
>
> But do *not* use this "*_test.c" naming model. It's the worst of all
> possible worlds.
>
> Please?
Oh, sure, no problem! I have no attachment to this convention at all;
I was trying to follow the Kunit docs:
https://docs.kernel.org/dev-tools/kunit/style.html#test-file-and-module-nam…
If I look at the existing naming, it's pretty scattered:
$ git grep '^static struct kunit_suite\b' | cut -d: -f1 | sort -u
/test/* 7
/tests/* 47
*-test.[ch] 27
*_test.[ch] 27
test-*.c 1
test_*.c 10
*-kunit.c 1
*_kunit.c 17
kunit-*.c 2
kunit_*.c 1
Should we go with "put it all under a 'tests' subdirectory" ?
So for fs/exec_test.c and fs/binfmt_elf_test.c, perhaps fs/tests/exec.c
and fs/tests/binfmt_elf.c respectively?
And for the lib/*_kunit.c files, use lib/tests/*.c ?
Then we can update the docs, etc.
--
Kees Cook
Based on feedback from Linus[1], change the suggested file naming for
KUnit tests.
Link: https://lore.kernel.org/lkml/CAHk-=wgim6pNiGTBMhP8Kd3tsB7_JTAuvNJ=XYd3wPvvk… [1]
Signed-off-by: Kees Cook <kees(a)kernel.org>
---
Cc: David Gow <davidgow(a)google.com>
Cc: Brendan Higgins <brendan.higgins(a)linux.dev>
Cc: Rae Moar <rmoar(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: kunit-dev(a)googlegroups.com
Cc: linux-doc(a)vger.kernel.org
---
Documentation/dev-tools/kunit/style.rst | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/Documentation/dev-tools/kunit/style.rst b/Documentation/dev-tools/kunit/style.rst
index b6d0d7359f00..761dee3f89ca 100644
--- a/Documentation/dev-tools/kunit/style.rst
+++ b/Documentation/dev-tools/kunit/style.rst
@@ -188,15 +188,20 @@ For example, a Kconfig entry might look like:
Test File and Module Names
==========================
-KUnit tests can often be compiled as a module. These modules should be named
-after the test suite, followed by ``_test``. If this is likely to conflict with
-non-KUnit tests, the suffix ``_kunit`` can also be used.
-
-The easiest way of achieving this is to name the file containing the test suite
-``<suite>_test.c`` (or, as above, ``<suite>_kunit.c``). This file should be
-placed next to the code under test.
+Whether a KUnit test is compiled as a separate module or via an
+``#include`` in a core kernel source file, the files should be named
+after the test suite, followed by ``_test``, and live in a ``tests``
+subdirectory to avoid conflicting with regular modules or the core kernel
+source file names (e.g. for tab-completion). If this would conflict with
+non-KUnit tests, the suffix ``_kunit`` can be used instead.
+
+So for the common case, name the file containing the test suite
+``tests/<suite>_test.c`` (or, if needed, ``tests/<suite>_kunit.c``). The
+``tests`` directory should be placed at the same level as the
+code under test. For example, tests for ``lib/string.c`` live in
+``lib/tests/string_test.c``.
If the suite name contains some or all of the name of the test's parent
directory, it may make sense to modify the source filename to reduce redundancy.
-For example, a ``foo_firmware`` suite could be in the ``foo/firmware_test.c``
+For example, a ``foo_firmware`` suite could be in the ``tests/foo/firmware_test.c``
file.
--
2.34.1