For cgroup v1, if turned on, and there's any cgroup in the "cpu" hierarchy it
needs an RT budget assigned, otherwise the processes in it will not be able to
get RT at all. The problem with RT group scheduling is that it requires the
budget assigned but there's no way we could assign a default budget, since the
values to assign are both upper and lower time limits, are absolute, and need to
be sum up to < 1 for each individal cgroup. That means we cannot really come up
with values that would work by default in the general case.[1]
For cgroup v2, it's almost unusable as well. If it turned on, the cpu controller
can only be enabled when all RT processes are in the root cgroup. But it will
lose the benefits of cgroup v2 if all RT process were placed in the same cgroup.
Red Hat, Gentoo, Arch Linux and Debian all disable it. systemd also doesn't
support it.[2]
I leave tools/testing/selftests/bpf/config.{s390x,aarch64} untouched because
I don't whether bpf testing requires it.
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=1229700
[2]: https://github.com/systemd/systemd/issues/13781#issuecomment-549164383
Celeste Liu (6):
riscv: defconfig: drop RT_GROUP_SCHED=y
loongarch: defconfig: drop RT_GROUP_SCHED=y
mips: defconfig: drop RT_GROUP_SCHED=y from generic/db1xxx/eyeq5
powerpc: defconfig: drop RT_GROUP_SCHED=y from ppc6xx_defconfig
sh: defconfig: drop RT_GROUP_SCHED=y from sdk7786/urquell
arm: defconfig: drop RT_GROUP_SCHED=y from bcm2855/tegra/omap2plus
arch/arm/configs/bcm2835_defconfig | 1 -
arch/arm/configs/omap2plus_defconfig | 1 -
arch/arm/configs/tegra_defconfig | 1 -
arch/loongarch/configs/loongson3_defconfig | 1 -
arch/mips/configs/db1xxx_defconfig | 1 -
arch/mips/configs/eyeq5_defconfig | 1 -
arch/mips/configs/generic_defconfig | 1 -
arch/powerpc/configs/ppc6xx_defconfig | 1 -
arch/riscv/configs/defconfig | 1 -
arch/sh/configs/sdk7786_defconfig | 1 -
arch/sh/configs/urquell_defconfig | 1 -
11 files changed, 11 deletions(-)
--
2.45.1
in randomize function, there is a open function, but there is no
close function in the randomize, which is easy to cause memory leaks.
Signed-off-by: Liu Jing <liujing(a)cmss.chinamobile.com>
---
tools/testing/selftests/net/tcp_mmap.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/tcp_mmap.c b/tools/testing/selftests/net/tcp_mmap.c
index 4fcce5150850..ab305e262d0a 100644
--- a/tools/testing/selftests/net/tcp_mmap.c
+++ b/tools/testing/selftests/net/tcp_mmap.c
@@ -438,6 +438,7 @@ static void randomize(void *target, size_t count)
perror("read /dev/urandom");
exit(1);
}
+ close(urandom);
}
int main(int argc, char *argv[])
--
2.33.0
+ Dave Miller, Jakub Kicinski, Paolo Abeni, Shuah Khan,
linux-kselftest(a)vger.kernel.org
On Mon, Jul 08, 2024 at 09:04:05PM +0000, zijianzhang(a)bytedance.com wrote:
> From: Zijian Zhang <zijianzhang(a)bytedance.com>
>
> We update selftests/net/msg_zerocopy.c to accommodate the new mechanism,
> cfg_notification_limit has the same semantics for both methods. Test
> results are as follows, we update skb_orphan_frags_rx to the same as
> skb_orphan_frags to support zerocopy in the localhost test.
>
> cfg_notification_limit = 1, both method get notifications after 1 calling
> of sendmsg. In this case, the new method has around 17% cpu savings in TCP
> and 23% cpu savings in UDP.
> +---------------------+---------+---------+---------+---------+
> | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 |
> +---------------------+---------+---------+---------+---------+
> | ZCopy (MB) | 7523 | 7706 | 7489 | 7304 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy (MB) | 8834 | 8993 | 9053 | 9228 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy / ZCopy | 117.42% | 116.70% | 120.88% | 126.34% |
> +---------------------+---------+---------+---------+---------+
>
> cfg_notification_limit = 32, both get notifications after 32 calling of
> sendmsg, which means more chances to coalesce notifications, and less
> overhead of poll + recvmsg for the original method. In this case, the new
> method has around 7% cpu savings in TCP and slightly better cpu usage in
> UDP. In the context of selftest, notifications of TCP are more likely to
> out of order than UDP, it's easier to coalesce more notifications in UDP.
> The original method can get one notification with range of 32 in a recvmsg
> most of the time. In TCP, most notifications' range is around 2, so the
> original method needs around 16 recvmsgs to get notified in one round.
> That's the reason for the "New ZCopy / ZCopy" diff in TCP and UDP here.
> +---------------------+---------+---------+---------+---------+
> | Test Type / Protocol| TCP v4 | TCP v6 | UDP v4 | UDP v6 |
> +---------------------+---------+---------+---------+---------+
> | ZCopy (MB) | 8842 | 8735 | 10072 | 9380 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy (MB) | 9366 | 9477 | 10108 | 9385 |
> +---------------------+---------+---------+---------+---------+
> | New ZCopy / ZCopy | 106.00% | 108.28% | 100.31% | 100.01% |
> +---------------------+---------+---------+---------+---------+
>
> In conclusion, when notification interval is small or notifications are
> hard to be coalesced, the new mechanism is highly recommended. Otherwise,
> the performance gain from the new mechanism is very limited.
>
> Signed-off-by: Zijian Zhang <zijianzhang(a)bytedance.com>
> Signed-off-by: Xiaochun Lu <xiaochun.lu(a)bytedance.com>
> ---
> tools/testing/selftests/net/msg_zerocopy.c | 111 ++++++++++++++++++--
> tools/testing/selftests/net/msg_zerocopy.sh | 1 +
> 2 files changed, 105 insertions(+), 7 deletions(-)
>
> diff --git a/tools/testing/selftests/net/msg_zerocopy.c b/tools/testing/selftests/net/msg_zerocopy.c
...
> @@ -466,6 +504,44 @@ static void do_recv_completions(int fd, int domain)
> sends_since_notify = 0;
> }
>
> +static void do_recv_completions2(void)
> +{
> + struct cmsghdr *cm = (struct cmsghdr *)zc_ckbuf;
> + struct zc_info *zc_info;
> + __u32 hi, lo, range;
> + __u8 zerocopy;
> + int i;
> +
> + zc_info = (struct zc_info *)CMSG_DATA(cm);
> + for (i = 0; i < zc_info->size; i++) {
> + hi = zc_info->arr[i].hi;
> + lo = zc_info->arr[i].lo;
> + zerocopy = zc_info->arr[i].zerocopy;
> + range = hi - lo + 1;
> +
> + if (cfg_verbose && lo != next_completion)
> + fprintf(stderr, "gap: %u..%u does not append to %u\n",
> + lo, hi, next_completion);
> + next_completion = hi + 1;
> +
> + if (zerocopied == -1)
> + zerocopied = zerocopy;
> + else if (zerocopied != zerocopy) {
> + fprintf(stderr, "serr: inconsistent\n");
> + zerocopied = zerocopy;
> + }
nit: If any arms of a conditional have {}, then all arms should have them
> +
> + completions += range;
> +
> + if (cfg_verbose >= 2)
> + fprintf(stderr, "completed: %u (h=%u l=%u)\n",
> + range, hi, lo);
> + }
> +
> + sends_since_notify = 0;
> + added_zcopy_info = false;
> +}
...
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Run this BPF selftests (./test_progs -t sockmap_basic) on a Loongarch
platform, a kernel panic occurs:
'''
Oops[#1]:
CPU: 22 PID: 2824 Comm: test_progs Tainted: G OE 6.10.0-rc2+ #18
Hardware name: LOONGSON Dabieshan/Loongson-TC542F0, BIOS Loongson-UDK2018
... ...
ra: 90000000048bf6c0 sk_msg_recvmsg+0x120/0x560
ERA: 9000000004162774 copy_page_to_iter+0x74/0x1c0
CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
PRMD: 0000000c (PPLV0 +PIE +PWE)
EUEN: 00000007 (+FPE +SXE +ASXE -BTE)
ECFG: 00071c1d (LIE=0,2-4,10-12 VS=7)
ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
BADV: 0000000000000040
PRID: 0014c011 (Loongson-64bit, Loongson-3C5000)
Modules linked in: bpf_testmod(OE) xt_CHECKSUM xt_MASQUERADE xt_conntrack
Process test_progs (pid: 2824, threadinfo=0000000000863a31, task=...)
Stack : ...
...
Call Trace:
[<9000000004162774>] copy_page_to_iter+0x74/0x1c0
[<90000000048bf6c0>] sk_msg_recvmsg+0x120/0x560
[<90000000049f2b90>] tcp_bpf_recvmsg_parser+0x170/0x4e0
[<90000000049aae34>] inet_recvmsg+0x54/0x100
[<900000000481ad5c>] sock_recvmsg+0x7c/0xe0
[<900000000481e1a8>] __sys_recvfrom+0x108/0x1c0
[<900000000481e27c>] sys_recvfrom+0x1c/0x40
[<9000000004c076ec>] do_syscall+0x8c/0xc0
[<9000000003731da4>] handle_syscall+0xc4/0x160
Code: ...
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Fatal exception
Kernel relocated by 0x3510000
.text @ 0x9000000003710000
.data @ 0x9000000004d70000
.bss @ 0x9000000006469400
---[ end Kernel panic - not syncing: Fatal exception ]---
'''
This crash happens every time when running sockmap_skb_verdict_shutdown
subtest in sockmap_basic.
This crash is because a NULL pointer is passed to page_address() in
sk_msg_recvmsg(). Due to the difference implementations depending on the
architecture, page_address(NULL) will trigger a panic on Loongarch
platform but not on X86 platform. So this bug was hidden on X86 platform
for a while, but now it is exposed on Loongarch platform.
The root cause is a zero length skb (skb->len == 0) is put on the queue.
This zero length skb is a TCP FIN packet, which is sent by shutdown(),
invoked in test_sockmap_skb_verdict_shutdown():
shutdown(p1, SHUT_WR);
In this case, in sk_psock_skb_ingress_enqueue(), num_sge is zero, and no
page is put to this sge (see sg_set_page in sg_set_page), but this empty
sge is queued into ingress_msg list.
And in sk_msg_recvmsg(), this empty sge is used, and a NULL page is got by
sg_page(sge). Pass this NULL page to copy_page_to_iter(), which passes it
to kmap_local_page() and to page_address(), then kernel panics.
To solve this, we should skip this zero length skb. So in sk_msg_recvmsg(),
if copy is zero, that means it's a zero length skb, skip invoking
copy_page_to_iter(). We are using the EFAULT return triggered by
copy_page_to_iter to check for is_fin in tcp_bpf.c.
Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
Suggested-by: John Fastabend <john.fastabend(a)gmail.com>
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
---
v5:
- update v5 as John suggested.
- skmsg: skip zero length skb in sk_msg_recvmsg
v4:
- skmsg: skip empty sge in sk_msg_recvmsg
v3:
- skmsg: prevent empty ingress skb from enqueuing
v2:
- skmsg: null check for sg_page in sk_msg_recvmsg
---
net/core/skmsg.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index fd20aae30be2..bbf40b999713 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -434,7 +434,8 @@ int sk_msg_recvmsg(struct sock *sk, struct sk_psock *psock, struct msghdr *msg,
page = sg_page(sge);
if (copied + copy > len)
copy = len - copied;
- copy = copy_page_to_iter(page, sge->offset, copy, iter);
+ if (copy)
+ copy = copy_page_to_iter(page, sge->offset, copy, iter);
if (!copy) {
copied = copied ? copied : -EFAULT;
goto out;
--
2.43.0
Changes v3:
- Reworked patch 2.
- Changed minor things in patch 1 like function name and made
corrections to the patch message.
Changes v2:
- Removed patches 2 and 3 since now this part will be supported by the
kernel.
Sub-Numa Clustering (SNC) allows splitting CPU cores, caches and memory
into multiple NUMA nodes. When enabled, NUMA-aware applications can
achieve better performance on bigger server platforms.
SNC support in the kernel is currently in review [1]. With SNC enabled
and kernel support in place all the tests will function normally (aside
from effective cache size). There might be a problem when SNC is enabled
but the system is still using an older kernel version without SNC
support. Currently the only message displayed in that situation is a
guess that SNC might be enabled and is causing issues. That message also
is displayed whenever the test fails on an Intel platform.
Add a mechanism to discover kernel support for SNC which will add more
meaning and certainty to the error message.
Add runtime SNC mode detection and verify how reliable that information
is.
Series was tested on Ice Lake server platforms with SNC disabled, SNC-2
and SNC-4. The tests were also ran with and without kernel support for
SNC.
Series applies cleanly on kselftest/next.
[1] https://lore.kernel.org/all/20240628215619.76401-1-tony.luck@intel.com/
Previous versions of this series:
[v1] https://lore.kernel.org/all/cover.1709721159.git.maciej.wieczor-retman@inte…
[v2] https://lore.kernel.org/all/cover.1715769576.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (2):
selftests/resctrl: Adjust effective L3 cache size with SNC enabled
selftests/resctrl: Adjust SNC support messages
tools/testing/selftests/resctrl/cache.c | 3 +
tools/testing/selftests/resctrl/cmt_test.c | 4 +-
tools/testing/selftests/resctrl/mba_test.c | 4 +
tools/testing/selftests/resctrl/mbm_test.c | 6 +-
tools/testing/selftests/resctrl/resctrl.h | 8 +
.../testing/selftests/resctrl/resctrl_tests.c | 7 +
tools/testing/selftests/resctrl/resctrlfs.c | 138 ++++++++++++++++++
7 files changed, 166 insertions(+), 4 deletions(-)
--
2.45.2
Even if a vgem device is configured in, we will skip the import_vgem_fd()
test almost every time.
TAP version 13
1..11
# Testing heap: system
# =======================================
# Testing allocation and importing:
ok 1 # SKIP Could not open vgem -1
The problem is that we use the DRM_IOCTL_VERSION ioctl to query the driver
version information but leave the name field a non-null-terminated string.
Terminate it properly to actually test against the vgem device.
Signed-off-by: Zenghui Yu <yuzenghui(a)huawei.com>
---
tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
index 5f541522364f..2fcc74998fa9 100644
--- a/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
+++ b/tools/testing/selftests/dmabuf-heaps/dmabuf-heap.c
@@ -32,6 +32,8 @@ static int check_vgem(int fd)
if (ret)
return 0;
+ name[4] = '\0';
+
return !strcmp(name, "vgem");
}
--
2.33.0
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v10:
- a new patch 10 is added.
- patches 1-6, 8-9 unchanged, only commit logs updated.
- "err = -errno" is used in patches 7, 11, 12 to get the real error
number before checking value of "err".
v9:
- new patches 5-7, new struct member expect_errno for network_helper_opts.
- patches 1-4, 8-9 unchanged.
- update patches 10-11 to make sure all tests pass.
v8:
- only patch 8 updated, to fix errors reported by CI.
v7:
- address Martin's comments in v6. (thanks)
- use MAX(opts->backlog, 0) instead of opts->backlog.
- use connect_to_fd_opts instead connect_to_fd.
- more ASSERT_* to check errors.
v6:
- update patch 6 as Daniel suggested. (thanks)
v5:
- keep make_server and make_client as Eduard suggested.
v4:
- a new patch to use make_sockaddr in sockmap_ktls.
- a new patch to close fd in error path in drop_on_reuseport.
- drop make_server() in patch 7.
- drop make_client() too in patch 9.
v3:
- a new patch to add backlog for network_helper_opts.
- use start_server_str in sockmap_ktls now, not start_server.
v2:
- address Eduard's comments in v1. (thanks)
- fix errors reported by CI.
This patch set uses network helpers in sockmap_ktls and sk_lookup, and
drop three local helpers tcp_server(), inetaddr_len() and make_socket()
in them.
Geliang Tang (12):
selftests/bpf: Add backlog for network_helper_opts
selftests/bpf: Use start_server_str in sockmap_ktls
selftests/bpf: Use connect_to_fd_opts in sockmap_ktls
selftests/bpf: Use make_sockaddr in sockmap_ktls
selftests/bpf: Add network_helper_opts for connect_fd_to_fd
selftests/bpf: Add expect_errno for network_helper_opts
selftests/bpf: Set expect_errno for cgroup_skb_sk_lookup
selftests/bpf: Close fd in error path in drop_on_reuseport
selftests/bpf: Use start_server_str in sk_lookup
selftests/bpf: Use connect_fd_to_fd in sk_lookup
selftests/bpf: Use connect_to_addr in sk_lookup
selftests/bpf: Drop make_socket in sk_lookup
tools/testing/selftests/bpf/network_helpers.c | 23 ++-
tools/testing/selftests/bpf/network_helpers.h | 8 +-
.../testing/selftests/bpf/prog_tests/bpf_nf.c | 5 +-
.../bpf/prog_tests/cgroup_skb_sk_lookup.c | 8 +-
.../selftests/bpf/prog_tests/cgroup_tcp_skb.c | 4 +-
.../selftests/bpf/prog_tests/cgroup_v1v2.c | 1 +
.../selftests/bpf/prog_tests/sk_lookup.c | 162 +++++++-----------
.../selftests/bpf/prog_tests/sockmap_ktls.c | 53 ++----
8 files changed, 108 insertions(+), 156 deletions(-)
--
2.43.0
I realized this while having a map containing both a struct bpf_timer and
a struct bpf_wq: the third argument provided to the bpf_wq callback is
not the struct bpf_wq pointer itself, but the pointer to the value in
the map.
Which means that the users need to double cast the provided "value" as
this is not a struct bpf_wq *.
This is a change of API, but there doesn't seem to be much users of bpf_wq
right now, so we should be able to go with this right now.
Signed-off-by: Benjamin Tissoires <bentiss(a)kernel.org>
---
Changes in v2:
- amended the selftests to retrieve something from the third argument of
the callback
- Link to v1: https://lore.kernel.org/r/20240705-fix-wq-v1-0-91b4d82cd825@kernel.org
---
Benjamin Tissoires (2):
bpf: helpers: fix bpf_wq_set_callback_impl signature
selftests/bpf: amend for wrong bpf_wq_set_callback_impl signature
kernel/bpf/helpers.c | 2 +-
tools/testing/selftests/bpf/bpf_experimental.h | 2 +-
tools/testing/selftests/bpf/progs/wq.c | 19 ++++++++++++++-----
tools/testing/selftests/bpf/progs/wq_failures.c | 4 ++--
4 files changed, 18 insertions(+), 9 deletions(-)
---
base-commit: fd8db07705c55a995c42b1e71afc42faad675b0b
change-id: 20240705-fix-wq-f069c7fb36c3
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>