As Guillaume pointed, many selftests create namespaces with very common
names (like "client" or "server") or even (partially) run directly in init_net.
This makes these tests prone to failure if another namespace with the same
name already exists. It also makes it impossible to run several instances
of these tests in parallel.
This patch set conver all the net selftests to run in unique namespace,
so we can update the selftest freamwork to run all tests in it's own namespace
in parallel. After update, we only need to wait for the test which need
longest time.
]# per_test_logging=1 time ./run_kselftest.sh -n -c net
TAP version 13
# selftests: net: reuseport_bpf_numa
not ok 3 selftests: net: reuseport_bpf_numa # exit=1
# selftests: net: reuseport_bpf_cpu
not ok 2 selftests: net: reuseport_bpf_cpu # exit=1
# selftests: net: reuseport_dualstack
not ok 4 selftests: net: reuseport_dualstack # exit=1
# selftests: net: reuseaddr_conflict
ok 5 selftests: net: reuseaddr_conflict
...
# selftests: net: test_vxlan_mdb.sh
ok 90 selftests: net: test_vxlan_mdb.sh
# selftests: net: fib_nexthops.sh
not ok 41 selftests: net: fib_nexthops.sh # exit=1
# selftests: net: fcnal-test.sh
not ok 36 selftests: net: fcnal-test.sh # exit=1
real 55m1.238s
user 12m10.350s
sys 22m17.432s
Hangbin Liu (38):
selftests/net: add lib.sh
selftests/net: arp_ndisc_evict_nocarrier.sh convert to run test in
unique namespace
selftest: arp_ndisc_untracked_subnets.sh convert to run test in unique
namespace
selftests/net: convert cmsg tests to make them run in unique namespace
selftests/net: convert drop_monitor_tests.sh to run it in unique
namespace
selftests/net: convert fcnal-test.sh to run it in unique namespace
selftests/net: convert fib_nexthop_multiprefix to run it in unique
namespace
selftests/net: convert fib_nexthop_nongw.sh to run it in unique
namespace
selftests/net: convert fib_nexthops.sh to run it in unique namespace
selftests/net: convert fib-onlink-tests.sh to run it in unique
namespace
selftests/net: convert fib_rule_tests.sh to run it in unique namespace
selftests/net: convert fib_tests.sh to run it in unique namespace
selftests/net: convert gre_gso.sh to run it in unique namespace
selftests/net: convert icmp_redirect.sh to run it in unique namespace
sleftests/net: convert icmp.sh to run it in unique namespace
selftests/net: convert ioam6.sh to run it in unique namespace
selftests/net: convert l2tp.sh to run it in unique namespace
selftests/net: convert ndisc_unsolicited_na_test.sh to run it in
unique namespace
selftests/net: convert netns-name.sh to run it in unique namespace
selftests/net: convert fdb_flush.sh to run it in unique namespace
selftests/net: convert rtnetlink.sh to run it in unique namespace
selftests/net: convert sctp_vrf.sh to run it in unique namespace
selftests/net: use unique netns name for setup_loopback.sh
setup_veth.sh
selftests/net: convert stress_reuseport_listen.sh to run it in unique
namespace
selftests/net: convert test_bridge_backup_port.sh to run it in unique
namespace
selftests/net: convert test_bridge_neigh_suppress.sh to run it in
unique namespace
selftests/net: convert test_vxlan_mdb.sh to run it in unique namespace
selftests/net: convert test_vxlan_nolocalbypass.sh to run it in unique
namespace
selftests/net: convert test_vxlan_under_vrf.sh to run it in unique
namespace
selftests/net: convert test_vxlan_vnifiltering.sh to run it in unique
namespace
selftests/net: convert toeplitz.sh to run it in unique namespace
selftests/net: convert unicast_extensions.sh to run it in unique
namespace
selftests/net: convert vrf_route_leaking.sh to run it in unique
namespace
selftests/net: convert vrf_strict_mode_test.sh to run it in unique
namespace
selftests/net: convert vrf-xfrm-tests.sh to run it in unique namespace
selftests/net: convert traceroute.sh to run it in unique namespace
selftests/net: convert xfrm_policy.sh to run it in unique namespace
kselftest/runner.sh: add netns support
tools/testing/selftests/kselftest/runner.sh | 26 +-
tools/testing/selftests/net/Makefile | 2 +-
.../net/arp_ndisc_evict_nocarrier.sh | 46 +--
.../net/arp_ndisc_untracked_subnets.sh | 18 +-
tools/testing/selftests/net/cmsg_ipv6.sh | 10 +-
tools/testing/selftests/net/cmsg_so_mark.sh | 7 +-
tools/testing/selftests/net/cmsg_time.sh | 7 +-
.../selftests/net/drop_monitor_tests.sh | 21 +-
tools/testing/selftests/net/fcnal-test.sh | 30 +-
tools/testing/selftests/net/fdb_flush.sh | 11 +-
.../testing/selftests/net/fib-onlink-tests.sh | 7 +-
.../selftests/net/fib_nexthop_multiprefix.sh | 104 +++--
.../selftests/net/fib_nexthop_nongw.sh | 34 +-
tools/testing/selftests/net/fib_nexthops.sh | 142 ++++---
tools/testing/selftests/net/fib_rule_tests.sh | 36 +-
tools/testing/selftests/net/fib_tests.sh | 184 +++++----
tools/testing/selftests/net/gre_gso.sh | 18 +-
tools/testing/selftests/net/icmp.sh | 10 +-
tools/testing/selftests/net/icmp_redirect.sh | 182 +++++----
tools/testing/selftests/net/ioam6.sh | 247 ++++++------
tools/testing/selftests/net/l2tp.sh | 130 +++----
tools/testing/selftests/net/lib.sh | 98 +++++
.../net/ndisc_unsolicited_na_test.sh | 19 +-
tools/testing/selftests/net/netns-name.sh | 44 +--
tools/testing/selftests/net/rtnetlink.sh | 21 +-
tools/testing/selftests/net/sctp_vrf.sh | 12 +-
tools/testing/selftests/net/settings | 2 +-
tools/testing/selftests/net/setup_loopback.sh | 8 +-
tools/testing/selftests/net/setup_veth.sh | 9 +-
.../selftests/net/stress_reuseport_listen.sh | 6 +-
.../selftests/net/test_bridge_backup_port.sh | 368 +++++++++---------
.../net/test_bridge_neigh_suppress.sh | 333 ++++++++--------
tools/testing/selftests/net/test_vxlan_mdb.sh | 202 +++++-----
.../selftests/net/test_vxlan_nolocalbypass.sh | 48 ++-
.../selftests/net/test_vxlan_under_vrf.sh | 70 ++--
.../selftests/net/test_vxlan_vnifiltering.sh | 154 +++++---
tools/testing/selftests/net/toeplitz.sh | 16 +-
tools/testing/selftests/net/traceroute.sh | 82 ++--
.../selftests/net/unicast_extensions.sh | 99 +++--
tools/testing/selftests/net/vrf-xfrm-tests.sh | 77 ++--
.../selftests/net/vrf_route_leaking.sh | 201 +++++-----
.../selftests/net/vrf_strict_mode_test.sh | 47 ++-
tools/testing/selftests/net/xfrm_policy.sh | 138 +++----
tools/testing/selftests/run_kselftest.sh | 4 +
44 files changed, 1676 insertions(+), 1654 deletions(-)
create mode 100644 tools/testing/selftests/net/lib.sh
--
2.41.0
Hi,
On Mon, Nov 27, 2023 at 11:49:16AM +0000, Felix Huettner wrote:
> conntrack zones are heavily used by tools like openvswitch to run
> multiple virtual "routers" on a single machine. In this context each
> conntrack zone matches to a single router, thereby preventing
> overlapping IPs from becoming issues.
> In these systems it is common to operate on all conntrack entries of a
> given zone, e.g. to delete them when a router is deleted. Previously this
> required these tools to dump the full conntrack table and filter out the
> relevant entries in userspace potentially causing performance issues.
>
> To do this we reuse the existing CTA_ZONE attribute. This was previous
> parsed but not used during dump and flush requests. Now if CTA_ZONE is
> set we filter these operations based on the provided zone.
> However this means that users that previously passed CTA_ZONE will
> experience a difference in functionality.
>
> Alternatively CTA_FILTER could have been used for the same
> functionality. However it is not yet supported during flush requests and
> is only available when using AF_INET or AF_INET6.
You mean, AF_UNSPEC cannot be specified in CTA_FILTER?
Please, extend libnetfilter_conntrack to support for this feature,
there is a filter API that can be used for this purpose.
Thanks.
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
> > I think ARM64 approached this problem by adding the
> > load-acquire/store-release instructions and for TSO based code,
> > translate into those (eg. x86 -> arm64 transpilers).
>
>
> Although those instructions have a bit more ordering constraints.
>
> I have heard rumors that the apple chips also have a register that can be
> set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies.
But to be fair, I haven't been paying *that* much attention to the apple
stuff.
I did read about how they fudged some of the x86 flags thing.
> And there are some IBM machines that have a setting, but not sure how it is
> controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I
wasn't aware they had this.
> > IIRC Risc-V actually has such instructions as well, so *why* are you
> > doing this?!?!
>
>
> Unfortunately, at least last time I checked RISC-V still hadn't gotten such
> instructions.
> What they have is the *semantics* of the instructions, but no actual opcodes
> to encode them.
Well, that sucks..
> I argued for them in the RISC-V memory group, but it was considered to be
> outside the scope of that group.
>
> Transpiling with sufficient DMB ISH to get the desired ordering is really
> bad for performance.
Ha!, quite dreadful I would imagine.
> That is not to say that linux should support this. Perhaps linux should
> pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like
a plan :-)
The series adds support for setrlimit/getrlimit.
Mainly to avoid spurious coredumps when running the tests under
qemu-user.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
tools/nolibc: drop custom definition of struct rusage
tools/nolibc: add support for getrlimit/setrlimit
selftests/nolibc: disable coredump via setrlimit
tools/include/nolibc/sys.h | 38 ++++++++++++++++++++++++++++
tools/include/nolibc/types.h | 21 +--------------
tools/testing/selftests/nolibc/nolibc-test.c | 31 +++++++++++++++++++++++
3 files changed, 70 insertions(+), 20 deletions(-)
---
base-commit: 0dbd4651f3f80151910a36416fa0df28a10c3b0a
change-id: 20231122-nolibc-rlimit-bb5b1f264fc4
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
In public cloud scenario, if kdump service works abnormally,
users cannot get vmcore. Without vmcore, user has no idea why the
kernel crashed. Meanwhile, there is no additional information
to find the reason why the kdump service is abnormal.
One way is to obtain console messages through VNC. The drawback
is that VNC is real-time, if user missed the timing to get the VNC
output, the crash needs to be retriggered.
Another way is to enable the console frontend of pstore and record the
console messages to the pstore backend. On the one hand, the console
logs only contain kernel printk logs and does not cover
user-mode print logs. Although we can redirect user-mode logs to the
pmsg frontend provided by pstore, user-mode information related to
booting and kdump service vary from systemd, kdump.sh, and so on which
makes redirection troublesome. So we added a tty frontend and save all
logs of tty driver to the pstore backend.
Another problem is that currently pstore only supports a single backend.
For debugging kdump problems, we hope to save the console logs and tty
logs to the ramoops backend of pstore, as it will not be lost after
rebooting. If the user has enabled another backend, the ramoops backend
will not be registered. To this end, we add the multi-backend function
to support simultaneous registration of multiple backends.
Based on the above changes, we can enable pstore in the crashdump kernel
and save the console logs and tty logs to the ramoops backend of pstore.
After rebooting, we can view the relevant logs by mounting the pstore
file system.
Furthermore, we also modified kexec-tools referring to crash-utils for
reading memory, so that pstore ramoops information can be read without
enabling pstore in first kernel. As we set the address and size of ramoops,
as well as the sizes of console and tty, we can infer the physical address
of console logs and tty logs in memory. Referring to the read method of
crash-utils, the console logs and tty logs are read from the memory,
user can get pstore debug information without affecting the first kernel
at all.
kexec-tools modification can be seen at
https://github.com/shuyuanmen/kexec-tools/blob/main/Add-pstore-segment.patch
Yuanhe Shu (5):
pstore: add tty frontend
pstore: add multi-backends support
pstore: add subdirs for multi-backends
pstore: remove the module parameter "backend"
tools/pstore: update pstore selftests
drivers/tty/n_tty.c | 1 +
fs/pstore/Kconfig | 23 ++
fs/pstore/Makefile | 2 +
fs/pstore/blk.c | 10 +
fs/pstore/ftrace.c | 22 +-
fs/pstore/inode.c | 86 ++++++-
fs/pstore/internal.h | 16 +-
fs/pstore/platform.c | 238 ++++++++++++--------
fs/pstore/pmsg.c | 23 +-
fs/pstore/ram.c | 40 +++-
fs/pstore/tty.c | 56 +++++
fs/pstore/zone.c | 42 +++-
include/linux/pstore.h | 33 +++
include/linux/pstore_blk.h | 3 +
include/linux/pstore_ram.h | 1 +
include/linux/pstore_zone.h | 2 +
include/linux/tty.h | 14 ++
tools/testing/selftests/pstore/common_tests | 4 -
18 files changed, 500 insertions(+), 116 deletions(-)
create mode 100644 fs/pstore/tty.c
--
2.39.3
Regressions that prevent a driver from probing a device can significantly
affect the functionality of a platform.
A kselftest to verify if devices on a DT-based platform are probed
correctly was recently introduced [1], but no such generic test is
available for ACPI platforms yet. bootrr [2] provides device probe
testing, but relies on a pre-defined list of the peripherals present on
each DUT.
On ACPI based hardware, a complete description of the platform is
provided to the OS by the system firmware. ACPI namespace objects are
mapped by the Linux ACPI subsystem into a device tree in
/sys/devices/LNXSYSTEM:00; the information in this subtree can be parsed
to build a list of the hw peripherals present on the DUT dynamically.
This series adds a test to verify if the devices declared in the ACPI
namespace and supported by the kernel are probed correctly.
This work follows a similar approach to [1], adapted for the ACPI use
case.
The first patch introduces a script that builds a list of all ACPI device
IDs supported by the kernel, by inspecting the acpi_device_id structs in
the sources. This list can be used to avoid testing ACPI-enumerated
devices that don't have a matching driver in the kernel. This script was
highly inspired by the dt-extract-compatibles script [3].
In the second patch, a new kselftest is added. It parses the
/sys/devices/LNXSYSTEM:00 tree to obtain a list of all platform
peripherals and verifies which of those, if supported, are correctly
bound to a driver.
Feedback is much appreciated,
Thank you,
Laura
[1] https://lore.kernel.org/all/20230828211424.2964562-1-nfraprado@collabora.co…
[2] https://github.com/kernelci/bootr
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scr…
Laura Nao (2):
acpi: Add script to extract ACPI device ids in the kernel
kselftest: Add test to detect unprobed devices on ACPI platforms
MAINTAINERS | 2 +
scripts/acpi/acpi-extract-ids | 60 +++++++++++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/acpi/.gitignore | 2 +
tools/testing/selftests/acpi/Makefile | 23 ++++++
.../selftests/acpi/test_unprobed_devices.sh | 75 +++++++++++++++++++
6 files changed, 163 insertions(+)
create mode 100755 scripts/acpi/acpi-extract-ids
create mode 100644 tools/testing/selftests/acpi/.gitignore
create mode 100644 tools/testing/selftests/acpi/Makefile
create mode 100755 tools/testing/selftests/acpi/test_unprobed_devices.sh
--
2.30.2
The za-fork test does not output a newline when reporting the result of
the one test it runs, causing the counts printed by kselftest to be
included in the test name. Add the newline.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/fp/za-fork.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/fp/za-fork.c b/tools/testing/selftests/arm64/fp/za-fork.c
index b86cb1049497..587b94648222 100644
--- a/tools/testing/selftests/arm64/fp/za-fork.c
+++ b/tools/testing/selftests/arm64/fp/za-fork.c
@@ -85,7 +85,7 @@ int main(int argc, char **argv)
*/
ret = open("/proc/sys/abi/sme_default_vector_length", O_RDONLY, 0);
if (ret >= 0) {
- ksft_test_result(fork_test(), "fork_test");
+ ksft_test_result(fork_test(), "fork_test\n");
} else {
ksft_print_msg("SME not supported\n");
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231115-arm64-fix-za-fork-output-21cdd7a7195c
Best regards,
--
Mark Brown <broonie(a)kernel.org>
commit 05f1edac8009 ("selftests/mm: run all tests from run_vmtests.sh")
fixed the inconsistancy caused by tests being defined as TEST_GEN_PROGS.
This issue was leading to tests not being executed via run_vmtests.sh and
furthermore some tests running twice due to the kselftests wrapper also
executing them.
Fix the definition of two tests (soft-dirty and pagemap_ioctl)
that are still incorrectly defined.
Signed-off-by: Nico Pache <npache(a)redhat.com>
---
tools/testing/selftests/mm/Makefile | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 78dfec8bc676..dede0bcf97a3 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -60,7 +60,7 @@ TEST_GEN_FILES += mrelease_test
TEST_GEN_FILES += mremap_dontunmap
TEST_GEN_FILES += mremap_test
TEST_GEN_FILES += on-fault-limit
-TEST_GEN_PROGS += pagemap_ioctl
+TEST_GEN_FILES += pagemap_ioctl
TEST_GEN_FILES += thuge-gen
TEST_GEN_FILES += transhuge-stress
TEST_GEN_FILES += uffd-stress
@@ -72,7 +72,7 @@ TEST_GEN_FILES += mdwe_test
TEST_GEN_FILES += hugetlb_fault_after_madv
ifneq ($(ARCH),arm64)
-TEST_GEN_PROGS += soft-dirty
+TEST_GEN_FILES += soft-dirty
endif
ifeq ($(ARCH),x86_64)
--
2.41.0
Intel SIOV allows creating virtual devices of which the vRID is
represented by a pasid of a physical device. It is called as SIOV
virtual device in this series. Such devices can be bound to an iommufd
as physical device does and then later be attached to an IOAS/hwpt
using that pasid. Such PASIDs are called as default pasid.
iommufd has already supported pasid attach[1]. So a simple way to
support SIOV virtual device attachment is to let device driver call
the iommufd_device_pasid_attach() and pass in the default pasid for
the virtual device. This should work for now, but it may have problem
if iommufd core wants to differentiate the default pasids with other
kind of pasids (e.g. pasid given by userspace). In the later forwarding
page request to userspace, the default pasids are not supposed to send
to userspace as default pasids are mainly used by the SIOV device driver.
With above reason, this series chooses to have a new API to bind the
default pasid to iommufd, and extends the iommufd_device_attach() to
convert the attachment to be pasid attach with the default pasid. Device
drivers (e.g. VFIO) that support SIOV shall call the below APIs to
interact with iommufd:
- iommufd_device_bind_pasid(): Bind virtual device (a pasid of a device)
to iommufd;
- iommufd_device_attach(): Attach a SIOV virtual device to IOAS/HWPT;
- iommufd_device_replace(): Replace IOAS/HWPT of a SIOV virtual device;
- iommufd_device_detach(): Detach IOAS/HWPT of a SIOV virtual device;
- iommufd_device_unbind(): Unbind virtual device from iommufd;
For vfio devices, the device drivers that support SIOV should:
- use below API to register vdev for SIOV virtual device
vfio_register_pasid_iommu_dev()
- use below API to bind vdev to iommufd in .bind_iommufd() callback
iommufd_device_bind_pasid()
- allocate pasid by itself before calling iommufd_device_bind_pasid()
Complete code can be found at[2]
[1] https://lore.kernel.org/linux-iommu/20230926092651.17041-1-yi.l.liu@intel.c…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid_siov
Regards,
Yi Liu
Kevin Tian (5):
iommufd: Handle unsafe interrupts in a separate function
iommufd: Introduce iommufd_alloc_device()
iommufd: Add iommufd_device_bind_pasid()
iommufd: Support attach/replace for SIOV virtual device {dev, pasid}
vfio: Add vfio_register_pasid_iommu_dev()
Yi Liu (2):
iommufd/selftest: Extend IOMMU_TEST_OP_MOCK_DOMAIN to pass in pasid
iommufd/selftest: Add test coverage for SIOV virtual device
drivers/iommu/iommufd/device.c | 163 ++++++++++++++----
drivers/iommu/iommufd/iommufd_private.h | 7 +
drivers/iommu/iommufd/iommufd_test.h | 2 +
drivers/iommu/iommufd/selftest.c | 10 +-
drivers/vfio/group.c | 18 ++
drivers/vfio/vfio.h | 8 +
drivers/vfio/vfio_main.c | 10 ++
include/linux/iommufd.h | 3 +
include/linux/vfio.h | 1 +
tools/testing/selftests/iommu/iommufd.c | 75 ++++++--
.../selftests/iommu/iommufd_fail_nth.c | 42 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 21 ++-
12 files changed, 296 insertions(+), 64 deletions(-)
--
2.34.1
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v3:
- Addressed the following points mentioned by Yonghong Song
- Improved `bpf_map_lookup_elem` assertion in bpf_tcp_ca.
- Replaced assertion introduced in v2 with one that checks `thread_ret`
instead of `pthread_join`. This ensures that `server`'s return value
(thread_ret) is the one being checked, as oposed to `pthread_join`'s
return value, since the latter one is less likely to fail.
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v2 - https://lore.kernel.org/lkml/GV1PR10MB6563AECF8E94798A1E5B36A4E8B6A@GV1PR10…
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 48 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 105 insertions(+), 169 deletions(-)
--
2.25.1
Changes from v1:
* Dropped some changes that were independently fixed[1]
* No longer separate the f strings to their own patch
* Use r strings when the value is a regular expression
* Updated verification script
In retrospect a script to find the instances and apply fixes isn't that
useful for review, so the attached script this time just looks for
differences in the AST. Apply the series and run the script, with
the two references to compare as arguments.
There are some intentional changes to the AST now though, as the r strings
turn '\t' from a single character tab into a backslash and 't' character
pair (similar for '\n'). This does not affect the correctness of the
regular expression though.
v1: https://lore.kernel.org/all/20230814060704.79655-1-bgray@linux.ibm.com/
[1]: https://lore.kernel.org/all/20230816122133.1231599-1-vishalc@linux.ibm.com/
---
#!/usr/bin/env python3
"""
Verify Python syntax trees are equivalent between two references
"""
import argparse
import ast
from pathlib import Path
import subprocess as sp
def read_file(path: Path, ref: str) -> str:
return sp.run(f"git show {ref}:{path}", stdout=sp.PIPE, shell=True, encoding="utf-8", check=True).stdout
parser = argparse.ArgumentParser("Compare Python ASTs between revisions")
parser.add_argument("ref1", type=str, help="First revision to use")
parser.add_argument("ref2", type=str, help="Second revision to use")
args = parser.parse_args()
for pyfile in Path(".").glob("**/*.py"):
try:
ref1_content = read_file(pyfile, args.ref1)
ref2_content = read_file(pyfile, args.ref2)
except Exception as e:
print(f"ERROR:{pyfile}: Failed to read ({e})")
continue
try:
ref1_syntax = ast.parse(ref1_content, filename=pyfile)
ref2_syntax = ast.parse(ref2_content, filename=pyfile)
except SyntaxError as e:
print(f"ERROR:{pyfile}: Failed to parse, is it Python3? ({e})")
continue
if ast.dump(ref1_syntax) != ast.dump(ref2_syntax):
print(f"ERROR:{pyfile}: Revisions have different AST")
cmd = f"diff <(git show {args.ref1}:{pyfile} | python -m ast) <(git show {args.ref2}:{pyfile} | python -m ast)"
print(cmd)
sp.run(cmd, shell=True)
continue
Benjamin Gray (7):
ia64: fix Python string escapes
Documentation/sphinx: fix Python string escapes
drivers/comedi: fix Python string escapes
scripts: fix Python string escapes
tools/perf: fix Python string escapes
tools/power: fix Python string escapes
selftests/bpf: fix Python string escapes
Documentation/sphinx/cdomain.py | 2 +-
Documentation/sphinx/kernel_abi.py | 2 +-
Documentation/sphinx/kernel_feat.py | 2 +-
Documentation/sphinx/kerneldoc.py | 2 +-
Documentation/sphinx/maintainers_include.py | 8 +++---
arch/ia64/scripts/unwcheck.py | 2 +-
.../ni_routing/tools/convert_csv_to_c.py | 2 +-
scripts/clang-tools/gen_compile_commands.py | 2 +-
scripts/gdb/linux/symbols.py | 2 +-
tools/perf/pmu-events/jevents.py | 2 +-
.../scripts/python/arm-cs-trace-disasm.py | 4 +--
tools/perf/scripts/python/compaction-times.py | 2 +-
.../scripts/python/exported-sql-viewer.py | 4 +--
tools/power/pm-graph/bootgraph.py | 12 ++++-----
.../selftests/bpf/test_bpftool_synctypes.py | 26 +++++++++----------
tools/testing/selftests/bpf/test_offload.py | 2 +-
16 files changed, 38 insertions(+), 38 deletions(-)
--
2.41.0
From: Willem de Bruijn <willemb(a)google.com>
Commit 29f834aa326e ("net_sched: sch_fq: add 3 bands and WRR
scheduling") introduces multiple traffic bands, and per-band maximum
packet count.
Per-band limits ensures that packets in one class cannot fill the
entire qdisc and so cause DoS to the traffic in the other classes.
Verify this behavior:
1. set the limit to 10 per band
2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
Packets must remain queued for a period to trigger this behavior.
Use SO_TXTIME to store packets for 100 msec.
The test reuses existing upstream test infra. The script is a fork of
cmsg_time.sh. The scripts call cmsg_sender.
The test extends cmsg_sender with two arguments:
* '-P' SO_PRIORITY
There is a subtle difference between IPv4 and IPv6 stack behavior:
PF_INET/IP_TOS sets IP header bits and sk_priority
PF_INET6/IPV6_TCLASS sets IP header bits BUT NOT sk_priority
* '-n' num pkts
Send multiple packets in quick succession.
I first attempted a for loop in the script, but this is too slow in
virtualized environments, causing flakiness as the 100ms timeout is
reached and packets are dequeued.
Also do not wait for timestamps to be queued unless timestamps are
requested.
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
tools/testing/selftests/net/Makefile | 1 +
tools/testing/selftests/net/cmsg_sender.c | 50 ++++++++++------
.../testing/selftests/net/fq_band_pktlimit.sh | 57 +++++++++++++++++++
3 files changed, 91 insertions(+), 17 deletions(-)
create mode 100755 tools/testing/selftests/net/fq_band_pktlimit.sh
diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 5b2aca4c5f10..9274edfb76ff 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -91,6 +91,7 @@ TEST_PROGS += test_bridge_neigh_suppress.sh
TEST_PROGS += test_vxlan_nolocalbypass.sh
TEST_PROGS += test_bridge_backup_port.sh
TEST_PROGS += fdb_flush.sh
+TEST_PROGS += fq_band_pktlimit.sh
TEST_FILES := settings
diff --git a/tools/testing/selftests/net/cmsg_sender.c b/tools/testing/selftests/net/cmsg_sender.c
index 24b21b15ed3f..8d7575389f58 100644
--- a/tools/testing/selftests/net/cmsg_sender.c
+++ b/tools/testing/selftests/net/cmsg_sender.c
@@ -45,11 +45,13 @@ struct options {
const char *host;
const char *service;
unsigned int size;
+ unsigned int num_pkt;
struct {
unsigned int mark;
unsigned int dontfrag;
unsigned int tclass;
unsigned int hlimit;
+ unsigned int priority;
} sockopt;
struct {
unsigned int family;
@@ -72,6 +74,7 @@ struct options {
} v6;
} opt = {
.size = 13,
+ .num_pkt = 1,
.sock = {
.family = AF_UNSPEC,
.type = SOCK_DGRAM,
@@ -112,7 +115,7 @@ static void cs_parse_args(int argc, char *argv[])
{
int o;
- while ((o = getopt(argc, argv, "46sS:p:m:M:d:tf:F:c:C:l:L:H:")) != -1) {
+ while ((o = getopt(argc, argv, "46sS:p:P:m:M:n:d:tf:F:c:C:l:L:H:")) != -1) {
switch (o) {
case 's':
opt.silent_send = true;
@@ -138,7 +141,9 @@ static void cs_parse_args(int argc, char *argv[])
cs_usage(argv[0]);
}
break;
-
+ case 'P':
+ opt.sockopt.priority = atoi(optarg);
+ break;
case 'm':
opt.mark.ena = true;
opt.mark.val = atoi(optarg);
@@ -146,6 +151,9 @@ static void cs_parse_args(int argc, char *argv[])
case 'M':
opt.sockopt.mark = atoi(optarg);
break;
+ case 'n':
+ opt.num_pkt = atoi(optarg);
+ break;
case 'd':
opt.txtime.ena = true;
opt.txtime.delay = atoi(optarg);
@@ -410,6 +418,10 @@ static void ca_set_sockopts(int fd)
setsockopt(fd, SOL_IPV6, IPV6_UNICAST_HOPS,
&opt.sockopt.hlimit, sizeof(opt.sockopt.hlimit)))
error(ERN_SOCKOPT, errno, "setsockopt IPV6_HOPLIMIT");
+ if (opt.sockopt.priority &&
+ setsockopt(fd, SOL_SOCKET, SO_PRIORITY,
+ &opt.sockopt.priority, sizeof(opt.sockopt.priority)))
+ error(ERN_SOCKOPT, errno, "setsockopt SO_PRIORITY");
}
int main(int argc, char *argv[])
@@ -421,6 +433,7 @@ int main(int argc, char *argv[])
char *buf;
int err;
int fd;
+ int i;
cs_parse_args(argc, argv);
@@ -480,24 +493,27 @@ int main(int argc, char *argv[])
cs_write_cmsg(fd, &msg, cbuf, sizeof(cbuf));
- err = sendmsg(fd, &msg, 0);
- if (err < 0) {
- if (!opt.silent_send)
- fprintf(stderr, "send failed: %s\n", strerror(errno));
- err = ERN_SEND;
- goto err_out;
- } else if (err != (int)opt.size) {
- fprintf(stderr, "short send\n");
- err = ERN_SEND_SHORT;
- goto err_out;
- } else {
- err = ERN_SUCCESS;
+ for (i = 0; i < opt.num_pkt; i++) {
+ err = sendmsg(fd, &msg, 0);
+ if (err < 0) {
+ if (!opt.silent_send)
+ fprintf(stderr, "send failed: %s\n", strerror(errno));
+ err = ERN_SEND;
+ goto err_out;
+ } else if (err != (int)opt.size) {
+ fprintf(stderr, "short send\n");
+ err = ERN_SEND_SHORT;
+ goto err_out;
+ }
}
+ err = ERN_SUCCESS;
- /* Make sure all timestamps have time to loop back */
- usleep(opt.txtime.delay);
+ if (opt.ts.ena) {
+ /* Make sure all timestamps have time to loop back */
+ usleep(opt.txtime.delay);
- cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ cs_read_cmsg(fd, &msg, cbuf, sizeof(cbuf));
+ }
err_out:
close(fd);
diff --git a/tools/testing/selftests/net/fq_band_pktlimit.sh b/tools/testing/selftests/net/fq_band_pktlimit.sh
new file mode 100755
index 000000000000..24b77bdf41ff
--- /dev/null
+++ b/tools/testing/selftests/net/fq_band_pktlimit.sh
@@ -0,0 +1,57 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify that FQ has a packet limit per band:
+#
+# 1. set the limit to 10 per band
+# 2. send 20 pkts on band A: verify that 10 are queued, 10 dropped
+# 3. send 20 pkts on band A: verify that 0 are queued, 20 dropped
+# 4. send 20 pkts on band B: verify that 10 are queued, 10 dropped
+#
+# Send packets with a 100ms delay to ensure that previously sent
+# packets are still queued when later ones are sent.
+# Use SO_TXTIME for this.
+
+die() {
+ echo "$1"
+ exit 1
+}
+
+# run inside private netns
+if [[ $# -eq 0 ]]; then
+ ./in_netns.sh "$0" __subprocess
+ exit
+fi
+
+ip link add type dummy
+ip link set dev dummy0 up
+ip -6 addr add fdaa::1/128 dev dummy0
+ip -6 route add fdaa::/64 dev dummy0
+tc qdisc replace dev dummy0 root handle 1: fq quantum 1514 initial_quantum 1514 limit 10
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT1="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 fdaa::2 8000
+OUT2="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+./cmsg_sender -6 -p u -d 100000 -n 20 -P 7 fdaa::2 8000
+OUT3="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Initial stats will report zero sent, as all packets are still
+# queued in FQ. Sleep for the delay period (100ms) and see that
+# twenty are now sent.
+sleep 0.1
+OUT4="$(tc -s qdisc show dev dummy0 | grep '^\ Sent')"
+
+# Log the output after the test
+echo "${OUT1}"
+echo "${OUT2}"
+echo "${OUT3}"
+echo "${OUT4}"
+
+# Test the output for expected values
+echo "${OUT1}" | grep -q '0\ pkt\ (dropped\ 10' || die "unexpected drop count at 1"
+echo "${OUT2}" | grep -q '0\ pkt\ (dropped\ 30' || die "unexpected drop count at 2"
+echo "${OUT3}" | grep -q '0\ pkt\ (dropped\ 40' || die "unexpected drop count at 3"
+echo "${OUT4}" | grep -q '20\ pkt\ (dropped\ 40' || die "unexpected accept count at 4"
--
2.43.0.rc1.413.gea7ed67945-goog
On Wed, Nov 15, 2023 at 01:17:06PM +0800, Liu, Jing2 wrote:
> This is the right way to approach it,
>
> I learned that there was discussion about using io_uring to get the
> page fault without
>
> eventfd notification in [1], and I am new at io_uring and studying the
> man page of
>
> liburing, but there're questions in my mind on how can QEMU get the
> coming page fault
>
> with a good performance.
>
> Since both QEMU and Kernel don't know when comes faults, after QEMU
> submits one
>
> read task to io_uring, we want kernel pending until fault comes. While
> based on
>
> hwpt_fault_fops_read() in [patch v2 4/6], it just returns 0 since
> there's now no fault,
>
> thus this round of read completes to CQ but it's not what we want. So
> I'm wondering
>
> how kernel pending on the read until fault comes. Does fops callback
> need special work to
Implement a fops with poll support that triggers when a new event is
pushed and everything will be fine. There are many examples in the
kernel. The ones in the mlx5 vfio driver spring to mind as a scheme I
recently looked at.
Jason
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Changes in v2:
- Fixed pthread_join assertion that broke the previous test
Previous version:
v1 - https://lore.kernel.org/lkml/GV1PR10MB6563FCFF1C5DEBE84FEA985FE8B0A@GV1PR10…
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 50 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 106 insertions(+), 170 deletions(-)
--
2.25.1
v4:
- Update patch 1 to move apply_wqattrs_lock() and apply_wqattrs_unlock()
down into CONFIG_SYSFS block to avoid compilation warnings.
v3:
- Break out a separate patch to make workqueue_set_unbound_cpumask()
static and move it down to the CONFIG_SYSFS section.
- Remove the "__DEBUG__." prefix and the CFTYPE_DEBUG flag from the
new root only cpuset.cpus.isolated control files and update the
test accordingly.
v2:
- Add 2 read-only workqueue sysfs files to expose the user requested
cpumask as well as the isolated CPUs to be excluded from
wq_unbound_cpumask.
- Ensure that caller of the new workqueue_unbound_exclude_cpumask()
hold cpus_read_lock.
- Update the cpuset code to make sure the cpus_read_lock is held
whenever workqueue_unbound_exclude_cpumask() may be called.
Isolated cpuset partition can currently be created to contain an
exclusive set of CPUs not used in other cgroups and with load balancing
disabled to reduce interference from the scheduler.
The main purpose of this isolated partition type is to dynamically
emulate what can be done via the "isolcpus" boot command line option,
specifically the default domain flag. One effect of the "isolcpus" option
is to remove the isolated CPUs from the cpumasks of unbound workqueues
since running work functions in an isolated CPU can be a major source
of interference. Changing the unbound workqueue cpumasks can be done at
run time by writing an appropriate cpumask without the isolated CPUs to
/sys/devices/virtual/workqueue/cpumask. So one can set up an isolated
cpuset partition and then write to the cpumask sysfs file to achieve
similar level of CPU isolation. However, this manual process can be
error prone.
This patch series implements automatic exclusion of isolated CPUs from
unbound workqueue cpumasks when an isolated cpuset partition is created
and then adds those CPUs back when the isolated partition is destroyed.
There are also other places in the kernel that look at the HK_FLAG_DOMAIN
cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from
certain actions to further reduce interference. CPUs in an isolated
cpuset partition will not be able to avoid those interferences yet. That
may change in the future as the need arises.
Waiman Long (5):
workqueue: Make workqueue_set_unbound_cpumask() static
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs
from wq_unbound_cpumask
selftests/cgroup: Minor code cleanup and reorganization of
test_cpuset_prs.sh
cgroup/cpuset: Keep track of CPUs in isolated partitions
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
Documentation/admin-guide/cgroup-v2.rst | 10 +-
include/linux/workqueue.h | 2 +-
kernel/cgroup/cpuset.c | 286 +++++++++++++-----
kernel/workqueue.c | 165 +++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++-----
5 files changed, 475 insertions(+), 204 deletions(-)
--
2.39.3
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zisslpcfi respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process in a similar manner
to how the normal stack is specified, keeping the current implicit
allocation behaviour if one is not specified either with clone3() or
through the use of clone(). Unlike normal stacks only the shadow stack
size is specified, similar issues to those that lead to the creation of
map_shadow_stack() apply.
Please note that the x86 portions of this code are build tested only, I
don't appear to have a system that can run CET avaible to me, I have
done testing with an integration into my pending work for GCS. There is
some possibility that the arm64 implementation may require the use of
clone3() and explicit userspace allocation of shadow stacks, this is
still under discussion.
A new architecture feature Kconfig option for shadow stacks is added as
here, this was suggested as part of the review comments for the arm64
GCS series and since we need to detect if shadow stacks are supported it
seemed sensible to roll it in here.
[1] https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org/
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (5):
mm: Introduce ARCH_HAS_USER_SHADOW_STACK
fork: Add shadow stack support to clone3()
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
kselftest/clone3: Test shadow stack support
arch/x86/Kconfig | 1 +
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 30 ++++-
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 2 +-
include/linux/sched/task.h | 2 +
include/uapi/linux/sched.h | 4 +
kernel/fork.c | 24 +++-
mm/Kconfig | 6 +
tools/testing/selftests/clone3/clone3.c | 151 ++++++++++++++++------
tools/testing/selftests/clone3/clone3_selftests.h | 7 +
12 files changed, 188 insertions(+), 54 deletions(-)
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Changelog:
v5:
* Replace reference getting with an rcu_read_lock() section for
zswap lru modifications (suggested by Yosry)
* Add a new prep patch that allows mem_cgroup_iter() to return
online cgroup.
* Add a callback that updates pool->next_shrink when the cgroup is
offlined (suggested by Yosry Ahmed, Johannes Weiner)
v4:
* Rename list_lru_add to list_lru_add_obj and __list_lru_add to
list_lru_add (patch 1) (suggested by Johannes Weiner and
Yosry Ahmed)
* Some cleanups on the memcg aware LRU patch (patch 2)
(suggested by Yosry Ahmed)
* Use event interface for the new per-cgroup writeback counters.
(patch 3) (suggested by Yosry Ahmed)
* Abstract zswap's lruvec states and handling into
zswap_lruvec_state (patch 5) (suggested by Yosry Ahmed)
v3:
* Add a patch to export per-cgroup zswap writeback counters
* Add a patch to update zswap's kselftest
* Separate the new list_lru functions into its own prep patch
* Do not start from the top of the hierarchy when encounter a memcg
that is not online for the global limit zswap writeback (patch 2)
(suggested by Yosry Ahmed)
* Do not remove the swap entry from list_lru in
__read_swapcache_async() (patch 2) (suggested by Yosry Ahmed)
* Removed a redundant zswap pool getting (patch 2)
(reported by Ryan Roberts)
* Use atomic for the nr_zswap_protected (instead of lruvec's lock)
(patch 5) (suggested by Yosry Ahmed)
* Remove the per-cgroup zswap shrinker knob (patch 5)
(suggested by Yosry Ahmed)
v2:
* Fix loongarch compiler errors
* Use pool stats instead of memcg stats when !CONFIG_MEMCG_KEM
There are currently several issues with zswap writeback:
1. There is only a single global LRU for zswap, making it impossible to
perform worload-specific shrinking - an memcg under memory pressure
cannot determine which pages in the pool it owns, and often ends up
writing pages from other memcgs. This issue has been previously
observed in practice and mitigated by simply disabling
memcg-initiated shrinking:
https://lore.kernel.org/all/20230530232435.3097106-1-nphamcs@gmail.com/T/#u
But this solution leaves a lot to be desired, as we still do not
have an avenue for an memcg to free up its own memory locked up in
the zswap pool.
2. We only shrink the zswap pool when the user-defined limit is hit.
This means that if we set the limit too high, cold data that are
unlikely to be used again will reside in the pool, wasting precious
memory. It is hard to predict how much zswap space will be needed
ahead of time, as this depends on the workload (specifically, on
factors such as memory access patterns and compressibility of the
memory pages).
This patch series solves these issues by separating the global zswap
LRU into per-memcg and per-NUMA LRUs, and performs workload-specific
(i.e memcg- and NUMA-aware) zswap writeback under memory pressure. The
new shrinker does not have any parameter that must be tuned by the
user, and can be opted in or out on a per-memcg basis.
As a proof of concept, we ran the following synthetic benchmark:
build the linux kernel in a memory-limited cgroup, and allocate some
cold data in tmpfs to see if the shrinker could write them out and
improved the overall performance. Depending on the amount of cold data
generated, we observe from 14% to 35% reduction in kernel CPU time used
in the kernel builds.
Domenico Cerasuolo (3):
zswap: make shrinking memcg-aware
mm: memcg: add per-memcg zswap writeback stat
selftests: cgroup: update per-memcg zswap writeback selftest
Nhat Pham (3):
list_lru: allows explicit memcg and NUMA node selection
memcontrol: allows mem_cgroup_iter() to check for onlineness
zswap: shrinks zswap pool based on memory pressure
Documentation/admin-guide/mm/zswap.rst | 7 +
drivers/android/binder_alloc.c | 5 +-
fs/dcache.c | 8 +-
fs/gfs2/quota.c | 6 +-
fs/inode.c | 4 +-
fs/nfs/nfs42xattr.c | 8 +-
fs/nfsd/filecache.c | 4 +-
fs/xfs/xfs_buf.c | 6 +-
fs/xfs/xfs_dquot.c | 2 +-
fs/xfs/xfs_qm.c | 2 +-
include/linux/list_lru.h | 46 ++-
include/linux/memcontrol.h | 9 +-
include/linux/mmzone.h | 2 +
include/linux/vm_event_item.h | 1 +
include/linux/zswap.h | 27 +-
mm/list_lru.c | 48 ++-
mm/memcontrol.c | 20 +-
mm/mmzone.c | 1 +
mm/shrinker.c | 4 +-
mm/swap.h | 3 +-
mm/swap_state.c | 26 +-
mm/vmscan.c | 26 +-
mm/vmstat.c | 1 +
mm/workingset.c | 4 +-
mm/zswap.c | 430 +++++++++++++++++---
tools/testing/selftests/cgroup/test_zswap.c | 74 ++--
26 files changed, 625 insertions(+), 149 deletions(-)
--
2.34.1
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute.
This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability ofthe function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’:proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 56198d4ca..74ef8627f 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -382,7 +382,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv >= 0);
assert(rv <= sizeof(buf));
if (0) {
- write(1, buf, rv);
+ ssize_t written = write(1, buf, rv);
+ if (written == -1) {
+ perror("write failed /proc/${pid}");
+ }
}
const char *p = buf;
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In tools/testing/selftests/proc/proc-empty->because the return value of a write call was being ignored. This call was partof a conditional debugging block (if (0) { ... }), which meant it would neveractually execute.
This patch removes the unused debug write call. This cleanup resolves the compi>warning about ignoring the result of write declared with the warn_unused_resultattribute.
Removing this code also improves the clarity and maintainability ofthe function, as it eliminates a non-functional block of code.
This is original warning: proc-empty-vm.c: In function ‘test_proc_pid_statm’:proc-empty-vm.c:385:17: warning: ignoring return value of ‘write’ declared with>385 | write(1, buf, rv);|
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/proc/proc-empty-vm.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/proc/proc-empty-vm.c b/tools/testing/selftests/proc/proc-empty-vm.c
index 56198d4ca..74ef8627f 100644
--- a/tools/testing/selftests/proc/proc-empty-vm.c
+++ b/tools/testing/selftests/proc/proc-empty-vm.c
@@ -382,7 +382,10 @@ static int test_proc_pid_statm(pid_t pid)
assert(rv >= 0);
assert(rv <= sizeof(buf));
if (0) {
- write(1, buf, rv);
+ ssize_t written = write(1, buf, rv);
+ if (written == -1) {
+ perror("write failed /proc/${pid}");
+ }
}
const char *p = buf;
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In the function 'tools/testing/selftests/breakpoints/run_test' within step_after_suspend_test.c, the ksft_print_msg function call incorrectly used '$s' as a format specifier. This commit corrects this typo to use the proper '%s' format specifier, ensuring the error message from waitpid() is correctly displayed.
The issue manifested as a compilation warning (too many arguments for format [-Wformat-extra-args]), potentially obscuring actual runtime errors and complicating debugging processes.
This fix enhances the clarity of error messages during test failures and ensures compliance with standard C format string conventions.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index 2cf6f10ab..b8703c499 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -89,7 +89,7 @@ int run_test(int cpu)
wpid = waitpid(pid, &status, __WALL);
if (wpid != pid) {
- ksft_print_msg("waitpid() failed: $s\n", strerror(errno));
+ ksft_print_msg("waitpid() failed: %s\n", strerror(errno));
return KSFT_FAIL;
}
if (WIFEXITED(status)) {
--
2.39.2
From: angquan yu <angquan21(a)gmail.com>
In the function 'tools/testing/selftests/breakpoints/run_test' within step_after_suspend_test.c, the ksft_print_msg function call incorrectly used '$s' as a format specifier. This commit corrects this typo to use the proper '%s' format specifier, ensuring the error message from waitpid() is correctly displayed.
The issue manifested as a compilation warning (too many arguments for format [-Wformat-extra-args]), potentially obscuring actual runtime errors and complicating debugging processes.
This fix enhances the clarity of error messages during test failures and ensures compliance with standard C format string conventions.
Signed-off-by: angquan yu <angquan21(a)gmail.com>
---
tools/testing/selftests/breakpoints/step_after_suspend_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index 2cf6f10ab..b8703c499 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -89,7 +89,7 @@ int run_test(int cpu)
wpid = waitpid(pid, &status, __WALL);
if (wpid != pid) {
- ksft_print_msg("waitpid() failed: $s\n", strerror(errno));
+ ksft_print_msg("waitpid() failed: %s\n", strerror(errno));
return KSFT_FAIL;
}
if (WIFEXITED(status)) {
--
2.39.2
Multiple files/programs in `tools/testing/selftests/bpf/prog_tests/` still
heavily use the `CHECK` macro, even when better `ASSERT_` alternatives are
available.
As it was already pointed out by Yonghong Song [1] in the bpf selftests the use
of the ASSERT_* series of macros is preferred over the CHECK macro.
This patchset replaces the usage of `CHECK(` macros to the equivalent `ASSERT_`
family of macros in the following prog_tests:
- bind_perm.c
- bpf_obj_id.c
- bpf_tcp_ca.c
- vmlinux.c
[1] https://lore.kernel.org/lkml/0a142924-633c-44e6-9a92-2dc019656bf2@linux.dev
Yuran Pereira (4):
Replaces the usage of CHECK calls for ASSERTs in bpf_tcp_ca
Replaces the usage of CHECK calls for ASSERTs in bind_perm
Replaces the usage of CHECK calls for ASSERTs in bpf_obj_id
selftests/bpf: Replaces the usage of CHECK calls for ASSERTs in
vmlinux
.../selftests/bpf/prog_tests/bind_perm.c | 6 +-
.../selftests/bpf/prog_tests/bpf_obj_id.c | 204 +++++++-----------
.../selftests/bpf/prog_tests/bpf_tcp_ca.c | 48 ++---
.../selftests/bpf/prog_tests/vmlinux.c | 16 +-
4 files changed, 105 insertions(+), 169 deletions(-)
--
2.25.1
On 2023/11/16 13:12, Cao, Yahui wrote:
>
> On 10/9/2023 4:51 PM, Yi Liu wrote:
>> From: Kevin Tian<kevin.tian(a)intel.com>
>>
>> This adds vfio_register_pasid_iommu_dev() for device driver to register
>> virtual devices which are isolated per PASID in physical IOMMU. The major
>> usage is for the SIOV devices which allows device driver to tag the DMAs
>> out of virtual devices within it with different PASIDs.
>>
>> For a given vfio device, VFIO core creates both group user interface and
>> device user interface (device cdev) if configured. However, for the virtual
>> devices backed by PASID of the device, VFIO core shall only create device
>> user interface as there is no plan to support such devices in the legacy
>> vfio_iommu drivers which is a must if creating group user interface for
>> such virtual devices. This introduces a VFIO_PASID_IOMMU group type for
>> the device driver to register PASID virtual devices, and provides a wrapper
>> API for it. In particular no iommu group (neither fake group or real group)
>> exists per PASID, hence no group interface for this type.
>>
>> Signed-off-by: Kevin Tian<kevin.tian(a)intel.com>
>> Signed-off-by: Yi Liu<yi.l.liu(a)intel.com>
>> ---
>> drivers/vfio/group.c | 18 ++++++++++++++++++
>> drivers/vfio/vfio.h | 8 ++++++++
>> drivers/vfio/vfio_main.c | 10 ++++++++++
>> include/linux/vfio.h | 1 +
>> 4 files changed, 37 insertions(+)
>>
>> ...
>> ...
>> +/*
>> + * Register a virtual device with IOMMU pasid protection. The user of
>> + * this device can trigger DMA as long as all of its outgoing DMAs are
>> + * always tagged with a pasid.
>> + */
>> +int vfio_register_pasid_iommu_dev(struct vfio_device *device)
>> +{
>> + return __vfio_register_dev(device, VFIO_PASID_IOMMU);
>> +}
>> +
>
>
> Missing symbol export here.
>
fixed.
--
Regards,
Yi Liu
When execute the dirty_log_test on some aarch64 machine, it sometimes
trigger the ASSERT:
==== Test Assertion Failure ====
dirty_log_test.c:384: dirty_ring_vcpu_ring_full
pid=14854 tid=14854 errno=22 - Invalid argument
1 0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
2 0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
3 (inlined by) run_test at dirty_log_test.c:802
4 0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
5 0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
6 0x0000ffff9be173c7: ?? ??:0
7 0x0000ffff9be1749f: ?? ??:0
8 0x000000000040206f: _start at ??:?
Didn't continue vcpu even without ring full
The dirty_log_test fails when execute the dirty-ring test, this is
because the sem_vcpu_cont and the sem_vcpu_stop is non-zero value when
execute the dirty_ring_collect_dirty_pages() function. When those two
sem_t variables are non-zero, the dirty_ring_wait_vcpu() at the
beginning of the dirty_ring_collect_dirty_pages() will not wait for the
vcpu to stop, but continue to execute the following code. In this case,
before vcpu stop, if the dirty_ring_vcpu_ring_full is true, and the
dirty_ring_collect_dirty_pages() has passed the check for the
dirty_ring_vcpu_ring_full but hasn't execute the check for the
continued_vcpu, the vcpu stop, and set the dirty_ring_vcpu_ring_full to
false. Then dirty_ring_collect_dirty_pages() will trigger the ASSERT.
Why sem_vcpu_cont and sem_vcpu_stop can be non-zero value? It's because
the dirty_ring_before_vcpu_join() execute the sem_post(&sem_vcpu_cont)
at the end of each dirty-ring test. It can cause two cases:
1. sem_vcpu_cont be non-zero. When we set the host_quit to be true,
the vcpu_worker directly see the host_quit to be true, it quit. So
the log_mode_before_vcpu_join() function will set the sem_vcpu_cont
to 1, since the vcpu_worker has quit, it won't consume it.
2. sem_vcpu_stop be non-zero. When we set the host_quit to be true,
the vcpu_worker has entered the guest state, the next time it exit
from guest state, it will set the sem_vcpu_stop to 1, and then see
the host_quit, no one will consume the sem_vcpu_stop.
When execute more and more dirty-ring tests, the sem_vcpu_cont and
sem_vcpu_stop can be larger and larger, which makes many code paths
don't wait for the sem_t. Thus finally cause the problem.
Fix this problem is easy, simply initialize the sem_t before every test.
Thus whatever the state previous test left, it won't interfere the next
test.
Signed-off-by: Shaoqin Huang <shahuang(a)redhat.com>
---
tools/testing/selftests/kvm/dirty_log_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index 936f3a8d1b83..23b179534c0b 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -726,6 +726,9 @@ static void run_test(enum vm_guest_mode mode, void *arg)
return;
}
+ sem_init(&sem_vcpu_stop, 0, 0);
+ sem_init(&sem_vcpu_cont, 0, 0);
+
/*
* We reserve page table for 2 times of extra dirty mem which
* will definitely cover the original (1G+) test range. Here
@@ -871,9 +874,6 @@ int main(int argc, char *argv[])
int opt, i;
sigset_t sigset;
- sem_init(&sem_vcpu_stop, 0, 0);
- sem_init(&sem_vcpu_cont, 0, 0);
-
guest_modes_append_default();
while ((opt = getopt(argc, argv, "c:hi:I:p:m:M:")) != -1) {
--
2.40.1
This series introduces a new test harness for nolibc.
It is similar to kselftest-harness and google test.
More information in patch 1.
This is an RFC to gather feedback, especially if it can be integrated
with the original kselftest-harness somehow.
Note:
When run under qemu-loongarch64 8.1.2 the test "mmap_munmap_good" fails.
This is a bug in qemu-user and already fixed there on master.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
selftests/nolibc: add custom test harness
selftests/nolibc: migrate startup tests to new harness
selftests/nolibc: migrate vfprintf tests to new harness
tools/testing/selftests/nolibc/nolibc-harness.h | 269 ++++++++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 180 ++++++++--------
2 files changed, 353 insertions(+), 96 deletions(-)
---
base-commit: d38d5366cb1c51f687b4720277adee97074b22e9
change-id: 20231105-nolibc-harness-28c59029d7a5
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi Christian,
Can you take this through the filesystem tree?
These patches make some changes to the kunit tests previously added for
iov_iter testing, in particular adding testing of UBUF/IOVEC iterators and
some benchmarking:
(1) Clean up a couple of checkpatch style complaints.
(2) Consolidate some repeated bits of code into helper functions and use
the same struct to represent straight offset/address ranges and
partial page lists.
(3) Add a function to set up a userspace VM, attach the VM to the kunit
testing thread, create an anonymous file, stuff some pages into the
file and map the file into the VM to act as a buffer that can be used
with UBUF/IOVEC iterators.
I map an anonymous file with pages attached rather than using MAP_ANON
so that I can check the pages obtained from iov_iter_extract_pages()
without worrying about them changing due to swap, migrate, etc..
[?] Is this the best way to do things? Mirroring execve, it requires
a number of extra core symbols to be exported. Should this be done in
the core code?
(4) Add tests for copying into and out of UBUF and IOVEC iterators.
(5) Add tests for extracting pages from UBUF and IOVEC iterators.
(6) Add tests to benchmark copying 256MiB to UBUF, IOVEC, KVEC, BVEC and
XARRAY iterators.
(7) Add a test to bencmark copying 256MiB from an xarray that gets decanted
into 256-page BVEC iterators to model batching from the pagecache.
(8) Add a test to benchmark copying 256MiB through dynamically allocated
256-page bvecs to simulate bio construction.
Example benchmarks output:
iov_kunit_benchmark_ubuf: avg 4474 uS, stddev 1340 uS
iov_kunit_benchmark_iovec: avg 6619 uS, stddev 23 uS
iov_kunit_benchmark_kvec: avg 2672 uS, stddev 14 uS
iov_kunit_benchmark_bvec: avg 3189 uS, stddev 19 uS
iov_kunit_benchmark_bvec_split: avg 3403 uS, stddev 8 uS
iov_kunit_benchmark_xarray: avg 3709 uS, stddev 7 uS
I've pushed the patches here also:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?…
David
Changes
=======
ver #3)
- #include <linux/personality.h> to get READ_IMPLIES_EXEC.
- Add a test to benchmark decanting an xarray into bio_vecs.
ver #2)
- Use MAP_ANON to make the user buffer if we don't want a list of pages.
- KUNIT_ASSERT_NOT_ERR_OR_NULL() doesn't like __user pointers as the
condition, so cast.
- Make the UBUF benchmark loop, doing an iterator per page so that the
overhead from the iterator code is not negligible.
- Make the KVEC benchmark use an iovec per page so that the iteration is
not not negligible.
- Switch the benchmarking to use copy_from_iter() so that only a single
page is needed in the userspace buffer (as it can be shared R/O), not
256MiB's worth.
Link: https://lore.kernel.org/r/20230914221526.3153402-1-dhowells@redhat.com/ # v1
Link: https://lore.kernel.org/r/20230920130400.203330-1-dhowells@redhat.com/ # v2
Link: https://lore.kernel.org/r/20230922113038.1135236-1-dhowells@redhat.com/ # v3
David Howells (10):
iov_iter: Fix some checkpatch complaints in kunit tests
iov_iter: Consolidate some of the repeated code into helpers
iov_iter: Consolidate the test vector struct in the kunit tests
iov_iter: Consolidate bvec pattern checking
iov_iter: Create a function to prepare userspace VM for UBUF/IOVEC
tests
iov_iter: Add copy kunit tests for ITER_UBUF and ITER_IOVEC
iov_iter: Add extract kunit tests for ITER_UBUF and ITER_IOVEC
iov_iter: Add benchmarking kunit tests
iov_iter: Add kunit to benchmark decanting of xarray to bvec
iov_iter: Add benchmarking kunit tests for UBUF/IOVEC
arch/s390/kernel/vdso.c | 1 +
fs/anon_inodes.c | 1 +
kernel/fork.c | 2 +
lib/kunit_iov_iter.c | 1317 +++++++++++++++++++++++++++++++++------
mm/mmap.c | 1 +
mm/util.c | 3 +
6 files changed, 1139 insertions(+), 186 deletions(-)
v3:
- Break out a separate patch to make workqueue_set_unbound_cpumask()
static and move it down to the CONFIG_SYSFS section.
- Remove the "__DEBUG__." prefix and the CFTYPE_DEBUG flag from the
new root only cpuset.cpus.isolated control files and update the
test accordingly.
v2:
- Add 2 read-only workqueue sysfs files to expose the user requested
cpumask as well as the isolated CPUs to be excluded from
wq_unbound_cpumask.
- Ensure that caller of the new workqueue_unbound_exclude_cpumask()
hold cpus_read_lock.
- Update the cpuset code to make sure the cpus_read_lock is held
whenever workqueue_unbound_exclude_cpumask() may be called.
Isolated cpuset partition can currently be created to contain an
exclusive set of CPUs not used in other cgroups and with load balancing
disabled to reduce interference from the scheduler.
The main purpose of this isolated partition type is to dynamically
emulate what can be done via the "isolcpus" boot command line option,
specifically the default domain flag. One effect of the "isolcpus" option
is to remove the isolated CPUs from the cpumasks of unbound workqueues
since running work functions in an isolated CPU can be a major source
of interference. Changing the unbound workqueue cpumasks can be done at
run time by writing an appropriate cpumask without the isolated CPUs to
/sys/devices/virtual/workqueue/cpumask. So one can set up an isolated
cpuset partition and then write to the cpumask sysfs file to achieve
similar level of CPU isolation. However, this manual process can be
error prone.
This patch series implements automatic exclusion of isolated CPUs from
unbound workqueue cpumasks when an isolated cpuset partition is created
and then adds those CPUs back when the isolated partition is destroyed.
There are also other places in the kernel that look at the HK_FLAG_DOMAIN
cpumask or other HK_FLAG_* cpumasks and exclude the isolated CPUs from
certain actions to further reduce interference. CPUs in an isolated
cpuset partition will not be able to avoid those interferences yet. That
may change in the future as the need arises.
Waiman Long (5):
workqueue: Make workqueue_set_unbound_cpumask() static
workqueue: Add workqueue_unbound_exclude_cpumask() to exclude CPUs
from wq_unbound_cpumask
selftests/cgroup: Minor code cleanup and reorganization of
test_cpuset_prs.sh
cgroup/cpuset: Keep track of CPUs in isolated partitions
cgroup/cpuset: Take isolated CPUs out of workqueue unbound cpumask
Documentation/admin-guide/cgroup-v2.rst | 10 +-
include/linux/workqueue.h | 2 +-
kernel/cgroup/cpuset.c | 286 +++++++++++++-----
kernel/workqueue.c | 139 +++++++--
.../selftests/cgroup/test_cpuset_prs.sh | 216 ++++++++-----
5 files changed, 462 insertions(+), 191 deletions(-)
--
2.39.3
KUnit's deferred action API accepts a void(*)(void *) function pointer
which is called when the test is exited. However, we very frequently
want to use existing functions which accept a single pointer, but which
may not be of type void*. While this is probably dodgy enough to be on
the wrong side of the C standard, it's been often used for similar
callbacks, and gcc's -Wcast-function-type seems to ignore cases where
the only difference is the type of the argument, assuming it's
compatible (i.e., they're both pointers to data).
However, clang 16 has introduced -Wcast-function-type-strict, which no
longer permits any deviation in function pointer type. This seems to be
because it'd break CFI, which validates the type of function calls.
This rather ruins our attempts to cast functions to defer them, and
leaves us with a few options. The one we've chosen is to implement a
macro which will generate a wrapper function which accepts a void*, and
casts the argument to the appropriate type.
For example, if you were trying to wrap:
void foo_close(struct foo *handle);
you could use:
KUNIT_DEFINE_ACTION_WRAPPER(kunit_action_foo_close,
foo_close,
struct foo *);
This would create a new kunit_action_foo_close() function, of type
kunit_action_t, which could be passed into kunit_add_action() and
similar functions.
In addition to defining this macro, update KUnit and its tests to use
it.
Link: https://github.com/ClangBuiltLinux/linux/issues/1750
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is a follow-up to the RFC here:
https://lore.kernel.org/linux-kselftest/20230915050125.3609689-1-davidgow@g…
There's no difference in the macro implementation, just an update to the
KUnit tests to use it. This version is intended to complement:
https://lore.kernel.org/all/20231106172557.2963-1-rf@opensource.cirrus.com/
There are also two follow-up patches in the series to use this macro in
various DRM tests.
Hopefully this will solve any CFI issues that show up with KUnit.
Thanks,
-- David
---
include/kunit/resource.h | 9 +++++++++
lib/kunit/kunit-test.c | 5 +----
lib/kunit/test.c | 6 ++++--
3 files changed, 14 insertions(+), 6 deletions(-)
diff --git a/include/kunit/resource.h b/include/kunit/resource.h
index c7383e90f5c9..4110e13970dc 100644
--- a/include/kunit/resource.h
+++ b/include/kunit/resource.h
@@ -390,6 +390,15 @@ void kunit_remove_resource(struct kunit *test, struct kunit_resource *res);
/* A 'deferred action' function to be used with kunit_add_action. */
typedef void (kunit_action_t)(void *);
+/* We can't cast function pointers to kunit_action_t if CFI is enabled. */
+#define KUNIT_DEFINE_ACTION_WRAPPER(wrapper, orig, arg_type) \
+ static void wrapper(void *in) \
+ { \
+ arg_type arg = (arg_type)in; \
+ orig(arg); \
+ }
+
+
/**
* kunit_add_action() - Call a function when the test ends.
* @test: Test case to associate the action with.
diff --git a/lib/kunit/kunit-test.c b/lib/kunit/kunit-test.c
index de2113a58fa0..ee6927c60979 100644
--- a/lib/kunit/kunit-test.c
+++ b/lib/kunit/kunit-test.c
@@ -538,10 +538,7 @@ static struct kunit_suite kunit_resource_test_suite = {
#if IS_BUILTIN(CONFIG_KUNIT_TEST)
/* This avoids a cast warning if kfree() is passed direct to kunit_add_action(). */
-static void kfree_wrapper(void *p)
-{
- kfree(p);
-}
+KUNIT_DEFINE_ACTION_WRAPPER(kfree_wrapper, kfree, const void *);
static void kunit_log_test(struct kunit *test)
{
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index f2eb71f1a66c..0308865194bb 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -772,6 +772,8 @@ static struct notifier_block kunit_mod_nb = {
};
#endif
+KUNIT_DEFINE_ACTION_WRAPPER(kfree_action_wrapper, kfree, const void *)
+
void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
{
void *data;
@@ -781,7 +783,7 @@ void *kunit_kmalloc_array(struct kunit *test, size_t n, size_t size, gfp_t gfp)
if (!data)
return NULL;
- if (kunit_add_action_or_reset(test, (kunit_action_t *)kfree, data) != 0)
+ if (kunit_add_action_or_reset(test, kfree_action_wrapper, data) != 0)
return NULL;
return data;
@@ -793,7 +795,7 @@ void kunit_kfree(struct kunit *test, const void *ptr)
if (!ptr)
return;
- kunit_release_action(test, (kunit_action_t *)kfree, (void *)ptr);
+ kunit_release_action(test, kfree_action_wrapper, (void *)ptr);
}
EXPORT_SYMBOL_GPL(kunit_kfree);
--
2.42.0.869.gea05f2083d-goog
Hi,
Good day! Following Guillaume's suggestion, I've been working on updating all
net self-tests to run in their respective netns. This modification allows us
to execute all tests in parallel, potentially saving a significant amount of
test time.
However, I've encountered a challenge while making these modifications. The
net selftest folder contains around 80 tests (excluding the forwarding test),
with some tests using common netns names and others using self-defined names.
I've considered two methods to address this issue:
One approach is to retain the original names but append a unique suffix using
$(mktemp -u XXXXXX). While this is a straightforward solution, it may not
prevent future tests from using common names.
Another option is to establish a general netns lib. Similar to the NUM_NETIFS
variable in the forwarding test, we could introduce a variable like NUM_NS.
This variable would define the number of netns instances, and all tests would
use the netns lib to set up and clean up netns accordingly. However, this
approach might complicate test debugging, especially for tests like
fib_nexthops.sh, which relies on clear and visually netns names
(e.g., me/peer/remote).
I'm reaching out to gather your insights on this matter. Do you have any
suggestions or preferences regarding the two proposed methods, or do you have
an alternative solution in mind?
Your expertise in this area would be greatly appreciated.
Best Regards
Hangbin
Here are a few fixes related to MPTCP:
- Patch 1 limits GSO max size to ~64K when MPTCP is being used due to a
spec limit. 'gso_max_size' can exceed the max value supported by MPTCP
since v5.19.
- Patch 2 fixes a possible NULL pointer dereference on close that can
happen since v6.7-rc1.
- Patch 3 avoids sending a RM_ADDR when the corresponding address is no
longer tracked locally. A regression for a fix backported to v5.19.
- Patch 4 adds a missing lock when changing the IP TOS with setsockopt().
A fix for v5.17.
- Patch 5 fixes an expectation when running MPTCP Join selftest with the
checksum option (-C). An issue present since v6.1.
Signed-off-by: Matthieu Baerts <matttbe(a)kernel.org>
---
Geliang Tang (1):
mptcp: add validity check for sending RM_ADDR
Paolo Abeni (4):
mptcp: deal with large GSO size
mptcp: fix possible NULL pointer dereference on close
mptcp: fix setsockopt(IP_TOS) subflow locking
selftests: mptcp: fix fastclose with csum failure
net/mptcp/pm_netlink.c | 5 +++--
net/mptcp/protocol.c | 11 ++++++++---
net/mptcp/sockopt.c | 3 +++
tools/testing/selftests/net/mptcp/mptcp_join.sh | 2 +-
4 files changed, 15 insertions(+), 6 deletions(-)
---
base-commit: 2bd5b559a1f391f05927bbb0b31381fa71c61e26
change-id: 20231113-upstream-net-20231113-mptcp-misc-fixes-6-7-rc2-d15df60b0a3f
Best regards,
--
Matthieu Baerts <matttbe(a)kernel.org>
This series enables support for the data processing extensions in the
newly released 2023 architecture, this is mainly support for 8 bit
floating point formats. Most of the extensions only introduce new
instructions and therefore only require hwcaps but there is a new EL0
visible control register FPMR used to control the 8 bit floating point
formats, we need to manage traps for this and context switch it.
The sharing of floating point save code between the host and guest
kernels slightly complicates the introduction of KVM support, we first
introduce host support with some placeholders for KVM then replace those
with the actual KVM support.
I've not added test coverage for ptrace, I've got a not quite finished
test program which exercises all the FP ptrace interfaces and their
interactions together, my plan is to cover it there rather than add
another tiny test program that duplicates the boilerplace for tracing a
target and doesn't actually run the traced program.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.7-rc1.
- Link to v1: https://lore.kernel.org/r/20231026-arm64-2023-dpisa-v1-0-8470dd989bb2@kerne…
---
Mark Brown (21):
arm64/sysreg: Add definition for ID_AA64PFR2_EL1
arm64/sysreg: Update ID_AA64ISAR2_EL1 defintion for DDI0601 2023-09
arm64/sysreg: Add definition for ID_AA64ISAR3_EL1
arm64/sysreg: Add definition for ID_AA64FPFR0_EL1
arm64/sysreg: Update ID_AA64SMFR0_EL1 definition for DDI0601 2023-09
arm64/sysreg: Update SCTLR_EL1 for DDI0601 2023-09
arm64/sysreg: Update HCRX_EL2 definition for DDI0601 2023-09
arm64/sysreg: Add definition for FPMR
arm64/cpufeature: Hook new identification registers up to cpufeature
arm64/fpsimd: Enable host kernel access to FPMR
arm64/fpsimd: Support FEAT_FPMR
arm64/signal: Add FPMR signal handling
arm64/ptrace: Expose FPMR via ptrace
KVM: arm64: Add newly allocated ID registers to register descriptions
KVM: arm64: Support FEAT_FPMR for guests
arm64/hwcap: Define hwcaps for 2023 DPISA features
kselftest/arm64: Handle FPMR context in generic signal frame parser
kselftest/arm64: Add basic FPMR test
kselftest/arm64: Add 2023 DPISA hwcap test coverage
KVM: arm64: selftests: Document feature registers added in 2023 extensions
KVM: arm64: selftests: Teach get-reg-list about FPMR
Documentation/arch/arm64/elf_hwcaps.rst | 49 +++++
arch/arm64/include/asm/cpu.h | 3 +
arch/arm64/include/asm/cpufeature.h | 5 +
arch/arm64/include/asm/fpsimd.h | 2 +
arch/arm64/include/asm/hwcap.h | 15 ++
arch/arm64/include/asm/kvm_arm.h | 4 +-
arch/arm64/include/asm/kvm_host.h | 3 +
arch/arm64/include/asm/processor.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 15 ++
arch/arm64/include/uapi/asm/sigcontext.h | 8 +
arch/arm64/kernel/cpufeature.c | 72 +++++++
arch/arm64/kernel/cpuinfo.c | 18 ++
arch/arm64/kernel/fpsimd.c | 13 ++
arch/arm64/kernel/ptrace.c | 42 ++++
arch/arm64/kernel/signal.c | 59 ++++++
arch/arm64/kvm/fpsimd.c | 19 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 7 +-
arch/arm64/kvm/sys_regs.c | 17 +-
arch/arm64/tools/cpucaps | 1 +
arch/arm64/tools/sysreg | 153 ++++++++++++++-
include/uapi/linux/elf.h | 1 +
tools/testing/selftests/arm64/abi/hwcap.c | 217 +++++++++++++++++++++
tools/testing/selftests/arm64/signal/.gitignore | 1 +
.../arm64/signal/testcases/fpmr_siginfo.c | 82 ++++++++
.../selftests/arm64/signal/testcases/testcases.c | 8 +
.../selftests/arm64/signal/testcases/testcases.h | 1 +
tools/testing/selftests/kvm/aarch64/get-reg-list.c | 11 +-
27 files changed, 810 insertions(+), 18 deletions(-)
---
base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86
change-id: 20231003-arm64-2023-dpisa-2f3d25746474
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is a follow up to:
commit b8e3a87a627b ("bpf: Add crosstask check to __bpf_get_stack").
This test ensures that the task iterator only gets a single
user stack (for the current task).
Signed-off-by: Jordan Rome <linux(a)jordanrome.com>
---
tools/testing/selftests/bpf/prog_tests/bpf_iter.c | 2 ++
tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c | 5 +++++
2 files changed, 7 insertions(+)
diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
index 4e02093c2cbe..618af9dfae9b 100644
--- a/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
+++ b/tools/testing/selftests/bpf/prog_tests/bpf_iter.c
@@ -332,6 +332,8 @@ static void test_task_stack(void)
do_dummy_read(skel->progs.dump_task_stack);
do_dummy_read(skel->progs.get_task_user_stacks);
+ ASSERT_EQ(skel->bss->num_user_stacks, 1, "num_user_stacks");
+
bpf_iter_task_stack__destroy(skel);
}
diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
index f2b8167b72a8..442f4ca39fd7 100644
--- a/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
+++ b/tools/testing/selftests/bpf/progs/bpf_iter_task_stack.c
@@ -35,6 +35,8 @@ int dump_task_stack(struct bpf_iter__task *ctx)
return 0;
}
+int num_user_stacks = 0;
+
SEC("iter/task")
int get_task_user_stacks(struct bpf_iter__task *ctx)
{
@@ -51,6 +53,9 @@ int get_task_user_stacks(struct bpf_iter__task *ctx)
if (res <= 0)
return 0;
+ /* Only one task, the current one, should succeed */
+ ++num_user_stacks;
+
buf_sz += res;
/* If the verifier doesn't refine bpf_get_task_stack res, and instead
--
2.39.3
Hi all:
The core frequency is subjected to the process variation in semiconductors.
Not all cores are able to reach the maximum frequency respecting the
infrastructure limits. Consequently, AMD has redefined the concept of
maximum frequency of a part. This means that a fraction of cores can reach
maximum frequency. To find the best process scheduling policy for a given
scenario, OS needs to know the core ordering informed by the platform through
highest performance capability register of the CPPC interface.
Earlier implementations of amd-pstate preferred core only support a static
core ranking and targeted performance. Now it has the ability to dynamically
change the preferred core based on the workload and platform conditions and
accounting for thermals and aging.
Amd-pstate driver utilizes the functions and data structures provided by
the ITMT architecture to enable the scheduler to favor scheduling on cores
which can be get a higher frequency with lower voltage.
We call it amd-pstate preferred core.
Here sched_set_itmt_core_prio() is called to set priorities and
sched_set_itmt_support() is called to enable ITMT feature.
Amd-pstate driver uses the highest performance value to indicate
the priority of CPU. The higher value has a higher priority.
Amd-pstate driver will provide an initial core ordering at boot time.
It relies on the CPPC interface to communicate the core ranking to the
operating system and scheduler to make sure that OS is choosing the cores
with highest performance firstly for scheduling the process. When amd-pstate
driver receives a message with the highest performance change, it will
update the core ranking.
Changes from V9->V10:
- cpufreq: amd-pstate:
- - add judgement for highest_perf. When it is less than 255, the
preferred core feature is enabled. And it will set the priority.
- - deleset "static u32 max_highest_perf" etc, because amd p-state
perferred coe does not require specail process for hotpulg.
Changes form V8->V9:
- all:
- - pick up Tested-By flag added by Oleksandr.
- cpufreq: amd-pstate:
- - pick up Review-By flag added by Wyes.
- - ignore modification of bug.
- - add a attribute of prefcore_ranking.
- - modify data type conversion from u32 to int.
- Documentation: amd-pstate:
- - pick up Review-By flag added by Wyes.
Changes form V7->V8:
- all:
- - pick up Review-By flag added by Mario and Ray.
- cpufreq: amd-pstate:
- - use hw_prefcore embeds into cpudata structure.
- - delete preferred core init from cpu online/off.
Changes form V6->V7:
- x86:
- - Modify kconfig about X86_AMD_PSTATE.
- cpufreq: amd-pstate:
- - modify incorrect comments about scheduler_work().
- - convert highest_perf data type.
- - modify preferred core init when cpu init and online.
- acpi: cppc:
- - modify link of CPPC highest performance.
- cpufreq:
- - modify link of CPPC highest performance changed.
Changes form V5->V6:
- cpufreq: amd-pstate:
- - modify the wrong tag order.
- - modify warning about hw_prefcore sysfs attribute.
- - delete duplicate comments.
- - modify the variable name cppc_highest_perf to prefcore_ranking.
- - modify judgment conditions for setting highest_perf.
- - modify sysfs attribute for CPPC highest perf to pr_debug message.
- Documentation: amd-pstate:
- - modify warning: title underline too short.
Changes form V4->V5:
- cpufreq: amd-pstate:
- - modify sysfs attribute for CPPC highest perf.
- - modify warning about comments
- - rebase linux-next
- cpufreq:
- - Moidfy warning about function declarations.
- Documentation: amd-pstate:
- - align with ``amd-pstat``
Changes form V3->V4:
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V2->V3:
- x86:
- - Modify kconfig and description.
- cpufreq: amd-pstate:
- - Add Co-developed-by tag in commit message.
- cpufreq:
- - Modify commit message.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
Changes form V1->V2:
- acpi: cppc:
- - Add reference link.
- cpufreq:
- - Moidfy link error.
- cpufreq: amd-pstate:
- - Init the priorities of all online CPUs
- - Use a single variable to represent the status of preferred core.
- Documentation:
- - Default enabled preferred core.
- Documentation: amd-pstate:
- - Modify inappropriate descriptions.
- - Default enabled preferred core.
- - Use a single variable to represent the status of preferred core.
Meng Li (7):
x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion.
acpi: cppc: Add get the highest performance cppc control
cpufreq: amd-pstate: Enable amd-pstate preferred core supporting.
cpufreq: Add a notification message that the highest perf has changed
cpufreq: amd-pstate: Update amd-pstate preferred core ranking
dynamically
Documentation: amd-pstate: introduce amd-pstate preferred core
Documentation: introduce amd-pstate preferrd core mode kernel command
line options
.../admin-guide/kernel-parameters.txt | 5 +
Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++-
arch/x86/Kconfig | 5 +-
drivers/acpi/cppc_acpi.c | 13 ++
drivers/acpi/processor_driver.c | 6 +
drivers/cpufreq/amd-pstate.c | 187 ++++++++++++++++--
drivers/cpufreq/cpufreq.c | 13 ++
include/acpi/cppc_acpi.h | 5 +
include/linux/amd-pstate.h | 10 +
include/linux/cpufreq.h | 5 +
10 files changed, 288 insertions(+), 20 deletions(-)
--
2.34.1