December 2023 - Linux-kselftest-mirror

[PATCH v2 0/6] IOMMUFD: Deliver IO page faults to user space

by Lu Baolu

Hi folks, This series implements the functionality of delivering IO page faults to user space through the IOMMUFD framework for nested translation. Nested translation is a hardware feature that supports two-stage translation tables for IOMMU. The second-stage translation table is managed by the host VMM, while the first-stage translation table is owned by user space. This allows user space to control the IOMMU mappings for its devices. When an IO page fault occurs on the first-stage translation table, the IOMMU hardware can deliver the page fault to user space through the IOMMUFD framework. User space can then handle the page fault and respond to the device top-down through the IOMMUFD. This allows user space to implement its own IO page fault handling policies. User space indicates its capability of handling IO page faults by setting the IOMMU_HWPT_ALLOC_IOPF_CAPABLE flag when allocating a hardware page table (HWPT). IOMMUFD will then set up its infrastructure for page fault delivery. On a successful return of HWPT allocation, the user can retrieve and respond to page faults by reading and writing to the file descriptor (FD) returned in out_fault_fd. The iommu selftest framework has been updated to test the IO page fault delivery and response functionality. This series is based on the latest implementation of nested translation under discussion [1] and the page fault handling framework refactoring in the IOMMU core [2]. The series and related patches are available on GitHub: [3] [1] https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.… [2] https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i… [3] https://github.com/LuBaolu/intel-iommu/commits/iommufd-io-pgfault-delivery-… Best regards, baolu Change log: v2: - Move all iommu refactoring patches into a sparated series and discuss it in a different thread. The latest patch series [v6] is available at https://lore.kernel.org/linux-iommu/20230928042734.16134-1-baolu.lu@linux.i… - We discussed the timeout of the pending page fault messages. We agreed that we shouldn't apply any timeout policy for the page fault handling in user space. https://lore.kernel.org/linux-iommu/20230616113232.GA84678@myrica/ - Jason suggested that we adopt a simple file descriptor interface for reading and responding to I/O page requests, so that user space applications can improve performance using io_uring. https://lore.kernel.org/linux-iommu/ZJWjD1ajeem6pK3I@ziepe.ca/ v1: https://lore.kernel.org/linux-iommu/20230530053724.232765-1-baolu.lu@linux.… Lu Baolu (6): iommu: Add iommu page fault cookie helpers iommufd: Add iommu page fault uapi data iommufd: Initializing and releasing IO page fault data iommufd: Deliver fault messages to user space iommufd/selftest: Add IOMMU_TEST_OP_TRIGGER_IOPF test support iommufd/selftest: Add coverage for IOMMU_TEST_OP_TRIGGER_IOPF include/linux/iommu.h | 9 + drivers/iommu/iommu-priv.h | 15 + drivers/iommu/iommufd/iommufd_private.h | 12 + drivers/iommu/iommufd/iommufd_test.h | 8 + include/uapi/linux/iommufd.h | 65 +++++ tools/testing/selftests/iommu/iommufd_utils.h | 66 ++++- drivers/iommu/io-pgfault.c | 50 ++++ drivers/iommu/iommufd/device.c | 69 ++++- drivers/iommu/iommufd/hw_pagetable.c | 260 +++++++++++++++++- drivers/iommu/iommufd/selftest.c | 56 ++++ tools/testing/selftests/iommu/iommufd.c | 24 +- .../selftests/iommu/iommufd_fail_nth.c | 2 +- 12 files changed, 620 insertions(+), 16 deletions(-) -- 2.34.1

4 months, 2 weeks

6
42
0 0

[PATCH 0/3] vfio-pci support pasid attach/detach

by Yi Liu

This adds the pasid attach/detach uAPIs for userspace to attach/detach a PASID of a device to/from a given ioas/hwpt. Only vfio-pci driver is enabled in this series. After this series, PASID-capable devices bound with vfio-pci can report PASID capability to userspace and VM to enable PASID usages like Shared Virtual Addressing (SVA). This series first adds the helpers for pasid attach in vfio core and then add the device cdev ioctls for pasid attach/detach, finally exposes the device PASID capability to user. It depends on iommufd pasid attach/detach series [1]. Complete code can be found at [2], tested with a draft Qemu branch[3] [1] https://lore.kernel.org/linux-iommu/20231127063428.127436-1-yi.l.liu@intel.… [2] https://github.com/yiliu1765/iommufd/tree/iommufd_pasid [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1%… Change log: v1: - Report PASID capability via VFIO_DEVICE_FEATURE (Alex) rfc: https://lore.kernel.org/linux-iommu/20230926093121.18676-1-yi.l.liu@intel.c… Regards, Yi Liu Kevin Tian (1): vfio-iommufd: Support pasid [at|de]tach for physical VFIO devices Yi Liu (2): vfio: Add VFIO_DEVICE_PASID_[AT|DE]TACH_IOMMUFD_PT vfio: Report PASID capability via VFIO_DEVICE_FEATURE ioctl drivers/vfio/device_cdev.c | 45 +++++++++++++++++++++ drivers/vfio/iommufd.c | 48 ++++++++++++++++++++++ drivers/vfio/pci/vfio_pci.c | 2 + drivers/vfio/pci/vfio_pci_core.c | 47 ++++++++++++++++++++++ drivers/vfio/vfio.h | 4 ++ drivers/vfio/vfio_main.c | 8 ++++ include/linux/vfio.h | 11 ++++++ include/uapi/linux/vfio.h | 68 ++++++++++++++++++++++++++++++++ 8 files changed, 233 insertions(+) -- 2.34.1

4 months, 2 weeks

5
30
0 0

[PATCH] selftests/livepatch: fix and refactor new dmesg message code

by Joe Lawrence

The livepatching kselftests rely on comparing expected vs. observed dmesg output. After each test, new dmesg entries are determined by the 'comm' utility comparing a saved, pre-test copy of dmesg to post-test dmesg output. Alexander reports that the 'comm --nocheck-order -13' invocation used by the tests can be confused when dmesg entry timestamps vary in magnitude (ie, "[ 98.820331]" vs. "[ 100.031067]"), in which case, additional messages are reported as new. The unexpected entries then spoil the test results. Instead of relying on 'comm' or 'diff' to determine new testing dmesg entries, refactor the code: - pre-test : log a unique canary dmesg entry - test : run tests, log messages - post-test : filter dmesg starting from pre-test message Reported-by: Alexander Gordeev <agordeev(a)linux.ibm.com> Closes: https://lore.kernel.org/live-patching/ZYAimyPYhxVA9wKg@li-008a6a4c-3549-11b… Signed-off-by: Joe Lawrence <joe.lawrence(a)redhat.com> --- .../testing/selftests/livepatch/functions.sh | 37 +++++++++---------- 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/tools/testing/selftests/livepatch/functions.sh b/tools/testing/selftests/livepatch/functions.sh index c8416c54b463..b1fd7362c2fe 100644 --- a/tools/testing/selftests/livepatch/functions.sh +++ b/tools/testing/selftests/livepatch/functions.sh @@ -42,17 +42,6 @@ function die() { exit 1 } -# save existing dmesg so we can detect new content -function save_dmesg() { - SAVED_DMESG=$(mktemp --tmpdir -t klp-dmesg-XXXXXX) - dmesg > "$SAVED_DMESG" -} - -# cleanup temporary dmesg file from save_dmesg() -function cleanup_dmesg_file() { - rm -f "$SAVED_DMESG" -} - function push_config() { DYNAMIC_DEBUG=$(grep '^kernel/livepatch' /sys/kernel/debug/dynamic_debug/control | \ awk -F'[: ]' '{print "file " $1 " line " $2 " " $4}') @@ -99,7 +88,6 @@ function set_ftrace_enabled() { function cleanup() { pop_config - cleanup_dmesg_file } # setup_config - save the current config and set a script exit trap that @@ -280,7 +268,15 @@ function set_pre_patch_ret { function start_test { local test="$1" - save_dmesg + # Dump something unique into the dmesg log, then stash the entry + # in LAST_DMESG. The check_result() function will use it to + # find new kernel messages since the test started. + local last_dmesg_msg="livepatch kselftest timestamp: $(date --rfc-3339=ns)" + log "$last_dmesg_msg" + loop_until 'dmesg | grep -q "$last_dmesg_msg"' || + die "buffer busy? can't find canary dmesg message: $last_dmesg_msg" + LAST_DMESG=$(dmesg | grep "$last_dmesg_msg") + echo -n "TEST: $test ... " log "===== TEST: $test =====" } @@ -291,23 +287,24 @@ function check_result { local expect="$*" local result - # Note: when comparing dmesg output, the kernel log timestamps - # help differentiate repeated testing runs. Remove them with a - # post-comparison sed filter. - - result=$(dmesg | comm --nocheck-order -13 "$SAVED_DMESG" - | \ + # Test results include any new dmesg entry since LAST_DMESG, then: + # - include lines matching keywords + # - exclude lines matching keywords + # - filter out dmesg timestamp prefixes + result=$(dmesg | awk -v last_dmesg="$LAST_DMESG" 'p; $0 == last_dmesg { p=1 }' | \ grep -e 'livepatch:' -e 'test_klp' | \ grep -v '$tainting\|taints$ kernel' | \ sed 's/^\[[ 0-9.]*\] //') if [[ "$expect" == "$result" ]] ; then echo "ok" + elif [[ "$result" == "" ]] ; then + echo -e "not ok\n\nbuffer overrun? can't find canary dmesg entry: $LAST_DMESG\n" + die "livepatch kselftest(s) failed" else echo -e "not ok\n\n$(diff -upr --label expected --label result <(echo "$expect") <(echo "$result"))\n" die "livepatch kselftest(s) failed" fi - - cleanup_dmesg_file } # check_sysfs_rights(modname, rel_path, expected_rights) - check sysfs -- 2.41.0

4 months, 3 weeks

4
3
0 0

[PATCH v2] selftests/ftrace: Add test to exercize function tracer across cpu hotplug

by Naveen N Rao

Add a test to exercize cpu hotplug with the function tracer active to ensure that sensitive functions in idle path are excluded from being traced. This helps catch issues such as the one fixed by commit 4b3338aaa74d ("powerpc/ftrace: Fix stack teardown in ftrace_no_trace"). Signed-off-by: Naveen N Rao <naveen(a)kernel.org> --- v2: Add a check for next available online cpu, as suggested by Masami. .../ftrace/test.d/ftrace/func_hotplug.tc | 42 +++++++++++++++++++ 1 file changed, 42 insertions(+) create mode 100644 tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc new file mode 100644 index 000000000000..ccfbfde3d942 --- /dev/null +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_hotplug.tc @@ -0,0 +1,42 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0-or-later +# description: ftrace - function trace across cpu hotplug +# requires: function:tracer + +if ! which nproc ; then + nproc() { + ls -d /sys/devices/system/cpu/cpu[0-9]* | wc -l + } +fi + +NP=`nproc` + +if [ $NP -eq 1 ] ;then + echo "We cannot test cpu hotplug in UP environment" + exit_unresolved +fi + +# Find online cpu +for i in /sys/devices/system/cpu/cpu[1-9]*; do + if [ -f $i/online ] && [ "$(cat $i/online)" = "1" ]; then + cpu=$i + break + fi +done + +if [ -z "$cpu" ]; then + echo "We cannot test cpu hotplug with a single cpu online" + exit_unresolved +fi + +echo 0 > tracing_on +echo > trace + +: "Set $(basename $cpu) offline/online with function tracer enabled" +echo function > current_tracer +echo 1 > tracing_on +(echo 0 > $cpu/online) +(echo "forked"; sleep 1) +(echo 1 > $cpu/online) +echo 0 > tracing_on +echo nop > current_tracer base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 -- 2.43.0

4 months, 3 weeks

3
3
0 0

[PATCH 0/5] tools: selftests: riscv: Fix compiler warnings

by Christoph Muellner

From: Christoph Müllner <christoph.muellner(a)vrull.eu> When building the RISC-V selftests with a riscv32 compiler I ran into a couple of compiler warnings. While riscv32 support for these tests is questionable, the fixes are so trivial that it is probably best to simply apply them. Note that the missing-include patch and some format string warnings are also relevant for riscv64. Christoph Müllner (5): tools: selftests: riscv: Fix compile warnings in hwprobe tools: selftests: riscv: Fix compile warnings in cbo tools: selftests: riscv: Add missing include for vector test tools: selftests: riscv: Fix compile warnings in vector tests tools: selftests: riscv: Fix compile warnings in mm tests tools/testing/selftests/riscv/hwprobe/cbo.c | 6 +++--- tools/testing/selftests/riscv/hwprobe/hwprobe.c | 4 ++-- tools/testing/selftests/riscv/mm/mmap_test.h | 3 +++ tools/testing/selftests/riscv/vector/v_initval_nolibc.c | 2 +- tools/testing/selftests/riscv/vector/vstate_exec_nolibc.c | 3 +++ tools/testing/selftests/riscv/vector/vstate_prctl.c | 4 ++-- 6 files changed, 14 insertions(+), 8 deletions(-) -- 2.41.0

4 months, 3 weeks

6
15
0 0

Re: [PATCH v14 10/12] selftests/landlock: Add network tests

by Muhammad Usama Anjum

Hi Konstantin, There are some errors being reported in KernelCI: https://linux.kernelci.org/test/plan/id/657ab2240c761c0bd1e134ee/ The following sub-tests are failing: landlock_net_test_protocol_no_sandbox_with_ipv6_tcp_bind_unspec landlock_net_test_protocol_no_sandbox_with_ipv6_udp_bind_unspec landlock_net_test_protocol_tcp_sandbox_with_ipv6_udp_bind_unspec From my initial investigation, I can see that these failures are coming from just finding the wrong return error code (-97 instead of -22). It may be test's issue or the kernel's, not sure yet. Thanks, Usama On 10/26/23 6:47 AM, Konstantin Meskhidze wrote: > Add 82 test suites to check edge cases related to bind() and connect() > actions. They are defined with 6 fixtures and their variants: > > The "protocol" fixture is extended with 12 variants defined as a matrix > of: sandboxed/not-sandboxed, IPv4/IPv6/unix network domain, and > stream/datagram socket. 4 related tests suites are defined: > * bind: Tests with non-landlocked/landlocked ipv4, ipv6 and unix sockets. > * connect: Tests with non-landlocked/landlocked ipv4, ipv6 and unix > sockets. > * bind_unspec: Tests with non-landlocked/landlocked restrictions > for bind action with AF_UNSPEC socket family. > * connect_unspec: Tests with non-landlocked/landlocked restrictions > for connect action with AF_UNSPEC socket family. > > The "ipv4" fixture is extended with 4 variants defined as a matrix > of: sandboxed/not-sandboxed, IPv4/unix network domain, and > stream/datagram socket. 1 related test suite is defined: > * from_unix_to_inet: Tests to make sure unix sockets' actions are not > restricted by Landlock rules applied to TCP ones. > > The "tcp_layers" fixture is extended with 8 variants defined as a matrix > of: IPv4/IPv6 network domain, and different number of landlock rule layers. > 2 related tests suites are defined: > * ruleset_overlap. > * ruleset_expand. > > In the "mini" fixture 4 tests suites are defined: > * network_access_rights: Tests with legitimate access values. > * unknown_access_rights: Tests with invalid attributes, out of access > range. > * inval: > - unhandled allowed access. > - zero access value. > * tcp_port_overflow: Tests with wrong port values more than U16_MAX. > > In the "ipv4_tcp" fixture supports IPv4 network domain, stream socket. > 2 tests suites are defined: > * port_endianness: Tests with big/little endian port formats. > * with_fs: Tests with network bind() socket action within > filesystem directory access test. > > The "port_specific" fixture is extended with 4 variants defined > as a matrix of: sandboxed/not-sandboxed, IPv4/IPv6 network domain, > and stream socket. 2 related tests suites are defined: > * bind_connect_zero: Tests with port 0 value. > * bind_connect_1023: Tests with port 1023 value. > > Test coverage for security/landlock is 94.5% of 932 lines according to > gcc/gcov-9. > > Signed-off-by: Konstantin Meskhidze <konstantin.meskhidze(a)huawei.com> > Co-developed-by: Mickaël Salaün <mic(a)digikod.net> > Signed-off-by: Mickaël Salaün <mic(a)digikod.net> > --- > > Changes since v13: > * Refactors "port_specific" test fixture: > - Deletes useless if .. else. > - Deletes repeating bind to port 0. > - Deletes useless lines. > - Adds 2 file descriptors per socket. > - Updates get_binded helper. > - Split test suite to bind_connect_zero > and bind_connect_1023. > * Adds CAP_NET_BIND_SERVICE to set_cap(); it helps > in bind_connect_1023 test. > * Moves with_net test from fs_test.c. > * Renames with_net test to with_fs. > * Refactors with_fs test by adding different > rule types per one ruleset layer. > * Minor fixes. > * Refactors commit message. > > Changes since v12: > * Renames port_zero to port_specific fixture. > * Refactors port_specific test: > - Adds set_port() and get_binded_port() helpers. > - Adds checks for port 0, allowed by Landlock in this version. > - Adds checks for port 1023. > * Refactors commit message. > > Changes since v11: > * Adds ipv4.from_unix_to_tcp test suite to check that socket family is > the same between a socket and a sockaddr by trying to connect/bind on > a unix socket (stream or dgram) using an inet family. Landlock should > not change the error code. This found a bug (which needs to be fixed) > with the TCP restriction. > * Revamps the inet.{bind,connect} tests into protocol.{bind,connect}: > - Merge bind_connect_unix_dgram_socket, bind_connect_unix_dgram_socket > and bind_connect_inval_addrlen into it: add a full test matrix of > IPv4/TCP, IPv6/TCP, IPv4/UDP, IPv6/UDP, unix/stream, unix/dgram, all > of them with or without sandboxing. This improve coverage and it > enables to check that a TCP restriction work as expected but doesn't > restrict other stream or datagram protocols. This also enables to > check consistency of the network stack with or without Landlock. > We now have 76 test suites for the network. > - Add full send/recv checks. > - Make a generic framework that will be ready for future > protocol supports. > * Replaces most ASSERT with EXPECT according to the criticity of an > action: if we can get more meaningful information with following > checks. For instance, failure to create a kernel object (e.g. > socket(), accept() or fork() call) is critical if it is used by > following checks. For Landlock ruleset building, the following checks > don't make sense if the sandbox is not complete. However, it doesn't > make sense to continue a FIXTURE_SETUP() if any check failed. > * Adds a new unspec fixture to replace inet.bind_afunspec with > unspec.bind and inet.connect_afunspec with unspec.connect, factoring > and simplifying code. > * Replaces inet.bind_afunspec with protocol.bind_unspec, and > inet.connect_afunspec with protocol.connect_unspec. Extend these > tests with the matrix of all "protocol" variants. Don't test connect > with the same socket which is already binded/listening (I guess this > was an copy-paste error). The protocol.bind_unspec tests found a bug > (which needs to be fixed). > * Add* and use set_service() and setup_loopback() helpers to configure > network services. Add and use and test_bind_and_connect() to factor > out a lot of checks. > * Adds new types (protocol_variant, service_fixture) and update related > helpers to get more generic test code. > * Replaces static (port) arrays with service_fixture variables. > * Adds new helpers: {bind,connect}_variant_addrlen() and get_addrlen() to > cover all protocols with previous bind_connect_inval_addrlen tests. > Make them return -errno in case of error. > * Switchs from a unix socket path address to an abstract one. This > enables to avoid file cleanup in test teardowns. > * Closes all rulesets after enforcement. > * Removes the duplicate "empty access" test. > * Replaces inet.ruleset_overlay with tcp_layers.ruleset_overlap and > simplify test: > - Always run sandbox tests because test were always run sandboxed and > it doesn't give more guarantees to do it not sandboxed. > - Rewrite test with variant->num_layers to make it simpler and > configurable. > - Add another test layer to tcp_layers used for ruleset_overlap and > test without sandbox. > - Leverage test_bind_and_connect() and avoid using SO_REUSEADDR > because the socket was not listened to, and don't use the same > socket/FD for server and client. > - Replace inet.ruleset_expanding with tcp_layers.ruleset_expand. > * Drops capabilities in all FIXTURE_SETUP(). > * Changes test ports to cover more ranges. > * Adds "mini" tests: > - Replace the invalid ruleset attribute test from port.inval with > mini.unknow_access_rights. > - Simplify port.inval and move some code to other mini.* tests. > - Add new mini.network_access_rights test. > * Rewrites inet.inval_port_format into mini.tcp_port_overflow: > - Remove useless is_sandbox checks. > - Extend tests with bind/connect checks. > - Interleave valid requests with invalid ones. > * Adds two_srv.port_endianness test, extracted and extended from > inet.inval_port_format . > * Adds Microsoft copyright. > * Rename some variables to make them easier to read. > * Constifies variables. > * Adds minimal logs to help debug test failures. > * Renames inet test to ipv4 and deletes is_sandboxed and prot vars from > FIXTURE_VARIANT. > * Adds port_zero tests. > * Renames all "net_service" to "net_port". > > Changes since v10: > * Replaces FIXTURE_VARIANT() with struct _fixture_variant_ . > * Changes tests names socket -> inet, standalone -> port. > * Gets rid of some DEFINEs. > * Changes names and groups tests' variables. > * Changes create_socket_variant() helper name to socket_variant(). > * Refactors FIXTURE_SETUP(port) logic. > * Changes TEST_F_FORK -> TEST_F since there no teardown. > * Refactors some tests' logic. > * Minor fixes. > * Refactors commit message. > > Changes since v9: > * Fixes mixing code declaration and code. > * Refactors FIXTURE_TEARDOWN() with clang-format. > * Replaces struct _fixture_variant_socket with > FIXTURE_VARIANT(socket). > * Deletes useless condition if (variant->is_sandboxed) > in multiple locations. > * Deletes zero_size argument in bind_variant() and > connect_variant(). > * Adds tests for port values exceeding U16_MAX. > > Changes since v8: > * Adds is_sandboxed const for FIXTURE_VARIANT(socket). > * Refactors AF_UNSPEC tests. > * Adds address length checking tests. > * Convert ports in all tests to __be16. > * Adds invalid port values tests. > * Minor fixes. > > Changes since v7: > * Squashes all selftest commits. > * Adds fs test with network bind() socket action. > * Minor fixes. > > --- > tools/testing/selftests/landlock/common.h | 3 + > tools/testing/selftests/landlock/config | 4 + > tools/testing/selftests/landlock/net_test.c | 1744 +++++++++++++++++++ > 3 files changed, 1751 insertions(+) > create mode 100644 tools/testing/selftests/landlock/net_test.c > > diff --git a/tools/testing/selftests/landlock/common.h b/tools/testing/selftests/landlock/common.h > index 0fd6c4cf5e6f..5b79758cae62 100644 > --- a/tools/testing/selftests/landlock/common.h > +++ b/tools/testing/selftests/landlock/common.h > @@ -112,10 +112,13 @@ static void _init_caps(struct __test_metadata *const _metadata, bool drop_all) > cap_t cap_p; > /* Only these three capabilities are useful for the tests. */ > const cap_value_t caps[] = { > + /* clang-format off */ > CAP_DAC_OVERRIDE, > CAP_MKNOD, > CAP_SYS_ADMIN, > CAP_SYS_CHROOT, > + CAP_NET_BIND_SERVICE, > + /* clang-format on */ > }; > > cap_p = cap_get_proc(); > diff --git a/tools/testing/selftests/landlock/config b/tools/testing/selftests/landlock/config > index 3dc9e438eab1..0086efaa7b68 100644 > --- a/tools/testing/selftests/landlock/config > +++ b/tools/testing/selftests/landlock/config > @@ -1,5 +1,9 @@ > CONFIG_CGROUPS=y > CONFIG_CGROUP_SCHED=y > +CONFIG_INET=y > +CONFIG_IPV6=y > +CONFIG_NET=y > +CONFIG_NET_NS=y > CONFIG_OVERLAY_FS=y > CONFIG_PROC_FS=y > CONFIG_SECURITY=y > diff --git a/tools/testing/selftests/landlock/net_test.c b/tools/testing/selftests/landlock/net_test.c > new file mode 100644 > index 000000000000..3c0a10f9811a > --- /dev/null > +++ b/tools/testing/selftests/landlock/net_test.c > @@ -0,0 +1,1744 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * Landlock tests - Network > + * > + * Copyright © 2022-2023 Huawei Tech. Co., Ltd. > + * Copyright © 2023 Microsoft Corporation > + */ > + > +#define _GNU_SOURCE > +#include <arpa/inet.h> > +#include <errno.h> > +#include <fcntl.h> > +#include <linux/landlock.h> > +#include <linux/in.h> > +#include <sched.h> > +#include <stdint.h> > +#include <string.h> > +#include <sys/prctl.h> > +#include <sys/socket.h> > +#include <sys/un.h> > + > +#include "common.h" > + > +const short sock_port_start = (1 << 10); > + > +static const char loopback_ipv4[] = "127.0.0.1"; > +static const char loopback_ipv6[] = "::1"; > + > +/* Number pending connections queue to be hold. */ > +const short backlog = 10; > + > +enum sandbox_type { > + NO_SANDBOX, > + /* This may be used to test rules that allow *and* deny accesses. */ > + TCP_SANDBOX, > +}; > + > +struct protocol_variant { > + int domain; > + int type; > +}; > + > +struct service_fixture { > + struct protocol_variant protocol; > + /* port is also stored in ipv4_addr.sin_port or ipv6_addr.sin6_port */ > + unsigned short port; > + union { > + struct sockaddr_in ipv4_addr; > + struct sockaddr_in6 ipv6_addr; > + struct { > + struct sockaddr_un unix_addr; > + socklen_t unix_addr_len; > + }; > + }; > +}; > + > +static int set_service(struct service_fixture *const srv, > + const struct protocol_variant prot, > + const unsigned short index) > +{ > + memset(srv, 0, sizeof(*srv)); > + > + /* > + * Copies all protocol properties in case of the variant only contains > + * a subset of them. > + */ > + srv->protocol = prot; > + > + /* Checks for port overflow. */ > + if (index > 2) > + return 1; > + srv->port = sock_port_start << (2 * index); > + > + switch (prot.domain) { > + case AF_UNSPEC: > + case AF_INET: > + srv->ipv4_addr.sin_family = prot.domain; > + srv->ipv4_addr.sin_port = htons(srv->port); > + srv->ipv4_addr.sin_addr.s_addr = inet_addr(loopback_ipv4); > + return 0; > + > + case AF_INET6: > + srv->ipv6_addr.sin6_family = prot.domain; > + srv->ipv6_addr.sin6_port = htons(srv->port); > + inet_pton(AF_INET6, loopback_ipv6, &srv->ipv6_addr.sin6_addr); > + return 0; > + > + case AF_UNIX: > + srv->unix_addr.sun_family = prot.domain; > + sprintf(srv->unix_addr.sun_path, > + "_selftests-landlock-net-tid%d-index%d", gettid(), > + index); > + srv->unix_addr_len = SUN_LEN(&srv->unix_addr); > + srv->unix_addr.sun_path[0] = '\0'; > + return 0; > + } > + return 1; > +} > + > +static void setup_loopback(struct __test_metadata *const _metadata) > +{ > + set_cap(_metadata, CAP_SYS_ADMIN); > + ASSERT_EQ(0, unshare(CLONE_NEWNET)); > + ASSERT_EQ(0, system("ip link set dev lo up")); > + clear_cap(_metadata, CAP_SYS_ADMIN); > +} > + > +static bool is_restricted(const struct protocol_variant *const prot, > + const enum sandbox_type sandbox) > +{ > + switch (prot->domain) { > + case AF_INET: > + case AF_INET6: > + switch (prot->type) { > + case SOCK_STREAM: > + return sandbox == TCP_SANDBOX; > + } > + break; > + } > + return false; > +} > + > +static int socket_variant(const struct service_fixture *const srv) > +{ > + int ret; > + > + ret = socket(srv->protocol.domain, srv->protocol.type | SOCK_CLOEXEC, > + 0); > + if (ret < 0) > + return -errno; > + return ret; > +} > + > +#ifndef SIN6_LEN_RFC2133 > +#define SIN6_LEN_RFC2133 24 > +#endif > + > +static socklen_t get_addrlen(const struct service_fixture *const srv, > + const bool minimal) > +{ > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + return sizeof(srv->ipv4_addr); > + > + case AF_INET6: > + if (minimal) > + return SIN6_LEN_RFC2133; > + return sizeof(srv->ipv6_addr); > + > + case AF_UNIX: > + if (minimal) > + return sizeof(srv->unix_addr) - > + sizeof(srv->unix_addr.sun_path); > + return srv->unix_addr_len; > + > + default: > + return 0; > + } > +} > + > +static void set_port(struct service_fixture *const srv, uint16_t port) > +{ > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + srv->ipv4_addr.sin_port = htons(port); > + return; > + > + case AF_INET6: > + srv->ipv6_addr.sin6_port = htons(port); > + return; > + > + default: > + return; > + } > +} > + > +static uint16_t get_binded_port(int socket_fd, > + const struct protocol_variant *const prot) > +{ > + struct sockaddr_in ipv4_addr; > + struct sockaddr_in6 ipv6_addr; > + socklen_t ipv4_addr_len, ipv6_addr_len; > + > + /* Gets binded port. */ > + switch (prot->domain) { > + case AF_UNSPEC: > + case AF_INET: > + ipv4_addr_len = sizeof(ipv4_addr); > + getsockname(socket_fd, &ipv4_addr, &ipv4_addr_len); > + return ntohs(ipv4_addr.sin_port); > + > + case AF_INET6: > + ipv6_addr_len = sizeof(ipv6_addr); > + getsockname(socket_fd, &ipv6_addr, &ipv6_addr_len); > + return ntohs(ipv6_addr.sin6_port); > + > + default: > + return 0; > + } > +} > + > +static int bind_variant_addrlen(const int sock_fd, > + const struct service_fixture *const srv, > + const socklen_t addrlen) > +{ > + int ret; > + > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + ret = bind(sock_fd, &srv->ipv4_addr, addrlen); > + break; > + > + case AF_INET6: > + ret = bind(sock_fd, &srv->ipv6_addr, addrlen); > + break; > + > + case AF_UNIX: > + ret = bind(sock_fd, &srv->unix_addr, addrlen); > + break; > + > + default: > + errno = EAFNOSUPPORT; > + return -errno; > + } > + > + if (ret < 0) > + return -errno; > + return ret; > +} > + > +static int bind_variant(const int sock_fd, > + const struct service_fixture *const srv) > +{ > + return bind_variant_addrlen(sock_fd, srv, get_addrlen(srv, false)); > +} > + > +static int connect_variant_addrlen(const int sock_fd, > + const struct service_fixture *const srv, > + const socklen_t addrlen) > +{ > + int ret; > + > + switch (srv->protocol.domain) { > + case AF_UNSPEC: > + case AF_INET: > + ret = connect(sock_fd, &srv->ipv4_addr, addrlen); > + break; > + > + case AF_INET6: > + ret = connect(sock_fd, &srv->ipv6_addr, addrlen); > + break; > + > + case AF_UNIX: > + ret = connect(sock_fd, &srv->unix_addr, addrlen); > + break; > + > + default: > + errno = -EAFNOSUPPORT; > + return -errno; > + } > + > + if (ret < 0) > + return -errno; > + return ret; > +} > + > +static int connect_variant(const int sock_fd, > + const struct service_fixture *const srv) > +{ > + return connect_variant_addrlen(sock_fd, srv, get_addrlen(srv, false)); > +} > + > +FIXTURE(protocol) > +{ > + struct service_fixture srv0, srv1, srv2, unspec_any0, unspec_srv0; > +}; > + > +FIXTURE_VARIANT(protocol) > +{ > + const enum sandbox_type sandbox; > + const struct protocol_variant prot; > +}; > + > +FIXTURE_SETUP(protocol) > +{ > + const struct protocol_variant prot_unspec = { > + .domain = AF_UNSPEC, > + .type = SOCK_STREAM, > + }; > + > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, variant->prot, 0)); > + ASSERT_EQ(0, set_service(&self->srv1, variant->prot, 1)); > + ASSERT_EQ(0, set_service(&self->srv2, variant->prot, 2)); > + > + ASSERT_EQ(0, set_service(&self->unspec_srv0, prot_unspec, 0)); > + > + ASSERT_EQ(0, set_service(&self->unspec_any0, prot_unspec, 0)); > + self->unspec_any0.ipv4_addr.sin_addr.s_addr = htonl(INADDR_ANY); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(protocol) > +{ > +} > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_tcp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_tcp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv4_udp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_ipv6_udp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_unix_stream) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, no_sandbox_with_unix_datagram) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_tcp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_tcp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv4_udp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_ipv6_udp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_DGRAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_unix_stream) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(protocol, tcp_sandbox_with_unix_datagram) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_UNIX, > + .type = SOCK_DGRAM, > + }, > +}; > + > +static void test_bind_and_connect(struct __test_metadata *const _metadata, > + const struct service_fixture *const srv, > + const bool deny_bind, const bool deny_connect) > +{ > + char buf = '\0'; > + int inval_fd, bind_fd, client_fd, status, ret; > + pid_t child; > + > + /* Starts invalid addrlen tests with bind. */ > + inval_fd = socket_variant(srv); > + ASSERT_LE(0, inval_fd) > + { > + TH_LOG("Failed to create socket: %s", strerror(errno)); > + } > + > + /* Tries to bind with zero as addrlen. */ > + EXPECT_EQ(-EINVAL, bind_variant_addrlen(inval_fd, srv, 0)); > + > + /* Tries to bind with too small addrlen. */ > + EXPECT_EQ(-EINVAL, bind_variant_addrlen(inval_fd, srv, > + get_addrlen(srv, true) - 1)); > + > + /* Tries to bind with minimal addrlen. */ > + ret = bind_variant_addrlen(inval_fd, srv, get_addrlen(srv, true)); > + if (deny_bind) { > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret) > + { > + TH_LOG("Failed to bind to socket: %s", strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(inval_fd)); > + > + /* Starts invalid addrlen tests with connect. */ > + inval_fd = socket_variant(srv); > + ASSERT_LE(0, inval_fd); > + > + /* Tries to connect with zero as addrlen. */ > + EXPECT_EQ(-EINVAL, connect_variant_addrlen(inval_fd, srv, 0)); > + > + /* Tries to connect with too small addrlen. */ > + EXPECT_EQ(-EINVAL, connect_variant_addrlen(inval_fd, srv, > + get_addrlen(srv, true) - 1)); > + > + /* Tries to connect with minimal addrlen. */ > + ret = connect_variant_addrlen(inval_fd, srv, get_addrlen(srv, true)); > + if (srv->protocol.domain == AF_UNIX) { > + EXPECT_EQ(-EINVAL, ret); > + } else if (deny_connect) { > + EXPECT_EQ(-EACCES, ret); > + } else if (srv->protocol.type == SOCK_STREAM) { > + /* No listening server, whatever the value of deny_bind. */ > + EXPECT_EQ(-ECONNREFUSED, ret); > + } else { > + EXPECT_EQ(0, ret) > + { > + TH_LOG("Failed to connect to socket: %s", > + strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(inval_fd)); > + > + /* Starts connection tests. */ > + bind_fd = socket_variant(srv); > + ASSERT_LE(0, bind_fd); > + > + ret = bind_variant(bind_fd, srv); > + if (deny_bind) { > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret); > + > + /* Creates a listening socket. */ > + if (srv->protocol.type == SOCK_STREAM) > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + } > + > + child = fork(); > + ASSERT_LE(0, child); > + if (child == 0) { > + int connect_fd, ret; > + > + /* Closes listening socket for the child. */ > + EXPECT_EQ(0, close(bind_fd)); > + > + /* Starts connection tests. */ > + connect_fd = socket_variant(srv); > + ASSERT_LE(0, connect_fd); > + ret = connect_variant(connect_fd, srv); > + if (deny_connect) { > + EXPECT_EQ(-EACCES, ret); > + } else if (deny_bind) { > + /* No listening server. */ > + EXPECT_EQ(-ECONNREFUSED, ret); > + } else { > + EXPECT_EQ(0, ret); > + EXPECT_EQ(1, write(connect_fd, ".", 1)); > + } > + > + EXPECT_EQ(0, close(connect_fd)); > + _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE); > + return; > + } > + > + /* Accepts connection from the child. */ > + client_fd = bind_fd; > + if (!deny_bind && !deny_connect) { > + if (srv->protocol.type == SOCK_STREAM) { > + client_fd = accept(bind_fd, NULL, 0); > + ASSERT_LE(0, client_fd); > + } > + > + EXPECT_EQ(1, read(client_fd, &buf, 1)); > + EXPECT_EQ('.', buf); > + } > + > + EXPECT_EQ(child, waitpid(child, &status, 0)); > + EXPECT_EQ(1, WIFEXITED(status)); > + EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status)); > + > + /* Closes connection, if any. */ > + if (client_fd != bind_fd) > + EXPECT_LE(0, close(client_fd)); > + > + /* Closes listening socket. */ > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_F(protocol, bind) > +{ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr tcp_connect_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv1.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect and bind for the first port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + > + /* Allows connect and denies bind for the second port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_connect_p1, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + /* Binds a socket to the first port. */ > + test_bind_and_connect(_metadata, &self->srv0, false, false); > + > + /* Binds a socket to the second port. */ > + test_bind_and_connect(_metadata, &self->srv1, > + is_restricted(&variant->prot, variant->sandbox), > + false); > + > + /* Binds a socket to the third port. */ > + test_bind_and_connect(_metadata, &self->srv2, > + is_restricted(&variant->prot, variant->sandbox), > + is_restricted(&variant->prot, variant->sandbox)); > +} > + > +TEST_F(protocol, connect) > +{ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr tcp_bind_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv1.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect and bind for the first port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + > + /* Allows bind and denies connect for the second port. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_p1, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + test_bind_and_connect(_metadata, &self->srv0, false, false); > + > + test_bind_and_connect(_metadata, &self->srv1, false, > + is_restricted(&variant->prot, variant->sandbox)); > + > + test_bind_and_connect(_metadata, &self->srv2, > + is_restricted(&variant->prot, variant->sandbox), > + is_restricted(&variant->prot, variant->sandbox)); > +} > + > +TEST_F(protocol, bind_unspec) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + int bind_fd, ret; > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows bind. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + /* Allowed bind on AF_UNSPEC/INADDR_ANY. */ > + ret = bind_variant(bind_fd, &self->unspec_any0); > + if (variant->prot.domain == AF_INET) { > + EXPECT_EQ(0, ret) > + { > + TH_LOG("Failed to bind to unspec/any socket: %s", > + strerror(errno)); > + } > + } else { > + EXPECT_EQ(-EINVAL, ret); > + } > + EXPECT_EQ(0, close(bind_fd)); > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Denies bind. */ > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + /* Denied bind on AF_UNSPEC/INADDR_ANY. */ > + ret = bind_variant(bind_fd, &self->unspec_any0); > + if (variant->prot.domain == AF_INET) { > + if (is_restricted(&variant->prot, variant->sandbox)) { > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + } else { > + EXPECT_EQ(-EINVAL, ret); > + } > + EXPECT_EQ(0, close(bind_fd)); > + > + /* Checks bind with AF_UNSPEC and the loopback address. */ > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + ret = bind_variant(bind_fd, &self->unspec_srv0); > + if (variant->prot.domain == AF_INET) { > + EXPECT_EQ(-EAFNOSUPPORT, ret); > + } else { > + EXPECT_EQ(-EINVAL, ret) > + { > + TH_LOG("Wrong bind error: %s", strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_F(protocol, connect_unspec) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + int bind_fd, client_fd, status; > + pid_t child; > + > + /* Specific connection tests. */ > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + EXPECT_EQ(0, bind_variant(bind_fd, &self->srv0)); > + if (self->srv0.protocol.type == SOCK_STREAM) > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + > + child = fork(); > + ASSERT_LE(0, child); > + if (child == 0) { > + int connect_fd, ret; > + > + /* Closes listening socket for the child. */ > + EXPECT_EQ(0, close(bind_fd)); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + EXPECT_EQ(0, connect_variant(connect_fd, &self->srv0)); > + > + /* Tries to connect again, or set peer. */ > + ret = connect_variant(connect_fd, &self->srv0); > + if (self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EISCONN, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, > + LANDLOCK_RULE_NET_PORT, > + &tcp_connect, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + /* Disconnects already connected socket, or set peer. */ > + ret = connect_variant(connect_fd, &self->unspec_any0); > + if (self->srv0.protocol.domain == AF_UNIX && > + self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EINVAL, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + /* Tries to reconnect, or set peer. */ > + ret = connect_variant(connect_fd, &self->srv0); > + if (self->srv0.protocol.domain == AF_UNIX && > + self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EISCONN, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + if (variant->sandbox == TCP_SANDBOX) { > + const int ruleset_fd = landlock_create_ruleset( > + &ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Denies connect. */ > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + ret = connect_variant(connect_fd, &self->unspec_any0); > + if (self->srv0.protocol.domain == AF_UNIX && > + self->srv0.protocol.type == SOCK_STREAM) { > + EXPECT_EQ(-EINVAL, ret); > + } else { > + /* Always allowed to disconnect. */ > + EXPECT_EQ(0, ret); > + } > + > + EXPECT_EQ(0, close(connect_fd)); > + _exit(_metadata->passed ? EXIT_SUCCESS : EXIT_FAILURE); > + return; > + } > + > + client_fd = bind_fd; > + if (self->srv0.protocol.type == SOCK_STREAM) { > + client_fd = accept(bind_fd, NULL, 0); > + ASSERT_LE(0, client_fd); > + } > + > + EXPECT_EQ(child, waitpid(child, &status, 0)); > + EXPECT_EQ(1, WIFEXITED(status)); > + EXPECT_EQ(EXIT_SUCCESS, WEXITSTATUS(status)); > + > + /* Closes connection, if any. */ > + if (client_fd != bind_fd) > + EXPECT_LE(0, close(client_fd)); > + > + /* Closes listening socket. */ > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +FIXTURE(ipv4) > +{ > + struct service_fixture srv0, srv1; > +}; > + > +FIXTURE_VARIANT(ipv4) > +{ > + const enum sandbox_type sandbox; > + const int type; > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, no_sandbox_with_tcp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .type = SOCK_STREAM, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, tcp_sandbox_with_tcp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .type = SOCK_STREAM, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, no_sandbox_with_udp) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .type = SOCK_DGRAM, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(ipv4, tcp_sandbox_with_udp) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .type = SOCK_DGRAM, > +}; > + > +FIXTURE_SETUP(ipv4) > +{ > + const struct protocol_variant prot = { > + .domain = AF_INET, > + .type = variant->type, > + }; > + > + disable_caps(_metadata); > + > + set_service(&self->srv0, prot, 0); > + set_service(&self->srv1, prot, 1); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(ipv4) > +{ > +} > + > +TEST_F(ipv4, from_unix_to_inet) > +{ > + int unix_stream_fd, unix_dgram_fd; > + > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + int ruleset_fd; > + > + /* Denies connect and bind to check errno value. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows connect and bind for srv0. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + unix_stream_fd = socket(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, 0); > + ASSERT_LE(0, unix_stream_fd); > + > + unix_dgram_fd = socket(AF_UNIX, SOCK_DGRAM | SOCK_CLOEXEC, 0); > + ASSERT_LE(0, unix_dgram_fd); > + > + /* Checks unix stream bind and connect for srv0. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_stream_fd, &self->srv0)); > + EXPECT_EQ(-EINVAL, connect_variant(unix_stream_fd, &self->srv0)); > + > + /* Checks unix stream bind and connect for srv1. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_stream_fd, &self->srv1)) > + { > + TH_LOG("Wrong bind error: %s", strerror(errno)); > + } > + EXPECT_EQ(-EINVAL, connect_variant(unix_stream_fd, &self->srv1)); > + > + /* Checks unix datagram bind and connect for srv0. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_dgram_fd, &self->srv0)); > + EXPECT_EQ(-EINVAL, connect_variant(unix_dgram_fd, &self->srv0)); > + > + /* Checks unix datagram bind and connect for srv1. */ > + EXPECT_EQ(-EINVAL, bind_variant(unix_dgram_fd, &self->srv1)); > + EXPECT_EQ(-EINVAL, connect_variant(unix_dgram_fd, &self->srv1)); > +} > + > +FIXTURE(tcp_layers) > +{ > + struct service_fixture srv0, srv1; > +}; > + > +FIXTURE_VARIANT(tcp_layers) > +{ > + const size_t num_layers; > + const int domain; > +}; > + > +FIXTURE_SETUP(tcp_layers) > +{ > + const struct protocol_variant prot = { > + .domain = variant->domain, > + .type = SOCK_STREAM, > + }; > + > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, prot, 0)); > + ASSERT_EQ(0, set_service(&self->srv1, prot, 1)); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(tcp_layers) > +{ > +} > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, no_sandbox_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 0, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, one_sandbox_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 1, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, two_sandboxes_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 2, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, three_sandboxes_with_ipv4) { > + /* clang-format on */ > + .domain = AF_INET, > + .num_layers = 3, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, no_sandbox_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 0, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, one_sandbox_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 1, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, two_sandboxes_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 2, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(tcp_layers, three_sandboxes_with_ipv6) { > + /* clang-format on */ > + .domain = AF_INET6, > + .num_layers = 3, > +}; > + > +TEST_F(tcp_layers, ruleset_overlap) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr tcp_bind_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + > + if (variant->num_layers >= 1) { > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Allows bind. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + /* Also allows bind, but allows connect too. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 2) { > + int ruleset_fd; > + > + /* Creates another ruleset layer. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Only allows bind. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 3) { > + int ruleset_fd; > + > + /* Creates another ruleset layer. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Try to allow bind and connect. */ > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + /* > + * Forbids to connect to the socket because only one ruleset layer > + * allows connect. > + */ > + test_bind_and_connect(_metadata, &self->srv0, false, > + variant->num_layers >= 2); > +} > + > +TEST_F(tcp_layers, ruleset_expand) > +{ > + if (variant->num_layers >= 1) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP, > + }; > + /* Allows bind for srv0. */ > + const struct landlock_net_port_attr bind_srv0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &bind_srv0, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 2) { > + /* Expands network mask with connect action. */ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + /* Allows bind for srv0 and connect to srv0. */ > + const struct landlock_net_port_attr tcp_bind_connect_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = self->srv0.port, > + }; > + /* Try to allow bind for srv1. */ > + const struct landlock_net_port_attr tcp_bind_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv1.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_p0, 0)); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_p1, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + if (variant->num_layers >= 3) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + /* Allows connect to srv0, without bind rule. */ > + const struct landlock_net_port_attr tcp_bind_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = self->srv0.port, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_p0, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + test_bind_and_connect(_metadata, &self->srv0, false, > + variant->num_layers >= 3); > + > + test_bind_and_connect(_metadata, &self->srv1, variant->num_layers >= 1, > + variant->num_layers >= 2); > +} > + > +/* clang-format off */ > +FIXTURE(mini) {}; > +/* clang-format on */ > + > +FIXTURE_SETUP(mini) > +{ > + disable_caps(_metadata); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(mini) > +{ > +} > + > +/* clang-format off */ > + > +#define ACCESS_LAST LANDLOCK_ACCESS_NET_CONNECT_TCP > + > +#define ACCESS_ALL ( \ > + LANDLOCK_ACCESS_NET_BIND_TCP | \ > + LANDLOCK_ACCESS_NET_CONNECT_TCP) > + > +/* clang-format on */ > + > +TEST_F(mini, network_access_rights) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = ACCESS_ALL, > + }; > + struct landlock_net_port_attr net_port = { > + .port = sock_port_start, > + }; > + int ruleset_fd; > + __u64 access; > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + for (access = 1; access <= ACCESS_LAST; access <<= 1) { > + net_port.allowed_access = access; > + EXPECT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &net_port, 0)) > + { > + TH_LOG("Failed to add rule with access 0x%llx: %s", > + access, strerror(errno)); > + } > + } > + EXPECT_EQ(0, close(ruleset_fd)); > +} > + > +/* Checks invalid attribute, out of landlock network access range. */ > +TEST_F(mini, unknown_access_rights) > +{ > + __u64 access_mask; > + > + for (access_mask = 1ULL << 63; access_mask != ACCESS_LAST; > + access_mask >>= 1) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = access_mask, > + }; > + > + EXPECT_EQ(-1, landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0)); > + EXPECT_EQ(EINVAL, errno); > + } > +} > + > +TEST_F(mini, inval) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP > + }; > + const struct landlock_net_port_attr tcp_bind_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = sock_port_start, > + }; > + const struct landlock_net_port_attr tcp_denied = { > + .allowed_access = 0, > + .port = sock_port_start, > + }; > + const struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = sock_port_start, > + }; > + int ruleset_fd; > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Checks unhandled allowed_access. */ > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + /* Checks zero access value. */ > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_denied, 0)); > + EXPECT_EQ(ENOMSG, errno); > + > + /* Adds with legitimate values. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > +} > + > +TEST_F(mini, tcp_port_overflow) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr port_max_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT16_MAX, > + }; > + const struct landlock_net_port_attr port_max_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = UINT16_MAX, > + }; > + const struct landlock_net_port_attr port_overflow1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT16_MAX + 1, > + }; > + const struct landlock_net_port_attr port_overflow2 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT16_MAX + 2, > + }; > + const struct landlock_net_port_attr port_overflow3 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT32_MAX + 1UL, > + }; > + const struct landlock_net_port_attr port_overflow4 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = UINT32_MAX + 2UL, > + }; > + const struct protocol_variant ipv4_tcp = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }; > + struct service_fixture srv_denied, srv_max_allowed; > + int ruleset_fd; > + > + ASSERT_EQ(0, set_service(&srv_denied, ipv4_tcp, 0)); > + > + /* Be careful to avoid port inconsistencies. */ > + srv_max_allowed = srv_denied; > + srv_max_allowed.port = port_max_bind.port; > + srv_max_allowed.ipv4_addr.sin_port = htons(port_max_bind.port); > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_max_bind, 0)); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow1, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow2, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow3, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + /* Interleaves with invalid rule additions. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_max_connect, 0)); > + > + EXPECT_EQ(-1, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &port_overflow4, 0)); > + EXPECT_EQ(EINVAL, errno); > + > + enforce_ruleset(_metadata, ruleset_fd); > + > + test_bind_and_connect(_metadata, &srv_denied, true, true); > + test_bind_and_connect(_metadata, &srv_max_allowed, false, false); > +} > + > +FIXTURE(ipv4_tcp) > +{ > + struct service_fixture srv0, srv1; > +}; > + > +FIXTURE_SETUP(ipv4_tcp) > +{ > + const struct protocol_variant ipv4_tcp = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }; > + > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, ipv4_tcp, 0)); > + ASSERT_EQ(0, set_service(&self->srv1, ipv4_tcp, 1)); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(ipv4_tcp) > +{ > +} > + > +TEST_F(ipv4_tcp, port_endianness) > +{ > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + }; > + const struct landlock_net_port_attr bind_host_endian_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + /* Host port format. */ > + .port = self->srv0.port, > + }; > + const struct landlock_net_port_attr connect_big_endian_p0 = { > + .allowed_access = LANDLOCK_ACCESS_NET_CONNECT_TCP, > + /* Big endian port format. */ > + .port = htons(self->srv0.port), > + }; > + const struct landlock_net_port_attr bind_connect_host_endian_p1 = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + /* Host port format. */ > + .port = self->srv1.port, > + }; > + const unsigned int one = 1; > + const char little_endian = *(const char *)&one; > + int ruleset_fd; > + > + ruleset_fd = > + landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &bind_host_endian_p0, 0)); > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &connect_big_endian_p0, 0)); > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &bind_connect_host_endian_p1, 0)); > + enforce_ruleset(_metadata, ruleset_fd); > + > + /* No restriction for big endinan CPU. */ > + test_bind_and_connect(_metadata, &self->srv0, false, little_endian); > + > + /* No restriction for any CPU. */ > + test_bind_and_connect(_metadata, &self->srv1, false, false); > +} > + > +TEST_F_FORK(ipv4_tcp, with_fs) > +{ > + const struct landlock_ruleset_attr ruleset_attr_fs_net = { > + .handled_access_fs = LANDLOCK_ACCESS_FS_READ_DIR, > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP, > + }; > + struct landlock_path_beneath_attr path_beneath = { > + .allowed_access = LANDLOCK_ACCESS_FS_READ_DIR, > + .parent_fd = -1, > + }; > + struct landlock_net_port_attr tcp_bind = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP, > + .port = sock_port_start, > + }; > + int sockfd, ruleset_fd, dirfd, open_dir1, open_dir2; > + struct sockaddr_in addr4; > + > + dirfd = open("/dev", O_PATH | O_DIRECTORY | O_CLOEXEC); > + ASSERT_LE(0, dirfd); > + path_beneath.parent_fd = dirfd; > + > + addr4.sin_family = AF_INET; > + addr4.sin_port = htons(sock_port_start); > + addr4.sin_addr.s_addr = inet_addr(loopback_ipv4); > + memset(&addr4.sin_zero, '\0', 8); > + > + /* Creates ruleset both for filesystem and network access. */ > + ruleset_fd = landlock_create_ruleset(&ruleset_attr_fs_net, > + sizeof(ruleset_attr_fs_net), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Adds a filesystem rule. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH, > + &path_beneath, 0)); > + /* Adds a network rule. */ > + ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + ASSERT_EQ(0, close(ruleset_fd)); > + > + /* Tests on a directories with the network rule loaded. */ > + open_dir1 = open("/dev", O_RDONLY); > + ASSERT_LE(0, open_dir1); > + ASSERT_EQ(0, close(open_dir1)); > + > + open_dir2 = open("/", O_RDONLY); > + /* Denied by Landlock. */ > + ASSERT_EQ(-1, open_dir2); > + EXPECT_EQ(EACCES, errno); > + > + sockfd = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, 0); > + ASSERT_LE(0, sockfd); > + /* Binds a socket to port 1024. */ > + ASSERT_EQ(0, bind(sockfd, &addr4, sizeof(addr4))); > + > + /* Closes bounded socket. */ > + ASSERT_EQ(0, close(sockfd)); > +} > + > +FIXTURE(port_specific) > +{ > + struct service_fixture srv0; > +}; > + > +FIXTURE_VARIANT(port_specific) > +{ > + const enum sandbox_type sandbox; > + const struct protocol_variant prot; > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, no_sandbox_with_ipv4) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, sandbox_with_ipv4) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, no_sandbox_with_ipv6) { > + /* clang-format on */ > + .sandbox = NO_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +/* clang-format off */ > +FIXTURE_VARIANT_ADD(port_specific, sandbox_with_ipv6) { > + /* clang-format on */ > + .sandbox = TCP_SANDBOX, > + .prot = { > + .domain = AF_INET6, > + .type = SOCK_STREAM, > + }, > +}; > + > +FIXTURE_SETUP(port_specific) > +{ > + disable_caps(_metadata); > + > + ASSERT_EQ(0, set_service(&self->srv0, variant->prot, 0)); > + > + setup_loopback(_metadata); > +}; > + > +FIXTURE_TEARDOWN(port_specific) > +{ > +} > + > +TEST_F(port_specific, bind_connect_zero) > +{ > + int bind_fd, connect_fd, ret; > + uint16_t port; > + > + /* Adds a rule layer with bind and connect actions. */ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP > + }; > + const struct landlock_net_port_attr tcp_bind_connect_zero = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = 0, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + /* Checks zero port value on bind and connect actions. */ > + EXPECT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_zero, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + > + /* Sets address port to 0 for both protocol families. */ > + set_port(&self->srv0, 0); > + /* > + * Binds on port 0, which selects a random port within > + * ip_local_port_range. > + */ > + ret = bind_variant(bind_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + > + /* Connects on port 0. */ > + ret = connect_variant(connect_fd, &self->srv0); > + EXPECT_EQ(-ECONNREFUSED, ret); > + > + /* Sets binded port for both protocol families. */ > + port = get_binded_port(bind_fd, &variant->prot); > + EXPECT_NE(0, port); > + set_port(&self->srv0, port); > + /* Connects on the binded port. */ > + ret = connect_variant(connect_fd, &self->srv0); > + if (is_restricted(&variant->prot, variant->sandbox)) { > + /* Denied by Landlock. */ > + EXPECT_EQ(-EACCES, ret); > + } else { > + EXPECT_EQ(0, ret); > + } > + > + EXPECT_EQ(0, close(connect_fd)); > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_F(port_specific, bind_connect_1023) > +{ > + int bind_fd, connect_fd, ret; > + > + /* Adds a rule layer with bind and connect actions. */ > + if (variant->sandbox == TCP_SANDBOX) { > + const struct landlock_ruleset_attr ruleset_attr = { > + .handled_access_net = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP > + }; > + /* A rule with port value less than 1024. */ > + const struct landlock_net_port_attr tcp_bind_connect_low_range = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = 1023, > + }; > + /* A rule with 1024 port. */ > + const struct landlock_net_port_attr tcp_bind_connect = { > + .allowed_access = LANDLOCK_ACCESS_NET_BIND_TCP | > + LANDLOCK_ACCESS_NET_CONNECT_TCP, > + .port = 1024, > + }; > + int ruleset_fd; > + > + ruleset_fd = landlock_create_ruleset(&ruleset_attr, > + sizeof(ruleset_attr), 0); > + ASSERT_LE(0, ruleset_fd); > + > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect_low_range, 0)); > + ASSERT_EQ(0, > + landlock_add_rule(ruleset_fd, LANDLOCK_RULE_NET_PORT, > + &tcp_bind_connect, 0)); > + > + enforce_ruleset(_metadata, ruleset_fd); > + EXPECT_EQ(0, close(ruleset_fd)); > + } > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + > + /* Sets address port to 1023 for both protocol families. */ > + set_port(&self->srv0, 1023); > + /* Binds on port 1023. */ > + ret = bind_variant(bind_fd, &self->srv0); > + /* Denied by the system. */ > + EXPECT_EQ(-EACCES, ret); > + > + set_cap(_metadata, CAP_NET_BIND_SERVICE); > + /* Binds on port 1023. */ > + ret = bind_variant(bind_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + clear_cap(_metadata, CAP_NET_BIND_SERVICE); > + > + /* Connects on the binded port 1023. */ > + ret = connect_variant(connect_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + > + EXPECT_EQ(0, close(connect_fd)); > + EXPECT_EQ(0, close(bind_fd)); > + > + bind_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, bind_fd); > + > + connect_fd = socket_variant(&self->srv0); > + ASSERT_LE(0, connect_fd); > + > + /* Sets address port to 1024 for both protocol families. */ > + set_port(&self->srv0, 1024); > + /* Binds on port 1024. */ > + ret = bind_variant(bind_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + EXPECT_EQ(0, listen(bind_fd, backlog)); > + clear_cap(_metadata, CAP_NET_BIND_SERVICE); > + > + /* Connects on the binded port 1024. */ > + ret = connect_variant(connect_fd, &self->srv0); > + EXPECT_EQ(0, ret); > + > + EXPECT_EQ(0, close(connect_fd)); > + EXPECT_EQ(0, close(bind_fd)); > +} > + > +TEST_HARNESS_MAIN > -- > 2.25.1 > > -- BR, Muhammad Usama Anjum

4 months, 3 weeks

2
3
0 0

[PATCH V12 0/7] amd-pstate preferred core

by Meng Li

Hi all: The core frequency is subjected to the process variation in semiconductors. Not all cores are able to reach the maximum frequency respecting the infrastructure limits. Consequently, AMD has redefined the concept of maximum frequency of a part. This means that a fraction of cores can reach maximum frequency. To find the best process scheduling policy for a given scenario, OS needs to know the core ordering informed by the platform through highest performance capability register of the CPPC interface. Earlier implementations of amd-pstate preferred core only support a static core ranking and targeted performance. Now it has the ability to dynamically change the preferred core based on the workload and platform conditions and accounting for thermals and aging. Amd-pstate driver utilizes the functions and data structures provided by the ITMT architecture to enable the scheduler to favor scheduling on cores which can be get a higher frequency with lower voltage. We call it amd-pstate preferred core. Here sched_set_itmt_core_prio() is called to set priorities and sched_set_itmt_support() is called to enable ITMT feature. Amd-pstate driver uses the highest performance value to indicate the priority of CPU. The higher value has a higher priority. Amd-pstate driver will provide an initial core ordering at boot time. It relies on the CPPC interface to communicate the core ranking to the operating system and scheduler to make sure that OS is choosing the cores with highest performance firstly for scheduling the process. When amd-pstate driver receives a message with the highest performance change, it will update the core ranking. Changes from V11->V12: - all: - - pick up Reviewed-By flag added by Perry. - cpufreq: amd-pstate: - - rebase the latest linux-next and fixed conflicts. - - fixed the issue about cpudata without init in amd_pstate_update_highest_perf(). Changes from V10->V11: - cpufreq: amd-pstate: - - according Perry's commnts, I replace the string with str_enabled_disable(). Changes from V9->V10: - cpufreq: amd-pstate: - - add judgement for highest_perf. When it is less than 255, the preferred core feature is enabled. And it will set the priority. - - deleset "static u32 max_highest_perf" etc, because amd p-state perferred coe does not require specail process for hotpulg. Changes form V8->V9: - all: - - pick up Tested-By flag added by Oleksandr. - cpufreq: amd-pstate: - - pick up Review-By flag added by Wyes. - - ignore modification of bug. - - add a attribute of prefcore_ranking. - - modify data type conversion from u32 to int. - Documentation: amd-pstate: - - pick up Review-By flag added by Wyes. Changes form V7->V8: - all: - - pick up Review-By flag added by Mario and Ray. - cpufreq: amd-pstate: - - use hw_prefcore embeds into cpudata structure. - - delete preferred core init from cpu online/off. Changes form V6->V7: - x86: - - Modify kconfig about X86_AMD_PSTATE. - cpufreq: amd-pstate: - - modify incorrect comments about scheduler_work(). - - convert highest_perf data type. - - modify preferred core init when cpu init and online. - acpi: cppc: - - modify link of CPPC highest performance. - cpufreq: - - modify link of CPPC highest performance changed. Changes form V5->V6: - cpufreq: amd-pstate: - - modify the wrong tag order. - - modify warning about hw_prefcore sysfs attribute. - - delete duplicate comments. - - modify the variable name cppc_highest_perf to prefcore_ranking. - - modify judgment conditions for setting highest_perf. - - modify sysfs attribute for CPPC highest perf to pr_debug message. - Documentation: amd-pstate: - - modify warning: title underline too short. Changes form V4->V5: - cpufreq: amd-pstate: - - modify sysfs attribute for CPPC highest perf. - - modify warning about comments - - rebase linux-next - cpufreq: - - Moidfy warning about function declarations. - Documentation: amd-pstate: - - align with ``amd-pstat`` Changes form V3->V4: - Documentation: amd-pstate: - - Modify inappropriate descriptions. Changes form V2->V3: - x86: - - Modify kconfig and description. - cpufreq: amd-pstate: - - Add Co-developed-by tag in commit message. - cpufreq: - - Modify commit message. - Documentation: amd-pstate: - - Modify inappropriate descriptions. Changes form V1->V2: - acpi: cppc: - - Add reference link. - cpufreq: - - Moidfy link error. - cpufreq: amd-pstate: - - Init the priorities of all online CPUs - - Use a single variable to represent the status of preferred core. - Documentation: - - Default enabled preferred core. - Documentation: amd-pstate: - - Modify inappropriate descriptions. - - Default enabled preferred core. - - Use a single variable to represent the status of preferred core. Meng Li (7): x86: Drop CPU_SUP_INTEL from SCHED_MC_PRIO for the expansion. acpi: cppc: Add get the highest performance cppc control cpufreq: amd-pstate: Enable amd-pstate preferred core supporting. cpufreq: Add a notification message that the highest perf has changed cpufreq: amd-pstate: Update amd-pstate preferred core ranking dynamically Documentation: amd-pstate: introduce amd-pstate preferred core Documentation: introduce amd-pstate preferrd core mode kernel command line options .../admin-guide/kernel-parameters.txt | 5 + Documentation/admin-guide/pm/amd-pstate.rst | 59 +++++- arch/x86/Kconfig | 5 +- drivers/acpi/cppc_acpi.c | 13 ++ drivers/acpi/processor_driver.c | 6 + drivers/cpufreq/amd-pstate.c | 175 +++++++++++++++++- drivers/cpufreq/cpufreq.c | 13 ++ include/acpi/cppc_acpi.h | 5 + include/linux/amd-pstate.h | 10 + include/linux/cpufreq.h | 5 + 10 files changed, 284 insertions(+), 12 deletions(-) -- 2.34.1

4 months, 3 weeks

4
21
0 0

[PATCH v8 00/10] Add iommufd nesting (part 2/2)

by Yi Liu

Nested translation is a hardware feature that is supported by many modern IOMMU hardwares. It has two stages (stage-1, stage-2) address translation to get access to the physical address. stage-1 translation table is owned by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes to stage-1 translation table should be followed by an IOTLB invalidation. Take Intel VT-d as an example, the stage-1 translation table is I/O page table. As the below diagram shows, guest I/O page table pointer in GPA (guest physical address) is passed to host and be used to perform the stage-1 address translation. Along with it, modifications to present mappings in the guest I/O page table should be followed with an IOTLB invalidation. .-------------. .---------------------------. | vIOMMU | | Guest I/O page table | | | '---------------------------' .----------------/ | PASID Entry |--- PASID cache flush --+ '-------------' | | | V | | I/O page table pointer in GPA '-------------' Guest ------| Shadow |---------------------------|-------- v v v Host .-------------. .------------------------. | pIOMMU | | FS for GIOVA->GPA | | | '------------------------' .----------------/ | | PASID Entry | V (Nested xlate) '----------------\.----------------------------------. | | | SS for GPA->HPA, unmanaged domain| | | '----------------------------------' '-------------' Where: - FS = First stage page tables - SS = Second stage page tables <Intel VT-d Nested translation> This series is based on the first part which was merged [1], this series is to add the cache invalidation interface or the userspace to invalidate cache after modifying the stage-1 page table. This includes both the iommufd changes and the VT-d driver changes. Complete code can be found in [2], QEMU could can be found in [3]. At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks them for the help. ^_^. Look forward to your feedbacks. [1] https://lore.kernel.org/linux-iommu/20231026044216.64964-1-yi.l.liu@intel.c… - merged [2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting [3] https://github.com/yiliu1765/qemu/tree/zhenzhong/wip/iommufd_nesting_rfcv1 Change log: v8: - Pass invalidation hint to the cache invalidation helper in the cache_invalidate_user op path (Kevin) - Move the devTLB invalidation out of info->iommu loop (Kevin, Weijiang) - Clear *fault per restart in qi_submit_sync() to avoid acroos submission error accumulation. (Kevin) - Define the vtd cache invalidation uapi structure in separate patch (Kevin) - Rename inv_error to be hw_error (Kevin) - Rename 'reqs_uptr', 'req_type', 'req_len' and 'req_num' to be 'data_uptr', 'data_type', "entry_len' and 'entry_num" (Kevin) - Allow user to set IOMMU_TEST_INVALIDATE_FLAG_ALL and IOMMU_TEST_INVALIDATE_FLAG_TRIGGER_ERROR in the same time (Kevin) v7: https://lore.kernel.org/linux-iommu/20231221153948.119007-1-yi.l.liu@intel.… - Remove domain->ops->cache_invalidate_user check in hwpt alloc path due to failure in bisect (Baolu) - Remove out_driver_error_code from struct iommu_hwpt_invalidate after discussion in v6. Should expect per-entry error code. - Rework the selftest cache invalidation part to report a per-entry error - Allow user to pass in an empty array to have a try-and-fail mechanism for user to check if a given req_type is supported by the kernel (Jason) - Define a separate enum type for cache invalidation data (Jason) - Fix the IOMMU_HWPT_INVALIDATE to always update the req_num field before returning (Nicolin) - Merge the VT-d nesting part 2/2 https://lore.kernel.org/linux-iommu/20231117131816.24359-1-yi.l.liu@intel.c… into this series to avoid defining empty enum in the middle of the series. The major difference is adding the VT-d related invalidation uapi structures together with the generic data structures in patch 02 of this series. - VT-d driver was refined to report ICE/ITE error from the bottom cache invalidation submit helpers, hence the cache_invalidate_user op could report such errors via the per-entry error field to user. VT-d driver will not stop the invalidation array walking due to the ICE/ITE errors as such errors are defined by VT-d spec, userspace should be able to handle it and let the real user (say Virtual Machine) know about it. But for other errors like invalid uapi data structure configuration, memory copy failure, such errors should stop the array walking as it may have more issues if go on. - Minor fixes per Jason and Kevin's review comments v6: https://lore.kernel.org/linux-iommu/20231117130717.19875-1-yi.l.liu@intel.c… - No much change, just rebase on top of 6.7-rc1 as part 1/2 is merged v5: https://lore.kernel.org/linux-iommu/20231020092426.13907-1-yi.l.liu@intel.c… - Split the iommufd nesting series into two parts of alloc_user and invalidation (Jason) - Split IOMMUFD_OBJ_HW_PAGETABLE to IOMMUFD_OBJ_HWPT_PAGING/_NESTED, and do the same with the structures/alloc()/abort()/destroy(). Reworked the selftest accordingly too. (Jason) - Move hwpt/data_type into struct iommu_user_data from standalone op arguments. (Jason) - Rename hwpt_type to be data_type, the HWPT_TYPE to be HWPT_ALLOC_DATA, _TYPE_DEFAULT to be _ALLOC_DATA_NONE (Jason, Kevin) - Rename iommu_copy_user_data() to iommu_copy_struct_from_user() (Kevin) - Add macro to the iommu_copy_struct_from_user() to calculate min_size (Jason) - Fix two bugs spotted by ZhaoYan v4: https://lore.kernel.org/linux-iommu/20230921075138.124099-1-yi.l.liu@intel.… - Separate HWPT alloc/destroy/abort functions between user-managed HWPTs and kernel-managed HWPTs - Rework invalidate uAPI to be a multi-request array-based design - Add a struct iommu_user_data_array and a helper for driver to sanitize and copy the entry data from user space invalidation array - Add a patch fixing TEST_LENGTH() in selftest program - Drop IOMMU_RESV_IOVA_RANGES patches - Update kdoc and inline comments - Drop the code to add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation, this does not change the rule that resv regions should only be added to the kernel-managed HWPT. The IOMMU_RESV_SW_MSI stuff will be added in later series as it is needed only by SMMU so far. v3: https://lore.kernel.org/linux-iommu/20230724110406.107212-1-yi.l.liu@intel.… - Add new uAPI things in alphabetical order - Pass in "enum iommu_hwpt_type hwpt_type" to op->domain_alloc_user for sanity, replacing the previous op->domain_alloc_user_data_len solution - Return ERR_PTR from domain_alloc_user instead of NULL - Only add IOMMU_RESV_SW_MSI to kernel-managed HWPT in nested translation (Kevin) - Add IOMMU_RESV_IOVA_RANGES to report resv iova ranges to userspace hence userspace is able to exclude the ranges in the stage-1 HWPT (e.g. guest I/O page table). (Kevin) - Add selftest coverage for the new IOMMU_RESV_IOVA_RANGES ioctl - Minor changes per Kevin's inputs v2: https://lore.kernel.org/linux-iommu/20230511143844.22693-1-yi.l.liu@intel.c… - Add union iommu_domain_user_data to include all user data structures to avoid passing void * in kernel APIs. - Add iommu op to return user data length for user domain allocation - Rename struct iommu_hwpt_alloc::data_type to be hwpt_type - Store the invalidation data length in iommu_domain_ops::cache_invalidate_user_data_len - Convert cache_invalidate_user op to be int instead of void - Remove @data_type in struct iommu_hwpt_invalidate - Remove out_hwpt_type_bitmap in struct iommu_hw_info hence drop patch 08 of v1 v1: https://lore.kernel.org/linux-iommu/20230309080910.607396-1-yi.l.liu@intel.… Thanks, Yi Liu Lu Baolu (4): iommu: Add cache_invalidate_user op iommu/vt-d: Allow qi_submit_sync() to return the QI faults iommu/vt-d: Convert stage-1 cache invalidation to return QI fault iommu/vt-d: Add iotlb flush for nested domain Nicolin Chen (4): iommu: Add iommu_copy_struct_from_user_array helper iommufd/selftest: Add mock_domain_cache_invalidate_user support iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl Yi Liu (2): iommufd: Add IOMMU_HWPT_INVALIDATE iommufd: Add data structure for Intel VT-d stage-1 cache invalidation drivers/iommu/intel/dmar.c | 38 ++-- drivers/iommu/intel/iommu.c | 12 +- drivers/iommu/intel/iommu.h | 8 +- drivers/iommu/intel/irq_remapping.c | 2 +- drivers/iommu/intel/nested.c | 118 ++++++++++++ drivers/iommu/intel/pasid.c | 14 +- drivers/iommu/intel/svm.c | 14 +- drivers/iommu/iommufd/hw_pagetable.c | 41 ++++ drivers/iommu/iommufd/iommufd_private.h | 10 + drivers/iommu/iommufd/iommufd_test.h | 39 ++++ drivers/iommu/iommufd/main.c | 3 + drivers/iommu/iommufd/selftest.c | 86 +++++++++ include/linux/iommu.h | 100 ++++++++++ include/uapi/linux/iommufd.h | 98 ++++++++++ tools/testing/selftests/iommu/iommufd.c | 179 ++++++++++++++++++ tools/testing/selftests/iommu/iommufd_utils.h | 57 ++++++ 16 files changed, 781 insertions(+), 38 deletions(-) -- 2.34.1

4 months, 3 weeks

5
25
0 0

[PATCH RESEND v4 0/3] livepatch: Move modules to selftests and add a new test

by Marcos Paulo de Souza

Changes in v4: * Documented how to compile the livepatch selftests without running the tests (Joe) * Removed the mention to lib/livepatch on MAINTAINERS file, reported by checkpatch. Changes in v3: * Rebased on top of v6.6-rc5 * The commits messages were improved (Thanks Petr!) * Created TEST_GEN_MODS_DIR variable to point to a directly that contains kernel modules, and adapt selftests to build it before running the test. * Moved test_klp-call_getpid out of test_programs, since the gen_tar would just copy the generated test programs to the livepatches dir, and so scripts relying on test_programs/test_klp-call_getpid will fail. * Added a module_param for klp_pids, describing it's usage. * Simplified the call_getpid program to ignore the return of getpid syscall, since we only want to make sure the process transitions correctly to the patched stated * The test-syscall.sh not prints a log message showing the number of remaining processes to transition into to livepatched state, and check_output expects it to be 0. * Added MODULE_AUTHOR and MODULE_DESCRIPTION to test_klp_syscall.c - Link to v3: https://lore.kernel.org/r/20231031-send-lp-kselftests-v3-0-2b1655c2605f@sus… - Link to v2: https://lore.kernel.org/linux-kselftest/20220630141226.2802-1-mpdesouza@sus… This patchset moves the current kernel testing livepatch modules from lib/livepatches to tools/testing/selftest/livepatch/test_modules, and compiles them as out-of-tree modules before testing. There is also a new test being added. This new test exercises multiple processes calling a syscall, while a livepatch patched the syscall. Why this move is an improvement: * The modules are now compiled as out-of-tree modules against the current running kernel, making them capable of being tested on different systems with newer or older kernels. * Such approach now needs kernel-devel package to be installed, since they are out-of-tree modules. These can be generated by running "make rpm-pkg" in the kernel source. What needs to be solved: * Currently gen_tar only packages the resulting binaries of the tests, and not the sources. For the current approach, the newly added modules would be compiled and then packaged. It works when testing on a system with the same kernel version. But it will fail when running on a machine with different kernel version, since module was compiled against the kernel currently running. This is not a new problem, just aligning the expectations. For the current approach to be truly system agnostic gen_tar would need to include the module and program sources to be compiled in the target systems. Thanks in advance! Marcos Signed-off-by: Marcos Paulo de Souza <mpdesouza(a)suse.com> --- Marcos Paulo de Souza (3): kselftests: lib.mk: Add TEST_GEN_MODS_DIR variable livepatch: Move tests from lib/livepatch to selftests/livepatch selftests: livepatch: Test livepatching a heavily called syscall Documentation/dev-tools/kselftest.rst | 4 + MAINTAINERS | 1 - arch/s390/configs/debug_defconfig | 1 - arch/s390/configs/defconfig | 1 - lib/Kconfig.debug | 22 ---- lib/Makefile | 2 - lib/livepatch/Makefile | 14 --- tools/testing/selftests/lib.mk | 20 +++- tools/testing/selftests/livepatch/Makefile | 5 +- tools/testing/selftests/livepatch/README | 25 +++-- tools/testing/selftests/livepatch/config | 1 - tools/testing/selftests/livepatch/functions.sh | 34 +++--- .../testing/selftests/livepatch/test-callbacks.sh | 50 ++++----- tools/testing/selftests/livepatch/test-ftrace.sh | 6 +- .../testing/selftests/livepatch/test-livepatch.sh | 10 +- .../selftests/livepatch/test-shadow-vars.sh | 2 +- tools/testing/selftests/livepatch/test-state.sh | 18 ++-- tools/testing/selftests/livepatch/test-syscall.sh | 53 ++++++++++ tools/testing/selftests/livepatch/test-sysfs.sh | 6 +- .../selftests/livepatch/test_klp-call_getpid.c | 44 ++++++++ .../selftests/livepatch/test_modules/Makefile | 20 ++++ .../test_modules}/test_klp_atomic_replace.c | 0 .../test_modules}/test_klp_callbacks_busy.c | 0 .../test_modules}/test_klp_callbacks_demo.c | 0 .../test_modules}/test_klp_callbacks_demo2.c | 0 .../test_modules}/test_klp_callbacks_mod.c | 0 .../livepatch/test_modules}/test_klp_livepatch.c | 0 .../livepatch/test_modules}/test_klp_shadow_vars.c | 0 .../livepatch/test_modules}/test_klp_state.c | 0 .../livepatch/test_modules}/test_klp_state2.c | 0 .../livepatch/test_modules}/test_klp_state3.c | 0 .../livepatch/test_modules/test_klp_syscall.c | 116 +++++++++++++++++++++ 32 files changed, 334 insertions(+), 121 deletions(-) --- base-commit: 206ed72d6b33f53b2a8bf043f54ed6734121d26b change-id: 20231031-send-lp-kselftests-4c917dcd4565 Best regards, -- Marcos Paulo de Souza <mpdesouza(a)suse.com>

4 months, 3 weeks

4
13
0 0

[PATCH v3] kunit: run test suites only after module initialization completes

by Marco Pagani

Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") fixed a wild-memory-access bug that could have happened during the loading phase of test suites built and executed as loadable modules. However, it also introduced a problematic side effect that causes test suites modules to crash when they attempt to register fake devices. When a module is loaded, it traverses the MODULE_STATE_UNFORMED and MODULE_STATE_COMING states before reaching the normal operating state MODULE_STATE_LIVE. Finally, when the module is removed, it moves to MODULE_STATE_GOING before being released. However, if the loading function load_module() fails between complete_formation() and do_init_module(), the module goes directly from MODULE_STATE_COMING to MODULE_STATE_GOING without passing through MODULE_STATE_LIVE. This behavior was causing kunit_module_exit() to be called without having first executed kunit_module_init(). Since kunit_module_exit() is responsible for freeing the memory allocated by kunit_module_init() through kunit_filter_suites(), this behavior was resulting in a wild-memory-access bug. Commit 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") fixed this issue by running the tests when the module is still in MODULE_STATE_COMING. However, modules in that state are not fully initialized, lacking sysfs kobjects. Therefore, if a test module attempts to register a fake device, it will inevitably crash. This patch proposes a different approach to fix the original wild-memory-access bug while restoring the normal module execution flow by making kunit_module_exit() able to detect if kunit_module_init() has previously initialized the tests suite set. In this way, test modules can once again register fake devices without crashing. This behavior is achieved by checking whether mod->kunit_suites is a virtual or direct mapping address. If it is a virtual address, then kunit_module_init() has allocated the suite_set in kunit_filter_suites() using kmalloc_array(). On the contrary, if mod->kunit_suites is still pointing to the original address that was set when looking up the .kunit_test_suites section of the module, then the loading phase has failed and there's no memory to be freed. v3: - add a comment to clarify why the start address is checked v2: - add include <linux/mm.h> Fixes: 2810c1e99867 ("kunit: Fix wild-memory-access bug in kunit_free_suite_set()") Tested-by: Richard Fitzgerald <rf(a)opensource.cirrus.com> Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com> Signed-off-by: Marco Pagani <marpagan(a)redhat.com> --- lib/kunit/test.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/lib/kunit/test.c b/lib/kunit/test.c index 7aceb07a1af9..3263e0d5e0f6 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -16,6 +16,7 @@ #include <linux/panic.h> #include <linux/sched/debug.h> #include <linux/sched.h> +#include <linux/mm.h> #include "debugfs.h" #include "hooks-impl.h" @@ -775,12 +776,19 @@ static void kunit_module_exit(struct module *mod) }; const char *action = kunit_action(); + /* + * Check if the start address is a valid virtual address to detect + * if the module load sequence has failed and the suite set has not + * been initialized and filtered. + */ + if (!suite_set.start || !virt_addr_valid(suite_set.start)) + return; + if (!action) __kunit_test_suites_exit(mod->kunit_suites, mod->num_kunit_suites); - if (suite_set.start) - kunit_free_suite_set(suite_set); + kunit_free_suite_set(suite_set); } static int kunit_module_notify(struct notifier_block *nb, unsigned long val, @@ -790,12 +798,12 @@ static int kunit_module_notify(struct notifier_block *nb, unsigned long val, switch (val) { case MODULE_STATE_LIVE: + kunit_module_init(mod); break; case MODULE_STATE_GOING: kunit_module_exit(mod); break; case MODULE_STATE_COMING: - kunit_module_init(mod); break; case MODULE_STATE_UNFORMED: break; base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a -- 2.43.0

4 months, 3 weeks

4
4
0 0

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror December 2023