This is the v7 of this series which tries to implement the fd-based KVM
guest private memory. The patches are based on latest kvm/queue branch
commit:
b9b71f43683a (kvm/queue) KVM: x86/mmu: Buffer nested MMU
split_desc_cache only by default capacity
Introduction
------------
In general this patch series introduce fd-based memslot which provides
guest memory through memory file descriptor fd[offset,size] instead of
hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
and the the memory backing store exchange callbacks when such memslot
gets created. At runtime KVM will call into callbacks provided by the
backing store to get the pfn with the fd+offset. Memory backing store
will also call into KVM callbacks when userspace punch hole on the fd
to notify KVM to unmap secondary MMU page table entries.
Comparing to existing hva-based memslot, this new type of memslot allows
guest memory unmapped from host userspace like QEMU and even the kernel
itself, therefore reduce attack surface and prevent bugs.
Based on this fd-based memslot, we can build guest private memory that
is going to be used in confidential computing environments such as Intel
TDX and AMD SEV. When supported, the memory backing store can provide
more enforcement on the fd and KVM can use a single memslot to hold both
the private and shared part of the guest memory.
mm extension
---------------------
Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file
created with these flags cannot read(), write() or mmap() etc via normal
MMU operations. The file content can only be used with the newly
introduced memfile_notifier extension.
The memfile_notifier extension provides two sets of callbacks for KVM to
interact with the memory backing store:
- memfile_notifier_ops: callbacks for memory backing store to notify
KVM when memory gets invalidated.
- backing store callbacks: callbacks for KVM to call into memory
backing store to request memory pages for guest private memory.
The memfile_notifier extension also provides APIs for memory backing
store to register/unregister itself and to trigger the notifier when the
bookmarked memory gets invalidated.
The patchset also introduces a new memfd seal F_SEAL_AUTO_ALLOCATE to
prevent double allocation caused by unintentional guest when we only
have a single side of the shared/private memfds effective.
memslot extension
-----------------
Add the private fd and the fd offset to existing 'shared' memslot so
that both private/shared guest memory can live in one single memslot.
A page in the memslot is either private or shared. Whether a guest page
is private or shared is maintained through reusing existing SEV ioctls
KVM_MEMORY_ENCRYPT_{UN,}REG_REGION.
Test
----
To test the new functionalities of this patch TDX patchset is needed.
Since TDX patchset has not been merged so I did two kinds of test:
- Regresion test on kvm/queue (this patchset)
Most new code are not covered. Code also in below repo:
https://github.com/chao-p/linux/tree/privmem-v7
- New Funational test on latest TDX code
The patch is rebased to latest TDX code and tested the new
funcationalities. See below repos:
Linux: https://github.com/chao-p/linux/tree/privmem-v7-tdx
QEMU: https://github.com/chao-p/qemu/tree/privmem-v7
An example QEMU command line for TDX test:
-object tdx-guest,id=tdx,debug=off,sept-ve-disable=off \
-machine confidential-guest-support=tdx \
-object memory-backend-memfd-private,id=ram1,size=${mem} \
-machine memory-backend=ram1
Changelog
----------
v7:
- Move the private/shared info from backing store to KVM.
- Introduce F_SEAL_AUTO_ALLOCATE to avoid double allocation.
- Rework on the sync mechanism between zap/page fault paths.
- Addressed other comments in v6.
v6:
- Re-organzied patch for both mm/KVM parts.
- Added flags for memfile_notifier so its consumers can state their
features and memory backing store can check against these flags.
- Put a backing store reference in the memfile_notifier and move pfn_ops
into backing store.
- Only support boot time backing store register.
- Overall KVM part improvement suggested by Sean and some others.
v5:
- Removed userspace visible F_SEAL_INACCESSIBLE, instead using an
in-kernel flag (SHM_F_INACCESSIBLE for shmem). Private fd can only
be created by MFD_INACCESSIBLE.
- Introduced new APIs for backing store to register itself to
memfile_notifier instead of direct function call.
- Added the accounting and restriction for MFD_INACCESSIBLE memory.
- Added KVM API doc for new memslot extensions and man page for the new
MFD_INACCESSIBLE flag.
- Removed the overlap check for mapping the same file+offset into
multiple gfns due to perf consideration, warned in document.
- Addressed other comments in v4.
v4:
- Decoupled the callbacks between KVM/mm from memfd and use new
name 'memfile_notifier'.
- Supported register multiple memslots to the same backing store.
- Added per-memslot pfn_ops instead of per-system.
- Reworked the invalidation part.
- Improved new KVM uAPIs (private memslot extension and memory
error) per Sean's suggestions.
- Addressed many other minor fixes for comments from v3.
v3:
- Added locking protection when calling
invalidate_page_range/fallocate callbacks.
- Changed memslot structure to keep use useraddr for shared memory.
- Re-organized F_SEAL_INACCESSIBLE and MEMFD_OPS.
- Added MFD_INACCESSIBLE flag to force F_SEAL_INACCESSIBLE.
- Commit message improvement.
- Many small fixes for comments from the last version.
Links to previous discussions
-----------------------------
[1] Original design proposal:
https://lkml.kernel.org/kvm/20210824005248.200037-1-seanjc@google.com/
[2] Updated proposal and RFC patch v1:
https://lkml.kernel.org/linux-fsdevel/20211111141352.26311-1-chao.p.peng@li…
[3] Patch v5: https://lkml.org/lkml/2022/5/19/861
Chao Peng (12):
mm: Add F_SEAL_AUTO_ALLOCATE seal to memfd
selftests/memfd: Add tests for F_SEAL_AUTO_ALLOCATE
mm: Introduce memfile_notifier
mm/memfd: Introduce MFD_INACCESSIBLE flag
KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS
KVM: Use gfn instead of hva for mmu_notifier_retry
KVM: Rename mmu_notifier_*
KVM: Extend the memslot to support fd-based private memory
KVM: Add KVM_EXIT_MEMORY_FAULT exit
KVM: Register/unregister the guest private memory regions
KVM: Handle page fault for private memory
KVM: Enable and expose KVM_MEM_PRIVATE
Kirill A. Shutemov (1):
mm/shmem: Support memfile_notifier
Documentation/virt/kvm/api.rst | 77 +++++-
arch/arm64/kvm/mmu.c | 8 +-
arch/mips/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/mmu.c | 10 +-
arch/powerpc/include/asm/kvm_book3s_64.h | 2 +-
arch/powerpc/kvm/book3s_64_mmu_host.c | 4 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 6 +-
arch/powerpc/kvm/book3s_hv_nested.c | 2 +-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 8 +-
arch/powerpc/kvm/e500_mmu_host.c | 4 +-
arch/riscv/kvm/mmu.c | 4 +-
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kvm/Kconfig | 3 +
arch/x86/kvm/mmu.h | 2 -
arch/x86/kvm/mmu/mmu.c | 74 +++++-
arch/x86/kvm/mmu/mmu_internal.h | 18 ++
arch/x86/kvm/mmu/mmutrace.h | 1 +
arch/x86/kvm/mmu/paging_tmpl.h | 4 +-
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 105 +++++---
include/linux/memfile_notifier.h | 91 +++++++
include/linux/shmem_fs.h | 2 +
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/kvm.h | 37 +++
include/uapi/linux/memfd.h | 1 +
mm/Kconfig | 4 +
mm/Makefile | 1 +
mm/memfd.c | 18 +-
mm/memfile_notifier.c | 123 ++++++++++
mm/shmem.c | 125 +++++++++-
tools/testing/selftests/memfd/memfd_test.c | 166 +++++++++++++
virt/kvm/Kconfig | 3 +
virt/kvm/kvm_main.c | 272 ++++++++++++++++++---
virt/kvm/pfncache.c | 14 +-
35 files changed, 1074 insertions(+), 127 deletions(-)
create mode 100644 include/linux/memfile_notifier.h
create mode 100644 mm/memfile_notifier.c
--
2.25.1
From: SeongJae Park <sjpark(a)amazon.de>
When running a test program, 'run_one()' checks if the program has the
execution permission and fails if it doesn't. However, it's easy to
mistakenly missing the permission, as some common tools like 'diff'
don't support the permission change well[1]. Compared to that, making
mistakes in the test program's path would only rare, as those are
explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense
to resolve the situation on our own and run the program.
For the reason, this commit makes the test program runner function to
still print the warning message but try parsing the interpreter of the
program and explicitly run it with the interpreter, in the case.
[1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
Changes from v1
(https://lore.kernel.org/linux-kselftest/20210810140459.23990-1-sj38.park@gm…)
- Parse and use the interpreter instead of changing the file
tools/testing/selftests/kselftest/runner.sh | 28 +++++++++++++--------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index cc9c846585f0..a9ba782d8ca0 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -33,9 +33,9 @@ tap_timeout()
{
# Make sure tests will time out if utility is available.
if [ -x /usr/bin/timeout ] ; then
- /usr/bin/timeout --foreground "$kselftest_timeout" "$1"
+ /usr/bin/timeout --foreground "$kselftest_timeout" $1
else
- "$1"
+ $1
fi
}
@@ -65,17 +65,25 @@ run_one()
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
- if [ ! -x "$TEST" ]; then
- echo -n "# Warning: file $TEST is "
- if [ ! -e "$TEST" ]; then
- echo "missing!"
- else
- echo "not executable, correct this."
- fi
+ if [ ! -e "$TEST" ]; then
+ echo "# Warning: file $TEST is missing!"
echo "not ok $test_num $TEST_HDR_MSG"
else
+ cmd="./$BASENAME_TEST"
+ if [ ! -x "$TEST" ]; then
+ echo "# Warning: file $TEST is not executable"
+
+ if [ $(head -n 1 "$TEST" | cut -c -2) = "#!" ]
+ then
+ interpreter=$(head -n 1 "$TEST" | cut -c 3-)
+ cmd="$interpreter ./$BASENAME_TEST"
+ else
+ echo "not ok $test_num $TEST_HDR_MSG"
+ return
+ fi
+ fi
cd `dirname $TEST` > /dev/null
- ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout "$cmd" 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.17.1
Sorry for the delay in this update.
Changes from v1:
* Improve the skip message along with the changelog massage (Suah Khan).
* Simplify the feature support check (Suah Khan).
=== Cover Letter ===
A couple of test updates are included:
* With the STRICT_SIGALTSTACK_SIZE option [1,2], the kernel's altstack
check becomes stringent. The x86 sigaltstack test is ignorant about this.
Adjust the test now. This check was established [3] to ensure every AMX
task's altstack is sufficient (regardless of that option) [4].
* The AMX test wrongly fails on non-AMX machines. Fix the code to skip the
test instead.
The series is available in this repository:
git://github.com/intel/amx-linux.git selftest
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arc…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Doc…
[3] 3aac3ebea08f ("x86/signal: Implement sigaltstack size validation")
[4] 4b7ca609a33d ("x86/signal: Use fpu::__state_user_size for sigalt stack validation")
Chang S. Bae (2):
selftests/x86/signal: Adjust the test to the kernel's altstack check
selftests/x86/amx: Fix the test to avoid failure when AMX is
unavailable
tools/testing/selftests/x86/amx.c | 31 ++++++++---------------
tools/testing/selftests/x86/sigaltstack.c | 14 +++++++++-
2 files changed, 23 insertions(+), 22 deletions(-)
base-commit: 32346491ddf24599decca06190ebca03ff9de7f8
--
2.17.1
The Felix VSC9959 switch in NXP LS1028A supports the tc-gate action
which enforced time-based access control per stream. A stream as seen by
this switch is identified by {MAC DA, VID}.
We use the standard forwarding selftest topology with 2 host interfaces
and 2 switch interfaces. The host ports must require timestamping non-IP
packets and supporting tc-etf offload, for isochron to work. The
isochron program monitors network sync status (ptp4l, phc2sys) and
deterministically transmits packets to the switch such that the tc-gate
action either (a) always accepts them based on its schedule, or
(b) always drops them.
I tried to keep as much of the logic that isn't specific to the NXP
LS1028A in a new tsn_lib.sh, for future reuse. This covers
synchronization using ptp4l and phc2sys, and isochron.
The cycle-time chosen for this selftest isn't particularly impressive
(and the focus is the functionality of the switch), but I didn't really
know what to do better, considering that it will mostly be run during
debugging sessions, various kernel bloatware would be enabled, like
lockdep, KASAN, etc, and we certainly can't run any races with those on.
I tried to look through the kselftest framework for other real time
applications and didn't really find any, so I'm not sure how better to
prepare the environment in case we want to go for a lower cycle time.
At the moment, the only thing the selftest is ensuring is that dynamic
frequency scaling is disabled on the CPU that isochron runs on. It would
probably be useful to have a blacklist of kernel config options (checked
through zcat /proc/config.gz) and some cyclictest scripts to run
beforehand, but I saw none of those.
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
v1->v2:
- fix an off-by-one bug introduced at the last minute regarding which
tc-mqprio queue was used for tc-etf and SO_TXTIME
- introduce debugging for packets incorrectly received / incorrectly
dropped based on "isochron report"
- make the tsn_lib.sh dependency on isochron and linuxptp optional via
REQUIRE_ISOCHRON and REQUIRE_LINUXPTP
- avoid errors when CONFIG_CPU_FREQ is disabled
- consistently use SCHED_FIFO instead of SCHED_RR for the isochron
receiver
.../selftests/drivers/net/ocelot/psfp.sh | 327 ++++++++++++++++++
.../selftests/net/forwarding/tsn_lib.sh | 235 +++++++++++++
2 files changed, 562 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/ocelot/psfp.sh
create mode 100644 tools/testing/selftests/net/forwarding/tsn_lib.sh
diff --git a/tools/testing/selftests/drivers/net/ocelot/psfp.sh b/tools/testing/selftests/drivers/net/ocelot/psfp.sh
new file mode 100755
index 000000000000..5a5cee92c665
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/ocelot/psfp.sh
@@ -0,0 +1,327 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2021-2022 NXP
+
+# Note: On LS1028A, in lack of enough user ports, this setup requires patching
+# the device tree to use the second CPU port as a user port
+
+WAIT_TIME=1
+NUM_NETIFS=4
+STABLE_MAC_ADDRS=yes
+NETIF_CREATE=no
+lib_dir=$(dirname $0)/../../../net/forwarding
+source $lib_dir/tc_common.sh
+source $lib_dir/lib.sh
+source $lib_dir/tsn_lib.sh
+
+UDS_ADDRESS_H1="/var/run/ptp4l_h1"
+UDS_ADDRESS_SWP1="/var/run/ptp4l_swp1"
+
+# Tunables
+NUM_PKTS=1000
+STREAM_VID=100
+STREAM_PRIO=6
+# Use a conservative cycle of 10 ms to allow the test to still pass when the
+# kernel has some extra overhead like lockdep etc
+CYCLE_TIME_NS=10000000
+# Create two Gate Control List entries, one OPEN and one CLOSE, of equal
+# durations
+GATE_DURATION_NS=$((${CYCLE_TIME_NS} / 2))
+# Give 2/3 of the cycle time to user space and 1/3 to the kernel
+FUDGE_FACTOR=$((${CYCLE_TIME_NS} / 3))
+# Shift the isochron base time by half the gate time, so that packets are
+# always received by swp1 close to the middle of the time slot, to minimize
+# inaccuracies due to network sync
+SHIFT_TIME_NS=$((${GATE_DURATION_NS} / 2))
+
+h1=${NETIFS[p1]}
+swp1=${NETIFS[p2]}
+swp2=${NETIFS[p3]}
+h2=${NETIFS[p4]}
+
+H1_IPV4="192.0.2.1"
+H2_IPV4="192.0.2.2"
+H1_IPV6="2001:db8:1::1"
+H2_IPV6="2001:db8:1::2"
+
+# Chain number exported by the ocelot driver for
+# Per-Stream Filtering and Policing filters
+PSFP()
+{
+ echo 30000
+}
+
+psfp_chain_create()
+{
+ local if_name=$1
+
+ tc qdisc add dev $if_name clsact
+
+ tc filter add dev $if_name ingress chain 0 pref 49152 flower \
+ skip_sw action goto chain $(PSFP)
+}
+
+psfp_chain_destroy()
+{
+ local if_name=$1
+
+ tc qdisc del dev $if_name clsact
+}
+
+psfp_filter_check()
+{
+ local expected=$1
+ local packets=""
+ local drops=""
+ local stats=""
+
+ stats=$(tc -j -s filter show dev ${swp1} ingress chain $(PSFP) pref 1)
+ packets=$(echo ${stats} | jq ".[1].options.actions[].stats.packets")
+ drops=$(echo ${stats} | jq ".[1].options.actions[].stats.drops")
+
+ if ! [ "${packets}" = "${expected}" ]; then
+ printf "Expected filter to match on %d packets but matched on %d instead\n" \
+ "${expected}" "${packets}"
+ fi
+
+ echo "Hardware filter reports ${drops} drops"
+}
+
+h1_create()
+{
+ simple_if_init $h1 $H1_IPV4/24 $H1_IPV6/64
+}
+
+h1_destroy()
+{
+ simple_if_fini $h1 $H1_IPV4/24 $H1_IPV6/64
+}
+
+h2_create()
+{
+ simple_if_init $h2 $H2_IPV4/24 $H2_IPV6/64
+}
+
+h2_destroy()
+{
+ simple_if_fini $h2 $H2_IPV4/24 $H2_IPV6/64
+}
+
+switch_create()
+{
+ local h2_mac_addr=$(mac_get $h2)
+
+ ip link set ${swp1} up
+ ip link set ${swp2} up
+
+ ip link add br0 type bridge vlan_filtering 1
+ ip link set ${swp1} master br0
+ ip link set ${swp2} master br0
+ ip link set br0 up
+
+ bridge vlan add dev ${swp2} vid ${STREAM_VID}
+ bridge vlan add dev ${swp1} vid ${STREAM_VID}
+ # PSFP on Ocelot requires the filter to also be added to the bridge
+ # FDB, and not be removed
+ bridge fdb add dev ${swp2} \
+ ${h2_mac_addr} vlan ${STREAM_VID} static master
+
+ psfp_chain_create ${swp1}
+
+ tc filter add dev ${swp1} ingress chain $(PSFP) pref 1 \
+ protocol 802.1Q flower skip_sw \
+ dst_mac ${h2_mac_addr} vlan_id ${STREAM_VID} \
+ action gate base-time 0.000000000 \
+ sched-entry OPEN ${GATE_DURATION_NS} -1 -1 \
+ sched-entry CLOSE ${GATE_DURATION_NS} -1 -1
+}
+
+switch_destroy()
+{
+ psfp_chain_destroy ${swp1}
+ ip link del br0
+}
+
+txtime_setup()
+{
+ local if_name=$1
+
+ tc qdisc add dev ${if_name} clsact
+ # Classify PTP on TC 7 and isochron on TC 6
+ tc filter add dev ${if_name} egress protocol 0x88f7 \
+ flower action skbedit priority 7
+ tc filter add dev ${if_name} egress protocol 802.1Q \
+ flower vlan_ethtype 0xdead action skbedit priority 6
+ tc qdisc add dev ${if_name} handle 100: parent root mqprio num_tc 8 \
+ queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
+ map 0 1 2 3 4 5 6 7 \
+ hw 1
+ # Set up TC 6 for SO_TXTIME. tc-mqprio queues count from 1.
+ tc qdisc replace dev ${if_name} parent 100:$((${STREAM_PRIO} + 1)) etf \
+ clockid CLOCK_TAI offload delta ${FUDGE_FACTOR}
+}
+
+txtime_cleanup()
+{
+ local if_name=$1
+
+ tc qdisc del dev ${if_name} root
+ tc qdisc del dev ${if_name} clsact
+}
+
+setup_prepare()
+{
+ vrf_prepare
+
+ h1_create
+ h2_create
+ switch_create
+
+ txtime_setup ${h1}
+
+ # Set up swp1 as a master PHC for h1, synchronized to the local
+ # CLOCK_REALTIME.
+ phc2sys_start ${swp1} ${UDS_ADDRESS_SWP1}
+
+ # Assumption true for LS1028A: h1 and h2 use the same PHC. So by
+ # synchronizing h1 to swp1 via PTP, h2 is also implicitly synchronized
+ # to swp1 (and both to CLOCK_REALTIME).
+ ptp4l_start ${h1} true ${UDS_ADDRESS_H1}
+ ptp4l_start ${swp1} false ${UDS_ADDRESS_SWP1}
+
+ # Make sure there are no filter matches at the beginning of the test
+ psfp_filter_check 0
+}
+
+cleanup()
+{
+ pre_cleanup
+
+ ptp4l_stop ${swp1}
+ ptp4l_stop ${h1}
+ phc2sys_stop
+ isochron_recv_stop
+
+ txtime_cleanup ${h1}
+
+ h2_destroy
+ h1_destroy
+ switch_destroy
+
+ vrf_cleanup
+}
+
+debug_incorrectly_dropped_packets()
+{
+ local isochron_dat=$1
+ local dropped_seqids
+ local seqid
+
+ echo "Packets incorrectly dropped:"
+
+ dropped_seqids=$(isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "%u RX hw %T\n" \
+ --printf-args "qR" | \
+ grep 'RX hw 0.000000000' | \
+ awk '{print $1}')
+
+ for seqid in ${dropped_seqids}; do
+ isochron report \
+ --input-file "${isochron_dat}" \
+ --start ${seqid} --stop ${seqid} \
+ --printf-format "seqid %u scheduled for %T, HW TX timestamp %T\n" \
+ --printf-args "qST"
+ done
+}
+
+debug_incorrectly_received_packets()
+{
+ local isochron_dat=$1
+
+ echo "Packets incorrectly received:"
+
+ isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "seqid %u scheduled for %T, HW TX timestamp %T, HW RX timestamp %T\n" \
+ --printf-args "qSTR" |
+ grep -v 'HW RX timestamp 0.000000000'
+}
+
+run_test()
+{
+ local base_time=$1
+ local expected=$2
+ local test_name=$3
+ local debug=$4
+ local isochron_dat="$(mktemp)"
+ local extra_args=""
+ local received
+
+ isochron_do \
+ "${h1}" \
+ "${h2}" \
+ "${UDS_ADDRESS_H1}" \
+ "" \
+ "${base_time}" \
+ "${CYCLE_TIME_NS}" \
+ "${SHIFT_TIME_NS}" \
+ "${NUM_PKTS}" \
+ "${STREAM_VID}" \
+ "${STREAM_PRIO}" \
+ "" \
+ "${isochron_dat}"
+
+ # Count all received packets by looking at the non-zero RX timestamps
+ received=$(isochron report \
+ --input-file "${isochron_dat}" \
+ --printf-format "%u\n" --printf-args "R" | \
+ grep -w -v '0' | wc -l)
+
+ if [ "${received}" = "${expected}" ]; then
+ RET=0
+ else
+ RET=1
+ echo "Expected isochron to receive ${expected} packets but received ${received}"
+ fi
+
+ log_test "${test_name}"
+
+ if [ "$RET" = "1" ]; then
+ ${debug} "${isochron_dat}"
+ fi
+
+ rm ${isochron_dat} 2> /dev/null
+}
+
+test_gate_in_band()
+{
+ # Send packets in-band with the OPEN gate entry
+ run_test 0.000000000 ${NUM_PKTS} "In band" \
+ debug_incorrectly_dropped_packets
+
+ psfp_filter_check ${NUM_PKTS}
+}
+
+test_gate_out_of_band()
+{
+ # Send packets in-band with the CLOSE gate entry
+ run_test 0.005000000 0 "Out of band" \
+ debug_incorrectly_received_packets
+
+ psfp_filter_check $((2 * ${NUM_PKTS}))
+}
+
+trap cleanup EXIT
+
+ALL_TESTS="
+ test_gate_in_band
+ test_gate_out_of_band
+"
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
diff --git a/tools/testing/selftests/net/forwarding/tsn_lib.sh b/tools/testing/selftests/net/forwarding/tsn_lib.sh
new file mode 100644
index 000000000000..60a1423e8116
--- /dev/null
+++ b/tools/testing/selftests/net/forwarding/tsn_lib.sh
@@ -0,0 +1,235 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright 2021-2022 NXP
+
+REQUIRE_ISOCHRON=${REQUIRE_ISOCHRON:=yes}
+REQUIRE_LINUXPTP=${REQUIRE_LINUXPTP:=yes}
+
+# Tunables
+UTC_TAI_OFFSET=37
+ISOCHRON_CPU=1
+
+if [[ "$REQUIRE_ISOCHRON" = "yes" ]]; then
+ # https://github.com/vladimiroltean/tsn-scripts
+ # WARNING: isochron versions pre-1.0 are unstable,
+ # always use the latest version
+ require_command isochron
+fi
+if [[ "$REQUIRE_LINUXPTP" = "yes" ]]; then
+ require_command phc2sys
+ require_command ptp4l
+fi
+
+phc2sys_start()
+{
+ local if_name=$1
+ local uds_address=$2
+ local extra_args=""
+
+ if ! [ -z "${uds_address}" ]; then
+ extra_args="${extra_args} -z ${uds_address}"
+ fi
+
+ phc2sys_log="$(mktemp)"
+
+ chrt -f 10 phc2sys -m \
+ -c ${if_name} \
+ -s CLOCK_REALTIME \
+ -O ${UTC_TAI_OFFSET} \
+ --step_threshold 0.00002 \
+ --first_step_threshold 0.00002 \
+ ${extra_args} \
+ > "${phc2sys_log}" 2>&1 &
+ phc2sys_pid=$!
+
+ echo "phc2sys logs to ${phc2sys_log} and has pid ${phc2sys_pid}"
+
+ sleep 1
+}
+
+phc2sys_stop()
+{
+ { kill ${phc2sys_pid} && wait ${phc2sys_pid}; } 2> /dev/null
+ rm "${phc2sys_log}" 2> /dev/null
+}
+
+ptp4l_start()
+{
+ local if_name=$1
+ local slave_only=$2
+ local uds_address=$3
+ local log="ptp4l_log_${if_name}"
+ local pid="ptp4l_pid_${if_name}"
+ local extra_args=""
+
+ if [ "${slave_only}" = true ]; then
+ extra_args="${extra_args} -s"
+ fi
+
+ # declare dynamic variables ptp4l_log_${if_name} and ptp4l_pid_${if_name}
+ # as global, so that they can be referenced later
+ declare -g "${log}=$(mktemp)"
+
+ chrt -f 10 ptp4l -m -2 -P \
+ -i ${if_name} \
+ --step_threshold 0.00002 \
+ --first_step_threshold 0.00002 \
+ --tx_timestamp_timeout 100 \
+ --uds_address="${uds_address}" \
+ ${extra_args} \
+ > "${!log}" 2>&1 &
+ declare -g "${pid}=$!"
+
+ echo "ptp4l for interface ${if_name} logs to ${!log} and has pid ${!pid}"
+
+ sleep 1
+}
+
+ptp4l_stop()
+{
+ local if_name=$1
+ local log="ptp4l_log_${if_name}"
+ local pid="ptp4l_pid_${if_name}"
+
+ { kill ${!pid} && wait ${!pid}; } 2> /dev/null
+ rm "${!log}" 2> /dev/null
+}
+
+cpufreq_max()
+{
+ local cpu=$1
+ local freq="cpu${cpu}_freq"
+ local governor="cpu${cpu}_governor"
+
+ # Kernel may be compiled with CONFIG_CPU_FREQ disabled
+ if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then
+ return
+ fi
+
+ # declare dynamic variables cpu${cpu}_freq and cpu${cpu}_governor as
+ # global, so they can be referenced later
+ declare -g "${freq}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq)"
+ declare -g "${governor}=$(cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor)"
+
+ cat /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_max_freq > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq
+ echo -n "performance" > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor
+}
+
+cpufreq_restore()
+{
+ local cpu=$1
+ local freq="cpu${cpu}_freq"
+ local governor="cpu${cpu}_governor"
+
+ if ! [ -d /sys/bus/cpu/devices/cpu${cpu}/cpufreq ]; then
+ return
+ fi
+
+ echo "${!freq}" > /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_min_freq
+ echo -n "${!governor}" > \
+ /sys/bus/cpu/devices/cpu${cpu}/cpufreq/scaling_governor
+}
+
+isochron_recv_start()
+{
+ local if_name=$1
+ local uds=$2
+ local extra_args=$3
+
+ if ! [ -z "${uds}" ]; then
+ extra_args="--unix-domain-socket ${uds}"
+ fi
+
+ isochron rcv \
+ --interface ${if_name} \
+ --sched-priority 98 \
+ --sched-fifo \
+ --utc-tai-offset ${UTC_TAI_OFFSET} \
+ --quiet \
+ ${extra_args} & \
+ isochron_pid=$!
+
+ sleep 1
+}
+
+isochron_recv_stop()
+{
+ { kill ${isochron_pid} && wait ${isochron_pid}; } 2> /dev/null
+}
+
+isochron_do()
+{
+ local sender_if_name=$1; shift
+ local receiver_if_name=$1; shift
+ local sender_uds=$1; shift
+ local receiver_uds=$1; shift
+ local base_time=$1; shift
+ local cycle_time=$1; shift
+ local shift_time=$1; shift
+ local num_pkts=$1; shift
+ local vid=$1; shift
+ local priority=$1; shift
+ local dst_ip=$1; shift
+ local isochron_dat=$1; shift
+ local extra_args=""
+ local receiver_extra_args=""
+ local vrf="$(master_name_get ${sender_if_name})"
+ local use_l2="true"
+
+ if ! [ -z "${dst_ip}" ]; then
+ use_l2="false"
+ fi
+
+ if ! [ -z "${vrf}" ]; then
+ dst_ip="${dst_ip}%${vrf}"
+ fi
+
+ if ! [ -z "${vid}" ]; then
+ vid="--vid=${vid}"
+ fi
+
+ if [ -z "${receiver_uds}" ]; then
+ extra_args="${extra_args} --omit-remote-sync"
+ fi
+
+ if ! [ -z "${shift_time}" ]; then
+ extra_args="${extra_args} --shift-time=${shift_time}"
+ fi
+
+ if [ "${use_l2}" = "true" ]; then
+ extra_args="${extra_args} --l2 --etype=0xdead ${vid}"
+ receiver_extra_args="--l2 --etype=0xdead"
+ else
+ extra_args="${extra_args} --l4 --ip-destination=${dst_ip}"
+ receiver_extra_args="--l4"
+ fi
+
+ cpufreq_max ${ISOCHRON_CPU}
+
+ isochron_recv_start "${h2}" "${receiver_uds}" "${receiver_extra_args}"
+
+ isochron send \
+ --interface ${sender_if_name} \
+ --unix-domain-socket ${sender_uds} \
+ --priority ${priority} \
+ --base-time ${base_time} \
+ --cycle-time ${cycle_time} \
+ --num-frames ${num_pkts} \
+ --frame-size 64 \
+ --txtime \
+ --utc-tai-offset ${UTC_TAI_OFFSET} \
+ --cpu-mask $((1 << ${ISOCHRON_CPU})) \
+ --sched-fifo \
+ --sched-priority 98 \
+ --client 127.0.0.1 \
+ --sync-threshold 5000 \
+ --output-file ${isochron_dat} \
+ ${extra_args} \
+ --quiet
+
+ isochron_recv_stop
+
+ cpufreq_restore ${ISOCHRON_CPU}
+}
--
2.25.1
This patch series was first developed as part of the LKCamp hackathon
that happened last year[1], mainly focusing on refactoring DRM tests to
use KUnit.
KUnit[2][3] is a unified test framework that provides helper tools,
simplifying their development and execution. Using an x86-64 machine
it's possible to run tests in the host's kernel natively using user-mode
Linux[4] (aka UML), which simplifies usage in a wide variety of
scenarios, including integration to CI.
As the tool's adoption widens into graphics testing territory, I and
LKCamp members figured it would be important to support it in IGT, as
it's a core tool for GPU drivers maintainers.
I have then added KUnit support into IGT mainly following the KTAP
specs, and it can be tested using patch 4/4 in this series together with
a DRM selftests patch series available at [5].
Changes since v1:
- Major rework of parsing function structure:
- It is not longer recursive
- Adapt kselftests functions and structs to be used with KUnit
- Switch DRM selftests to KUnit parsing as they're updated in the kernel
- Replace AMD KUnit tests by DRM selftests
[1]: https://groups.google.com/g/kunit-dev/c/YqFR1q2uZvk/m/IbvItSfHBAAJ
[2]: https://kunit.dev
[3]: https://docs.kernel.org/dev-tools/kunit/index.html
[4]: http://user-mode-linux.sourceforge.net
[5]: https://lore.kernel.org/all/20220708203052.236290-1-maira.canal@usp.br/
Isabella Basso (4):
lib/igt_kmod: rename kselftest functions to ktest
lib/igt_kmod.c: check if module is builtin before attempting to unload
it
lib/igt_kmod: add compatibility for KUnit
tests: DRM selftests: switch to KUnit
lib/igt_kmod.c | 315 +++++++++++++++++++++++++++++++++++++++++--
lib/igt_kmod.h | 14 +-
tests/drm_buddy.c | 7 +-
tests/drm_mm.c | 7 +-
tests/kms_selftest.c | 12 +-
5 files changed, 329 insertions(+), 26 deletions(-)
--
2.37.2