On Tue, Sep 30, 2025 at 10:58:46AM +0200, Stefano Garzarella wrote:
On Tue, Sep 16, 2025 at 04:43:53PM -0700, Bobby Eshleman wrote:
From: Bobby Eshleman bobbyeshleman@meta.com
Add tests for namespace support in vsock. Use socat for basic connection failure tests and vsock_test for full functionality tests when communication is expected to succeed. vsock_test is not used for failure cases because in theory vsock_test could allow connection and some traffic flow but fail on some other case (e.g., fail on MSG_ZEROCOPY).
Tests cover all cases of clients and servers being in all variants of local ns, global ns, host process, and VM process.
Legacy tests are retained and executed in the init ns.
Signed-off-by: Bobby Eshleman bobbyeshleman@meta.com
Changes in v6:
- check for namespace support in vmtest.sh
Changes in v5:
- use /proc/sys/net/vsock/ns_mode
- clarify logic of tests that reuse the same VM and tests that require
netns setup
- fix unassigned BUILD bug
tools/testing/selftests/vsock/vmtest.sh | 954 ++++++++++++++++++++++++++++---- 1 file changed, 849 insertions(+), 105 deletions(-)
diff --git a/tools/testing/selftests/vsock/vmtest.sh b/tools/testing/selftests/vsock/vmtest.sh index 5e36d1068f6f..59621b32cf1a 100755 --- a/tools/testing/selftests/vsock/vmtest.sh +++ b/tools/testing/selftests/vsock/vmtest.sh @@ -7,6 +7,7 @@ # * virtme-ng # * busybox-static (used by virtme-ng) # * qemu (used by virtme-ng) +# * socat
readonly SCRIPT_DIR="$(cd -P -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd -P)" readonly KERNEL_CHECKOUT=$(realpath "${SCRIPT_DIR}"/../../../../) @@ -23,7 +24,7 @@ readonly VSOCK_CID=1234 readonly WAIT_PERIOD=3 readonly WAIT_PERIOD_MAX=60 readonly WAIT_TOTAL=$(( WAIT_PERIOD * WAIT_PERIOD_MAX )) -readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) +readonly WAIT_QEMU=5
# virtme-ng offers a netdev for ssh when using "--ssh", but we also need a # control port forwarded for vsock_test. Because virtme-ng doesn't support @@ -33,23 +34,146 @@ readonly QEMU_PIDFILE=$(mktemp /tmp/qemu_vsock_vmtest_XXXX.pid) # add the kernel cmdline options that virtme-init uses to setup the interface. readonly QEMU_TEST_PORT_FWD="hostfwd=tcp::${TEST_HOST_PORT}-:${TEST_GUEST_PORT}" readonly QEMU_SSH_PORT_FWD="hostfwd=tcp::${SSH_HOST_PORT}-:${SSH_GUEST_PORT}" -readonly QEMU_OPTS="\
-netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
-device virtio-net-pci,netdev=n0 \
-device vhost-vsock-pci,guest-cid=${VSOCK_CID} \
--pidfile ${QEMU_PIDFILE} \
-"
I expected this patch to only add new tests, but we are changing a few things. Are they related, or can they be moved to another preparation patch?
Unfortunately they are related. For example, this pidfile change is because this patch introduces testing two VMs up at once (to test if their CIDs collide), so they each need unique pidfiles.
Definitely can be moved to a preparation patch, will do next rev.
readonly KERNEL_CMDLINE="\ virtme.dhcp net.ifnames=0 biosdevname=0 \ virtme.ssh virtme_ssh_channel=tcp virtme_ssh_user=$USER \ " readonly LOG=$(mktemp /tmp/vsock_vmtest_XXXX.log) -readonly TEST_NAMES=(vm_server_host_client vm_client_host_server vm_loopback) +readonly TEST_NAMES=(
- vm_server_host_client
- vm_client_host_server
- vm_loopback
- host_vsock_ns_mode_ok
can we use a `ns_` prefix for all ns related tests?
Yep, np.
- host_vsock_ns_mode_write_once_ok
- global_same_cid_fails
- local_same_cid_ok
- global_local_same_cid_ok
- local_global_same_cid_ok
- diff_ns_global_host_connect_to_global_vm_ok
- diff_ns_global_host_connect_to_local_vm_fails
- diff_ns_global_vm_connect_to_global_host_ok
- diff_ns_global_vm_connect_to_local_host_fails
- diff_ns_local_host_connect_to_local_vm_fails
- diff_ns_local_vm_connect_to_local_host_fails
- diff_ns_global_to_local_loopback_local_fails
- diff_ns_local_to_global_loopback_fails
- diff_ns_local_to_local_loopback_fails
- diff_ns_global_to_global_loopback_ok
- same_ns_local_loopback_ok
- same_ns_local_host_connect_to_local_vm_ok
- same_ns_local_vm_connect_to_local_host_ok
+)
readonly TEST_DESCS=(
- # vm_server_host_client "Run vsock_test in server mode on the VM and in client mode on the host."
- # vm_client_host_server "Run vsock_test in client mode on the VM and in server mode on the host."
- # vm_loopback "Run vsock_test using the loopback transport in the VM."
- # host_vsock_ns_mode_ok
- "Check /proc/sys/net/vsock/ns_mode strings on the host."
- # host_vsock_ns_mode_write_once_ok
- "Check /proc/sys/net/vsock/ns_mode is write-once on the host."
- # global_same_cid_fails
- "Check QEMU fails to start two VMs with same CID in two different global namespaces."
- # local_same_cid_ok
- "Check QEMU successfully starts two VMs with same CID in two different local namespaces."
- # global_local_same_cid_ok
- "Check QEMU successfully starts one VM in a global ns and then another VM in a local ns with the same CID."
- # local_global_same_cid_ok
- "Check QEMU successfully starts one VM in a local ns and then another VM in a global ns with the same CID."
- # diff_ns_global_host_connect_to_global_vm_ok
- "Run vsock_test client in global ns with server in VM in another global ns."
- # diff_ns_global_host_connect_to_local_vm_fails
- "Run socat to test a process in a global ns fails to connect to a VM in a local ns."
- # diff_ns_global_vm_connect_to_global_host_ok
- "Run vsock_test client in VM in a global ns with server in another global ns."
- # diff_ns_global_vm_connect_to_local_host_fails
- "Run socat to test a VM in a global ns fails to connect to a host process in a local ns."
- # diff_ns_local_host_connect_to_local_vm_fails
- "Run socat to test a host process in a local ns fails to connect to a VM in another local ns."
- # diff_ns_local_vm_connect_to_local_host_fails
- "Run socat to test a VM in a local ns fails to connect to a host process in another local ns."
- # diff_ns_global_to_local_loopback_local_fails
- "Run socat to test a loopback vsock in a global ns fails to connect to a vsock in a local ns."
- # diff_ns_local_to_global_loopback_fails
- "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in a global ns."
- # diff_ns_local_to_local_loopback_fails
- "Run socat to test a loopback vsock in a local ns fails to connect to a vsock in another local ns."
- # diff_ns_global_to_global_loopback_ok
- "Run socat to test a loopback vsock in a global ns successfully connects to a vsock in another global ns."
- # same_ns_local_loopback_ok
- "Run socat to test a loopback vsock in a local ns successfully connects to a vsock in the same ns."
- # same_ns_local_host_connect_to_local_vm_ok
- "Run vsock_test client in a local ns with server in VM in same ns."
- # same_ns_local_vm_connect_to_local_host_ok
- "Run vsock_test client in VM in a local ns with server in same ns."
Should we run some test to check edge cases like namespace deletion during active connections or changing ns mode from global to local while running.
Sgtm!
+)
+readonly USE_SHARED_VM=(vm_server_host_client vm_client_host_server vm_loopback) +readonly USE_INIT_NETNS=(
- global_same_cid_fails
- local_same_cid_ok
- global_local_same_cid_ok
- local_global_same_cid_ok
- diff_ns_global_host_connect_to_global_vm_ok
- diff_ns_global_host_connect_to_local_vm_fails
- diff_ns_global_vm_connect_to_global_host_ok
- diff_ns_global_vm_connect_to_local_host_fails
- diff_ns_local_host_connect_to_local_vm_fails
- diff_ns_local_vm_connect_to_local_host_fails
- diff_ns_global_to_local_loopback_local_fails
- diff_ns_local_to_global_loopback_fails
- diff_ns_local_to_local_loopback_fails
- diff_ns_global_to_global_loopback_ok
- same_ns_local_loopback_ok
- same_ns_local_host_connect_to_local_vm_ok
- same_ns_local_vm_connect_to_local_host_ok
+) +readonly REQUIRES_NETNS=(
- host_vsock_ns_mode_ok
- host_vsock_ns_mode_write_once_ok
- global_same_cid_fails
- local_same_cid_ok
- global_local_same_cid_ok
- local_global_same_cid_ok
- diff_ns_global_host_connect_to_global_vm_ok
- diff_ns_global_host_connect_to_local_vm_fails
- diff_ns_global_vm_connect_to_global_host_ok
- diff_ns_global_vm_connect_to_local_host_fails
- diff_ns_local_host_connect_to_local_vm_fails
- diff_ns_local_vm_connect_to_local_host_fails
- diff_ns_global_to_local_loopback_local_fails
- diff_ns_local_to_global_loopback_fails
- diff_ns_local_to_local_loopback_fails
- diff_ns_global_to_global_loopback_ok
- same_ns_local_loopback_ok
- same_ns_local_host_connect_to_local_vm_ok
- same_ns_local_vm_connect_to_local_host_ok
) +readonly MODES=("local" "global")
What about NS_MODES ?
Roger that.
readonly LOG_LEVEL_DEBUG=0 readonly LOG_LEVEL_INFO=1 @@ -58,6 +182,12 @@ readonly LOG_LEVEL_ERROR=3
VERBOSE="${LOG_LEVEL_WARN}"
+# Test pass/fail counters +cnt_pass=0 +cnt_fail=0 +cnt_skip=0 +cnt_total=0
usage() { local name local desc @@ -77,7 +207,7 @@ usage() { for ((i = 0; i < ${#TEST_NAMES[@]}; i++)); do name=${TEST_NAMES[${i}]} desc=${TEST_DESCS[${i}]}
printf "\t%-35s%-35s\n" "${name}" "${desc}"
done echoprintf "\t%-55s%-35s\n" "${name}" "${desc}"
@@ -89,21 +219,87 @@ die() { exit "${KSFT_FAIL}" }
+add_namespaces() {
- # add namespaces local0, local1, global0, and global1
- for mode in "${MODES[@]}"; do
ip netns add "${mode}0" 2>/dev/null
ip netns add "${mode}1" 2>/dev/null
- done
+}
+init_namespaces() {
- for mode in "${MODES[@]}"; do
ns_set_mode "${mode}0" "${mode}"
ns_set_mode "${mode}1" "${mode}"
log_host "set ns ${mode}0 to mode ${mode}"
log_host "set ns ${mode}1 to mode ${mode}"
# we need lo for qemu port forwarding
ip netns exec "${mode}0" ip link set dev lo up
ip netns exec "${mode}1" ip link set dev lo up
- done
+}
+del_namespaces() {
- for mode in "${MODES[@]}"; do
ip netns del "${mode}0"
ip netns del "${mode}1"
log_host "removed ns ${mode}0"
log_host "removed ns ${mode}1"
- done &>/dev/null
+}
+ns_set_mode() {
- local ns=$1
- local mode=$2
- echo "${mode}" | ip netns exec "${ns}" \
tee /proc/sys/net/vsock/ns_mode &>/dev/null
+}
vm_ssh() {
- ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost "$@"
- local ns_exec
- if [[ "${1}" == none ]]; then
local ns_exec=""
- else
local ns_exec="ip netns exec ${1}"
- fi
- shift
- ${ns_exec} ssh -q -o UserKnownHostsFile=/dev/null -p ${SSH_HOST_PORT} localhost $*
- return $?
}
cleanup() {
- if [[ -s "${QEMU_PIDFILE}" ]]; then
pkill -SIGTERM -F "${QEMU_PIDFILE}" > /dev/null 2>&1
- fi
- del_namespaces
+}
- # If failure occurred during or before qemu start up, then we need
- # to clean this up ourselves.
- if [[ -e "${QEMU_PIDFILE}" ]]; then
rm "${QEMU_PIDFILE}"
- fi
+terminate_pidfiles() {
- local pidfile
- for pidfile in "$@"; do
if [[ -s "${pidfile}" ]]; then
pkill -SIGTERM -F "${pidfile}" 2>&1 > /dev/null
fi
# If failure occurred during or before qemu start up, then we need
# to clean this up ourselves.
if [[ -e "${pidfile}" ]]; then
rm -f "${pidfile}"
fi
- done
+}
+terminate_pids() {
- local pid
- for pid in "$@"; do
kill -SIGTERM "${pid}" &>/dev/null || :
- done
}
check_args() { @@ -133,7 +329,7 @@ check_args() { }
check_deps() {
- for dep in vng ${QEMU} busybox pkill ssh; do
- for dep in vng ${QEMU} busybox pkill ssh socat; do if [[ ! -x $(command -v "${dep}") ]]; then echo -e "skip: dependency ${dep} not found!\n" exit "${KSFT_SKIP}"
@@ -147,6 +343,20 @@ check_deps() { fi }
+check_test_deps() {
- local tname=$1
- # If the test requires NS support, check if NS support exists
- # using /proc/self/ns
- if [[ "${tname}" =~ "${REQUIRES_NETNS[@]}" ]] &&
[[ ! -e /proc/self/ns ]]; then
log_host "No NS support detected for test ${tname}"
return 1
- fi
- return 0
+}
check_vng() { local tested_versions local version @@ -170,6 +380,20 @@ check_vng() { fi }
+check_socat() {
- local support_string
- support_string="$(socat -V)"
- if [[ "${support_string}" != *"WITH_VSOCK 1"* ]]; then
die "err: socat is missing vsock support"
- fi
- if [[ "${support_string}" != *"WITH_UNIX 1"* ]]; then
die "err: socat is missing unix support"
- fi
+}
handle_build() { if [[ ! "${BUILD}" -eq 1 ]]; then return @@ -194,9 +418,14 @@ handle_build() { }
vm_start() {
local cid=$1
local ns=$2
local pidfile=$3 local logfile=/dev/null local verbose_opt=""
local qemu_opts="" local kernel_opt=""
local ns_exec="" local qemu
qemu=$(command -v "${QEMU}")
@@ -206,27 +435,37 @@ vm_start() { logfile=/dev/stdout fi
- qemu_opts="\
-netdev user,id=n0,${QEMU_TEST_PORT_FWD},${QEMU_SSH_PORT_FWD} \
-device virtio-net-pci,netdev=n0 \
${QEMU_OPTS} -device vhost-vsock-pci,guest-cid=${cid} \
Have we removed QEMU_OPTS, right?
(I still prefer to have it defined on top, but maybe there is a reason to remove it)
--pidfile ${pidfile}
- "
- if [[ "${BUILD}" -eq 1 ]]; then kernel_opt="${KERNEL_CHECKOUT}" fi
- vng \
- if [[ "${ns}" != "none" ]]; then
ns_exec="ip netns exec ${ns}"
- fi
- ${ns_exec} vng \ --run \ ${kernel_opt} \ ${verbose_opt} \
--qemu-opts="${QEMU_OPTS}" \
--qemu="${qemu}" \ --user root \ --append "${KERNEL_CMDLINE}" \ --rw &> ${logfile} &--qemu-opts="${qemu_opts}" \
- if ! timeout ${WAIT_TOTAL} \
So WAIT_TOTAL is now unused, right?
Yes, its definition can be dropped.
Can you explain better this change?
It turned out that WAIT_TOTAL (three minutes) was very long wait time to just see if qemu places the pidfile or not. It's a suitable wait time for VM boot up, but qemu creates the pidfile far before that. Waiting three minutes made the cases where we expect the VM to fail bootup to take a a very long time. Waiting only 5 seconds to check the pidfile offers a lot of savings.
Sounds like this should also be a different patch.
[..]
Sorry, there are too many changes and reviewing it is complicated. Can you at least divide it into patches to fix pre-existing bugs, patches to support namespaces (and use init_ns for the ones we already have), and patches to add tests?
Thanks, Stefano
No problem, totally understandable looking at it in hindsight. I'll break it up and send that out after the window opens.
Thanks again for your patient review!
Best, Bobby