November 2025 - Linux-kselftest-mirror

[PATCH net-next] selftests: net: add a hint about MACAddressPolicy=persistent

by Jakub Kicinski

New NIPA installation had been reporting a few flaky tests. arp_ndisc_evict_nocarrier is most flaky of them all. I suspect that the flakiness is due to udev swapping the MAC addresses on the interfaces. Extend the message in arp_ndisc_evict_nocarrier to hint at this potential issue. Having the neigh get fail right after ping is rather unusual, unless udev changes the MAC addr causing a flush in the meantime. Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> --- CC: shuah(a)kernel.org CC: linux-kselftest(a)vger.kernel.org --- tools/testing/selftests/net/arp_ndisc_evict_nocarrier.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/arp_ndisc_evict_nocarrier.sh b/tools/testing/selftests/net/arp_ndisc_evict_nocarrier.sh index 92eb880c52f2..00758f00efbf 100755 --- a/tools/testing/selftests/net/arp_ndisc_evict_nocarrier.sh +++ b/tools/testing/selftests/net/arp_ndisc_evict_nocarrier.sh @@ -75,7 +75,7 @@ setup_v4() { ip neigh get $V4_ADDR1 dev veth0 >/dev/null 2>&1 if [ $? -ne 0 ]; then cleanup_v4 - echo "failed" + echo "failed; is the system using MACAddressPolicy=persistent ?" exit 1 fi -- 2.51.1

11 hours, 19 minutes

2
1
0 0

[PATCH v3 0/5] mm, kvm: add guest_memfd support for uffd minor faults

by Mike Rapoport

From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org> Hi, These patches allow guest_memfd to notify userspace about minor page faults using userfaultfd and let userspace to resolve these page faults using UFFDIO_CONTINUE. To allow UFFDIO_CONTINUE outside of the core mm I added a get_folio_noalloc() callback to vm_ops that allows an address space backing a VMA to return a folio that exists in it's page cache (patch 2) In order for guest_memfd to notify userspace about page faults, there is a new VM_FAULT_UFFD_MINOR that a ->fault() handler can return to inform the page fault handler that it needs to call handle_userfault() to complete the fault (patch 3). Patch 4 plumbs these new goodies into guest_memfd. This series is the minimal change I've been able to come up with to allow integration of guest_memfd with uffd and while refactoring uffd and making mfill_atomic() flow more linear would have been a nice improvement, it's way out of the scope of enabling uffd with guest_memfd. v3 changes: * rename ->get_folio() to ->get_folio_noalloc() * fix build errors reported by kbuild * pull handling of UFFD_MINOR out of hotpath in __do_fault() * update guest_memfs changes so its ->fault() and ->get_folio_noalloc() follow the same semantics as shmem and hugetlb. * s/MISSING/MINOR/g in changelogs * added review tags v2: https://lore.kernel.org/all/20251125183840.2368510-1-rppt@kernel.org * rename ->get_shared_folio() to ->get_folio() * hardwire VM_FAULF_UFFD_MINOR to 0 when CONFIG_USERFAULTFD=n v1: https://patch.msgid.link/20251123102707.559422-1-rppt@kernel.org * Introduce VM_FAULF_UFFD_MINOR to avoid exporting handle_userfault() * Simplify vma_can_mfill_atomic() * Rename get_pagecache_folio() to get_shared_folio() and use inode instead of vma as its argument rfc: https://patch.msgid.link/20251117114631.2029447-1-rppt@kernel.org Mike Rapoport (Microsoft) (4): userfaultfd: move vma_can_userfault out of line userfaultfd, shmem: use a VMA callback to handle UFFDIO_CONTINUE mm: introduce VM_FAULT_UFFD_MINOR fault reason guest_memfd: add support for userfaultfd minor mode Nikita Kalyazin (1): KVM: selftests: test userfaultfd minor for guest_memfd include/linux/mm.h | 9 ++ include/linux/mm_types.h | 10 +- include/linux/userfaultfd_k.h | 36 +------ mm/memory.c | 5 +- mm/shmem.c | 20 +++- mm/userfaultfd.c | 80 ++++++++++++--- .../testing/selftests/kvm/guest_memfd_test.c | 97 +++++++++++++++++++ virt/kvm/guest_memfd.c | 33 ++++++- 8 files changed, 236 insertions(+), 54 deletions(-) base-commit: ac3fd01e4c1efce8f2c054cdeb2ddd2fc0fb150d -- 2.51.0

11 hours, 31 minutes

4
13
0 0

[PATCH v12 00/23] TDX KVM selftests

by Sagi Shahar

This is v12 of the TDX selftests. This series is based on v6.18-rc3 Changes from v11 [1]: - Rebased on top of v6.18-rc3. - Hook vm_tdx_finalize() into kvm_arch_vm_finalize_vcpus instead of calling it as part of vm_tdx_create_with_one_vcpu. See "KVM: selftests: Finalize TD memory as part of kvm_arch_vm_finalize_vcpus" which was added to this series. - Replaced vm_tdx_create_with_one_vcpu with vm_create_shape_with_one_vcpu following Sean's patch to simplify creating VM shapes. [1] https://lore.kernel.org/lkml/20250925172851.606193-1-sagis@google.com/ Ackerley Tng (2): KVM: selftests: Add helpers to init TDX memory and finalize VM KVM: selftests: Add ucall support for TDX Erdem Aktas (2): KVM: selftests: Add TDX boot code KVM: selftests: Add support for TDX TDCALL from guest Isaku Yamahata (2): KVM: selftests: Update kvm_init_vm_address_properties() for TDX KVM: selftests: TDX: Use KVM_TDX_CAPABILITIES to validate TDs' attribute configuration Sagi Shahar (16): KVM: selftests: Allocate pgd in virt_map() as necessary KVM: selftests: Expose functions to get default sregs values KVM: selftests: Expose function to allocate guest vCPU stack KVM: selftests: Expose segment definitons to assembly files KVM: selftests: Add kbuild definitons KVM: selftests: Define structs to pass parameters to TDX boot code KVM: selftests: Set up TDX boot code region KVM: selftests: Set up TDX boot parameters region KVM: selftests: Add helper to initialize TDX VM KVM: selftests: Call TDX init when creating a new TDX vm KVM: selftests: Setup memory regions for TDX on vm creation KVM: selftests: Call KVM_TDX_INIT_VCPU when creating a new TDX vcpu KVM: selftests: Set entry point for TDX guest code KVM: selftests: Finalize TD memory as part of kvm_arch_vm_finalize_vcpus KVM: selftests: Add wrapper for TDX MMIO from guest KVM: selftests: Add TDX lifecycle test Sean Christopherson (1): KVM: selftests: Add macros so simplify creating VM shapes for non-default types tools/include/linux/kbuild.h | 18 + tools/testing/selftests/kvm/Makefile.kvm | 32 ++ .../testing/selftests/kvm/include/kvm_util.h | 14 + .../selftests/kvm/include/ucall_common.h | 1 + .../selftests/kvm/include/x86/processor.h | 40 +++ .../selftests/kvm/include/x86/processor_asm.h | 12 + tools/testing/selftests/kvm/include/x86/sev.h | 2 - .../selftests/kvm/include/x86/tdx/td_boot.h | 74 ++++ .../kvm/include/x86/tdx/td_boot_asm.h | 16 + .../selftests/kvm/include/x86/tdx/tdcall.h | 34 ++ .../selftests/kvm/include/x86/tdx/tdx.h | 14 + .../selftests/kvm/include/x86/tdx/tdx_util.h | 84 +++++ .../testing/selftests/kvm/include/x86/ucall.h | 6 - tools/testing/selftests/kvm/lib/kvm_util.c | 10 +- .../testing/selftests/kvm/lib/x86/processor.c | 99 ++++-- tools/testing/selftests/kvm/lib/x86/sev.c | 16 - .../selftests/kvm/lib/x86/tdx/td_boot.S | 60 ++++ .../kvm/lib/x86/tdx/td_boot_offsets.c | 21 ++ .../selftests/kvm/lib/x86/tdx/tdcall.S | 93 +++++ .../kvm/lib/x86/tdx/tdcall_offsets.c | 16 + tools/testing/selftests/kvm/lib/x86/tdx/tdx.c | 23 ++ .../selftests/kvm/lib/x86/tdx/tdx_util.c | 330 ++++++++++++++++++ tools/testing/selftests/kvm/lib/x86/ucall.c | 46 ++- .../selftests/kvm/x86/sev_smoke_test.c | 40 +-- tools/testing/selftests/kvm/x86/tdx_vm_test.c | 33 ++ 25 files changed, 1056 insertions(+), 78 deletions(-) create mode 100644 tools/include/linux/kbuild.h create mode 100644 tools/testing/selftests/kvm/include/x86/processor_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/td_boot_asm.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdcall.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx.h create mode 100644 tools/testing/selftests/kvm/include/x86/tdx/tdx_util.h create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/td_boot_offsets.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall.S create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdcall_offsets.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx.c create mode 100644 tools/testing/selftests/kvm/lib/x86/tdx/tdx_util.c create mode 100644 tools/testing/selftests/kvm/x86/tdx_vm_test.c -- 2.51.1.851.g4ebd6896fd-goog

12 hours

5
70
0 0

[PATCH v2 00/13] tools/nolibc: always use 64-bit time-related types

by Thomas Weißschuh

nolibc currently uses 32-bit types for various APIs. These are problematic as their reduced value range can lead to truncated values. Intended for 6.19. Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net> --- Changes in v2: - Drop already applied ino_t and off_t patches. - Also handle 'struct timeval'. - Make the progression of the series a bit clearer. - Add compatibility assertions. - Link to v1: https://lore.kernel.org/r/20251029-nolibc-uapi-types-v1-0-e79de3b215d8@weis… --- Thomas Weißschuh (13): tools/nolibc/poll: use kernel types for system call invocations tools/nolibc/poll: drop __NR_poll fallback tools/nolibc/select: drop non-pselect based implementations tools/nolibc/time: drop invocation of gettimeofday system call tools/nolibc: prefer explicit 64-bit time-related system calls tools/nolibc/gettimeofday: avoid libgcc 64-bit divisions tools/nolibc/select: avoid libgcc 64-bit multiplications tools/nolibc: use custom structs timespec and timeval tools/nolibc: always use 64-bit time types selftests/nolibc: test compatibility of nolibc and kernel time types tools/nolibc: remove time conversions tools/nolibc: add __nolibc_static_assert() selftests/nolibc: add static assertions around time types handling tools/include/nolibc/arch-s390.h | 3 + tools/include/nolibc/compiler.h | 2 + tools/include/nolibc/poll.h | 14 ++-- tools/include/nolibc/std.h | 2 +- tools/include/nolibc/sys/select.h | 25 ++----- tools/include/nolibc/sys/time.h | 6 +- tools/include/nolibc/sys/timerfd.h | 32 +++------ tools/include/nolibc/time.h | 102 +++++++++------------------ tools/include/nolibc/types.h | 17 ++++- tools/testing/selftests/nolibc/nolibc-test.c | 27 +++++++ 10 files changed, 107 insertions(+), 123 deletions(-) --- base-commit: 586e8d5137dfcddfccca44c3b992b92d2be79347 change-id: 20251001-nolibc-uapi-types-1c072d10fcc7 Best regards, -- Thomas Weißschuh <linux(a)weissschuh.net>

21 hours, 36 minutes

3
20
0 0

[PATCH v8 00/12] arm64: entry: Convert to Generic Entry

by Jinjie Ruan

Currently, x86, Riscv, Loongarch use the Generic Entry which makes maintainers' work easier and codes more elegant. arm64 has already successfully switched to the Generic IRQ Entry in commit b3cf07851b6c ("arm64: entry: Switch to generic IRQ entry"), it is time to completely convert arm64 to Generic Entry. The goal is to bring arm64 in line with other architectures that already use the generic entry infrastructure, reducing duplicated code and making it easier to share future changes in entry/exit paths, such as "Syscall User Dispatch". This patch set is rebased on v6.18-rc6. The performance benchmarks from perf bench basic syscall on real hardware are below: | Metric | W/O Generic Framework | With Generic Framework | Change | | ---------- | --------------------- | ---------------------- | ------ | | Total time | 2.813 [sec] | 2.930 [sec] | ↑4% | | usecs/op | 0.281349 | 0.293006 | ↑4% | | ops/sec | 3,554,299 | 3,412,894 | ↓4% | Compared to earlier with arch specific handling, the performance decreased by approximately 4%. It was tested ok with following test cases on QEMU virt platform: - Perf tests. - Different `dynamic preempt` mode switch. - Pseudo NMI tests. - Stress-ng CPU stress test. - Hackbench stress test. - MTE test case in Documentation/arch/arm64/memory-tagging-extension.rst and all test cases in tools/testing/selftests/arm64/mte/*. - "sud" selftest testcase. - get_set_sud, get_syscall_info, set_syscall_info, peeksiginfo in tools/testing/selftests/ptrace. - breakpoint_test_arm64 in selftests/breakpoints. - syscall-abi and ptrace in tools/testing/selftests/arm64/abi - fp-ptrace, sve-ptrace, za-ptrace in selftests/arm64/fp. - vdso_test_getrandom in tools/testing/selftests/vDSO - Strace tests. The test QEMU configuration is as follows: qemu-system-aarch64 \ -M virt,gic-version=3,virtualization=on,mte=on \ cpu max,pauth-impdef=on \ kernel Image \ smp 8,sockets=1,cores=4,threads=2 \ m 512m \ nographic \ no-reboot \ device virtio-rng-pci \ append "root=/dev/vda rw console=ttyAMA0 kgdboc=ttyAMA0,115200 \ earlycon preempt=voluntary irqchip.gicv3_pseudo_nmi=1" \ drive if=none,file=images/rootfs.ext4,format=raw,id=hd0 \ device virtio-blk-device,drive=hd0 \ Changes in v8: - Rename "report_syscall_enter()" to "report_syscall_entry()". - Add ptrace_save_reg() to avoid duplication. - Remove unused _TIF_WORK_MASK in a standalone patch. - Align syscall_trace_enter() return value with the generic version. - Use "scno" instead of regs->syscallno in el0_svc_common(). - Move rseq_syscall() ahead in a standalone patch to clarify it clearly. - Rename "syscall_trace_exit()" to "syscall_exit_work()". - Keep the goto in el0_svc_common(). - No argument was passed to __secure_computing() and check -1 not -1L. - Remove "Add has_syscall_work() helper" patch. - Move "Add syscall_exit_to_user_mode_prepare() helper" patch later. - Add miss header for asm/entry-common.h. - Update the implementation of arch_syscall_is_vdso_sigreturn(). - Add "ARCH_SYSCALL_WORK_EXIT" to be defined as "SECCOMP | SYSCALL_EMU" to keep the behaviour unchanged. - Add more testcases test. - Add Reviewed-by. - Update the commit message. - Link to v7: https://lore.kernel.org/all/20251117133048.53182-1-ruanjinjie@huawei.com/ Chanegs in v7: - Support "Syscall User Dispatch" by implementing arch_syscall_is_vdso_sigreturn() as kemal suggested. - Add aarch64 support for "sud" selftest testcase, which tested ok with the patch series. - Fix the kernel test robot warning for arch_ptrace_report_syscall_entry() and arch_ptrace_report_syscall_exit() in asm/entry-common.h. - Add perf syscall performance test. - Link to v6: https://lore.kernel.org/all/20250916082611.2972008-1-ruanjinjie@huawei.com/ Changes in v6: - Rebased on v6.17-rc5-next as arm64 generic irq entry has merged. - Update the commit message. - Link to v5: https://lore.kernel.org/all/20241206101744.4161990-1-ruanjinjie@huawei.com/ Changes in v5: - Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver. - Move irqentry_state definition into arch/arm64/kernel/entry-common.c. - Avoid removing the __enter_from_*() and __exit_to_*() wrappers. - Update "irqentry_state_t ret/irq_state" to "state" to keep it consistently. - Use generic irq entry header for PREEMPT_DYNAMIC after split the generic entry. - Also refactor the ARM64 syscall code. - Introduce arch_ptrace_report_syscall_entry/exit(), instead of arch_pre/post_report_syscall_entry/exit() to simplify code. - Make the syscall patches clear separation. - Update the commit message. - Link to v4: https://lore.kernel.org/all/20241025100700.3714552-1-ruanjinjie@huawei.com/ Changes in v4: - Rework/cleanup split into a few patches as Mark suggested. - Replace interrupts_enabled() macro with regs_irqs_disabled(), instead of left it here. - Remove rcu and lockdep state in pt_regs by using temporary irqentry_state_t as Mark suggested. - Remove some unnecessary intermediate functions to make it clear. - Rework preempt irq and PREEMPT_DYNAMIC code to make the switch more clear. - arch_prepare_*_entry/exit() -> arch_pre_*_entry/exit(). - Expand the arch functions comment. - Make arch functions closer to its caller. - Declare saved_reg in for block. - Remove arch_exit_to_kernel_mode_prepare(), arch_enter_from_kernel_mode(). - Adjust "Add few arch functions to use generic entry" patch to be the penultimate. - Update the commit message. - Add suggested-by. - Link to v3: https://lore.kernel.org/all/20240629085601.470241-1-ruanjinjie@huawei.com/ Changes in v3: - Test the MTE test cases. - Handle forget_syscall() in arch_post_report_syscall_entry() - Make the arch funcs not use __weak as Thomas suggested, so move the arch funcs to entry-common.h, and make arch_forget_syscall() folded in arch_post_report_syscall_entry() as suggested. - Move report_single_step() to thread_info.h for arm64 - Change __always_inline() to inline, add inline for the other arch funcs. - Remove unused signal.h for entry-common.h. - Add Suggested-by. - Update the commit message. Changes in v2: - Add tested-by. - Fix a bug that not call arch_post_report_syscall_entry() in syscall_trace_enter() if ptrace_report_syscall_entry() return not zero. - Refactor report_syscall(). - Add comment for arch_prepare_report_syscall_exit(). - Adjust entry-common.h header file inclusion to alphabetical order. - Update the commit message. Jinjie Ruan (11): arm64: Remove unused _TIF_WORK_MASK arm64/ptrace: Split report_syscall() arm64/ptrace: Refactor syscall_trace_enter/exit() arm64: ptrace: Move rseq_syscall() before audit_syscall_exit() arm64: syscall: Rework el0_svc_common() arm64/ptrace: Return early for ptrace_report_syscall_entry() error arm64/ptrace: Expand secure_computing() in place arm64/ptrace: Use syscall_get_arguments() heleper entry: Split syscall_exit_to_user_mode_work() for arch reuse entry: Add arch_ptrace_report_syscall_entry/exit() arm64: entry: Convert to generic entry kemal (1): selftests: sud_test: Support aarch64 arch/arm64/Kconfig | 2 +- arch/arm64/include/asm/entry-common.h | 76 ++++++++++++++++ arch/arm64/include/asm/syscall.h | 21 ++++- arch/arm64/include/asm/thread_info.h | 22 +---- arch/arm64/kernel/debug-monitors.c | 7 ++ arch/arm64/kernel/ptrace.c | 90 ------------------- arch/arm64/kernel/signal.c | 2 +- arch/arm64/kernel/syscall.c | 25 ++---- include/linux/entry-common.h | 35 +++++--- kernel/entry/syscall-common.c | 43 ++++++++- .../syscall_user_dispatch/sud_test.c | 4 + 11 files changed, 179 insertions(+), 148 deletions(-) -- 2.34.1

22 hours, 11 minutes

2
26
0 0

[PATCH v3 0/1] cpuset: relax the overlap check for cgroup-v2

by Sun Shaojie

In cgroup v2, a mutual overlap check is required when at least one of two cpusets is exclusive. However, this check should be relaxed and limited to cases where both cpusets are exclusive. This patch ensures that for sibling cpusets A1 (exclusive) and B1 (non-exclusive), change B1 cannot affect A1's exclusivity. for example. Assume a machine has 4 CPUs (0-3). root cgroup / \ A1 B1 Case 1: Table 1.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root invalid | member | After step #3, A1 changes from "root" to "root invalid" because its CPUs (0-1) overlap with those requested by B1 (0-3). However, B1 can actually use CPUs 2-3(from B1's parent), so it would be more reasonable for A1 to remain as "root." Table 1.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root | member | #3> echo "0" > B1/cpuset.cpus | root | member | Case 2: (This situation remains unchanged from before) Table 2.1: Before applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | Table 2.2: After applying the patch Step | A1's prstate | B1'sprstate | #1> echo "0-1" > A1/cpuset.cpus | member | member | #3> echo "1-2" > B1/cpuset.cpus | member | member | #2> echo "root" > A1/cpuset.cpus.partition | root invalid | member | All other cases remain unaffected. For example, cgroup-v1, both A1 and B1 are exclusive or non-exlusive. --- v3 -> v4: - Adjust the test_cpuset_prt.sh test file to align with the current behavior. v2 -> v3: - Ensure compliance with constraints such as cpuset.cpus.exclusive. - Link: https://lore.kernel.org/cgroups/20251113131434.606961-1-sunshaojie@kylinos.… v1 -> v2: - Keeps the current cgroup v1 behavior unchanged - Link: https://lore.kernel.org/cgroups/c8e234f4-2c27-4753-8f39-8ae83197efd3@redhat… --- kernel/cgroup/cpuset-internal.h | 3 ++ kernel/cgroup/cpuset-v1.c | 20 +++++++++ kernel/cgroup/cpuset.c | 43 ++++++++++++++----- .../selftests/cgroup/test_cpuset_prs.sh | 5 ++- 4 files changed, 58 insertions(+), 13 deletions(-) -- 2.25.1

22 hours, 44 minutes

4
38
0 0

[PATCH 1/2 net-next] ipv6: use the right ifindex when replying to icmpv6 from localhost

by Fernando Fernandez Mancera

When replying to a ICMPv6 echo request that comes from localhost address the right output ifindex is 1 (lo) and not rt6i_idev dev index. Use the skb device ifindex instead. This fixes pinging to a local address from localhost source address. $ ping6 -I ::1 2001:1:1::2 -c 3 PING 2001:1:1::2 (2001:1:1::2) from ::1 : 56 data bytes 64 bytes from 2001:1:1::2: icmp_seq=1 ttl=64 time=0.037 ms 64 bytes from 2001:1:1::2: icmp_seq=2 ttl=64 time=0.069 ms 64 bytes from 2001:1:1::2: icmp_seq=3 ttl=64 time=0.122 ms 2001:1:1::2 ping statistics 3 packets transmitted, 3 received, 0% packet loss, time 2032ms rtt min/avg/max/mdev = 0.037/0.076/0.122/0.035 ms Fixes: 1b70d792cf67 ("ipv6: Use rt6i_idev index for echo replies to a local address") Signed-off-by: Fernando Fernandez Mancera <fmancera(a)suse.de> --- net/ipv6/icmp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c index 5d2f90babaa5..5de254043133 100644 --- a/net/ipv6/icmp.c +++ b/net/ipv6/icmp.c @@ -965,7 +965,9 @@ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb) fl6.daddr = ipv6_hdr(skb)->saddr; if (saddr) fl6.saddr = *saddr; - fl6.flowi6_oif = icmp6_iif(skb); + fl6.flowi6_oif = ipv6_addr_type(&fl6.daddr) & IPV6_ADDR_LOOPBACK ? + skb->dev->ifindex : + icmp6_iif(skb); fl6.fl6_icmp_type = type; fl6.flowi6_mark = mark; fl6.flowi6_uid = sock_net_uid(net, NULL); -- 2.51.1

23 hours, 31 minutes

2
3
0 0

[PATCH] selftests: breakpoints: check RTC wakeup alarm support before test

by Xinyu Zheng

If RTC wakeup alarm feature is unsupported, this testcase may cause infinite suspend if there is no other wakeup source. To solve this problem, set wakeup alarm up before we trigger suspend. In this case, we can test if RTC support RTC_FEATURE_ALARM and efi_set_alarm function. Signed-off-by: Xinyu Zheng <zhengxinyu6(a)huawei.com> --- .../breakpoints/step_after_suspend_test.c | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c index 8d233ac95696..e738af896ce1 100644 --- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c +++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c @@ -13,6 +13,8 @@ #include <stdio.h> #include <string.h> #include <unistd.h> +#include <linux/rtc.h> +#include <sys/ioctl.h> #include <sys/ptrace.h> #include <sys/stat.h> #include <sys/timerfd.h> @@ -159,10 +161,30 @@ void suspend(void) int count_before; int count_after; struct itimerspec spec = {}; + char *rtc_file = "/dev/rtc0"; + int rtc_fd; + struct rtc_wkalrm alarm = { 0 }; + time_t secs; if (getuid() != 0) ksft_exit_skip("Please run the test as root - Exiting.\n"); + rtc_fd = open(rtc_file, O_RDONLY); + if (rtc_fd < 0) + ksft_exit_fail_msg("open rtc0 failed\n"); + + err = ioctl(rtc_fd, RTC_RD_TIME, &alarm.time); + if (err < 0) + ksft_exit_fail_msg("get rtc time failed\n"); + + secs = timegm((struct tm *)&alarm.time) + 3; + gmtime_r(&secs, (struct tm *)&alarm.time); + alarm.enabled = 1; + + err = ioctl(rtc_fd, RTC_WKALM_SET, &alarm); + if (err < 0) + ksft_exit_fail_msg("set wake alarm test failed, errno %d\n", errno); + timerfd = timerfd_create(CLOCK_BOOTTIME_ALARM, 0); if (timerfd < 0) ksft_exit_fail_msg("timerfd_create() failed\n"); @@ -180,6 +202,7 @@ void suspend(void) if (count_after <= count_before) ksft_exit_fail_msg("Failed to enter Suspend state\n"); + close(rtc_fd); close(timerfd); } -- 2.34.1

23 hours, 58 minutes

2
2
0 0

[PATCH net-next v2] selftests: mptcp: Mark xerror __noreturn

by Ankit Khushwaha

Compiler reports potential uses of uninitialized variables in mptcp_connect.c when xerror() is called from failure paths. mptcp_connect.c:1262:11: warning: variable 'raw_addr' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] xerror() terminates execution by calling exit(), but it is not visible to the compiler & assumes control flow may continue past the call. Annotate xerror() with __noreturn so the compiler can correctly reason about control flow and avoid false-positive uninitialized variable warnings. Signed-off-by: Ankit Khushwaha <ankitkhushwaha.linux(a)gmail.com> --- changelog: v2: - annotate 'xerror()' with __noreturn - remove defining 'raw_addr' to NULL v1: https://lore.kernel.org/all/20251126163046.58615-1-ankitkhushwaha.linux@gma… --- tools/testing/selftests/net/mptcp/Makefile | 4 ++++ tools/testing/selftests/net/mptcp/mptcp_connect.c | 3 ++- tools/testing/selftests/net/mptcp/mptcp_inq.c | 3 ++- tools/testing/selftests/net/mptcp/mptcp_sockopt.c | 3 ++- 4 files changed, 10 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index 15d144a25d82..4c94c01b893a 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -35,3 +35,7 @@ TEST_INCLUDES := ../lib.sh $(wildcard ../lib/sh/*.sh) EXTRA_CLEAN := *.pcap include ../../lib.mk + +$(OUTPUT)/mptcp_connect: CFLAGS += -I$(top_srcdir)/tools/include +$(OUTPUT)/mptcp_sockopt: CFLAGS += -I$(top_srcdir)/tools/include +$(OUTPUT)/mptcp_inq: CFLAGS += -I$(top_srcdir)/tools/include diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.c b/tools/testing/selftests/net/mptcp/mptcp_connect.c index 404a77bf366a..10f6f99cfd4e 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_connect.c +++ b/tools/testing/selftests/net/mptcp/mptcp_connect.c @@ -33,6 +33,7 @@ #include <linux/tcp.h> #include <linux/time_types.h> #include <linux/sockios.h> +#include <linux/compiler.h> extern int optind; @@ -140,7 +141,7 @@ static void die_usage(void) exit(1); } -static void xerror(const char *fmt, ...) +static void __noreturn xerror(const char *fmt, ...) { va_list ap; diff --git a/tools/testing/selftests/net/mptcp/mptcp_inq.c b/tools/testing/selftests/net/mptcp/mptcp_inq.c index 8e8f6441ad8b..d4a729814662 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_inq.c +++ b/tools/testing/selftests/net/mptcp/mptcp_inq.c @@ -28,6 +28,7 @@ #include <linux/tcp.h> #include <linux/sockios.h> +#include <linux/compiler.h> #ifndef IPPROTO_MPTCP #define IPPROTO_MPTCP 262 @@ -52,7 +53,7 @@ static void die_usage(int r) exit(r); } -static void xerror(const char *fmt, ...) +static void __noreturn xerror(const char *fmt, ...) { va_list ap; diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c index 286164f7246e..15cea5e919b5 100644 --- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.c +++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.c @@ -25,6 +25,7 @@ #include <netinet/in.h> #include <linux/tcp.h> +#include <linux/compiler.h> static int pf = AF_INET; @@ -139,7 +140,7 @@ static void die_usage(int r) exit(r); } -static void xerror(const char *fmt, ...) +static void __noreturn xerror(const char *fmt, ...) { va_list ap; -- 2.52.0

1 day, 4 hours

3
4
0 0

[PATCH net-next v7 0/9] Add support for providers with large rx buffer

by Pavel Begunkov

Note: it's net/ only bits and doesn't include changes, which shoulf be merged separately and are posted separately. The full branch for convenience is at [1], and the patch is here: https://lore.kernel.org/io-uring/7486ab32e99be1f614b3ef8d0e9bc77015b173f7.1… Many modern NICs support configurable receive buffer lengths, and zcrx and memory providers can use buffers larger than 4K/PAGE_SIZE on x86 to improve performance. When paired with hw-gro larger rx buffer sizes can drastically reduce the number of buffers traversing the stack and save a lot of processing time. It also allows to give to users larger contiguous chunks of data. The idea was first floated around by Saeed during netdev conf 2024 and was asked about by a few folks. Single stream benchmarks showed up to ~30% CPU util improvement. E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC: packets=23987040 (MB=2745098), rps=199559 (MB/s=22837) CPU %usr %nice %sys %iowait %irq %soft %idle 0 1.53 0.00 27.78 2.72 1.31 66.45 0.22 packets=24078368 (MB=2755550), rps=200319 (MB/s=22924) CPU %usr %nice %sys %iowait %irq %soft %idle 0 0.69 0.00 8.26 31.65 1.83 57.00 0.57 This series adds net infrastructure for memory providers configuring the size and implements it for bnxt. It's an opt-in feature for drivers, they should advertise support for the parameter in the qops and must check if the hardware supports the given size. It's limited to memory providers as it drastically simplifies implementation. It doesn't affect the fast path zcrx uAPI, and the sizes is defined in zcrx terms, which allows it to be flexible and adjusted in the future, see Patch 8 for details. A liburing example can be found at [2] full branch: [1] https://github.com/isilence/linux.git zcrx/large-buffers-v7 Liburing example: [2] https://github.com/isilence/liburing.git zcrx/rx-buf-len v7: - Add xa_destroy - Rebase v6: - Update docs and add a selftest v5: https://lore.kernel.org/netdev/cover.1760440268.git.asml.silence@gmail.com/ - Remove all unnecessary bits like configuration via netlink, and multi-stage queue configuration. v4: https://lore.kernel.org/all/cover.1760364551.git.asml.silence@gmail.com/ - Update fbnic qops - Propagate max buf len for hns3 - Use configured buf size in __bnxt_alloc_rx_netmem - Minor stylistic changes v3: https://lore.kernel.org/all/cover.1755499375.git.asml.silence@gmail.com/ - Rebased, excluded zcrx specific patches - Set agg_size_fac to 1 on warning v2: https://lore.kernel.org/all/cover.1754657711.git.asml.silence@gmail.com/ - Add MAX_PAGE_ORDER check on pp init - Applied comments rewording - Adjust pp.max_len based on order - Patch up mlx5 queue callbacks after rebase - Minor ->queue_mgmt_ops refactoring - Rebased to account for both fill level and agg_size_fac - Pass providers buf length in struct pp_memory_provider_params and apply it in __netdev_queue_confi(). - Use ->supported_ring_params to validate drivers support of set qcfg parameters. Jakub Kicinski (1): eth: bnxt: adjust the fill level of agg queues with larger buffers Pavel Begunkov (8): net: page pool: xa init with destroy on pp init net: page_pool: sanitise allocation order net: memzero mp params when closing a queue net: let pp memory provider to specify rx buf len eth: bnxt: store rx buffer size per queue eth: bnxt: allow providers to set rx buf size io_uring/zcrx: document area chunking parameter selftests: iou-zcrx: test large chunk sizes Documentation/networking/iou-zcrx.rst | 20 +++ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 118 ++++++++++++++---- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 6 +- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h | 2 +- include/net/netdev_queues.h | 9 ++ include/net/page_pool/types.h | 1 + net/core/netdev_rx_queue.c | 14 ++- net/core/page_pool.c | 4 + .../selftests/drivers/net/hw/iou-zcrx.c | 72 +++++++++-- .../selftests/drivers/net/hw/iou-zcrx.py | 37 ++++++ 11 files changed, 236 insertions(+), 49 deletions(-) -- 2.52.0

1 day, 8 hours

1
9
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2025