test_memcg_sock() currently requires that memory.stat's "sock " counter
is exactly zero immediately after the TCP server exits. On a busy system
this assumption is too strict:
- Socket memory may be freed with a small delay (e.g. RCU callbacks).
- memcg statistics are updated asynchronously via the rstat flushing
worker, so the "sock " value in memory.stat can stay non-zero for a
short period of time even after all socket memory has been uncharged.
As a result, test_memcg_sock() can intermittently fail even though socket
memory accounting is working correctly.
Make the test more robust by polling memory.stat for the "sock "
counter and allowing it some time to drop to zero instead of checking
it only once. The timeout is set to 3 seconds to cover the periodic
rstat flush interval (FLUSH_TIME = 2*HZ by default) plus some
scheduling slack. If the counter does not become zero within the
timeout, the test still fails as before.
On my test system, running test_memcontrol 50 times produced:
- Before this patch: 6/50 runs passed.
- After this patch: 50/50 runs passed.
Suggested-by: Lance Yang <lance.yang(a)linux.dev>
Reviewed-by: Lance Yang <lance.yang(a)linux.dev>
Signed-off-by: Guopeng Zhang <zhangguopeng(a)kylinos.cn>
---
v3:
- Move MEMCG_SOCKSTAT_WAIT_* defines after the #include block as
suggested.
v2:
- Mention the periodic rstat flush interval (FLUSH_TIME = 2*HZ) in
the comment and clarify the rationale for the 3s timeout.
- Replace the hard-coded retry count and wait interval with macros
to avoid magic numbers and make the 3s timeout calculation explicit.
---
.../selftests/cgroup/test_memcontrol.c | 30 ++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 4e1647568c5b..8ff7286fc80b 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -21,6 +21,9 @@
#include "kselftest.h"
#include "cgroup_util.h"
+#define MEMCG_SOCKSTAT_WAIT_RETRIES 30 /* 3s total */
+#define MEMCG_SOCKSTAT_WAIT_INTERVAL_US (100 * 1000) /* 100 ms */
+
static bool has_localevents;
static bool has_recursiveprot;
@@ -1384,6 +1387,8 @@ static int test_memcg_sock(const char *root)
int bind_retries = 5, ret = KSFT_FAIL, pid, err;
unsigned short port;
char *memcg;
+ long sock_post = -1;
+ int i;
memcg = cg_name(root, "memcg_test");
if (!memcg)
@@ -1432,7 +1437,30 @@ static int test_memcg_sock(const char *root)
if (cg_read_long(memcg, "memory.current") < 0)
goto cleanup;
- if (cg_read_key_long(memcg, "memory.stat", "sock "))
+ /*
+ * memory.stat is updated asynchronously via the memcg rstat
+ * flushing worker, which runs periodically (every 2 seconds,
+ * see FLUSH_TIME). On a busy system, the "sock " counter may
+ * stay non-zero for a short period of time after the TCP
+ * connection is closed and all socket memory has been
+ * uncharged.
+ *
+ * Poll memory.stat for up to 3 seconds (~FLUSH_TIME plus some
+ * scheduling slack) and require that the "sock " counter
+ * eventually drops to zero.
+ */
+ for (i = 0; i < MEMCG_SOCKSTAT_WAIT_RETRIES; i++) {
+ sock_post = cg_read_key_long(memcg, "memory.stat", "sock ");
+ if (sock_post < 0)
+ goto cleanup;
+
+ if (!sock_post)
+ break;
+
+ usleep(MEMCG_SOCKSTAT_WAIT_INTERVAL_US);
+ }
+
+ if (sock_post)
goto cleanup;
ret = KSFT_PASS;
--
2.25.1
Currently, x86, Riscv, Loongarch use the Generic Entry which makes
maintainers' work easier and codes more elegant. arm64 has already
successfully switched to the Generic IRQ Entry in commit
b3cf07851b6c ("arm64: entry: Switch to generic IRQ entry"), it is
time to completely convert arm64 to Generic Entry.
The goal is to bring arm64 in line with other architectures that already
use the generic entry infrastructure, reducing duplicated code and
making it easier to share future changes in entry/exit paths, such as
"Syscall User Dispatch".
This patch set is rebased on v6.18-rc6.
The performance benchmarks from perf bench basic syscall on
real hardware are below:
| Metric | W/O Generic Framework | With Generic Framework | Change |
| ---------- | --------------------- | ---------------------- | ------ |
| Total time | 2.813 [sec] | 2.930 [sec] | ↑4% |
| usecs/op | 0.281349 | 0.293006 | ↑4% |
| ops/sec | 3,554,299 | 3,412,894 | ↓4% |
Compared to earlier with arch specific handling, the performance decreased
by approximately 4%.
It was tested ok with following test cases on QEMU virt platform:
- Perf tests.
- Different `dynamic preempt` mode switch.
- Pseudo NMI tests.
- Stress-ng CPU stress test.
- MTE test case in Documentation/arch/arm64/memory-tagging-extension.rst
and all test cases in tools/testing/selftests/arm64/mte/*.
- "sud" selftest testcase.
- get_syscall_info, peeksiginfo in tools/testing/selftests/ptrace.
The test QEMU configuration is as follows:
qemu-system-aarch64 \
-M virt,gic-version=3,virtualization=on,mte=on \
-cpu max,pauth-impdef=on \
-kernel Image \
-smp 8,sockets=1,cores=4,threads=2 \
-m 512m \
-nographic \
-no-reboot \
-device virtio-rng-pci \
-append "root=/dev/vda rw console=ttyAMA0 kgdboc=ttyAMA0,115200 \
earlycon preempt=voluntary irqchip.gicv3_pseudo_nmi=1" \
-drive if=none,file=images/rootfs.ext4,format=raw,id=hd0 \
-device virtio-blk-device,drive=hd0 \
Chanegs in v7:
- Support "Syscall User Dispatch" by implementing
arch_syscall_is_vdso_sigreturn() as kemal suggested.
- Add aarch64 support for "sud" selftest testcase, which tested ok with
the patch series.
- Fix the kernel test robot warning for arch_ptrace_report_syscall_entry()
and arch_ptrace_report_syscall_exit() in asm/entry-common.h.
- Add perf syscall performance test.
- Link to v6: https://lore.kernel.org/all/20250916082611.2972008-1-ruanjinjie@huawei.com/
Changes in v6:
- Rebased on v6.17-rc5-next as arm64 generic irq entry has merged.
- Update the commit message.
- Link to v5: https://lore.kernel.org/all/20241206101744.4161990-1-ruanjinjie@huawei.com/
Changes in v5:
- Not change arm32 and keep inerrupts_enabled() macro for gicv3 driver.
- Move irqentry_state definition into arch/arm64/kernel/entry-common.c.
- Avoid removing the __enter_from_*() and __exit_to_*() wrappers.
- Update "irqentry_state_t ret/irq_state" to "state"
to keep it consistently.
- Use generic irq entry header for PREEMPT_DYNAMIC after split
the generic entry.
- Also refactor the ARM64 syscall code.
- Introduce arch_ptrace_report_syscall_entry/exit(), instead of
arch_pre/post_report_syscall_entry/exit() to simplify code.
- Make the syscall patches clear separation.
- Update the commit message.
- Link to v4: https://lore.kernel.org/all/20241025100700.3714552-1-ruanjinjie@huawei.com/
Changes in v4:
- Rework/cleanup split into a few patches as Mark suggested.
- Replace interrupts_enabled() macro with regs_irqs_disabled(), instead
of left it here.
- Remove rcu and lockdep state in pt_regs by using temporary
irqentry_state_t as Mark suggested.
- Remove some unnecessary intermediate functions to make it clear.
- Rework preempt irq and PREEMPT_DYNAMIC code
to make the switch more clear.
- arch_prepare_*_entry/exit() -> arch_pre_*_entry/exit().
- Expand the arch functions comment.
- Make arch functions closer to its caller.
- Declare saved_reg in for block.
- Remove arch_exit_to_kernel_mode_prepare(), arch_enter_from_kernel_mode().
- Adjust "Add few arch functions to use generic entry" patch to be
the penultimate.
- Update the commit message.
- Add suggested-by.
- Link to v3: https://lore.kernel.org/all/20240629085601.470241-1-ruanjinjie@huawei.com/
Changes in v3:
- Test the MTE test cases.
- Handle forget_syscall() in arch_post_report_syscall_entry()
- Make the arch funcs not use __weak as Thomas suggested, so move
the arch funcs to entry-common.h, and make arch_forget_syscall() folded
in arch_post_report_syscall_entry() as suggested.
- Move report_single_step() to thread_info.h for arm64
- Change __always_inline() to inline, add inline for the other arch funcs.
- Remove unused signal.h for entry-common.h.
- Add Suggested-by.
- Update the commit message.
Changes in v2:
- Add tested-by.
- Fix a bug that not call arch_post_report_syscall_entry() in
syscall_trace_enter() if ptrace_report_syscall_entry() return not zero.
- Refactor report_syscall().
- Add comment for arch_prepare_report_syscall_exit().
- Adjust entry-common.h header file inclusion to alphabetical order.
- Update the commit message.
Jinjie Ruan (10):
arm64/ptrace: Split report_syscall()
arm64/ptrace: Refactor syscall_trace_enter/exit()
arm64/ptrace: Refator el0_svc_common()
entry: Add syscall_exit_to_user_mode_prepare() helper
arm64/ptrace: Handle ptrace_report_syscall_entry() error
arm64/ptrace: Expand secure_computing() in place
arm64/ptrace: Use syscall_get_arguments() heleper
entry: Add arch_ptrace_report_syscall_entry/exit()
entry: Add has_syscall_work() helper
arm64: entry: Convert to generic entry
kemal (1):
selftests: sud_test: Support aarch64
arch/arm64/Kconfig | 2 +-
arch/arm64/include/asm/entry-common.h | 69 ++++++++++++++
arch/arm64/include/asm/syscall.h | 29 +++++-
arch/arm64/include/asm/thread_info.h | 22 +----
arch/arm64/kernel/debug-monitors.c | 7 ++
arch/arm64/kernel/ptrace.c | 90 -------------------
arch/arm64/kernel/signal.c | 2 +-
arch/arm64/kernel/syscall.c | 31 ++-----
include/linux/entry-common.h | 42 ++++++---
kernel/entry/syscall-common.c | 43 ++++++++-
.../syscall_user_dispatch/sud_test.c | 4 +
11 files changed, 188 insertions(+), 153 deletions(-)
--
2.34.1
This series adds namespace support to vhost-vsock and loopback. It does
not add namespaces to any of the other guest transports (virtio-vsock,
hyperv, or vmci).
The current revision supports two modes: local and global. Local
mode is complete isolation of namespaces, while global mode is complete
sharing between namespaces of CIDs (the original behavior).
The mode is set using /proc/sys/net/vsock/ns_mode.
Modes are per-netns and write-once. This allows a system to configure
namespaces independently (some may share CIDs, others are completely
isolated). This also supports future possible mixed use cases, where
there may be namespaces in global mode spinning up VMs while there are
mixed mode namespaces that provide services to the VMs, but are not
allowed to allocate from the global CID pool (this mode is not
implemented in this series).
If a socket or VM is created when a namespace is global but the
namespace changes to local, the socket or VM will continue working
normally. That is, the socket or VM assumes the mode behavior of the
namespace at the time the socket/VM was created. The original mode is
captured in vsock_create() and so occurs at the time of socket(2) and
accept(2) for sockets and open(2) on /dev/vhost-vsock for VMs. This
prevents a socket/VM connection from suddenly breaking due to a
namespace mode change. Any new sockets/VMs created after the mode change
will adopt the new mode's behavior.
Additionally, added tests for the new namespace features:
tools/testing/selftests/vsock/vmtest.sh
1..29
ok 1 vm_server_host_client
ok 2 vm_client_host_server
ok 3 vm_loopback
ok 4 ns_vm_local_mode_rejected
ok 5 ns_host_vsock_ns_mode_ok
ok 6 ns_host_vsock_ns_mode_write_once_ok
ok 7 ns_global_same_cid_fails
ok 8 ns_local_same_cid_ok
ok 9 ns_global_local_same_cid_ok
ok 10 ns_local_global_same_cid_ok
ok 11 ns_diff_global_host_connect_to_global_vm_ok
ok 12 ns_diff_global_host_connect_to_local_vm_fails
ok 13 ns_diff_global_vm_connect_to_global_host_ok
ok 14 ns_diff_global_vm_connect_to_local_host_fails
ok 15 ns_diff_local_host_connect_to_local_vm_fails
ok 16 ns_diff_local_vm_connect_to_local_host_fails
ok 17 ns_diff_global_to_local_loopback_local_fails
ok 18 ns_diff_local_to_global_loopback_fails
ok 19 ns_diff_local_to_local_loopback_fails
ok 20 ns_diff_global_to_global_loopback_ok
ok 21 ns_same_local_loopback_ok
ok 22 ns_same_local_host_connect_to_local_vm_ok
ok 23 ns_same_local_vm_connect_to_local_host_ok
ok 24 ns_mode_change_connection_continue_vm_ok
ok 25 ns_mode_change_connection_continue_host_ok
ok 26 ns_mode_change_connection_continue_both_ok
ok 27 ns_delete_vm_ok
ok 28 ns_delete_host_ok
ok 29 ns_delete_both_ok
SUMMARY: PASS=29 SKIP=0 FAIL=0
Dependent on series:
https://lore.kernel.org/all/20251108-vsock-selftests-fixes-and-improvements…
Thanks again for everyone's help and reviews!
Suggested-by: Sargun Dhillon <sargun(a)sargun.me>
Signed-off-by: Bobby Eshleman <bobbyeshleman(a)gmail.com>
Changes in v11:
- vmtest: add a patch to use ss in wait_for_listener functions and
support vsock, tcp, and unix. Change all patches to use the new
functions.
- vmtest: add a patch to re-use vm dmesg / warn counting functions
- Link to v10: https://lore.kernel.org/r/20251117-vsock-vmtest-v10-0-df08f165bf3e@meta.com
Changes in v10:
- Combine virtio common patches into one (Stefano)
- Resolve vsock_loopback virtio_transport_reset_no_sock() issue
with info->vsk setting. This eliminates the need for skb->cb,
so remove skb->cb patches.
- many line width 80 fixes
- Link to v9: https://lore.kernel.org/all/20251111-vsock-vmtest-v9-0-852787a37bed@meta.com
Changes in v9:
- reorder loopback patch after patch for virtio transport common code
- remove module ordering tests patch because loopback no longer depends
on pernet ops
- major simplifications in vsock_loopback
- added a new patch for blocking local mode for guests, added test case
to check
- add net ref tracking to vsock_loopback patch
- Link to v8: https://lore.kernel.org/r/20251023-vsock-vmtest-v8-0-dea984d02bb0@meta.com
Changes in v8:
- Break generic cleanup/refactoring patches into standalone series,
remove those from this series
- Link to dependency: https://lore.kernel.org/all/20251022-vsock-selftests-fixes-and-improvements…
- Link to v7: https://lore.kernel.org/r/20251021-vsock-vmtest-v7-0-0661b7b6f081@meta.com
Changes in v7:
- fix hv_sock build
- break out vmtest patches into distinct, more well-scoped patches
- change `orig_net_mode` to `net_mode`
- many fixes and style changes in per-patch change sets (see individual
patches for specific changes)
- optimize `virtio_vsock_skb_cb` layout
- update commit messages with more useful descriptions
- vsock_loopback: use orig_net_mode instead of current net mode
- add tests for edge cases (ns deletion, mode changing, loopback module
load ordering)
- Link to v6: https://lore.kernel.org/r/20250916-vsock-vmtest-v6-0-064d2eb0c89d@meta.com
Changes in v6:
- define behavior when mode changes to local while socket/VM is alive
- af_vsock: clarify description of CID behavior
- af_vsock: use stronger langauge around CID rules (dont use "may")
- af_vsock: improve naming of buf/buffer
- af_vsock: improve string length checking on proc writes
- vsock_loopback: add space in struct to clarify lock protection
- vsock_loopback: do proper cleanup/unregister on vsock_loopback_exit()
- vsock_loopback: use virtio_vsock_skb_net() instead of sock_net()
- vsock_loopback: set loopback to NULL after kfree()
- vsock_loopback: use pernet_operations and remove callback mechanism
- vsock_loopback: add macros for "global" and "local"
- vsock_loopback: fix length checking
- vmtest.sh: check for namespace support in vmtest.sh
- Link to v5: https://lore.kernel.org/r/20250827-vsock-vmtest-v5-0-0ba580bede5b@meta.com
Changes in v5:
- /proc/net/vsock_ns_mode -> /proc/sys/net/vsock/ns_mode
- vsock_global_net -> vsock_global_dummy_net
- fix netns lookup in vhost_vsock to respect pid namespaces
- add callbacks for vsock_loopback to avoid circular dependency
- vmtest.sh loads vsock_loopback module
- remove vsock_net_mode_can_set()
- change vsock_net_write_mode() to return true/false based on success
- make vsock_net_mode enum instead of u8
- Link to v4: https://lore.kernel.org/r/20250805-vsock-vmtest-v4-0-059ec51ab111@meta.com
Changes in v4:
- removed RFC tag
- implemented loopback support
- renamed new tests to better reflect behavior
- completed suite of tests with permutations of ns modes and vsock_test
as guest/host
- simplified socat bridging with unix socket instead of tcp + veth
- only use vsock_test for success case, socat for failure case (context
in commit message)
- lots of cleanup
Changes in v3:
- add notion of "modes"
- add procfs /proc/net/vsock_ns_mode
- local and global modes only
- no /dev/vhost-vsock-netns
- vmtest.sh already merged, so new patch just adds new tests for NS
- Link to v2:
https://lore.kernel.org/kvm/20250312-vsock-netns-v2-0-84bffa1aa97a@gmail.com
Changes in v2:
- only support vhost-vsock namespaces
- all g2h namespaces retain old behavior, only common API changes
impacted by vhost-vsock changes
- add /dev/vhost-vsock-netns for "opt-in"
- leave /dev/vhost-vsock to old behavior
- removed netns module param
- Link to v1:
https://lore.kernel.org/r/20200116172428.311437-1-sgarzare@redhat.com
Changes in v1:
- added 'netns' module param to vsock.ko to enable the
network namespace support (disabled by default)
- added 'vsock_net_eq()' to check the "net" assigned to a socket
only when 'netns' support is enabled
- Link to RFC: https://patchwork.ozlabs.org/cover/1202235/
---
Bobby Eshleman (13):
vsock: a per-net vsock NS mode state
vsock: add netns to vsock core
vsock: reject bad VSOCK_NET_MODE_LOCAL configuration for G2H
virtio: set skb owner of virtio_transport_reset_no_sock() reply
vsock: add netns support to virtio transports
selftests/vsock: add namespace helpers to vmtest.sh
selftests/vsock: prepare vm management helpers for namespaces
selftests/vsock: add vm_dmesg_{warn,oops}_count() helpers
selftests/vsock: use ss to wait for listeners instead of /proc/net
selftests/vsock: add tests for proc sys vsock ns_mode
selftests/vsock: add namespace tests for CID collisions
selftests/vsock: add tests for host <-> vm connectivity with namespaces
selftests/vsock: add tests for namespace deletion and mode changes
MAINTAINERS | 1 +
drivers/vhost/vsock.c | 57 +-
include/linux/virtio_vsock.h | 8 +-
include/net/af_vsock.h | 64 +-
include/net/net_namespace.h | 4 +
include/net/netns/vsock.h | 17 +
net/vmw_vsock/af_vsock.c | 290 ++++++++-
net/vmw_vsock/hyperv_transport.c | 6 +
net/vmw_vsock/virtio_transport.c | 29 +-
net/vmw_vsock/virtio_transport_common.c | 69 +-
net/vmw_vsock/vmci_transport.c | 12 +
net/vmw_vsock/vsock_loopback.c | 20 +-
tools/testing/selftests/vsock/vmtest.sh | 1087 +++++++++++++++++++++++++++++--
13 files changed, 1560 insertions(+), 104 deletions(-)
---
base-commit: 962ac5ca99a5c3e7469215bf47572440402dfd59
change-id: 20250325-vsock-vmtest-b3a21d2102c2
prerequisite-message-id: <20251022-vsock-selftests-fixes-and-improvements-v1-0-edeb179d6463(a)meta.com>
prerequisite-patch-id: a2eecc3851f2509ed40009a7cab6990c6d7cfff5
prerequisite-patch-id: 501db2100636b9c8fcb3b64b8b1df797ccbede85
prerequisite-patch-id: ba1a2f07398a035bc48ef72edda41888614be449
prerequisite-patch-id: fd5cc5445aca9355ce678e6d2bfa89fab8a57e61
prerequisite-patch-id: 795ab4432ffb0843e22b580374782e7e0d99b909
prerequisite-patch-id: 1499d263dc933e75366c09e045d2125ca39f7ddd
prerequisite-patch-id: f92d99bb1d35d99b063f818a19dcda999152d74c
prerequisite-patch-id: e3296f38cdba6d903e061cff2bbb3e7615e8e671
prerequisite-patch-id: bc4662b4710d302d4893f58708820fc2a0624325
prerequisite-patch-id: f8991f2e98c2661a706183fde6b35e2b8d9aedcf
prerequisite-patch-id: 44bf9ed69353586d284e5ee63d6fffa30439a698
prerequisite-patch-id: d50621bc630eeaf608bbaf260370c8dabf6326df
Best regards,
--
Bobby Eshleman <bobbyeshleman(a)meta.com>
First patch here tries to auto-disable building the iouring sample.
Our CI will still run the iouring test(s), of course, but it looks
like the liburing updates aren't very quick in distroes and having
to hack around it when developing unrelated tests is a bit annoying.
Remaining 4 patches iron out running the Toeplitz hash test against
real NICs. I tested mlx5, bnxt and fbnic, they all pass now.
I switched to using YNL directly in the C code, can't see a reason
to get the info in Python and pass it to C via argv. The old code
likely did this because it predates YNL.
Jakub Kicinski (5):
selftests: hw-net: auto-disable building the iouring C code
selftests: hw-net: toeplitz: make sure NICs have pure Toeplitz
configured
selftests: hw-net: toeplitz: read the RSS key directly from C
selftests: hw-net: toeplitz: read indirection table from the device
selftests: hw-net: toeplitz: give the test up to 4 seconds
.../testing/selftests/drivers/net/hw/Makefile | 23 ++++++-
.../selftests/drivers/net/hw/toeplitz.c | 65 ++++++++++++++++++-
.../selftests/drivers/net/hw/toeplitz.py | 28 ++++----
3 files changed, 98 insertions(+), 18 deletions(-)
--
2.51.1
The current netconsole implementation allocates a static buffer for
extradata (userdata + sysdata) with a fixed size of
MAX_EXTRADATA_ENTRY_LEN * MAX_EXTRADATA_ITEMS bytes for every target,
regardless of whether userspace actually uses this feature. This forces
us to keep MAX_EXTRADATA_ITEMS small (16), which is restrictive for
users who need to attach more metadata to their log messages.
This patch series enables dynamic allocation of the userdata buffer,
allowing it to grow on-demand based on actual usage. The series:
1. Refactors send_fragmented_body() to simplify handling of separated
userdata and sysdata (patch 1/4)
2. Splits userdata and sysdata into separate buffers (patch 2/4)
3. Implements dynamic allocation for the userdata buffer (patch 3/4)
4. Increases MAX_USERDATA_ITEMS from 16 to 256 now that we can do so
without memory waste (patch 4/4)
Benefits:
- No memory waste when userdata is not used
- Targets that use userdata only consume what they need
- Users can attach significantly more metadata without impacting systems
that don't use this feature
Signed-off-by: Gustavo Luiz Duarte <gustavold(a)gmail.com>
---
Changes in v3:
- Split calculating the lentgh of the formatted userdata string into a
separate function calc_userdata_len().
- Exit update_userdata() immediately if we hit WARN due to too many
userdata entries.
- Use offset instead of len to save userdata_length in update_userdata()
- Link to v2: https://lore.kernel.org/r/20251113-netconsole_dynamic_extradata-v2-0-18cf7f…
Changes in v2:
- Added null pointer checks for userdata and sysdata buffers
- Added MAX_SYSDATA_ITEMS to enum sysdata_feature
- Moved code out of ifdef in send_msg_no_fragmentation()
- Renamed variables in send_fragmented_body() to make it easier to
reason about the code
- Link to v1: https://lore.kernel.org/r/20251105-netconsole_dynamic_extradata-v1-0-142890…
---
Gustavo Luiz Duarte (4):
netconsole: Simplify send_fragmented_body()
netconsole: Split userdata and sysdata
netconsole: Dynamic allocation of userdata buffer
netconsole: Increase MAX_USERDATA_ITEMS
drivers/net/netconsole.c | 386 +++++++++++----------
.../selftests/drivers/net/netcons_overflow.sh | 2 +-
2 files changed, 195 insertions(+), 193 deletions(-)
---
base-commit: 45a1cd8346ca245a1ca475b26eb6ceb9d8b7c6f0
change-id: 20251007-netconsole_dynamic_extradata-21bd9d726568
Best regards,
--
Gustavo Duarte <gustavold(a)meta.com>
Hi,
This series fixes issues in the devlink_rate_tc_bw.py selftest and
introduces a new Iperf3Runner that helps with measurement handling.
Thanks,
Carolina
Carolina Jubran (6):
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in
devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in
devlink_rate_tc_bw.py
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
.../testing/selftests/drivers/net/hw/Makefile | 1 +
.../drivers/net/hw/devlink_rate_tc_bw.py | 174 ++++++++----------
.../drivers/net/hw/lib/py/__init__.py | 5 +-
.../selftests/drivers/net/lib/py/__init__.py | 5 +-
.../selftests/drivers/net/lib/py/load.py | 84 ++++++++-
5 files changed, 157 insertions(+), 112 deletions(-)
--
2.38.1
From: Hui Zhu <zhuhui(a)kylinos.cn>
This series proposes adding eBPF support to the Linux memory
controller, enabling dynamic and extensible memory management
policies at runtime.
Background
The memory controller (memcg) currently provides fixed memory
accounting and reclamation policies through static kernel code.
This limits flexibility for specialized workloads and use cases
that require custom memory management strategies.
By enabling eBPF programs to hook into key memory control
operations, administrators can implement custom policies without
recompiling the kernel, while maintaining the safety guarantees
provided by the BPF verifier.
Use Cases
1. Custom memory reclamation strategies for specialized workloads
2. Dynamic memory pressure monitoring and telemetry
3. Memory accounting adjustments based on runtime conditions
4. Integration with container orchestration systems for
intelligent resource management
5. Research and experimentation with novel memory management
algorithms
Design Overview
This series introduces:
1. A new BPF struct ops type (`memcg_ops`) that allows eBPF
programs to implement custom behavior for memory charging
operations.
2. A hook point in the `try_charge_memcg()` fast path that
invokes registered eBPF programs to determine if custom
memory management should be applied.
3. The eBPF handler can inspect memory cgroup context and
optionally modify certain parameters (e.g., `nr_pages` for
reclamation size).
4. A reference counting mechanism using `percpu_ref` to safely
manage the lifecycle of registered eBPF struct ops instances.
5. Configuration via `CONFIG_MEMCG_BPF` to allow disabling this
feature at build time.
Implementation Details
- Uses BPF struct ops for a cleaner integration model
- Leverages static branch keys for minimal overhead when feature
is unused
- RCU synchronization ensures safe replacement of handlers
- Sample eBPF program demonstrates monitoring capabilities
- Comprehensive selftest suite validates core functionality
Performance Considerations
- Zero overhead when feature is disabled or no eBPF program is
loaded (static branch is disabled)
- Minimal overhead when enabled: one indirect function call per
charge attempt
- eBPF programs run under the restrictions of the BPF verifier
Patch Overview
PATCH 1/3: Core kernel implementation
- Adds eBPF struct ops support to memcg
- Introduces CONFIG_MEMCG_BPF option
- Implements safe registration/unregistration mechanism
PATCH 2/3: Selftest suite
- prog_tests/memcg_ops.c: Test entry points
- progs/memcg_ops.bpf.c: Test eBPF program
- Validates load, attach, and single-handler constraints
PATCH 3/3: Sample userspace program
- samples/bpf/memcg_printk.bpf.c: Monitoring eBPF program
- samples/bpf/memcg_printk.c: Userspace loader
- Demonstrates real-world usage and debugging capabilities
Open Questions & Discussion Points
1. Should the eBPF handler have access to additional memory
cgroup state? Current design exposes minimal context to
reduce attack surface.
2. Are there other memory control operations that would benefit
from eBPF extensibility (e.g., uncharge, reclaim)?
3. Should there be permission checks or restrictions on who can
load memcg eBPF programs? Currently inherits BPF's
CAP_PERFMON/CAP_SYS_ADMIN requirements.
4. How should we handle multiple eBPF programs trying to
register? Current implementation allows only one active
handler.
5. Is the current exposed context in `try_charge_memcg` struct
sufficient, or should additional fields be added?
Testing
The selftests provide comprehensive coverage of the core
functionality. The sample program can be used for manual
testing and as a reference for implementing additional
monitoring tools.
Hui Zhu (3):
memcg: add eBPF struct ops support for memory charging
selftests/bpf: add memcg eBPF struct ops test
samples/bpf: add example memcg eBPF program
MAINTAINERS | 5 +
init/Kconfig | 38 ++++
mm/Makefile | 1 +
mm/memcontrol.c | 26 ++-
mm/memcontrol_bpf.c | 200 ++++++++++++++++++
mm/memcontrol_bpf.h | 103 +++++++++
samples/bpf/Makefile | 2 +
samples/bpf/memcg_printk.bpf.c | 30 +++
samples/bpf/memcg_printk.c | 82 +++++++
.../selftests/bpf/prog_tests/memcg_ops.c | 117 ++++++++++
tools/testing/selftests/bpf/progs/memcg_ops.c | 20 ++
11 files changed, 617 insertions(+), 7 deletions(-)
create mode 100644 mm/memcontrol_bpf.c
create mode 100644 mm/memcontrol_bpf.h
create mode 100644 samples/bpf/memcg_printk.bpf.c
create mode 100644 samples/bpf/memcg_printk.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_ops.c
create mode 100644 tools/testing/selftests/bpf/progs/memcg_ops.c
--
2.43.0
Main objective of this series is to convert the gro.sh and toeplitz.sh
tests to be "NIPA-compatible" - meaning make use of the Python env,
which lets us run the tests against either netdevsim or a real device.
The tests seem to have been written with a different flow in mind.
Namely they source different bash "setup" scripts depending on arguments
passed to the test. While I have nothing against the use of bash and
the overall architecture - the existing code needs quite a bit of work
(don't assume MAC/IP addresses, support remote endpoint over SSH).
If I'm the one fixing it, I'd rather convert them to our "simplistic"
Python.
This series rewrites the tests in Python while addressing their
shortcomings. The functionality of running the test over loopback
on a real device is retained but with a different method of invocation
(see the last patch).
Once again we are dealing with a script which run over a variety of
protocols (combination of [ipv4, ipv6, ipip] x [tcp, udp]). The first
4 patches add support for test variants to our scripts. We use the
term "variant" in the same sense as the C kselftest_harness.h -
variant is just a set of static input arguments.
Note that neither GRO nor the Toeplitz test fully passes for me on
any HW I have access to. But this is unrelated to the conversion.
This series is not making any real functional changes to the tests,
it is limited to improving the "test harness" scripts.
v3:
[patch 1] Exception -> BaseException
[patch 3] use named tuple instead of attaching attrs directly to a func
[patch 9] restore the comment about retries in GRO test
[patch 10] use open() instead of echo
[patch 10] move MTU changes to _setup() to handle all the config related
stuff in that function
v2: https://lore.kernel.org/20251118215126.2225826-1-kuba@kernel.org
[patch 5] fix accidental modification of gitignore
[patch 8] fix typo in "compared"
[patch 9] fix typo I -> It
[patch 10] fix typoe configure -> configured
v1: https://lore.kernel.org/20251117205810.1617533-1-kuba@kernel.org
Jakub Kicinski (12):
selftests: net: py: coding style improvements
selftests: net: py: extract the case generation logic
selftests: net: py: add test variants
selftests: drv-net: xdp: use variants for qstat tests
selftests: net: relocate gro and toeplitz tests to drivers/net
selftests: net: py: support ksft ready without wait
selftests: net: py: read ip link info about remote dev
netdevsim: pass packets thru GRO on Rx
selftests: drv-net: add a Python version of the GRO test
selftests: drv-net: hw: convert the Toeplitz test to Python
netdevsim: add loopback support
selftests: net: remove old setup_* scripts
tools/testing/selftests/drivers/net/Makefile | 2 +
.../testing/selftests/drivers/net/hw/Makefile | 6 +-
tools/testing/selftests/net/Makefile | 7 -
tools/testing/selftests/net/lib/Makefile | 1 +
drivers/net/netdevsim/netdev.c | 26 ++-
.../testing/selftests/{ => drivers}/net/gro.c | 5 +-
.../{net => drivers/net/hw}/toeplitz.c | 7 +-
.../testing/selftests/drivers/net/.gitignore | 1 +
tools/testing/selftests/drivers/net/gro.py | 164 ++++++++++++++
.../selftests/drivers/net/hw/.gitignore | 1 +
.../drivers/net/hw/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/hw/toeplitz.py | 209 ++++++++++++++++++
.../selftests/drivers/net/lib/py/__init__.py | 4 +-
.../selftests/drivers/net/lib/py/env.py | 2 +
tools/testing/selftests/drivers/net/xdp.py | 42 ++--
tools/testing/selftests/net/.gitignore | 2 -
tools/testing/selftests/net/gro.sh | 105 ---------
.../selftests/net/lib/ksft_setup_loopback.sh | 111 ++++++++++
.../testing/selftests/net/lib/py/__init__.py | 5 +-
tools/testing/selftests/net/lib/py/ksft.py | 91 ++++++--
tools/testing/selftests/net/lib/py/nsim.py | 2 +-
tools/testing/selftests/net/lib/py/utils.py | 20 +-
tools/testing/selftests/net/setup_loopback.sh | 120 ----------
tools/testing/selftests/net/setup_veth.sh | 45 ----
tools/testing/selftests/net/toeplitz.sh | 199 -----------------
.../testing/selftests/net/toeplitz_client.sh | 28 ---
26 files changed, 632 insertions(+), 577 deletions(-)
rename tools/testing/selftests/{ => drivers}/net/gro.c (99%)
rename tools/testing/selftests/{net => drivers/net/hw}/toeplitz.c (99%)
create mode 100755 tools/testing/selftests/drivers/net/gro.py
create mode 100755 tools/testing/selftests/drivers/net/hw/toeplitz.py
delete mode 100755 tools/testing/selftests/net/gro.sh
create mode 100755 tools/testing/selftests/net/lib/ksft_setup_loopback.sh
delete mode 100644 tools/testing/selftests/net/setup_loopback.sh
delete mode 100644 tools/testing/selftests/net/setup_veth.sh
delete mode 100755 tools/testing/selftests/net/toeplitz.sh
delete mode 100755 tools/testing/selftests/net/toeplitz_client.sh
--
2.51.1
Note: this requires INPUT_PROP_PRESSUREPAD [1] which is not yet
available in Linus' tree but it is in Dmitry's for-linus tree.
Nicely enough MS defines a button type for a pressurepad touchpad [2]
and it looks like most touchpad vendors fill this in.
The selftests require a bit of prep work (and a hack for the test
itself) - hidtools 0.12 requires python-libevdev 0.13 which in turn
provides constructors for unknown properties.
[1] https://lore.kernel.org/linux-input/20251030011735.GA969565@quokka/T/#m9d9b…
[2] https://learn.microsoft.com/en-us/windows-hardware/design/component-guideli…
Signed-off-by: Peter Hutterer <peter.hutterer(a)who-t.net>
---
Peter Hutterer (3):
selftests/hid: require hidtools 0.12
selftests/hid: use a enum class for the different button types
HID: multitouch: set INPUT_PROP_PRESSUREPAD based on Digitizer/Button Type
drivers/hid/hid-multitouch.c | 12 ++++-
tools/testing/selftests/hid/tests/conftest.py | 14 +++++
.../testing/selftests/hid/tests/test_multitouch.py | 61 +++++++++++++++++-----
3 files changed, 73 insertions(+), 14 deletions(-)
---
base-commit: 2bc4c50a42f8b83f611d0475598dc72740e87640
change-id: 20251111-wip-hid-pressurepad-8a800cdf1813
Best regards,
--
Peter Hutterer <peter.hutterer(a)who-t.net>