I had cause to look at the vfork() support for GCS and realised that we
don't have any direct test coverage, this series does so by adding
vfork() to nolibc and then using that in basic-gcs to provide some
simple vfork() coverage.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v3:
- Stylistic nits in the GCS vfork() test.
- SPARC has a non-standard vfork() ABI which needs handling.
- Link to v2: https://lore.kernel.org/r/20250610-arm64-gcs-vfork-exit-v2-0-929443dfcf82@k…
Changes in v2:
- Add replacement of ifdef with if defined() in nolibc since the code
doesn't reflect the coding style.
- Remove check for arch specific vfork().
- Link to v1: https://lore.kernel.org/r/20250609-arm64-gcs-vfork-exit-v1-0-baad0f085747@k…
---
Mark Brown (4):
tools/nolibc: Replace ifdef with if defined() in sys.h
tools/nolibc: Provide vfork()
kselftest/arm64: Add a test for vfork() with GCS
selftests/nolibc: Add coverage of vfork()
tools/include/nolibc/arch-sparc.h | 16 +++++++
tools/include/nolibc/sys.h | 59 ++++++++++++++++++-------
tools/testing/selftests/arm64/gcs/basic-gcs.c | 63 +++++++++++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 23 ++++++++--
4 files changed, 142 insertions(+), 19 deletions(-)
---
base-commit: 86731a2a651e58953fc949573895f2fa6d456841
change-id: 20250528-arm64-gcs-vfork-exit-4a7daf7652ee
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v12 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v12 (04-Jul-2025)
- Fix compilation issues with some intermediate patches in v11
- Add more comments for AccECN helpers of tcp_ecn.h
v11 (03-Jul-2025)
- Fix compilation issues with some intermediate patches in v10
v10 (02-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (6):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: Add wait_third_ack for ECN negotiation in simultaneous connect
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 13 +
include/linux/tcp.h | 33 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 663 ++++++++++++++++++
include/uapi/linux/tcp.h | 7 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_input.c | 371 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 297 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1444 insertions(+), 187 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
The kunit test that checks the longests symbol length [1], has triggered
warnings in some pilelines when symbol prefixes are used [2][3]. The test
will to depend on !PREFIX_SYMBOLS and !CFI_CLANG as sujested in [4] and
on !GCOV_KERNEL.
[1] https://lore.kernel.org/rust-for-linux/CABVgOSm=5Q0fM6neBhxSbOUHBgNzmwf2V22…
[2] https://lore.kernel.org/all/20250328112156.2614513-1-arnd@kernel.org/T/#u
[3] https://lore.kernel.org/rust-for-linux/bbd03b37-c4d9-4a92-9be2-75aaf8c19815…
[4] https://lore.kernel.org/linux-kselftest/20250427200916.GA1661412@ax162/T/#t
Reviewed-by: Rae Moar <rmoar(a)google.com>
Signed-off-by: Sergio González Collado <sergio.collado(a)gmail.com>
---
v2 -> v3: added dependency on !GCOV_KERNEL (to avoid __gcov_ prefix)
---
v1 -> v2: added dependency on !CFI_CLANG as suggested in [3], removed
CONFIG_ prefix
---
lib/Kconfig.debug | 1 +
lib/tests/longest_symbol_kunit.c | 3 +--
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index f9051ab610d5..e55c761eae20 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2886,6 +2886,7 @@ config FORTIFY_KUNIT_TEST
config LONGEST_SYM_KUNIT_TEST
tristate "Test the longest symbol possible" if !KUNIT_ALL_TESTS
depends on KUNIT && KPROBES
+ depends on !PREFIX_SYMBOLS && !CFI_CLANG && !GCOV_KERNEL
default KUNIT_ALL_TESTS
help
Tests the longest symbol possible
diff --git a/lib/tests/longest_symbol_kunit.c b/lib/tests/longest_symbol_kunit.c
index e3c28ff1807f..9b4de3050ba7 100644
--- a/lib/tests/longest_symbol_kunit.c
+++ b/lib/tests/longest_symbol_kunit.c
@@ -3,8 +3,7 @@
* Test the longest symbol length. Execute with:
* ./tools/testing/kunit/kunit.py run longest-symbol
* --arch=x86_64 --kconfig_add CONFIG_KPROBES=y --kconfig_add CONFIG_MODULES=y
- * --kconfig_add CONFIG_RETPOLINE=n --kconfig_add CONFIG_CFI_CLANG=n
- * --kconfig_add CONFIG_MITIGATION_RETPOLINE=n
+ * --kconfig_add CONFIG_CPU_MITIGATIONS=n --kconfig_add CONFIG_GCOV_KERNEL=n
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
base-commit: 1a80a098c606b285fb0a13aa992af4f86da1ff06
--
2.39.2
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli(a)nabijaczleweli.xyz>
---
v1: https://lore.kernel.org/lkml/h2ieddqja5jfrnuh3mvlxt6njrvp352t5rfzp2cvnrufop…
tools/testing/selftests/powerpc/tm/tm-tmspr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/powerpc/tm/tm-tmspr.c b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
index dd5ddffa28b7..0d64988ffb40 100644
--- a/tools/testing/selftests/powerpc/tm/tm-tmspr.c
+++ b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
@@ -14,7 +14,7 @@
* (1) create more threads than cpus
* (2) in each thread:
* (a) set TFIAR and TFHAR a unique value
- * (b) loop for awhile, continually checking to see if
+ * (b) loop for a while, continually checking to see if
* either register has been corrupted.
*
* (3) Loop:
--
2.39.5
I've removed the RFC tag from this version of the series, but the items
that I'm looking for feedback on remains the same:
- The userspace ABI, in particular:
- The vector length used for the SVE registers, access to the SVE
registers and access to ZA and (if available) ZT0 depending on
the current state of PSTATE.{SM,ZA}.
- The use of a single finalisation for both SVE and SME.
- The addition of control for enabling fine grained traps in a similar
manner to FGU but without the UNDEF, I'm not clear if this is desired
at all and at present this requires symmetric read and write traps like
FGU. That seemed like it might be desired from an implementation
point of view but we already have one case where we enable an
asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it
seems generally useful to enable asymmetrically.
This series implements support for SME use in non-protected KVM guests.
Much of this is very similar to SVE, the main additional challenge that
SME presents is that it introduces a new vector length similar to the
SVE vector length and two new controls which change the registers seen
by guests:
- PSTATE.ZA enables the ZA matrix register and, if SME2 is supported,
the ZT0 LUT register.
- PSTATE.SM enables streaming mode, a new floating point mode which
uses the SVE register set with the separately configured SME vector
length. In streaming mode implementation of the FFR register is
optional.
It is also permitted to build systems which support SME without SVE, in
this case when not in streaming mode no SVE registers or instructions
are available. Further, there is no requirement that there be any
overlap in the set of vector lengths supported by SVE and SME in a
system, this is expected to be a common situation in practical systems.
Since there is a new vector length to configure we introduce a new
feature parallel to the existing SVE one with a new pseudo register for
the streaming mode vector length. Due to the overlap with SVE caused by
streaming mode rather than finalising SME as a separate feature we use
the existing SVE finalisation to also finalise SME, a new define
KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising
SVE and SME separately would introduce complication with register access
since finalising SVE makes the SVE registers writeable by userspace and
doing multiple finalisations results in an error being reported.
Dealing with a state where the SVE registers are writeable due to one of
SVE or SME being finalised but may have their VL changed by the other
being finalised seems like needless complexity with minimal practical
utility, it seems clearer to just express directly that only one
finalisation can be done in the ABI.
Access to the floating point registers follows the architecture:
- When both SVE and SME are present:
- If PSTATE.SM == 0 the vector length used for the Z and P registers
is the SVE vector length.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
is the SME vector length.
- If only SME is present:
- If PSTATE.SM == 0 the Z and P registers are inaccessible and the
floating point state accessed via the encodings for the V registers.
- If PSTATE.SM == 1 the vector length used for the Z and P registers
- The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1.
The VMM must understand this, in particular when loading state SVCR
should be configured before other state. It should be noted that while
the architecture refers to PSTATE.SM and PSTATE.ZA these PSTATE bits are
not preserved in SPSR_ELx, they are only accessible via SVCR.
There are a large number of subfeatures for SME, most of which only
offer additional instructions but some of which (SME2 and FA64) add
architectural state. These are configured via the ID registers as per
usual.
Protected KVM supported, with the implementation maintaining the
existing restriction that the hypervisor will refuse to run if streaming
mode or ZA is enabled. This both simplfies the code and avoids the need
to allocate storage for host ZA and ZT0 state, there seems to be little
practical use case for supporting this and the memory usage would be
non-trivial.
The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been
added to the get-reg-list selftest, the idea of supporting additional
features there without restructuring the program to generate all
possible feature combinations has been rejected. I will post a separate
series which does that restructuring.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v6:
- Rebase onto v6.16-rc3.
- Link to v5: https://lore.kernel.org/r/20250417-kvm-arm64-sme-v5-0-f469a2d5f574@kernel.o…
Changes in v5:
- Rebase onto v6.15-rc2.
- Add pKVM guest support.
- Always restore SVCR.
- Link to v4: https://lore.kernel.org/r/20250214-kvm-arm64-sme-v4-0-d64a681adcc2@kernel.o…
Changes in v4:
- Rebase onto v6.14-rc2 and Mark Rutland's fixes.
- Expose SME to nested guests.
- Additional cleanups and test fixes following on from the rebase.
- Flush register state on VMM PSTATE.{SM,ZA}.
- Link to v3: https://lore.kernel.org/r/20241220-kvm-arm64-sme-v3-0-05b018c1ffeb@kernel.o…
Changes in v3:
- Rebase onto v6.12-rc2.
- Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.o…
Changes in v2:
- Rebase onto v6.7-rc3.
- Configure subfeatures based on host system only.
- Complete nVHE support.
- There was some snafu with sending v1 out, it didn't make it to the
lists but in case it hit people's inboxes I'm sending as v2.
---
Mark Brown (28):
arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state
arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time
arm64/fpsimd: Check enable bit for FA64 when saving EFI state
arm64/fpsimd: Determine maximum virtualisable SME vector length
KVM: arm64: Introduce non-UNDEF FGT control
KVM: arm64: Pay attention to FFR parameter in SVE save and load
KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h
KVM: arm64: Move SVE state access macros after feature test macros
KVM: arm64: Rename SVE finalization constants to be more general
KVM: arm64: Document the KVM ABI for SME
KVM: arm64: Define internal features for SME
KVM: arm64: Rename sve_state_reg_region
KVM: arm64: Store vector lengths in an array
KVM: arm64: Implement SME vector length configuration
KVM: arm64: Support SME control registers
KVM: arm64: Support TPIDR2_EL0
KVM: arm64: Support SME identification registers for guests
KVM: arm64: Support SME priority registers
KVM: arm64: Provide assembly for SME register access
KVM: arm64: Support userspace access to streaming mode Z and P registers
KVM: arm64: Flush register state on writes to SVCR.SM and SVCR.ZA
KVM: arm64: Expose SME specific state to userspace
KVM: arm64: Context switch SME state for guests
KVM: arm64: Handle SME exceptions
KVM: arm64: Expose SME to nested guests
KVM: arm64: Provide interface for configuring and enabling SME for guests
KVM: arm64: selftests: Add SME system registers to get-reg-list
KVM: arm64: selftests: Add SME to set_id_regs test
Documentation/virt/kvm/api.rst | 117 +++++++----
arch/arm64/include/asm/fpsimd.h | 26 +++
arch/arm64/include/asm/kvm_emulate.h | 6 +
arch/arm64/include/asm/kvm_host.h | 168 ++++++++++++---
arch/arm64/include/asm/kvm_hyp.h | 5 +-
arch/arm64/include/asm/kvm_pkvm.h | 2 +-
arch/arm64/include/asm/vncr_mapping.h | 2 +
arch/arm64/include/uapi/asm/kvm.h | 33 +++
arch/arm64/kernel/cpufeature.c | 2 -
arch/arm64/kernel/fpsimd.c | 89 ++++----
arch/arm64/kvm/arm.c | 10 +
arch/arm64/kvm/fpsimd.c | 28 ++-
arch/arm64/kvm/guest.c | 252 ++++++++++++++++++++---
arch/arm64/kvm/handle_exit.c | 14 ++
arch/arm64/kvm/hyp/fpsimd.S | 28 ++-
arch/arm64/kvm/hyp/include/hyp/switch.h | 175 ++++++++++++++--
arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 97 +++++----
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 86 ++++++--
arch/arm64/kvm/hyp/nvhe/pkvm.c | 81 ++++++--
arch/arm64/kvm/hyp/nvhe/switch.c | 4 +-
arch/arm64/kvm/hyp/nvhe/sys_regs.c | 6 +
arch/arm64/kvm/hyp/vhe/switch.c | 17 +-
arch/arm64/kvm/nested.c | 3 +-
arch/arm64/kvm/reset.c | 156 ++++++++++----
arch/arm64/kvm/sys_regs.c | 140 ++++++++++++-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/arm64/get-reg-list.c | 32 ++-
tools/testing/selftests/kvm/arm64/set_id_regs.c | 29 ++-
28 files changed, 1315 insertions(+), 294 deletions(-)
---
base-commit: 7204503c922cfdb4fcfce4a4ab61f4558a01a73b
change-id: 20230301-kvm-arm64-sme-06a1246d3636
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Mark Rutland's recent SME fixes updated the SME ABI to reject any
attempt to write FPSIMD register data via the streaming mode SVE
register set but did not update the sve-ptrace test to take account of
this, resulting in spurious failures. Update the test for this, and
also fix another preexisting issue I noticed while looking at this.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.16-rc1.
- Update fixes tag for patch 1.
- Link to v1: https://lore.kernel.org/r/20250523-kselftest-arm64-ssve-fixups-v1-0-65069a2…
---
Mark Brown (3):
kselftest/arm64: Fix check for setting new VLs in sve-ptrace
kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
tools/testing/selftests/arm64/fp/sve-ptrace.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250523-kselftest-arm64-ssve-fixups-b68ae61c1ebf
Best regards,
--
Mark Brown <broonie(a)kernel.org>
FEAT_LSFE is optional from v9.5, it adds new instructions for atomic
memory operations with floating point values. We have no immediate use
for it in kernel, provide a hwcap so userspace can discover it and allow
the ID register field to be exposed to KVM guests.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Fix result of vi dropping in hwcap test.
- Link to v1: https://lore.kernel.org/r/20250627-arm64-lsfe-v1-0-68351c4bf741@kernel.org
---
Mark Brown (3):
arm64/hwcap: Add hwcap for FEAT_LSFE
KVM: arm64: Expose FEAT_LSFE to guests
kselftest/arm64: Add lsfe to the hwcaps test
Documentation/arch/arm64/elf_hwcaps.rst | 4 ++++
arch/arm64/include/asm/hwcap.h | 1 +
arch/arm64/include/uapi/asm/hwcap.h | 1 +
arch/arm64/kernel/cpufeature.c | 2 ++
arch/arm64/kernel/cpuinfo.c | 1 +
arch/arm64/kvm/sys_regs.c | 4 +++-
tools/testing/selftests/arm64/abi/hwcap.c | 21 +++++++++++++++++++++
7 files changed, 33 insertions(+), 1 deletion(-)
---
base-commit: 86731a2a651e58953fc949573895f2fa6d456841
change-id: 20250625-arm64-lsfe-0810cf98adc2
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Hi all,
In the last Fall Reinette mentioned functional tests of resctrl would
be preferred over selftests that are based on performance measurement.
This series tries to address that shortcoming by adding some functional
tests for resctrl FS interface and another that checks MSRs match to
what is written through resctrl FS. The MSR test is only available for
Intel CPUs at the moment.
Why RFC?
The new functional selftest itself works, AFAIK. However, calling
ksft_test_result_skip() in cat.c if MSR reading is found to be
unavailable is problematic because of how kselftest harness is
architected. The kselftest.h header itself defines some variables, so
including it into different .c files results in duplicating the test
framework related variables (duplication of ksft_count matters in this
case).
The duplication problem could be worked around by creating a resctrl
selftest specific wrapper for ksft_test_result_skip() into
resctrl_tests.c so the accounting would occur in the "correct" .c file,
but perhaps that is considered hacky and the selftest framework/build
systems should be reworked to avoid duplicating variables?
Ilpo Järvinen (2):
kselftest/resctrl: CAT L3 resctrl FS function tests
kselftest/resctrl: Add CAT L3 CBM functional test for Intel RDT
tools/testing/selftests/resctrl/cat_test.c | 210 ++++++++++++++++++
tools/testing/selftests/resctrl/msr.c | 55 +++++
tools/testing/selftests/resctrl/resctrl.h | 6 +
.../testing/selftests/resctrl/resctrl_tests.c | 2 +
tools/testing/selftests/resctrl/resctrlfs.c | 48 ++++
5 files changed, 321 insertions(+)
create mode 100644 tools/testing/selftests/resctrl/msr.c
base-commit: c1d7e19c70cbb8a19f63c190cf53e71b5f970514
--
2.39.5
Hi all,
This patch series addresses false positives in the generic mm selftests
and skips tests that cannot run correctly due to missing features or system
limitations.
---
v1: https://lore.kernel.org/all/20250616160632.35250-1-aboorvad@linux.ibm.com/
Changes in v2:
- Rebased onto the mm-new branch, top commit of the base is 3b4a8ad89f7e ("mm: add zblock allocator").
- Split some patches for clarity.
- Updated virtual_address_range test to support testing 4PB VA on PPC64.
- Added proper Fixes: tags.
- Included a patch to skip a failing userfaultfd test when unsupported,
instead of reporting a failure.
---
Please let us know if you have any further comments.
Thanks,
Aboorva
Aboorva Devarajan (3):
selftests/mm: Fix child process exit codes in ksm_functional_tests
selftests/mm: Skip thuge-gen if shmmax is too small or no 1G huge
pages
selftests/mm: Skip hugepage-mremap test if userfaultfd unavailable
Donet Tom (4):
mm/selftests: Fix incorrect pointer being passed to mark_range()
selftests/mm: Add support to test 4PB VA on PPC64
selftest/mm: Fix ksm_funtional_test failures
mm/selftests: Fix split_huge_page_test failure on systems with 64KB
page size
tools/testing/selftests/mm/hugepage-mremap.c | 16 ++++++++++---
.../selftests/mm/ksm_functional_tests.c | 24 +++++++++++++------
.../selftests/mm/split_huge_page_test.c | 23 +++++++++++++-----
tools/testing/selftests/mm/thuge-gen.c | 11 +++++----
.../selftests/mm/virtual_address_range.c | 8 ++++++-
5 files changed, 61 insertions(+), 21 deletions(-)
--
2.43.5
I had cause to look at the vfork() support for GCS and realised that we
don't have any direct test coverage, this series does so by adding
vfork() to nolibc and then using that in basic-gcs to provide some
simple vfork() coverage.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Add replacement of ifdef with if defined() in nolibc since the code
doesn't reflect the coding style.
- Remove check for arch specific vfork().
- Link to v1: https://lore.kernel.org/r/20250609-arm64-gcs-vfork-exit-v1-0-baad0f085747@k…
---
Mark Brown (4):
tools/nolibc: Replace ifdef with if defined() in sys.h
tools/nolibc: Provide vfork()
kselftest/arm64: Add a test for vfork() with GCS
selftests/nolibc: Add coverage of vfork()
tools/include/nolibc/sys.h | 57 +++++++++++++++++-------
tools/testing/selftests/arm64/gcs/basic-gcs.c | 63 +++++++++++++++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 23 ++++++++--
3 files changed, 124 insertions(+), 19 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250528-arm64-gcs-vfork-exit-4a7daf7652ee
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Of these:
7 "for a while" typos
5 "take a while" typos
1 misreading of "once in a while"?
3 awhiles used correctly remain in the tree
Signed-off-by: Ahelenia Ziemiańska <nabijaczleweli(a)nabijaczleweli.xyz>
---
Documentation/trace/histogram.rst | 2 +-
arch/sh/drivers/pci/common.c | 2 +-
arch/sh/drivers/pci/pci-sh7780.c | 2 +-
drivers/atm/lanai.c | 2 +-
drivers/md/bcache/bcache.h | 2 +-
drivers/md/bcache/request.c | 2 +-
drivers/net/ethernet/google/gve/gve_rx_dqo.c | 2 +-
drivers/scsi/hpsa.c | 2 +-
drivers/tty/serial/jsm/jsm_neo.c | 2 +-
fs/ocfs2/dlm/dlmrecovery.c | 2 +-
sound/pci/emu10k1/emu10k1_main.c | 2 +-
sound/pci/emu10k1/emupcm.c | 2 +-
tools/testing/selftests/powerpc/tm/tm-tmspr.c | 2 +-
13 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/Documentation/trace/histogram.rst b/Documentation/trace/histogram.rst
index 0aada18c38c6..2b98c1720a54 100644
--- a/Documentation/trace/histogram.rst
+++ b/Documentation/trace/histogram.rst
@@ -249,7 +249,7 @@ Extended error information
table, it should keep a running total of the number of bytes
requested by that call_site.
- We'll let it run for awhile and then dump the contents of the 'hist'
+ We'll let it run for a while and then dump the contents of the 'hist'
file in the kmalloc event's subdirectory (for readability, a number
of entries have been omitted)::
diff --git a/arch/sh/drivers/pci/common.c b/arch/sh/drivers/pci/common.c
index 9633b6147a05..f95004c67e6c 100644
--- a/arch/sh/drivers/pci/common.c
+++ b/arch/sh/drivers/pci/common.c
@@ -148,7 +148,7 @@ unsigned int pcibios_handle_status_errors(unsigned long addr,
cmd |= PCI_STATUS_PARITY | PCI_STATUS_DETECTED_PARITY;
- /* Now back off of the IRQ for awhile */
+ /* Now back off of the IRQ for a while */
if (hose->err_irq) {
disable_irq_nosync(hose->err_irq);
hose->err_timer.expires = jiffies + HZ;
diff --git a/arch/sh/drivers/pci/pci-sh7780.c b/arch/sh/drivers/pci/pci-sh7780.c
index 9a624a6ee354..f41d6939a3d9 100644
--- a/arch/sh/drivers/pci/pci-sh7780.c
+++ b/arch/sh/drivers/pci/pci-sh7780.c
@@ -153,7 +153,7 @@ static irqreturn_t sh7780_pci_serr_irq(int irq, void *dev_id)
/* Deassert SERR */
__raw_writel(SH4_PCIINTM_SDIM, hose->reg_base + SH4_PCIINTM);
- /* Back off the IRQ for awhile */
+ /* Back off the IRQ for a while */
disable_irq_nosync(irq);
hose->serr_timer.expires = jiffies + HZ;
add_timer(&hose->serr_timer);
diff --git a/drivers/atm/lanai.c b/drivers/atm/lanai.c
index 2a1fe3080712..0dfa2cdc897c 100644
--- a/drivers/atm/lanai.c
+++ b/drivers/atm/lanai.c
@@ -755,7 +755,7 @@ static void lanai_shutdown_rx_vci(const struct lanai_vcc *lvcc)
/* Shutdown transmitting on card.
* Unfortunately the lanai needs us to wait until all the data
* drains out of the buffer before we can dealloc it, so this
- * can take awhile -- up to 370ms for a full 128KB buffer
+ * can take a while -- up to 370ms for a full 128KB buffer
* assuming everone else is quiet. In theory the time is
* boundless if there's a CBR VCC holding things up.
*/
diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index 1d33e40d26ea..7318d9800370 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -499,7 +499,7 @@ struct gc_stat {
* won't automatically reattach).
*
* CACHE_SET_STOPPING always gets set first when we're closing down a cache set;
- * we'll continue to run normally for awhile with CACHE_SET_STOPPING set (i.e.
+ * we'll continue to run normally for a while with CACHE_SET_STOPPING set (i.e.
* flushing dirty data).
*
* CACHE_SET_RUNNING means all cache devices have been registered and journal
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index af345dc6fde1..87b4341cb42c 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -257,7 +257,7 @@ static CLOSURE_CALLBACK(bch_data_insert_start)
/*
* But if it's not a writeback write we'd rather just bail out if
- * there aren't any buckets ready to write to - it might take awhile and
+ * there aren't any buckets ready to write to - it might take a while and
* we might be starving btree writes for gc or something.
*/
diff --git a/drivers/net/ethernet/google/gve/gve_rx_dqo.c b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
index dcb0545baa50..6a0be54f1c81 100644
--- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c
@@ -608,7 +608,7 @@ static int gve_rx_dqo(struct napi_struct *napi, struct gve_rx_ring *rx,
buf_len = compl_desc->packet_len;
hdr_len = compl_desc->header_len;
- /* Page might have not been used for awhile and was likely last written
+ /* Page might have not been used for a while and was likely last written
* by a different thread.
*/
if (rx->dqo.page_pool) {
diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c
index c73a71ac3c29..0066f15153a7 100644
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -7795,7 +7795,7 @@ static int hpsa_wait_for_mode_change_ack(struct ctlr_info *h)
u32 doorbell_value;
unsigned long flags;
- /* under certain very rare conditions, this can take awhile.
+ /* under certain very rare conditions, this can take a while.
* (e.g.: hot replace a failed 144GB drive in a RAID 5 set right
* as we enter this code.)
*/
diff --git a/drivers/tty/serial/jsm/jsm_neo.c b/drivers/tty/serial/jsm/jsm_neo.c
index e8e13bf056e2..2eb9ff26d6e8 100644
--- a/drivers/tty/serial/jsm/jsm_neo.c
+++ b/drivers/tty/serial/jsm/jsm_neo.c
@@ -1189,7 +1189,7 @@ static irqreturn_t neo_intr(int irq, void *voidbrd)
/*
* The UART triggered us with a bogus interrupt type.
* It appears the Exar chip, when REALLY bogged down, will throw
- * these once and awhile.
+ * these periodically.
* Its harmless, just ignore it and move on.
*/
jsm_dbg(INTR, &brd->pci_dev,
diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 67fc62a49a76..00f52812dbb0 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2632,7 +2632,7 @@ static int dlm_pick_recovery_master(struct dlm_ctxt *dlm)
dlm_reco_master_ready(dlm),
msecs_to_jiffies(1000));
if (!dlm_reco_master_ready(dlm)) {
- mlog(0, "%s: reco master taking awhile\n",
+ mlog(0, "%s: reco master taking a while\n",
dlm->name);
goto again;
}
diff --git a/sound/pci/emu10k1/emu10k1_main.c b/sound/pci/emu10k1/emu10k1_main.c
index bbe252b8916c..6050201851b1 100644
--- a/sound/pci/emu10k1/emu10k1_main.c
+++ b/sound/pci/emu10k1/emu10k1_main.c
@@ -606,7 +606,7 @@ static int snd_emu10k1_ecard_init(struct snd_emu10k1 *emu)
/* Step 2: Calibrate the ADC and DAC */
snd_emu10k1_ecard_write(emu, EC_DACCAL | EC_LEDN | EC_TRIM_CSN);
- /* Step 3: Wait for awhile; XXX We can't get away with this
+ /* Step 3: Wait for a while; XXX We can't get away with this
* under a real operating system; we'll need to block and wait that
* way. */
snd_emu10k1_wait(emu, 48000);
diff --git a/sound/pci/emu10k1/emupcm.c b/sound/pci/emu10k1/emupcm.c
index 1bf6e3d652f8..ca4b03317539 100644
--- a/sound/pci/emu10k1/emupcm.c
+++ b/sound/pci/emu10k1/emupcm.c
@@ -991,7 +991,7 @@ static snd_pcm_uframes_t snd_emu10k1_capture_pointer(struct snd_pcm_substream *s
if (!epcm->running)
return 0;
if (epcm->first_ptr) {
- udelay(50); /* hack, it takes awhile until capture is started */
+ udelay(50); /* hack, it takes a while until capture is started */
epcm->first_ptr = 0;
}
ptr = snd_emu10k1_ptr_read(emu, epcm->capture_idx_reg, 0) & 0x0000ffff;
diff --git a/tools/testing/selftests/powerpc/tm/tm-tmspr.c b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
index dd5ddffa28b7..0d64988ffb40 100644
--- a/tools/testing/selftests/powerpc/tm/tm-tmspr.c
+++ b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
@@ -14,7 +14,7 @@
* (1) create more threads than cpus
* (2) in each thread:
* (a) set TFIAR and TFHAR a unique value
- * (b) loop for awhile, continually checking to see if
+ * (b) loop for a while, continually checking to see if
* either register has been corrupted.
*
* (3) Loop:
--
2.39.5
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v10 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v11 (04-Jul-2025)
- Fix compilation issue of some intermediate patches in v10
v10 (03-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (6):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: Add wait_third_ack flag for ECN negotiation in simultaneous
connect
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 13 +
include/linux/tcp.h | 33 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 648 ++++++++++++++++++
include/uapi/linux/tcp.h | 7 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_input.c | 371 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 297 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1429 insertions(+), 187 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
V12:
- Fixed YAML indentation in devlink.yaml.
- Removed unused total variable from devlink_nl_rate_tc_bw_set().
- Quoted shell variables in devlink.sh and split declarations to fix
shellcheck warnings.
- Added missing DevlinkFamily imports in selftests to fix pylint
warnings.
- Pulled changes from net-next to enable these adjustments:
Inclusion of DevlinkFamily in YNL test libs.
Introduction of nlmsg_for_each_attr_type() macro and its use in nfsd.
V11:
- Refactored the devlink code to accept relative TC bandwidth share
values instead of percentages.
- Updated documentation to clarify that values are interpreted as
relative shares.
- Refactored the logic in mlx5 to support proportional scaling for
tc-bw values.
- Switched to `nlmsg_for_each_attr_type()` for cleaner attribute
parsing.
- Added a hardware selftest to validate TC bandwidth behavior.
- Refactored esw_qos_is_node_empty for readability.
V10:
- Added netdevsim selftest for tc-bw ops.
- Dropped header: field as it’s unnecessary for local constants in
devlink.yaml.
V9:
- Defined DEVLINK_RATE_TCS_MAX as 8 in uapi/linux/devlink.h.
- Replaced IEEE_8021QAZ_MAX_TCS with DEVLINK_RATE_TCS_MAX throughout
the code.
- Updated devlink-rate-tc-index-max spec to reference the correct UAPI
header.
V8:
- Limit line width to 80 characters in mlx5 changes instead of 100.
- Increase the scheduling node levels to support TC arbitration.
- Ensure parent nodes are set correctly in all code paths that extend
the hierarchy depth for TC arbitration.
- Extended the cover letter with the ongoing discussion on devlink-rate
and net-shapers.
- Extended the cover letter with the Netdev talk link on this series.
V7:
- Fixed disabling tc-bw on leaf nodes that did not have tc-bw
configured.
- Fixed an issue where tc-bw was disabled on a node with assigned
vports, ensuring that vport->qos.sched_node->parent is correctly
updated with the cloned node.
- Declared a constant for the maximum allowed Traffic Class index in
devlink rate.
- Added a range check to validate rate-tc-index.
- Added documentation for the tc-bw argument.
- Add a validation check to ensure that the total bandwidth assigned to
all traffic classes sums to 100.
V6:
- Addressed comments on devlink patch #3.
- Removed first 4 IFC patches, to be pulled from mlx5-next.
V5:
- Fix warning in devlink_nl_rate_tc_bw_set().
- Fix target branch of patch #4.
V4:
- Renamed the nested attribute for traffic class bandwidth to
DEVLINK_ATTR_RATE_TC_BWS.
- Changed the order of the attributes in `devlink.h`.
- Refactored the initialization tc-bw array in
devlink_nl_rate_tc_bw_set().
- Added extack messages to provide clear feedback on issues with tc-bw
arguments.
- Updated `rate-tc-bws` to support a multi-attr set, where each
attribute includes an index and the corresponding bandwidth for that
traffic class.
- Handled the issue where the user could provide
DEVLINK_ATTR_RATE_TC_BWS with duplicate indices.
- Provided ynl exmaples in patch [1/5] commit message.
- Take IFC patches to beginning of the series, targeted for mlx5-next.
V3:
- Dropped rate-tc-index, using tc-bw array index instead.
- Renamed rate-bw to rate-tc-bw.
- Documneted what the rate-tc-bw represents and added a range check for
validation.
- Intorduced devlink_nl_rate_tc_bw_set() to parse and set the TC
bandwidth values.
- Updated the user API in the commit message of patch 1/6 to ensure
bandwidths sum equals 100.
- Fixed missing filling of rate-parent in devlink_nl_rate_fill().
V2:
- Included <linux/dcbnl.h> in devlink.h to resolve missing
IEEE_8021QAZ_MAX_TCS definition.
- Refactored the rate-tc-bw attribute structure to use a separate
rate-tc-index.
- Updated patch 2/6 title.
This patch series extends the devlink-rate API to support traffic class
(TC) bandwidth management, enabling more granular control over traffic
shaping and rate limiting across multiple TCs. The API now allows users
to specify bandwidth proportions for different traffic classes in a
single command. This is particularly useful for managing Enhanced
Transmission Selection (ETS) for groups of Virtual Functions (VFs),
allowing precise bandwidth allocation across traffic classes.
Additionally the series refines the QoS handling in net/mlx5 to support
TC arbitration and bandwidth management on vports and rate nodes.
Discussions on traffic class shaping in net-shapers began in V5 [1],
where we discussed with maintainers whether net-shapers should support
traffic classes and how this could be implemented.
Later, after further conversations with Paolo Abeni and Simon Horman,
Cosmin provided an update [2], confirming that net-shapers' tree-based
hierarchy aligns well with traffic classes when treated as distinct
subsets of netdev queues. Since mlx5 enforces a 1:1 mapping between TX
queues and traffic classes, this approach seems feasible, though some
open questions remain regarding queue reconfiguration and certain mlx5
scheduling behaviors.
Building on that discussion, Cosmin has now shared a concrete
implementation plan on the netdev mailing list [3]. The plan, developed
in collaboration with Paolo and Simon, outlines how net-shapers can be
extended to support the same use cases currently covered by
devlink-rate, with the eventual goal of aligning both and simplifying
the shaping infrastructure in the kernel.
This work was presented at Netdev 0x19 in Zagreb [4].
There we presented how TC scheduling is enforced in mlx5 hardware,
which led to discussions on the mailing list.
A summary of how things work:
Classification means labeling a packet with a traffic class based on
the packet's DSCP or VLAN PCP field, then treating packets with
different traffic classes differently during transmit processing.
In a virtualized setup, VFs are untrusted and do not control
classification or shaping. Classification is done by the hardware using
a prio-to-TC mapping set by the hypervisor. VFs only select which send
queue to use and are expected to respect the classification logic by
sending each traffic class on its dedicated queue. As stated in the
net-shapers plan [3], each transmit queue should carry only a single
traffic class. Mixing classes in a single queue can lead to HOL
blocking.
In the mlx5 implementation, if the queue used does not match the
classified traffic class, the hardware moves the queue to the correct
TC scheduler. This movement is not a reclassification; it’s a necessary
enforcement step to ensure traffic class isolation is maintained.
Extend devlink-rate API to support rate management on TCs:
- devlink: Extend the devlink rate API to support traffic class
bandwidth management
Introduce a no-op implementation:
- net/mlx5: Add no-op implementation for setting tc-bw on rate objects
Add support for enabling and disabling TC QoS on vports and nodes:
- net/mlx5: Add support for setting tc-bw on nodes
- net/mlx5: Add traffic class scheduling support for vport QoS
Support for setting tc-bw on rate objects:
- net/mlx5: Manage TC arbiter nodes and implement full support for
tc-bw
[1]
https://lore.kernel.org/netdev/20241204220931.254964-1-tariqt@nvidia.com/
[2]
https://lore.kernel.org/netdev/67df1a562614b553dcab043f347a0d7c5393ff83.cam…
[3]
https://lore.kernel.org/netdev/d9831d0c940a7b77419abe7c7330e822bbfd1cfb.cam…
[4]
https://netdevconf.info/0x19/sessions/talk/optimizing-bandwidth-allocation-…
Carolina Jubran (8):
netlink: introduce type-checking attribute iteration for nlmsg
devlink: Extend devlink rate API with traffic classes bandwidth management
selftest: netdevsim: Add devlink rate tc-bw test
net/mlx5: Add no-op implementation for setting tc-bw on rate objects
net/mlx5: Add support for setting tc-bw on nodes
net/mlx5: Add traffic class scheduling support for vport QoS
net/mlx5: Manage TC arbiter nodes and implement full support for tc-bw
selftests: drv-net: Add test for devlink-rate traffic class bandwidth distribution
Documentation/netlink/specs/devlink.yaml | 32 +-
.../networking/devlink/devlink-port.rst | 8 +
.../net/ethernet/mellanox/mlx5/core/devlink.c | 2 +
.../net/ethernet/mellanox/mlx5/core/esw/qos.c | 1037 ++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/esw/qos.h | 8 +
.../net/ethernet/mellanox/mlx5/core/eswitch.h | 14 +-
drivers/net/netdevsim/dev.c | 43 +
drivers/net/netdevsim/netdevsim.h | 1 +
drivers/net/vxlan/vxlan_vnifilter.c | 13 +-
fs/nfsd/nfsctl.c | 36 +-
include/net/devlink.h | 8 +
include/net/netlink.h | 14 +
include/uapi/linux/devlink.h | 9 +
net/devlink/netlink_gen.c | 15 +-
net/devlink/netlink_gen.h | 1 +
net/devlink/rate.c | 127 ++
.../drivers/net/hw/devlink_rate_tc_bw.py | 466 ++++++++
.../drivers/net/hw/lib/py/__init__.py | 2 +-
.../selftests/drivers/net/lib/py/__init__.py | 2 +-
.../drivers/net/netdevsim/devlink.sh | 53 +
.../testing/selftests/net/lib/py/__init__.py | 2 +-
tools/testing/selftests/net/lib/py/ynl.py | 5 +
22 files changed, 1825 insertions(+), 73 deletions(-)
create mode 100755 tools/testing/selftests/drivers/net/hw/devlink_rate_tc_bw.py
base-commit: 20a0c20f82acf46d5731a11743e7c7ac4de25db8
--
2.34.1
The test_kexec_jump program builds correctly when invoked from the top-level
selftests/Makefile, which explicitly sets the OUTPUT variable. However,
building directly in tools/testing/selftests/kexec fails with:
make: *** No rule to make target '/test_kexec_jump', needed by 'test_kexec_jump.sh'. Stop.
This failure occurs because the Makefile rule relies on $(OUTPUT), which is
undefined in direct builds.
Fix this by listing test_kexec_jump in TEST_GEN_PROGS, the standard way to
declare generated test binaries in the kselftest framework. This ensures the
binary is built regardless of invocation context and properly removed by
make clean.
Also add the binary to .gitignore to avoid tracking it in version control.
Signed-off-by: Moon Hee Lee <moonhee.lee.ca(a)gmail.com>
---
tools/testing/selftests/kexec/.gitignore | 2 ++
tools/testing/selftests/kexec/Makefile | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kexec/.gitignore
diff --git a/tools/testing/selftests/kexec/.gitignore b/tools/testing/selftests/kexec/.gitignore
new file mode 100644
index 000000000000..5f3d9e089ae8
--- /dev/null
+++ b/tools/testing/selftests/kexec/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+test_kexec_jump
diff --git a/tools/testing/selftests/kexec/Makefile b/tools/testing/selftests/kexec/Makefile
index e3000ccb9a5d..874cfdd3b75b 100644
--- a/tools/testing/selftests/kexec/Makefile
+++ b/tools/testing/selftests/kexec/Makefile
@@ -12,7 +12,7 @@ include ../../../scripts/Makefile.arch
ifeq ($(IS_64_BIT)$(ARCH_PROCESSED),1x86)
TEST_PROGS += test_kexec_jump.sh
-test_kexec_jump.sh: $(OUTPUT)/test_kexec_jump
+TEST_GEN_PROGS := test_kexec_jump
endif
include ../lib.mk
--
2.43.0
The kernel has recently added support for shadow stacks, currently
x86 only using their CET feature but both arm64 and RISC-V have
equivalent features (GCS and Zicfiss respectively), I am actively
working on GCS[1]. With shadow stacks the hardware maintains an
additional stack containing only the return addresses for branch
instructions which is not generally writeable by userspace and ensures
that any returns are to the recorded addresses. This provides some
protection against ROP attacks and making it easier to collect call
stacks. These shadow stacks are allocated in the address space of the
userspace process.
Our API for shadow stacks does not currently offer userspace any
flexiblity for managing the allocation of shadow stacks for newly
created threads, instead the kernel allocates a new shadow stack with
the same size as the normal stack whenever a thread is created with the
feature enabled. The stacks allocated in this way are freed by the
kernel when the thread exits or shadow stacks are disabled for the
thread. This lack of flexibility and control isn't ideal, in the vast
majority of cases the shadow stack will be over allocated and the
implicit allocation and deallocation is not consistent with other
interfaces. As far as I can tell the interface is done in this manner
mainly because the shadow stack patches were in development since before
clone3() was implemented.
Since clone3() is readily extensible let's add support for specifying a
shadow stack when creating a new thread or process, keeping the current
implicit allocation behaviour if one is not specified either with
clone3() or through the use of clone(). The user must provide a shadow
stack pointer, this must point to memory mapped for use as a shadow
stackby map_shadow_stack() with an architecture specified shadow stack
token at the top of the stack.
Yuri Khrustalev has raised questions from the libc side regarding
discoverability of extended clone3() structure sizes[2], this seems like
a general issue with clone3(). There was a suggestion to add a hwcap on
arm64 which isn't ideal but is doable there, though architecture
specific mechanisms would also be needed for x86 (and RISC-V if it's
support gets merged before this does). The idea has, however, had
strong pushback from the architecture maintainers and it is possible to
detect support for this in clone3() by attempting a call with a
misaligned shadow stack pointer specified so no hwcap has been added.
[1] https://lore.kernel.org/linux-arm-kernel/20241001-arm64-gcs-v13-0-222b78d87…
[2] https://lore.kernel.org/r/aCs65ccRQtJBnZ_5@arm.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v18:
- Rebase onto v6.16-rc3.
- Thanks to pointers from Yuri Khrustalev this version has been tested
on x86 so I have removed the RFT tag.
- Clarify clone3_shadow_stack_valid() comment about the Kconfig check.
- Remove redundant GCSB DSYNCs in arm64 code.
- Fix token validation on x86.
- Link to v17: https://lore.kernel.org/r/20250609-clone3-shadow-stack-v17-0-8840ed97ff6f@k…
Changes in v17:
- Rebase onto v6.16-rc1.
- Link to v16: https://lore.kernel.org/r/20250416-clone3-shadow-stack-v16-0-2ffc9ca3917b@k…
Changes in v16:
- Rebase onto v6.15-rc2.
- Roll in fixes from x86 testing from Rick Edgecombe.
- Rework so that the argument is shadow_stack_token.
- Link to v15: https://lore.kernel.org/r/20250408-clone3-shadow-stack-v15-0-3fa245c6e3be@k…
Changes in v15:
- Rebase onto v6.15-rc1.
- Link to v14: https://lore.kernel.org/r/20250206-clone3-shadow-stack-v14-0-805b53af73b9@k…
Changes in v14:
- Rebase onto v6.14-rc1.
- Link to v13: https://lore.kernel.org/r/20241203-clone3-shadow-stack-v13-0-93b89a81a5ed@k…
Changes in v13:
- Rebase onto v6.13-rc1.
- Link to v12: https://lore.kernel.org/r/20241031-clone3-shadow-stack-v12-0-7183eb8bee17@k…
Changes in v12:
- Add the regular prctl() to the userspace API document since arm64
support is queued in -next.
- Link to v11: https://lore.kernel.org/r/20241005-clone3-shadow-stack-v11-0-2a6a2bd6d651@k…
Changes in v11:
- Rebase onto arm64 for-next/gcs, which is based on v6.12-rc1, and
integrate arm64 support.
- Rework the interface to specify a shadow stack pointer rather than a
base and size like we do for the regular stack.
- Link to v10: https://lore.kernel.org/r/20240821-clone3-shadow-stack-v10-0-06e8797b9445@k…
Changes in v10:
- Integrate fixes & improvements for the x86 implementation from Rick
Edgecombe.
- Require that the shadow stack be VM_WRITE.
- Require that the shadow stack base and size be sizeof(void *) aligned.
- Clean up trailing newline.
- Link to v9: https://lore.kernel.org/r/20240819-clone3-shadow-stack-v9-0-962d74f99464@ke…
Changes in v9:
- Pull token validation earlier and report problems with an error return
to parent rather than signal delivery to the child.
- Verify that the top of the supplied shadow stack is VM_SHADOW_STACK.
- Rework token validation to only do the page mapping once.
- Drop no longer needed support for testing for signals in selftest.
- Fix typo in comments.
- Link to v8: https://lore.kernel.org/r/20240808-clone3-shadow-stack-v8-0-0acf37caf14c@ke…
Changes in v8:
- Fix token verification with user specified shadow stack.
- Don't track user managed shadow stacks for child processes.
- Link to v7: https://lore.kernel.org/r/20240731-clone3-shadow-stack-v7-0-a9532eebfb1d@ke…
Changes in v7:
- Rebase onto v6.11-rc1.
- Typo fixes.
- Link to v6: https://lore.kernel.org/r/20240623-clone3-shadow-stack-v6-0-9ee7783b1fb9@ke…
Changes in v6:
- Rebase onto v6.10-rc3.
- Ensure we don't try to free the parent shadow stack in error paths of
x86 arch code.
- Spelling fixes in userspace API document.
- Additional cleanups and improvements to the clone3() tests to support
the shadow stack tests.
- Link to v5: https://lore.kernel.org/r/20240203-clone3-shadow-stack-v5-0-322c69598e4b@ke…
Changes in v5:
- Rebase onto v6.8-rc2.
- Rework ABI to have the user allocate the shadow stack memory with
map_shadow_stack() and a token.
- Force inlining of the x86 shadow stack enablement.
- Move shadow stack enablement out into a shared header for reuse by
other tests.
- Link to v4: https://lore.kernel.org/r/20231128-clone3-shadow-stack-v4-0-8b28ffe4f676@ke…
Changes in v4:
- Formatting changes.
- Use a define for minimum shadow stack size and move some basic
validation to fork.c.
- Link to v3: https://lore.kernel.org/r/20231120-clone3-shadow-stack-v3-0-a7b8ed3e2acc@ke…
Changes in v3:
- Rebase onto v6.7-rc2.
- Remove stale shadow_stack in internal kargs.
- If a shadow stack is specified unconditionally use it regardless of
CLONE_ parameters.
- Force enable shadow stacks in the selftest.
- Update changelogs for RISC-V feature rename.
- Link to v2: https://lore.kernel.org/r/20231114-clone3-shadow-stack-v2-0-b613f8681155@ke…
Changes in v2:
- Rebase onto v6.7-rc1.
- Remove ability to provide preallocated shadow stack, just specify the
desired size.
- Link to v1: https://lore.kernel.org/r/20231023-clone3-shadow-stack-v1-0-d867d0b5d4d0@ke…
---
Mark Brown (8):
arm64/gcs: Return a success value from gcs_alloc_thread_stack()
Documentation: userspace-api: Add shadow stack API documentation
selftests: Provide helper header for shadow stack testing
fork: Add shadow stack support to clone3()
selftests/clone3: Remove redundant flushes of output streams
selftests/clone3: Factor more of main loop into test_clone3()
selftests/clone3: Allow tests to flag if -E2BIG is a valid error code
selftests/clone3: Test shadow stack support
Documentation/userspace-api/index.rst | 1 +
Documentation/userspace-api/shadow_stack.rst | 44 +++++
arch/arm64/include/asm/gcs.h | 8 +-
arch/arm64/kernel/process.c | 8 +-
arch/arm64/mm/gcs.c | 55 +++++-
arch/x86/include/asm/shstk.h | 11 +-
arch/x86/kernel/process.c | 2 +-
arch/x86/kernel/shstk.c | 53 ++++-
include/asm-generic/cacheflush.h | 11 ++
include/linux/sched/task.h | 17 ++
include/uapi/linux/sched.h | 9 +-
kernel/fork.c | 93 +++++++--
tools/testing/selftests/clone3/clone3.c | 226 ++++++++++++++++++----
tools/testing/selftests/clone3/clone3_selftests.h | 65 ++++++-
tools/testing/selftests/ksft_shstk.h | 98 ++++++++++
15 files changed, 620 insertions(+), 81 deletions(-)
---
base-commit: 86731a2a651e58953fc949573895f2fa6d456841
change-id: 20231019-clone3-shadow-stack-15d40d2bf536
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Add basic support to run various MIPS variants via kunit_tool using the
virtualized malta platform.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
---
Changes in v4:
- Rebase on v6.16-rc1
- Pick up reviews from David
- Clarify that GIC page is linked to vDSO
- Link to v3: https://lore.kernel.org/r/20250415-kunit-mips-v3-0-4ec2461b5a7e@linutronix.…
Changes in v3:
- Also skip VDSO_RANDOMIZE_SIZE adjustment for kthreads
- Link to v2: https://lore.kernel.org/r/20250414-kunit-mips-v2-0-4cf01e1a29e6@linutronix.…
Changes in v2:
- Fix usercopy kunit test by handling ABI-less tasks in stack_top()
- Drop change to mm initialization.
The broken test is not built by default anymore.
- Link to v1: https://lore.kernel.org/r/20250212-kunit-mips-v1-0-eb49c9d76615@linutronix.…
---
Thomas Weißschuh (2):
MIPS: Don't crash in stack_top() for tasks without ABI or vDSO
kunit: qemu_configs: Add MIPS configurations
arch/mips/kernel/process.c | 16 +++++++++-------
tools/testing/kunit/qemu_configs/mips.py | 18 ++++++++++++++++++
tools/testing/kunit/qemu_configs/mips64.py | 19 +++++++++++++++++++
tools/testing/kunit/qemu_configs/mips64el.py | 19 +++++++++++++++++++
tools/testing/kunit/qemu_configs/mipsel.py | 18 ++++++++++++++++++
5 files changed, 83 insertions(+), 7 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20241014-kunit-mips-e4fe1c265ed7
Best regards,
--
Thomas Weißschuh <thomas.weissschuh(a)linutronix.de>
Refactors the netpoll UDP transmit path to improve code clarity,
maintainability, and protocol-layer encapsulation.
Function netpoll_send_udp() has more than 100 LoC, which is hard to
understand and review. After this patchset, it has only 32 LoC, which is
more manageable.
The series systematically moves the construction of protocol headers
(UDP, IPv4, IPv6, Ethernet) out of the core `netpoll_send_udp()`
function into dedicated static helpers:
- `push_udp()` for UDP header setup
- `push_ipv4()` and `push_ipv6()` for IP header setup
- `push_eth()` for Ethernet header setup
This results in a clean, layered abstraction that mirrors the protocol
stack, reduces code duplication, and improves readability.
Also, to make sure this is not breaking anything, add IPv6 selftest to
netconsole tests, which will exercise this code. This test would also pick
problems similiar to the one fixed by f599020702698 ("net: netpoll:
Initialize UDP checksum field before checksumming"), which was
embarrassin we didn't have a selftest catch it.
Anyway, there are **no functional changes** intended in this patchset.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Breno Leitao (7):
netpoll: Improve code clarity with explicit struct size calculations
netpoll: factor out UDP checksum calculation into helper
netpoll: factor out IPv6 header setup into push_ipv6() helper
netpoll: factor out IPv4 header setup into push_ipv4() helper
netpoll: factor out UDP header setup into push_udp() helper
netpoll: move Ethernet setup to push_eth() helper
selftests: net: Add IPv6 support to netconsole basic tests
net/core/netpoll.c | 196 +++++++++++++--------
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 74 +++++++-
.../testing/selftests/drivers/net/netcons_basic.sh | 52 +++---
3 files changed, 216 insertions(+), 106 deletions(-)
---
base-commit: 8efa26fcbf8a7f783fd1ce7dd2a409e9b7758df0
change-id: 20250620-netpoll_untagle_ip-e37c799a6925
Best regards,
--
Breno Leitao <leitao(a)debian.org>
Fix cur_aux()->nospec_result test after do_check_insn() referring to the
to-be-analyzed (potentially unsafe) instruction, not the
already-analyzed (safe) instruction. This might allow a unsafe insn to
slip through on a speculative path. Create some tests from the
reproducer [1].
Commit d6f1c85f2253 ("bpf: Fall back to nospec for Spectre v1") should
not be in any stable kernel yet, therefore bpf-next should suffice.
[1] https://lore.kernel.org/bpf/685b3c1b.050a0220.2303ee.0010.GAE@google.com/
Changes since v1:
- Fix compiler error due to missed rename of prev_insn_idx in first
patch
- v1: https://lore.kernel.org/bpf/20250628125927.763088-1-luis.gerhorst@fau.de/
Changes since RFC:
- Introduce prev_aux() as suggested by Alexei. For this, we must move
the env->prev_insn_idx assignment to happen directly after
do_check_insn(), for which I have created a separate commit. This
patch could be simplified by using a local prev_aux variable as
sugested by Eduard, but I figured one might find the new
assignment-strategy easier to understand (before, prev_insn_idx and
env->prev_insn_idx were out-of-sync for the latter part of the loop).
Also, like this we do not have an additional prev_* variable that must
be kept in-sync and the local variable's usage (old prev_insn_idx, new
tmp) is much more local. If you think it would be better to not take
the risk and keep the fix simple by just introducing the prev_aux
variable, let me know.
- Change WARN_ON_ONCE() to verifier_bug_if() as suggested by Alexei
- Change assertion to check instruction is BPF_JMP[32] as suggested by
Eduard
- RFC: https://lore.kernel.org/bpf/8734bmoemx.fsf@fau.de/
Luis Gerhorst (3):
bpf: Update env->prev_insn_idx after do_check_insn()
bpf: Fix aux usage after do_check_insn()
selftests/bpf: Add Spectre v4 tests
kernel/bpf/verifier.c | 30 ++--
tools/testing/selftests/bpf/progs/bpf_misc.h | 4 +
.../selftests/bpf/progs/verifier_unpriv.c | 149 ++++++++++++++++++
3 files changed, 174 insertions(+), 9 deletions(-)
base-commit: d69bafe6ee2b5eff6099fa26626ecc2963f0f363
--
2.49.0
Hello,
binder_alloc_selftest provides a robust set of checks for the binder
allocator, but it rarely runs because it must hook into a running binder
process and block all other binder threads until it completes. The test
itself is a good candidate for conversion to KUnit, and it can be
further isolated from user processes by using a test-specific lru
freelist instead of the global one. This series converts the selftest
to KUnit to make it less burdensome to run and to set up a foundation
for unit testing future binder_alloc changes.
Thanks,
Tiffany
Tiffany Yang (5):
binder: Fix selftest page indexing
binder: Store lru freelist in binder_alloc
binder: Scaffolding for binder_alloc KUnit tests
binder: Convert binder_alloc selftests to KUnit
binder: encapsulate individual alloc test cases
drivers/android/Kconfig | 15 +-
drivers/android/Makefile | 2 +-
drivers/android/binder.c | 10 +-
drivers/android/binder_alloc.c | 39 +-
drivers/android/binder_alloc.h | 14 +-
drivers/android/binder_alloc_selftest.c | 306 -----------
drivers/android/binder_internal.h | 4 +
drivers/android/tests/.kunitconfig | 3 +
drivers/android/tests/Makefile | 3 +
drivers/android/tests/binder_alloc_kunit.c | 573 +++++++++++++++++++++
include/kunit/test.h | 12 +
lib/kunit/user_alloc.c | 4 +-
12 files changed, 645 insertions(+), 340 deletions(-)
delete mode 100644 drivers/android/binder_alloc_selftest.c
create mode 100644 drivers/android/tests/.kunitconfig
create mode 100644 drivers/android/tests/Makefile
create mode 100644 drivers/android/tests/binder_alloc_kunit.c
--
2.50.0.727.gbf7dc18ff4-goog
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the v10 AccECN protocol patch series, which covers the core
functionality of Accurate ECN, AccECN negotiation, AccECN TCP options,
and AccECN failure handling. The Accurate ECN draft can be found in
https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-accurate-ecn-28
This patch series is part of the full AccECN patch series, which is available at
https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/
Best Regards,
Chia-Yu
---
v10 (03-Jul-2025)
- Add new patch of separated header file include/net/tcp_ecn.h to include ECN and AccECN functions (Eric Dumazet <edumazet(a)google.com>)
- Add comments on the AccECN helper functions in tcp_ecn.h (Eric Dumazet <edumazet(a)google.com>)
- Add documentation of tcp_ecn, tcp_ecn_option, tcp_ecn_beacon in ip-sysctl.rst to the corresponding patch (Eric Dumazet <edumazet(a)google.com>)
- Split wait third ACK functionality into a separated patch from AccECN negotiation patch (Eric Dumazet <edumazet(a)google.com>)
- Add READ_ONCE() over every reads of sysctl for all patches in the series (Eric Dumazet <edumazet(a)google.com>)
- Merge heuristics of AccECN option ceb/cep and ACE field multi-wrap into a single patch
- Add a table of SACK block reduction and required AccECN field in patch #15 commit message (Eric Dumazet <edumazet(a)google.com>)
v9 (21-Jun-2025)
- Use tcp_data_ecn_check() to set TCP_ECN_SEE flag only for RFC3168 ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments about setting TCP_ECN_SEEN flag for RFC3168 and Accruate ECN (Paolo Abeni <pabeni(a)redhat.com>)
- Restruct the code in the for loop of tcp_accecn_process_option() (Paolo Abeni <pabeni(a)redhat.com>)
- Remove ecn_bytes and add use_synack_ecn_bytes flag to identify whether syn_ack_bytes or received_ecn_bytes is used (Paolo Abeni <pabeni(a)redhat.com>)
- Replace leftover_bytes and leftover_size with leftover_highbyte and leftover_lowbyte and add comments in tcp_options_write() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments and commit message about the 1st retx SYN still attempt AccECN negotiation (Paolo Abeni <pabeni(a)redhat.com>)
v8 (10-Jun-2025)
- Add new helper function tcp_ecn_received_counters_payload() in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Set opts->num_sack_blocks=0 to avoid potential undefined value in #8 (Paolo Abeni <pabeni(a)redhat.com>)
- Reset leftover_size to 2 once leftover_bytes is used in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_opt_demand_min() in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Add new helper function tcp_accecn_saw_opt_fail_recv() in #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Update tcp_options_fit_accecn() to avoid using recursion in #14 (Paolo Abeni <pabeni(a)redhat.com>)
v7 (14-May-2025)
- Modify group sizes of tcp_sock_write_txrx and tcp_sock_write_rx in #3 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Fix the issue in #4 and #5 where the RFC3168 ECN behavior in tcp_ecn_send() is changed (Paolo Abeni <pabeni(a)redhat.com>)
- Modify group size of tcp_sock_write_txrx in #4 and #6 based on pahole results (Paolo Abeni <pabeni(a)redhat.com>)
- Update commit message for #9 to explain the increase in tcp_sock_write_rx group size
- Modify group size of tcp_sock_write_tx in #10 based on pahole results
v6 (09-May-2025)
- Add #3 to utilize exisintg holes of tcp_sock_write_txrx group for later patches (#4, #9, #10) with new u8 members (Paolo Abeni <pabeni(a)redhat.com>)
- Add pahole outcomes before and after commit in #4, #5, #6, #9, #10, #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Define new helper function tcp_send_ack_reflect_ect() for sending ACK with reflected ECT in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments for function tcp_ecn_rcv_synack() in #5 (Paolo Abeni <pabeni(a)redhat.com>)
- Add enum/define to be used by sysctl_tcp_ecn in #5, sysctl_tcp_ecn_option in #9, and sysctl_tcp_ecn_option_beacon in #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move accecn_fail_mode and saw_accecn_opt in #5 and #11 to use exisintg holes of tcp_sock (Paolo Abeni <pabeni(a)redhat.com>)
- Change data type of new members of tcp_request_sock and move them to the end of struct in #5 and #11 (Paolo Abeni <pabeni(a)redhat.com>)
- Move new members of tcp_info to the end of struct in #6 (Paolo Abeni <pabeni(a)redhat.com>)
- Merge previous #7 into #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Mask ecnfield with INET_ECN_MASK to remove WARN_ONCE in #9 (Paolo Abeni <pabeni(a)redhat.com>)
- Reduce the indentation levels for reabability in #9 and #10 (Paolo Abeni <pabeni(a)redhat.com>)
- Move delivered_ecn_bytes to the RX group in #9, accecn_opt_tstamp to the TX group in #10, pkts_acked_ewma to the RX group in #15 (Paolo Abeni <pabeni(a)redhat.com>)
- Add changes in Documentation/networking/net_cachelines/tcp_sock.rst for new tcp_sock members in #3, #5, #6, #9, #10, #15
v5 (22-Apr-2025)
- Further fix for 32-bit ARM alignment in tcp.c (Simon Horman <horms(a)kernel.org>)
v4 (18-Apr-2025)
- Fix 32-bit ARM assertion for alignment requirement (Simon Horman <horms(a)kernel.org>)
v3 (14-Apr-2025)
- Fix patch apply issue in v2 (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Mar-2025)
- Add one missing patch from the previous AccECN protocol preparation patch series to this patch series.
---
Chia-Yu Chang (6):
tcp: reorganize tcp_sock_write_txrx group for variables later
tcp: ecn functions in separated include file
tcp: Add wait_third_ack flag for ECN negotiation in simultaneous
connect
tcp: accecn: AccECN option send control
tcp: accecn: AccECN option failure handling
tcp: accecn: try to fit AccECN option with SACK
Ilpo Järvinen (9):
tcp: reorganize SYN ECN code
tcp: fast path functions later
tcp: AccECN core
tcp: accecn: AccECN negotiation
tcp: accecn: add AccECN rx byte counters
tcp: accecn: AccECN needs to know delivered bytes
tcp: sack option handling improvements
tcp: accecn: AccECN option
tcp: accecn: AccECN option ceb/cep and ACE field multi-wrap heuristics
Documentation/networking/ip-sysctl.rst | 55 +-
.../networking/net_cachelines/tcp_sock.rst | 13 +
include/linux/tcp.h | 33 +-
include/net/netns/ipv4.h | 2 +
include/net/tcp.h | 87 ++-
include/net/tcp_ecn.h | 647 ++++++++++++++++++
include/uapi/linux/tcp.h | 7 +
net/ipv4/syncookies.c | 4 +
net/ipv4/sysctl_net_ipv4.c | 19 +
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_input.c | 371 ++++++++--
net/ipv4/tcp_ipv4.c | 8 +-
net/ipv4/tcp_minisocks.c | 40 +-
net/ipv4/tcp_output.c | 297 ++++++--
net/ipv6/syncookies.c | 2 +
net/ipv6/tcp_ipv6.c | 1 +
16 files changed, 1428 insertions(+), 187 deletions(-)
create mode 100644 include/net/tcp_ecn.h
--
2.34.1
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find DUALPI2 iproute2 patch v10.
For more details of DualPI2, please refer IETF RFC9332
(https://datatracker.ietf.org/doc/html/rfc9332).
Best Regards,
Chia-Yu
---
v10 (02-Jul-2025)
- Replace STEP_THRESH and STEP_PACKETS w/ STEP_THRESH_PKTS and STEP_THRESH_US of net-next patch (Jakub Kicinski <kuba(a)kernel.org>)
v9 (13-Jun-2025)
- Fix space issue and typos (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Change 'rtt_typical' to 'typical_rtt' in tc/q_dualpi2.c (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Add the num of enum used by DualPI2 in pkt_sched.h
v8 (09-May-2025)
- Update pkt_sched.h with the one in nex-next
- Correct a typo in the comment within pkt_sched.h (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update manual content in man/man8/tc-dualpi2.8 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update tc/q_dualpi2.c to fix missing blank lines and add missing case (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
v7 (05-May-2025)
- Align pkt_sched.h with the v14 version of net-next due to spec modification in tc.yaml
- Reorganize dualpi2_print_opt() to match the order in tc.yaml
- Remove credit-queue in PRINT_JSON
v6 (26-Apr-2025)
- Update JSON file output due to spec modification in tc.yaml of net-next
v5 (25-Mar-2025)
- Use matches() to replace current strcmp() (Stephen Hemminger <stephen(a)networkplumber.org>)
- Use general parse_percent() for handling scaled percentage values (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 stats (Stephen Hemminger <stephen(a)networkplumber.org>)
v4 (16-Mar-2025)
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step marking.
v3 (21-Feb-2025)
- Add memlimit to the dualpi2 attribute, and add memory_used, max_memory_used, and memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update the manual to align with the latest implementation and clarify the queue naming and default unit
- Use common "get_scaled_alpha_beta" and clean print_opt for Dualpi2
v2 (23-Oct-2024)
- Rename get_float in dualpi2 to get_float_min_max in utils.c
- Move get_float from iplink_can.c in utils.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Add print function for JSON of dualpi2 (Stephen Hemminger <stephen(a)networkplumber.org>)
---
Chia-Yu Chang (1):
tc: add dualpi2 scheduler module
bash-completion/tc | 11 +-
include/uapi/linux/pkt_sched.h | 68 +++++
include/utils.h | 2 +
ip/iplink_can.c | 14 -
lib/utils.c | 30 ++
man/man8/tc-dualpi2.8 | 249 ++++++++++++++++
tc/Makefile | 1 +
tc/q_dualpi2.c | 528 +++++++++++++++++++++++++++++++++
8 files changed, 888 insertions(+), 15 deletions(-)
create mode 100644 man/man8/tc-dualpi2.8
create mode 100644 tc/q_dualpi2.c
--
2.34.1
This patch set improves the documentation and selftests for XDP Rx metadata
handling. The first patch clarifies the documentation around XDP metadata
layout and the use of bpf_xdp_adjust_meta. The second patch enhances the
BPF selftests to make XDP metadata handling more robust and portable across
different NICs.
Prior to this patch set, the user application retrieved the xdp_meta by
calculating backward from the data pointer, while the XDP program fill in
the xdp_meta by calculating backward from data_meta. This approach will
cause mismatch if there is device-reserved metadata.
|<---sizeof(xdp_meta)--|
| |
struct xdp_meta rx_desc->address
^ ^
| |
+----------+----------------------+------------+------+
| headroom | custom metadata | reserved | data |
+----------+----------------------+------------+------+
^ ^ ^
| | |
struct xdp_meta xdp_buff->data_meta xdp_buff->data
| |
|<---sizeof(xdp_meta)--|
V2:
- unconditionally do bpf_xdp_adjust_meta with -XDP_METADATA_SIZE (Stanislav)
V1: https://lore.kernel.org/netdev/20250701042940.3272325-1-yoong.siang.song@in…
Song Yoong Siang (2):
doc: clarify XDP Rx metadata layout and bpf_xdp_adjust_meta usage
selftests/bpf: Enhance XDP Rx Metadata Handling
Documentation/networking/xdp-rx-metadata.rst | 38 +++++++++++++++++++
.../selftests/bpf/prog_tests/xdp_metadata.c | 2 +-
.../selftests/bpf/progs/xdp_hw_metadata.c | 2 +-
.../selftests/bpf/progs/xdp_metadata.c | 2 +-
tools/testing/selftests/bpf/xdp_hw_metadata.c | 2 +-
tools/testing/selftests/bpf/xdp_metadata.h | 7 ++++
6 files changed, 49 insertions(+), 4 deletions(-)
--
2.34.1
The config snippet specifies CONFIG_NET_EMATCH_IPSET. This option
depends on CONFIG_IP_SET.
Set CONFIG_IP_SET to be enabled at part for tc-testing.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
---
tools/testing/selftests/tc-testing/config | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/tc-testing/config b/tools/testing/selftests/tc-testing/config
index db176fe7d0c3f..8e902f7f1a181 100644
--- a/tools/testing/selftests/tc-testing/config
+++ b/tools/testing/selftests/tc-testing/config
@@ -21,6 +21,7 @@ CONFIG_NF_NAT=m
CONFIG_NETFILTER_XT_TARGET_LOG=m
CONFIG_NET_SCHED=y
+CONFIG_IP_SET=m
#
# Queueing/Scheduling
--
2.50.0
From: Chia-Yu Chang <chia-yu.chang(a)nokia-bell-labs.com>
Hello,
Please find the DualPI2 patch v20.
This patch serise adds DualPI Improved with a Square (DualPI2) with following features:
* Supports congestion controls that comply with the Prague requirements in RFC9331 (e.g. TCP-Prague)
* Coupled dual-queue that separates the L4S traffic in a low latency queue (L-queue), without harming remaining traffic that is scheduled in classic queue (C-queue) due to congestion-coupling using PI2 as defined in RFC9332
* Configurable overload strategies
* Use of sojourn time to reliably estimate queue delay
* Supports ECN L4S-identifier (IP.ECN==0b*1) to classify traffic into respective queues
For more details of DualPI2, please refer IETF RFC9332 (https://datatracker.ietf.org/doc/html/rfc9332).
Best regards,
Chia-Yu
---
v20 (21-Jun-2025)
- Add one more commit to fix warning and style check on tdc.sh reported by shellcheck
- Remove double-prefixed of "tc_tc_dualpi2_attrs" in tc-user.h (Donald Hunter <donald.hunter(a)gmail.com>)
v19 (14-Jun-2025)
- Fix one typo in the comment of #1 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Update commit message of #4 (ALOK TIWARI <alok.a.tiwari(a)oracle.com>)
- Wrap long lines of Documentation/netlink/specs/tc.yaml to within 80 characters (Jakub Kicinski <kuba(a)kernel.org>)
v18 (13-Jun-2025)
- Add the num of enum used by DualPI2 and fix name and name-prefix of DualPI2 enum and attribute
- Replace from_timer() with timer_container_of() (Pedro Tammela <pctammela(a)mojatatu.com>)
v17 (25-May-2025, Resent at 11-Jun-2025)
- Replace 0xffffffff with U32_MAX (Paolo Abeni <pabeni(a)redhat.com>)
- Use helper function qdisc_dequeue_internal() and add new helper function skb_apply_step() (Paolo Abeni <pabeni(a)redhat.com>)
- Add s64 casting when calculating the delta of the PI controller (Paolo Abeni <pabeni(a)redhat.com>)
- Change the drop reason into SKB_DROP_REASON_QDISC_CONGESTED for drop_early (Paolo Abeni <pabeni(a)redhat.com>)
- Modify the condition to remove the original skb when enqueuing multiple GSO segments (Paolo Abeni <pabeni(a)redhat.com>)
- Add READ_ONCE() in dualpi2_dump_stat() (Paolo Abeni <pabeni(a)redhat.com>)
- Add comments, brackets, and brackets for readability (Paolo Abeni <pabeni(a)redhat.com>)
v16 (16-MAy-2025)
- Add qdisc_lock() to dualpi2_timer() in dualpi2_timer (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce convert_ns_to_usec() to convert usec to nsec without overflow in #1 (Paolo Abeni <pabeni(a)redhat.com>)
- Update convert_us_tonsec() to convert nsec to usec without overflow in #2 (Paolo Abeni <pabeni(a)redhat.com>)
- Add more descriptions with respect to DualPI2 in the cover ltter and add changelog in each patch (Paolo Abeni <pabeni(a)redhat.com>)
v15 (09-May-2025)
- Add enum of TCA_DUALPI2_ECN_MASK_CLA_ECT to remove potential leakeage in #1 (Simon Horman <horms(a)kernel.org>)
- Fix one typo in comment of #2
- Update tc.yaml in #5 to aligh with the updated enum of pkt_sched.h
v14 (05-May-2025)
- Modify tc.yaml: (1) Replace flags with enum and remove enum-as-flags, (2) Remove credit-queue in xstats, and (3) Change attribute types (Donald Hunter <donald.hun
- Add enum and fix the ordering of variables in pkt_sched.h to align with the modified tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add validators for DROP_OVERLOAD, DROP_EARLY, ECN_MASK, and SPLIT_GSO in sch_dualpi2.c (Donald Hunter <donald.hunter(a)gmail.com>)
- Update dualpi2.json to align with the updated variable order in pkt_sched.h
- Reorder patches (Donald Hunter <donald.hunter(a)gmail.com>)
v13 (26-Apr-2025)
- Use dashes in member names to follow YNL conventions in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Define enumerations separately for flags of drop-early, drop-overload, ecn-mask, credit-queue in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the types of split-gso and step-packets into flag in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Revert to u32/u8 types for tc-dualpi2-xstats members in tc.yaml (Donald Hunter <donald.hunter(a)gmail.com>)
- Add new test cases in tc-tests/qdiscs/dualpi2.json to cover all dualpi2 parameters (Donald Hunter <donald.hunter(a)gmail.com>)
- Change the type of TCA_DUALPI2_STEP_PACKETS into NLA_FLAG (Donald Hunter <donald.hunter(a)gmail.com>)
v12 (22-Apr-2025)
- Remove anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Replace u32/u8 with uint and s32 with int in tc spec document (Paolo Abeni <pabeni(a)redhat.com>)
- Introduce get_memory_limit function to handle potential overflow when multipling limit with MTU (Paolo Abeni <pabeni(a)redhat.com>)
- Double the packet length to further include packet overhead in memory_limit (Paolo Abeni <pabeni(a)redhat.com>)
- Remove the check of qdisc_qlen(sch) when calling qdisc_tree_reduce_backlog (Paolo Abeni <pabeni(a)redhat.com>)
v11 (15-Apr-2025)
- Replace hstimer_init with hstimer_setup in sch_dualpi2.c
v10 (25-Mar-2025)
- Remove leftover include in include/linux/netdevice.h and anonymous struct in sch_dualpi2.c (Paolo Abeni <pabeni(a)redhat.com>)
- Use kfree_skb_reason() and add SKB_DROP_REASON_DUALPI2_STEP_DROP drop reason (Paolo Abeni <pabeni(a)redhat.com>)
- Split sch_dualpi2.c into 3 patches (and overall 5 patches): Struct definition & parsing, Dump stats & configuration, Enqueue/Dequeue (Paolo Abeni <pabeni(a)redhat.com>)
v9 (16-Mar-2025)
- Fix mem_usage error in previous version
- Add min_qlen_step to the dualpi2 attribute as the minimum queue length in number of packets in the L-queue to start step threshold marking.
In previous versions, this value was fixed to 2, so the step threshold was applied to mark packets in the L queue only when the queue length of the L queue was greater than or equal to 2 packets.
This will cause larger queuing delays for L4S traffic at low rates (<20Mbps). So we parameterize it and change the default value to 0.
Comparison of tcp_1down run 'HTB 20Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 11.55 11.70 ms 350
TCP upload avg : 18.96 N/A Mbits/s 350
TCP upload sum : 18.96 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 10.81 10.70 ms 350
TCP upload avg : 18.91 N/A Mbits/s 350
TCP upload sum : 18.91 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 12.61 12.80 ms 350
TCP upload avg : 9.48 N/A Mbits/s 350
TCP upload sum : 9.48 N/A Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.06 10.80 ms 350
TCP upload avg : 9.43 N/A Mbits/s 350
TCP upload sum : 9.43 N/A Mbits/s 350
Comparison of tcp_1down run 'HTB 10Mbit + DUALPI2 + 10ms base delay'
Old versions:
avg median # data pts
Ping (ms) ICMP : 40.86 37.45 ms 350
TCP upload avg : 0.88 N/A Mbits/s 350
TCP upload sum : 0.88 N/A Mbits/s 350
TCP upload::1 : 0.88 0.97 Mbits/s 350
New version (v9):
avg median # data pts
Ping (ms) ICMP : 11.07 10.40 ms 350
TCP upload avg : 0.55 N/A Mbits/s 350
TCP upload sum : 0.55 N/A Mbits/s 350
TCP upload::1 : 0.55 0.59 Mbits/s 350
v8 (11-Mar-2025)
- Fix warning messages in v7
v7 (07-Mar-2025)
- Separate into 3 patches to avoid mixing changes of documentation, selftest, and code. (Cong Wang <xiyou.wangcong(a)gmail.com>)
v6 (04-Mar-2025)
- Add modprobe for dulapi2 in tc-testing script tc-testing/tdc.sh (Jakub Kicinski <kuba(a)kernel.org>)
- Update test cases in dualpi2.json
- Update commit message
v5 (22-Feb-2025)
- A comparison was done between MQ + DUALPI2, MQ + FQ_PIE, MQ + FQ_CODEL:
Unshaped 1gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
- Summary of tcp_4down run 'MQ + FQ_PIE'
avg median # data pts
Ping (ms) ICMP : 1.21 1.37 ms 350
TCP download avg : 235.42 N/A Mbits/s 350
TCP download sum : 941.61 N/A Mbits/s 350
TCP download::1 : 232.54 233.13 Mbits/s 350
TCP download::2 : 232.52 232.80 Mbits/s 350
TCP download::3 : 233.14 233.78 Mbits/s 350
TCP download::4 : 243.41 241.48 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2'
avg median # data pts
Ping (ms) ICMP : 1.19 1.34 ms 349
TCP download avg : 235.42 N/A Mbits/s 349
TCP download sum : 941.68 N/A Mbits/s 349
TCP download::1 : 235.19 235.39 Mbits/s 349
TCP download::2 : 235.03 235.35 Mbits/s 349
TCP download::3 : 236.89 235.44 Mbits/s 349
TCP download::4 : 234.57 235.19 Mbits/s 349
Unshaped 1gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 1.88 1.86 ms 350
TCP download avg : 7.39 N/A Mbits/s 350
TCP download sum : 946.47 N/A Mbits/s 350
Unshaped 10gigE with 4 download streams test:
- Summary of tcp_4down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 0.22 0.23 ms 350
TCP download avg : 2354.08 N/A Mbits/s 350
TCP download sum : 9416.31 N/A Mbits/s 350
TCP download::1 : 2353.65 2352.81 Mbits/s 350
TCP download::2 : 2354.54 2354.21 Mbits/s 350
TCP download::3 : 2353.56 2353.78 Mbits/s 350
TCP download::4 : 2354.56 2354.45 Mbits/s 350
- Summary of tcp_4down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 0.20 0.19 ms 350
TCP download avg : 2354.76 N/A Mbits/s 350
TCP download sum : 9419.04 N/A Mbits/s 350
TCP download::1 : 2354.77 2353.89 Mbits/s 350
TCP download::2 : 2353.41 2354.29 Mbits/s 350
TCP download::3 : 2356.18 2354.19 Mbits/s 350
TCP download::4 : 2354.68 2353.15 Mbits/s 350
- Summary of tcp_4down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 0.24 0.24 ms 350
TCP download avg : 2354.11 N/A Mbits/s 350
TCP download sum : 9416.43 N/A Mbits/s 350
TCP download::1 : 2354.75 2353.93 Mbits/s 350
TCP download::2 : 2353.15 2353.75 Mbits/s 350
TCP download::3 : 2353.49 2353.72 Mbits/s 350
TCP download::4 : 2355.04 2353.73 Mbits/s 350
Unshaped 10gigE with 128 download streams test:
- Summary of tcp_128down run 'MQ + FQ_CODEL':
avg median # data pts
Ping (ms) ICMP : 7.57 8.69 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9467.82 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + FQ_PIE':
avg median # data pts
Ping (ms) ICMP : 7.82 8.91 ms 350
TCP download avg : 73.97 N/A Mbits/s 350
TCP download sum : 9468.42 N/A Mbits/s 350
- Summary of tcp_128down run 'MQ + DUALPI2':
avg median # data pts
Ping (ms) ICMP : 6.87 7.93 ms 350
TCP download avg : 73.95 N/A Mbits/s 350
TCP download sum : 9465.87 N/A Mbits/s 350
From the results shown above, we see small differences between combinations.
- Update commit message to include results of no_split_gso and split_gso (Dave Taht <dave.taht(a)gmail.com> and Paolo Abeni <pabeni(a)redhat.com>)
- Add memlimit in the dualpi2 attribute, and add memory_used, max_memory_used, memory_limit in dualpi2 stats (Dave Taht <dave.taht(a)gmail.com>)
- Update note in sch_dualpi2.c related to BBRv3 status (Dave Taht <dave.taht(a)gmail.com>)
- Update license identifier (Dave Taht <dave.taht(a)gmail.com>)
- Add selftest in tools/testing/selftests/tc-testing (Cong Wang <xiyou.wangcong(a)gmail.com>)
- Use netlink policies for parameter checks (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Modify texts & fix typos in Documentation/netlink/specs/tc.yaml (Dave Taht <dave.taht(a)gmail.com>)
- Add descriptions of packet counter statistics and the reset function of sch_dualpi2.c
- Fix step_thresh in packets
- Update code comments in sch_dualpi2.c
v4 (22-Oct-2024)
- Update statement in Kconfig for DualPI2 (Stephen Hemminger <stephen(a)networkplumber.org>)
- Put a blank line after #define in sch_dualpi2.c (Stephen Hemminger <stephen(a)networkplumber.org>)
- Fix line length warning.
v3 (19-Oct-2024)
- Fix compilaiton error
- Update Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
v2 (18-Oct-2024)
- Add Documentation/netlink/specs/tc.yaml (Jakub Kicinski <kuba(a)kernel.org>)
- Use dualpi2 instead of skb prefix (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Replace nla_parse_nested_deprecated with nla_parse_nested (Jamal Hadi Salim <jhs(a)mojatatu.com>)
- Fix line length warning
---
Chia-Yu Chang (5):
sched: Struct definition and parsing of dualpi2 qdisc
sched: Dump configuration and statistics of dualpi2 qdisc
selftests/tc-testing: Fix warning and style check on tdc.sh
selftests/tc-testing: Add selftests for qdisc DualPI2
Documentation: netlink: specs: tc: Add DualPI2 specification
Koen De Schepper (1):
sched: Add enqueue/dequeue of dualpi2 qdisc
Documentation/netlink/specs/tc.yaml | 166 +++
include/net/dropreason-core.h | 6 +
include/uapi/linux/pkt_sched.h | 70 +-
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/sch_dualpi2.c | 1146 +++++++++++++++++
tools/testing/selftests/tc-testing/config | 1 +
.../tc-testing/tc-tests/qdiscs/dualpi2.json | 254 ++++
tools/testing/selftests/tc-testing/tdc.sh | 6 +-
9 files changed, 1658 insertions(+), 4 deletions(-)
create mode 100644 net/sched/sch_dualpi2.c
create mode 100644 tools/testing/selftests/tc-testing/tc-tests/qdiscs/dualpi2.json
--
2.34.1
Hello,
binder_alloc_selftest provides a robust set of checks for the binder allocator,
but it rarely runs because it must hook into a running binder process and block
all other binder threads until it completes. The test itself is a good candidate
for conversion to KUnit, and it can be further isolated from user processes by
using a test-specific lru freelist instead of the global one. This series
converts the selftest to KUnit to make it less burdensome to run and to set up a
foundation for testing future binder_alloc changes.
Thanks,
Tiffany
Tiffany Yang (5):
binder: Fix selftest page indexing
binder: Store lru freelist in binder_alloc
binder: Scaffolding for binder_alloc KUnit tests
binder: Convert binder_alloc selftests to KUnit
binder: encapsulate individual alloc test cases
drivers/android/Kconfig | 15 +-
drivers/android/Makefile | 2 +-
drivers/android/binder.c | 10 +-
drivers/android/binder_alloc.c | 39 +-
drivers/android/binder_alloc.h | 14 +-
drivers/android/binder_alloc_selftest.c | 306 -----------
drivers/android/binder_internal.h | 4 +
drivers/android/tests/.kunitconfig | 3 +
drivers/android/tests/Makefile | 3 +
drivers/android/tests/binder_alloc_kunit.c | 572 +++++++++++++++++++++
include/kunit/test.h | 12 +
lib/kunit/user_alloc.c | 4 +-
12 files changed, 644 insertions(+), 340 deletions(-)
delete mode 100644 drivers/android/binder_alloc_selftest.c
create mode 100644 drivers/android/tests/.kunitconfig
create mode 100644 drivers/android/tests/Makefile
create mode 100644 drivers/android/tests/binder_alloc_kunit.c
--
2.50.0.727.gbf7dc18ff4-goog
Some libc's like musl libc don't provide execinfo.h since it's not part
of POSIX. In order to fix compilation on musl, only include execinfo.h
if available (HAVE_BACKTRACE_SUPPORT)
This was discovered with c104c16073b7 ("Kunit to check the longest symbol length")
which starts to include linux/kallsyms.h with Alpine Linux' configs.
Signed-off-by: Achill Gilgenast <fossdd(a)pwned.life>
Cc: stable(a)vger.kernel.org
---
tools/include/linux/kallsyms.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/include/linux/kallsyms.h b/tools/include/linux/kallsyms.h
index 5a37ccbec54f..f61a01dd7eb7 100644
--- a/tools/include/linux/kallsyms.h
+++ b/tools/include/linux/kallsyms.h
@@ -18,6 +18,7 @@ static inline const char *kallsyms_lookup(unsigned long addr,
return NULL;
}
+#ifdef HAVE_BACKTRACE_SUPPORT
#include <execinfo.h>
#include <stdlib.h>
static inline void print_ip_sym(const char *loglvl, unsigned long ip)
@@ -30,5 +31,8 @@ static inline void print_ip_sym(const char *loglvl, unsigned long ip)
free(name);
}
+#else
+static inline void print_ip_sym(const char *loglvl, unsigned long ip) {}
+#endif
#endif
--
2.50.0