RFC v2: https://patchwork.kernel.org/project/netdevbpf/list/?series=920056&state=*
=======
RFC v2 addresses much of the feedback from RFC v1. I plan on sending
something close to this as net-next reopens, sending it slightly early
to get feedback if any.
Major changes:
--------------
- much improved UAPI as suggested by Stan. We now interpret the iov_base
of the passed in iov from userspace as the offset into the dmabuf to
send from. This removes the need to set iov.iov_base = NULL which may
be confusing to users, and enables us to send multiple iovs in the
same sendmsg() call. ncdevmem and the docs show a sample use of that.
- Removed the duplicate dmabuf iov_iter in binding->iov_iter. I think
this is good improvment as it was confusing to keep track of
2 iterators for the same sendmsg, and mistracking both iterators
caused a couple of bugs reported in the last iteration that are now
resolved with this streamlining.
- Improved test coverage in ncdevmem. Now muliple sendmsg() are tested,
and sending multiple iovs in the same sendmsg() is tested.
- Fixed issue where dmabuf unmapping was happening in invalid context
(Stan).
====================================================================
The TX path had been dropped from the Device Memory TCP patch series
post RFCv1 [1], to make that series slightly easier to review. This
series rebases the implementation of the TX path on top of the
net_iov/netmem framework agreed upon and merged. The motivation for
the feature is thoroughly described in the docs & cover letter of the
original proposal, so I don't repeat the lengthy descriptions here, but
they are available in [1].
Sending this series as RFC as the winder closure is immenient. I plan on
reposting as non-RFC once the tree re-opens, addressing any feedback
I receive in the meantime.
Full outline on usage of the TX path is detailed in the documentation
added in the first patch.
Test example is available via the kselftest included in the series as well.
The series is relatively small, as the TX path for this feature largely
piggybacks on the existing MSG_ZEROCOPY implementation.
Patch Overview:
---------------
1. Documentation & tests to give high level overview of the feature
being added.
2. Add netmem refcounting needed for the TX path.
3. Devmem TX netlink API.
4. Devmem TX net stack implementation.
Testing:
--------
Testing is very similar to devmem TCP RX path. The ncdevmem test used
for the RX path is now augemented with client functionality to test TX
path.
* Test Setup:
Kernel: net-next with this RFC and memory provider API cherry-picked
locally.
Hardware: Google Cloud A3 VMs.
NIC: GVE with header split & RSS & flow steering support.
Performance results are not included with this version, unfortunately.
I'm having issues running the dma-buf exporter driver against the
upstream kernel on my test setup. The issues are specific to that
dma-buf exporter and do not affect this patch series. I plan to follow
up this series with perf fixes if the tests point to issues once they're
up and running.
Special thanks to Stan who took a stab at rebasing the TX implementation
on top of the netmem/net_iov framework merged. Parts of his proposal [2]
that are reused as-is are forked off into their own patches to give full
credit.
[1] https://lore.kernel.org/netdev/20240909054318.1809580-1-almasrymina@google.…
[2] https://lore.kernel.org/netdev/20240913150913.1280238-2-sdf@fomichev.me/T/#…
Cc: sdf(a)fomichev.me
Cc: asml.silence(a)gmail.com
Cc: dw(a)davidwei.uk
Cc: Jamal Hadi Salim <jhs(a)mojatatu.com>
Cc: Victor Nogueira <victor(a)mojatatu.com>
Cc: Pedro Tammela <pctammela(a)mojatatu.com>
Mina Almasry (5):
net: add devmem TCP TX documentation
selftests: ncdevmem: Implement devmem TCP TX
net: add get_netmem/put_netmem support
net: devmem: Implement TX path
net: devmem: make dmabuf unbinding scheduled work
Stanislav Fomichev (1):
net: devmem: TCP tx netlink api
Documentation/netlink/specs/netdev.yaml | 12 +
Documentation/networking/devmem.rst | 144 ++++++++-
include/linux/skbuff.h | 15 +-
include/linux/skbuff_ref.h | 4 +-
include/net/netmem.h | 3 +
include/net/sock.h | 1 +
include/uapi/linux/netdev.h | 1 +
include/uapi/linux/uio.h | 6 +-
net/core/datagram.c | 41 ++-
net/core/devmem.c | 110 ++++++-
net/core/devmem.h | 70 ++++-
net/core/netdev-genl-gen.c | 13 +
net/core/netdev-genl-gen.h | 1 +
net/core/netdev-genl.c | 67 ++++-
net/core/skbuff.c | 36 ++-
net/core/sock.c | 8 +
net/ipv4/tcp.c | 36 ++-
net/vmw_vsock/virtio_transport_common.c | 3 +-
tools/include/uapi/linux/netdev.h | 1 +
.../selftests/drivers/net/hw/ncdevmem.c | 276 +++++++++++++++++-
20 files changed, 802 insertions(+), 46 deletions(-)
--
2.48.1.362.g079036d154-goog
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows Patches::
Patch "consolidate send buffers into netconsole_target struct"
=================================================
Move the static buffers to netconsole target, from static declaration
in send_msg_no_fragmentation() and send_msg_fragmented().
Patch "netconsole: Rename userdata to extradata"
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch "netconsole: Helper to count number of used entries"
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch "Introduce configfs helpers for sysdata features"
======================================================
Create the concept of sysdata feature in the netconsole target, and
create the configfs helpers to enable the bit in nt->sysdata
Patch "Include sysdata in extradata entry count"
================================================
Add the concept of sysdata when counting for available space in the
buffer. This will protect users from creating new userdata/sysdata if
there is no more space
Patch "netconsole: add support for sysdata and CPU population"
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v3:
- Moved the buffer into netconsole_target, avoiding static functions in
the send path (Jakub).
- Fix a documentation error (Randy Dunlap)
- Created a function that handle all the extradata, consolidating it in
a single place (Jakub)
- Split the patch even more, trying to simplify the review.
- Link to v2: https://lore.kernel.org/r/20250115-netcon_cpu-v2-0-95971b44dc56@debian.org
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (8):
netconsole: consolidate send buffers into netconsole_target struct
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: Introduce configfs helpers for sysdata features
netconsole: Include sysdata in extradata entry count
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 ++++
drivers/net/netconsole.c | 260 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 167 +++++++++++++
5 files changed, 426 insertions(+), 64 deletions(-)
---
base-commit: fa10c9f5aa705deb43fd65623508074bee942764
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal,
qboot does not.
It turns out this is actually useful with kunit.py, since the user is
exposed to this issue if they set --raw_output=all.
qboot is also faster than SeaBIOS, but it's is marginal for this
usecase.
[0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDUR…
Both SeaBIOS and qboot are x86-specific.
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
---
tools/testing/kunit/qemu_configs/x86_64.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/qemu_configs/x86_64.py b/tools/testing/kunit/qemu_configs/x86_64.py
index dc794907686304b325dbe180149169dd79bcd44f..4a6bf4e048f5b05c889e3b9b03046f14cc9b0bcc 100644
--- a/tools/testing/kunit/qemu_configs/x86_64.py
+++ b/tools/testing/kunit/qemu_configs/x86_64.py
@@ -7,4 +7,6 @@ CONFIG_SERIAL_8250_CONSOLE=y''',
qemu_arch='x86_64',
kernel_path='arch/x86/boot/bzImage',
kernel_command_line='console=ttyS0',
- extra_qemu_params=[])
+ # qboot is faster than SeaBIOS and doesn't mess up
+ # the terminal.
+ extra_qemu_params=['-bios', 'qboot.rom'])
---
base-commit: 8ea24baaaa869adeb39c6b9ce7542657a7251b56
change-id: 20250124-kunit-qboot-5201945f7e86
Best regards,
--
Brendan Jackman <jackmanb(a)google.com>
This series introduce the dynptr counterpart of the
bpf_probe_read_{kernel,user} helpers and bpf_copy_from_user helper.
These helpers are helpful for reading variable-length data from kernel
memory into dynptr without going through an intermediate buffer.
Link: https://lore.kernel.org/bpf/MEYP282MB2312CFCE5F7712FDE313215AC64D2@MEYP282M…
Suggested-by: Andrii Nakryiko <andrii.nakryiko(a)gmail.com>
Signed-off-by: Levi Zim <rsworktech(a)outlook.com>
---
Changes in v2:
- Add missing bpf-next prefix. I forgot it in the initial series. Sorry
about that.
- Link to v1: https://lore.kernel.org/r/20250125-bpf_dynptr_probe-v1-0-c3cb121f6951@outlo…
---
Levi Zim (7):
bpf: Implement bpf_probe_read_kernel_dynptr helper
bpf: Implement bpf_probe_read_user_dynptr helper
bpf: Implement bpf_copy_from_user_dynptr helper
tools headers UAPI: Update tools's copy of bpf.h header
selftests/bpf: probe_read_kernel_dynptr test
selftests/bpf: probe_read_user_dynptr test
selftests/bpf: copy_from_user_dynptr test
include/linux/bpf.h | 3 +
include/uapi/linux/bpf.h | 49 ++++++++++
kernel/bpf/helpers.c | 53 ++++++++++-
kernel/trace/bpf_trace.c | 72 ++++++++++++++
tools/include/uapi/linux/bpf.h | 49 ++++++++++
tools/testing/selftests/bpf/prog_tests/dynptr.c | 45 ++++++++-
tools/testing/selftests/bpf/progs/dynptr_success.c | 106 +++++++++++++++++++++
7 files changed, 374 insertions(+), 3 deletions(-)
---
base-commit: d0d106a2bd21499901299160744e5fe9f4c83ddb
change-id: 20250124-bpf_dynptr_probe-ab483c554f1a
Best regards,
--
Levi Zim <rsworktech(a)outlook.com>
Greetings:
This is an attempt to followup on something Jakub asked me about [1],
adding an xsk attribute to queues and more clearly documenting which
queues are linked to NAPIs...
But:
1. I couldn't pick a good "thing" to expose as "xsk", so I chose 0 or 1.
Happy to take suggestions on what might be better to expose for the
xsk queue attribute.
2. I create a silly C helper program to create an XDP socket in order to
add a new test to queues.py. I'm not particularly good at python
programming, so there's probably a better way to do this. Notably,
python does not seem to have a socket.AF_XDP, so I needed the C
helper to make a socket and bind it to a queue to perform the test.
Tested this on my mlx5 machine and the test seems to pass.
Happy to take any suggestions / feedback on this one; sorry in advance
if I missed many obvious better ways to do things.
Thanks,
Joe
[1]: https://lore.kernel.org/netdev/20250113143109.60afa59a@kernel.org/
Joe Damato (2):
netdev-genl: Add an XSK attribute to queues
selftests: drv-net: Test queue xsk attribute
Documentation/netlink/specs/netdev.yaml | 10 ++-
include/uapi/linux/netdev.h | 1 +
net/core/netdev-genl.c | 6 ++
tools/include/uapi/linux/netdev.h | 1 +
tools/testing/selftests/drivers/.gitignore | 1 +
tools/testing/selftests/drivers/net/Makefile | 3 +
tools/testing/selftests/drivers/net/queues.py | 32 ++++++-
.../selftests/drivers/net/xdp_helper.c | 90 +++++++++++++++++++
8 files changed, 141 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/drivers/net/xdp_helper.c
base-commit: 0ad9617c78acbc71373fb341a6f75d4012b01d69
--
2.25.1
xtheadvector is a custom extension that is based upon riscv vector
version 0.7.1 [1]. All of the vector routines have been modified to
support this alternative vector version based upon whether xtheadvector
was determined to be supported at boot.
vlenb is not supported on the existing xtheadvector hardware, so a
devicetree property thead,vlenb is added to provide the vlenb to Linux.
There is a new hwprobe key RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 that is
used to request which thead vendor extensions are supported on the
current platform. This allows future vendors to allocate hwprobe keys
for their vendor.
Support for xtheadvector is also added to the vector kselftests.
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
[1] https://github.com/T-head-Semi/thead-extension-spec/blob/95358cb2cca9489361…
---
This series is a continuation of a different series that was fragmented
into two other series in an attempt to get part of it merged in the 6.10
merge window. The split-off series did not get merged due to a NAK on
the series that added the generic riscv,vlenb devicetree entry. This
series has converted riscv,vlenb to thead,vlenb to remedy this issue.
The original series is titled "riscv: Support vendor extensions and
xtheadvector" [3].
I have tested this with an Allwinner Nezha board. I used SkiffOS [1] to
manage building the image, but upgraded the U-Boot version to Samuel
Holland's more up-to-date version [2] and changed out the device tree
used by U-Boot with the device trees that are present in upstream linux
and this series. Thank you Samuel for all of the work you did to make
this task possible.
[1] https://github.com/skiffos/SkiffOS/tree/master/configs/allwinner/nezha
[2] https://github.com/smaeul/u-boot/commit/2e89b706f5c956a70c989cd31665f1429e9…
[3] https://lore.kernel.org/all/20240503-dev-charlie-support_thead_vector_6_9-v…
[4] https://lore.kernel.org/lkml/20240719-support_vendor_extensions-v3-4-0af758…
---
Changes in v11:
- Fix an issue where the mitigation was not being properly skipped when
requested
- Fix vstate_discard issue
- Fix issue when -1 was passed into
__riscv_isa_vendor_extension_available()
- Remove some artifacts from being placed in the test directory
- Link to v10: https://lore.kernel.org/r/20240911-xtheadvector-v10-0-8d3930091246@rivosinc…
Changes in v10:
- In DT probing disable vector with new function to clear vendor
extension bits for xtheadvector
- Add ghostwrite mitigations for c9xx CPUs. This disables xtheadvector
unless mitigations=off is set as a kernel boot arg
- Link to v9: https://lore.kernel.org/r/20240806-xtheadvector-v9-0-62a56d2da5d0@rivosinc.…
Changes in v9:
- Rebase onto palmer's for-next
- Fix sparse error in arch/riscv/kernel/vendor_extensions/thead.c
- Fix maybe-uninitialized warning in arch/riscv/include/asm/vendor_extensions/vendor_hwprobe.h
- Wrap some long lines
- Link to v8: https://lore.kernel.org/r/20240724-xtheadvector-v8-0-cf043168e137@rivosinc.…
Changes in v8:
- Rebase onto palmer's for-next
- Link to v7: https://lore.kernel.org/r/20240724-xtheadvector-v7-0-b741910ada3e@rivosinc.…
Changes in v7:
- Add defs for has_xtheadvector_no_alternatives() and has_xtheadvector()
when vector disabled. (Palmer)
- Link to v6: https://lore.kernel.org/r/20240722-xtheadvector-v6-0-c9af0130fa00@rivosinc.…
Changes in v6:
- Fix return type of is_vector_supported()/is_xthead_supported() to be bool
- Link to v5: https://lore.kernel.org/r/20240719-xtheadvector-v5-0-4b485fc7d55f@rivosinc.…
Changes in v5:
- Rebase on for-next
- Link to v4: https://lore.kernel.org/r/20240702-xtheadvector-v4-0-2bad6820db11@rivosinc.…
Changes in v4:
- Replace inline asm with C (Samuel)
- Rename VCSRs to CSRs (Samuel)
- Replace .insn directives with .4byte directives
- Link to v3: https://lore.kernel.org/r/20240619-xtheadvector-v3-0-bff39eb9668e@rivosinc.…
Changes in v3:
- Add back Heiko's signed-off-by (Conor)
- Mark RISCV_HWPROBE_KEY_VENDOR_EXT_THEAD_0 as a bitmask
- Link to v2: https://lore.kernel.org/r/20240610-xtheadvector-v2-0-97a48613ad64@rivosinc.…
Changes in v2:
- Removed extraneous references to "riscv,vlenb" (Jess)
- Moved declaration of "thead,vlenb" into cpus.yaml and added
restriction that it's only applicable to thead cores (Conor)
- Check CONFIG_RISCV_ISA_XTHEADVECTOR instead of CONFIG_RISCV_ISA_V for
thead,vlenb (Jess)
- Fix naming of hwprobe variables (Evan)
- Link to v1: https://lore.kernel.org/r/20240609-xtheadvector-v1-0-3fe591d7f109@rivosinc.…
---
Charlie Jenkins (13):
dt-bindings: riscv: Add xtheadvector ISA extension description
dt-bindings: cpus: add a thead vlen register length property
riscv: dts: allwinner: Add xtheadvector to the D1/D1s devicetree
riscv: Add thead and xtheadvector as a vendor extension
riscv: vector: Use vlenb from DT for thead
riscv: csr: Add CSR encodings for CSR_VXRM/CSR_VXSAT
riscv: Add xtheadvector instruction definitions
riscv: vector: Support xtheadvector save/restore
riscv: hwprobe: Add thead vendor extension probing
riscv: hwprobe: Document thead vendor extensions and xtheadvector extension
selftests: riscv: Fix vector tests
selftests: riscv: Support xtheadvector in vector tests
riscv: Add ghostwrite vulnerability
Heiko Stuebner (1):
RISC-V: define the elements of the VCSR vector CSR
Documentation/arch/riscv/hwprobe.rst | 10 +
Documentation/devicetree/bindings/riscv/cpus.yaml | 19 ++
.../devicetree/bindings/riscv/extensions.yaml | 10 +
arch/riscv/Kconfig.errata | 11 +
arch/riscv/Kconfig.vendor | 26 ++
arch/riscv/boot/dts/allwinner/sun20i-d1s.dtsi | 3 +-
arch/riscv/errata/thead/errata.c | 28 ++
arch/riscv/include/asm/bugs.h | 22 ++
arch/riscv/include/asm/cpufeature.h | 2 +
arch/riscv/include/asm/csr.h | 15 +
arch/riscv/include/asm/errata_list.h | 3 +-
arch/riscv/include/asm/hwprobe.h | 3 +-
arch/riscv/include/asm/switch_to.h | 2 +-
arch/riscv/include/asm/vector.h | 222 +++++++++++----
arch/riscv/include/asm/vendor_extensions/thead.h | 47 ++++
.../include/asm/vendor_extensions/thead_hwprobe.h | 19 ++
.../include/asm/vendor_extensions/vendor_hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/hwprobe.h | 3 +-
arch/riscv/include/uapi/asm/vendor/thead.h | 3 +
arch/riscv/kernel/Makefile | 2 +
arch/riscv/kernel/bugs.c | 60 ++++
arch/riscv/kernel/cpufeature.c | 59 +++-
arch/riscv/kernel/kernel_mode_vector.c | 8 +-
arch/riscv/kernel/process.c | 4 +-
arch/riscv/kernel/signal.c | 6 +-
arch/riscv/kernel/sys_hwprobe.c | 5 +
arch/riscv/kernel/vector.c | 24 +-
arch/riscv/kernel/vendor_extensions.c | 10 +
arch/riscv/kernel/vendor_extensions/Makefile | 2 +
arch/riscv/kernel/vendor_extensions/thead.c | 29 ++
.../riscv/kernel/vendor_extensions/thead_hwprobe.c | 19 ++
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 1 +
tools/testing/selftests/riscv/vector/.gitignore | 3 +-
tools/testing/selftests/riscv/vector/Makefile | 17 +-
.../selftests/riscv/vector/v_exec_initval_nolibc.c | 94 +++++++
tools/testing/selftests/riscv/vector/v_helpers.c | 68 +++++
tools/testing/selftests/riscv/vector/v_helpers.h | 8 +
tools/testing/selftests/riscv/vector/v_initval.c | 22 ++
.../selftests/riscv/vector/v_initval_nolibc.c | 68 -----
.../selftests/riscv/vector/vstate_exec_nolibc.c | 20 +-
.../testing/selftests/riscv/vector/vstate_prctl.c | 305 +++++++++++++--------
42 files changed, 1051 insertions(+), 271 deletions(-)
---
base-commit: 0eb512779d642b21ced83778287a0f7a3ca8f2a1
change-id: 20240530-xtheadvector-833d3d17b423
--
- Charlie
Here is a series from Geliang, adding mptcp_subflow bpf_iter support.
We are working on extending MPTCP with BPF, e.g. to control the path
manager -- in charge of the creation, deletion, and announcements of
subflows (paths) -- and the packet scheduler -- in charge of selecting
which available path the next data will be sent to. These extensions
need to iterate over the list of subflows attached to an MPTCP
connection, and do some specific actions via some new kfunc that will be
added later on.
This preparation work is split in different patches:
- Patch 1: extend bpf_skc_to_mptcp_sock() to be called with msk.
- Patch 2: allow using skc_to_mptcp_sock() in CGroup sockopt hooks.
- Patch 3: register some "basic" MPTCP kfunc.
- Patch 4: add mptcp_subflow bpf_iter support. Note that previous
versions of this single patch have already been shared to the
BPF mailing list. The changelog has been kept with a comment,
but the version number has been reset to avoid confusions.
- Patch 5: add kfunc to make sure the msk is valid
- Patch 6: add more MPTCP endpoints in the selftests, in order to create
more than 2 subflows.
- Patch 7: add a very simple test validating mptcp_subflow bpf_iter
support. This test could be written without the new bpf_iter,
but it is there only to make sure this specific feature works
as expected.
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Changes in v2:
- Patches 1-2: new ones.
- Patch 3: remove two kfunc, more restrictions. (Martin)
- Patch 4: add BUILD_BUG_ON(), more restrictions. (Martin)
- Patch 7: adaptations due to modifications in patches 1-4.
- Link to v1: https://lore.kernel.org/r/20241108-bpf-next-net-mptcp-bpf_iter-subflows-v1-…
---
Geliang Tang (7):
bpf: Extend bpf_skc_to_mptcp_sock to MPTCP sock
bpf: Allow use of skc_to_mptcp_sock in cg_sockopt
bpf: Register mptcp common kfunc set
bpf: Add mptcp_subflow bpf_iter
bpf: Acquire and release mptcp socket
selftests/bpf: More endpoints for endpoint_init
selftests/bpf: Add mptcp_subflow bpf_iter subtest
include/net/mptcp.h | 4 +-
kernel/bpf/cgroup.c | 2 +
net/core/filter.c | 2 +-
net/mptcp/bpf.c | 113 +++++++++++++++++-
tools/testing/selftests/bpf/bpf_experimental.h | 8 ++
tools/testing/selftests/bpf/prog_tests/mptcp.c | 129 ++++++++++++++++++++-
tools/testing/selftests/bpf/progs/mptcp_bpf.h | 9 ++
.../testing/selftests/bpf/progs/mptcp_bpf_iters.c | 63 ++++++++++
8 files changed, 318 insertions(+), 12 deletions(-)
---
base-commit: dad704ebe38642cd405e15b9c51263356391355c
change-id: 20241108-bpf-next-net-mptcp-bpf_iter-subflows-027f6d87770e
Best regards,
--
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
This patch series provides workingset reporting of user pages in
lruvecs, of which coldness can be tracked by accessed bits and fd
references. However, the concept of workingset applies generically to
all types of memory, which could be kernel slab caches, discardable
userspace caches (databases), or CXL.mem. Therefore, data sources might
come from slab shrinkers, device drivers, or the userspace.
Another interesting idea might be hugepage workingset, so that we can
measure the proportion of hugepages backing cold memory. However, with
architectures like arm, there may be too many hugepage sizes leading to
a combinatorial explosion when exporting stats to the userspace.
Nonetheless, the kernel should provide a set of workingset interfaces
that is generic enough to accommodate the various use cases, and extensible
to potential future use cases.
Use cases
==========
Job scheduling
On overcommitted hosts, workingset information improves efficiency and
reliability by allowing the job scheduler to have better stats on the
exact memory requirements of each job. This can manifest in efficiency by
landing more jobs on the same host or NUMA node. On the other hand, the
job scheduler can also ensure each node has a sufficient amount of memory
and does not enter direct reclaim or the kernel OOM path. With workingset
information and job priority, the userspace OOM killing or proactive
reclaim policy can kick in before the system is under memory pressure.
If the job shape is very different from the machine shape, knowing the
workingset per-node can also help inform page allocation policies.
Proactive reclaim
Workingset information allows the a container manager to proactively
reclaim memory while not impacting a job's performance. While PSI may
provide a reactive measure of when a proactive reclaim has reclaimed too
much, workingset reporting allows the policy to be more accurate and
flexible.
Ballooning (similar to proactive reclaim)
The last patch of the series extends the virtio-balloon device to report
the guest workingset.
Balloon policies benefit from workingset to more precisely determine the
size of the memory balloon. On end-user devices where memory is scarce and
overcommitted, the balloon sizing in multiple VMs running on the same
device can be orchestrated with workingset reports from each one.
On the server side, workingset reporting allows the balloon controller to
inflate the balloon without causing too much file cache to be reclaimed in
the guest.
Promotion/Demotion
If different mechanisms are used for promition and demotion, workingset
information can help connect the two and avoid pages being migrated back
and forth.
For example, given a promotion hot page threshold defined in reaccess
distance of N seconds (promote pages accessed more often than every N
seconds). The threshold N should be set so that ~80% (e.g.) of pages on
the fast memory node passes the threshold. This calculation can be done
with workingset reports.
To be directly useful for promotion policies, the workingset report
interfaces need to be extended to report hotness and gather hotness
information from the devices[1].
[1]
https://www.opencompute.org/documents/ocp-cms-hotness-tracking-requirements…
Sysfs and Cgroup Interfaces
==========
The interfaces are detailed in the patches that introduce them. The main
idea here is we break down the workingset per-node per-memcg into time
intervals (ms), e.g.
1000 anon=137368 file=24530
20000 anon=34342 file=0
30000 anon=353232 file=333608
40000 anon=407198 file=206052
9223372036854775807 anon=4925624 file=892892
Implementation
==========
The reporting of user pages is based off of MGLRU, and therefore requires
CONFIG_LRU_GEN=y. We would benefit from more MGLRU generations for a more
fine-grained workingset report, but we can already gather a lot of data
with just four generations. The workingset reporting mechanism is gated
behind CONFIG_WORKINGSET_REPORT, and the aging thread is behind
CONFIG_WORKINGSET_REPORT_AGING.
Benchmarks
==========
Ghait Ouled Amar Ben Cheikh has implemented a simple policy and ran Linux
compile and redis benchmarks from openbenchmarking.org. The policy and
runner is referred to as WMO (Workload Memory Optimization).
The results were based on v3 of the series, but v4 doesn't change the core
of the working set reporting and just adds the ballooning counterpart.
The timed Linux kernel compilation benchmark shows improvements in peak
memory usage with a policy of "swap out all bytes colder than 10 seconds
every 40 seconds". A swapfile is configured on SSD.
--------------------------------------------
peak memory usage (with WMO): 4982.61328 MiB
peak memory usage (control): 9569.1367 MiB
peak memory reduction: 47.9%
--------------------------------------------
Benchmark | Experimental |Control | Experimental_Std_Dev | Control_Std_Dev
Timed Linux Kernel Compilation - allmodconfig (sec) | 708.486 (95.91%) | 679.499 (100%) | 0.6% | 0.1%
--------------------------------------------
Seconds, fewer is better
The redis benchmark shows employs the same policy:
--------------------------------------------
peak memory usage (with WMO): 375.9023 MiB
peak memory usage (control): 509.765 MiB
peak memory reduction: 26%
--------------------------------------------
Benchmark | Experimental | Control | Experimental_Std_Dev | Control_Std_Dev
Redis - LPOP (Reqs/sec) | 2023130 (98.22%) | 2059849 (100%) | 1.2% | 2%
Redis - SADD (Reqs/sec) | 2539662 (98.63%) | 2574811 (100%) | 2.3% | 1.4%
Redis - LPUSH (Reqs/sec)| 2024880 (100%) | 2000884 (98.81%) | 1.1% | 0.8%
Redis - GET (Reqs/sec) | 2835764 (100%) | 2763722 (97.46%) | 2.7% | 1.6%
Redis - SET (Reqs/sec) | 2340723 (100%) | 2327372 (99.43%) | 2.4% | 1.8%
--------------------------------------------
Reqs/sec, more is better
The detailed report and benchmarking results are in Ghait's repo:
https://github.com/miloudi98/WMO
Changelog
==========
Changes from PATCH v3 -> v4:
- Added documentation for cgroup-v2
(Waiman Long)
- Fixed types in documentation
(Randy Dunlap)
- Added implementation for the ballooning use case
- Added detailed description of benchmark results
(Andrew Morton)
Changes from PATCH v2 -> v3:
- Fixed typos in commit messages and documentation
(Lance Yang, Randy Dunlap)
- Split out the force_scan patch to be reviewed separately
- Added benchmarks from Ghait Ouled Amar Ben Cheikh
- Fixed reported compile error without CONFIG_MEMCG
Changes from PATCH v1 -> v2:
- Updated selftest to use ksft_test_result_code instead of switch-case
(Muhammad Usama Anjum)
- Included more use cases in the cover letter
(Huang, Ying)
- Added documentation for sysfs and memcg interfaces
- Added an aging-specific struct lru_gen_mm_walk in struct pglist_data
to avoid allocating for each lruvec.
[v1] https://lore.kernel.org/linux-mm/20240504073011.4000534-1-yuanchu@google.co…
[v2] https://lore.kernel.org/linux-mm/20240604020549.1017540-1-yuanchu@google.co…
[v3] https://lore.kernel.org/linux-mm/20240813165619.748102-1-yuanchu@google.com/
Yuanchu Xie (9):
mm: aggregate workingset information into histograms
mm: use refresh interval to rate-limit workingset report aggregation
mm: report workingset during memory pressure driven scanning
mm: extend workingset reporting to memcgs
mm: add kernel aging thread for workingset reporting
selftest: test system-wide workingset reporting
Docs/admin-guide/mm/workingset_report: document sysfs and memcg
interfaces
Docs/admin-guide/cgroup-v2: document workingset reporting
virtio-balloon: add workingset reporting
Documentation/admin-guide/cgroup-v2.rst | 35 +
Documentation/admin-guide/mm/index.rst | 1 +
.../admin-guide/mm/workingset_report.rst | 105 +++
drivers/base/node.c | 6 +
drivers/virtio/virtio_balloon.c | 390 ++++++++++-
include/linux/balloon_compaction.h | 1 +
include/linux/memcontrol.h | 21 +
include/linux/mmzone.h | 13 +
include/linux/workingset_report.h | 167 +++++
include/uapi/linux/virtio_balloon.h | 30 +
mm/Kconfig | 15 +
mm/Makefile | 2 +
mm/internal.h | 19 +
mm/memcontrol.c | 162 ++++-
mm/mm_init.c | 2 +
mm/mmzone.c | 2 +
mm/vmscan.c | 56 +-
mm/workingset_report.c | 653 ++++++++++++++++++
mm/workingset_report_aging.c | 127 ++++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +
tools/testing/selftests/mm/run_vmtests.sh | 5 +
.../testing/selftests/mm/workingset_report.c | 306 ++++++++
.../testing/selftests/mm/workingset_report.h | 39 ++
.../selftests/mm/workingset_report_test.c | 330 +++++++++
25 files changed, 2482 insertions(+), 9 deletions(-)
create mode 100644 Documentation/admin-guide/mm/workingset_report.rst
create mode 100644 include/linux/workingset_report.h
create mode 100644 mm/workingset_report.c
create mode 100644 mm/workingset_report_aging.c
create mode 100644 tools/testing/selftests/mm/workingset_report.c
create mode 100644 tools/testing/selftests/mm/workingset_report.h
create mode 100644 tools/testing/selftests/mm/workingset_report_test.c
--
2.47.0.338.g60cca15819-goog
A previous commit described in this topic
http://lore.kernel.org/bpf/20230523025618.113937-9-john.fastabend@gmail.com
directly updated 'sk->copied_seq' in the tcp_eat_skb() function when the
action of a BPF program was SK_REDIRECT. For other actions, like SK_PASS,
the update logic for 'sk->copied_seq' was moved to
tcp_bpf_recvmsg_parser() to ensure the accuracy of the 'fionread' feature.
That commit works for a single stream_verdict scenario, as it also
modified 'sk_data_ready->sk_psock_verdict_data_ready->tcp_read_skb'
to remove updating 'sk->copied_seq'.
However, for programs where both stream_parser and stream_verdict are
active (strparser purpose), tcp_read_sock() was used instead of
tcp_read_skb() (sk_data_ready->strp_data_ready->tcp_read_sock).
tcp_read_sock() now still updates 'sk->copied_seq', leading to duplicated
updates.
In summary, for strparser + SK_PASS, copied_seq is redundantly calculated
in both tcp_read_sock() and tcp_bpf_recvmsg_parser().
The issue causes incorrect copied_seq calculations, which prevent
correct data reads from the recv() interface in user-land.
Also we added test cases for bpf + strparser and separated them from
sockmap_basic, as strparser has more encapsulation and parsing
capabilities compared to sockmap.
---
V8 -> v9
https://lore.kernel.org/bpf/20250121050707.55523-1-mrpre@163.com/
Fixed some issues suggested by Jakub Sitnicki.
V7 -> V8
https://lore.kernel.org/bpf/20250116140531.108636-1-mrpre@163.com/
Avoid using add read_sock to psock. (Jakub Sitnicki)
Avoid using warpper function to check whether strparser is supported.
V3 -> V7:
https://lore.kernel.org/bpf/20250109094402.50838-1-mrpre@163.com/https://lore.kernel.org/bpf/20241218053408.437295-1-mrpre@163.com/
Avoid introducing new proto_ops. (Jakub Sitnicki).
Add more edge test cases for strparser + bpf.
Fix patchwork fail of test cases code.
Fix psock fetch without rcu lock.
Move code of modifying to tcp_bpf.c.
V1 -> V3:
https://lore.kernel.org/bpf/20241209152740.281125-1-mrpre@163.com/
Fix patchwork fail by adding Fixes tag.
Save skb data offset for ENOMEM. (John Fastabend)
---
Jiayuan Chen (5):
strparser: add read_sock callback
bpf: fix wrong copied_seq calculation
bpf: disable non stream socket for strparser
selftests/bpf: fix invalid flag of recv()
selftests/bpf: add strparser test for bpf
Documentation/networking/strparser.rst | 9 +-
include/linux/skmsg.h | 2 +
include/net/strparser.h | 2 +
include/net/tcp.h | 8 +
net/core/skmsg.c | 7 +
net/core/sock_map.c | 5 +-
net/ipv4/tcp.c | 29 +-
net/ipv4/tcp_bpf.c | 36 ++
net/strparser/strparser.c | 11 +-
.../selftests/bpf/prog_tests/sockmap_basic.c | 59 +--
.../selftests/bpf/prog_tests/sockmap_strp.c | 454 ++++++++++++++++++
.../selftests/bpf/progs/test_sockmap_strp.c | 53 ++
12 files changed, 610 insertions(+), 65 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/sockmap_strp.c
create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_strp.c
--
2.43.5
This series remove compatibility with Python 2.x from scripts that have some
backward compatibility logic on it. The rationale is that, since
commit 627395716cc3 ("docs: document python version used for compilation"),
the minimal Python version was set to 3.x. Also, Python 2.x is EOL since Jan, 2020.
Patch 1: fix a script that was compatible only with Python 2.x;
Patches 2-4: remove backward-compat code;
Patches 5-6 solves forward-compat with modern Python which warns about using
raw strings without using "r" format.
Mauro Carvalho Chehab (6):
docs: trace: decode_msr.py: make it compatible with python 3
tools: perf: exported-sql-viewer: drop support for Python 2
tools: perf: tools: perf: exported-sql-viewer: drop support for Python
2
tools: perf: task-analyzer: drop support for Python 2
tools: selftests/bpf: test_bpftool_synctypes: escape raw symbols
comedi: convert_csv_to_c.py: use r-string for a regex expression
Documentation/trace/postprocess/decode_msr.py | 2 +-
.../ni_routing/tools/convert_csv_to_c.py | 2 +-
.../scripts/python/exported-sql-viewer.py | 5 ++--
tools/perf/scripts/python/task-analyzer.py | 23 ++++----------
tools/perf/tests/shell/lib/attr.py | 6 +---
.../selftests/bpf/test_bpftool_synctypes.py | 30 +++++++++----------
6 files changed, 25 insertions(+), 43 deletions(-)
--
2.48.1
PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of
system calls the tracee is blocked in.
This API allows ptracers to obtain and modify system call details
in a straightforward and architecture-agnostic way.
Current implementation supports changing only those bits of system call
information that are used by strace, namely, syscall number, syscall
arguments, and syscall return value.
Support of changing additional details returned by PTRACE_GET_SYSCALL_INFO,
such as instruction pointer and stack pointer, could be added later if
needed, by using struct ptrace_syscall_info.flags to specify the additional
details that should be set. Currently, "flags", "reserved", and
"seccomp.reserved2" fields of struct ptrace_syscall_info must be
initialized with zeroes; "arch", "instruction_pointer", and "stack_pointer"
fields are ignored.
PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.
Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure
to provide an API of changing the first system call argument on riscv
architecture [1].
ptrace(2) man page:
long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).
[1] https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Notes:
v3:
* powerpc: Submit syscall_set_return_value fix for "sc" case separately
* mips: Do not introduce erroneous argument truncation on mips n32,
add a detailed description to the commit message of the
mips_get_syscall_arg change
* ptrace: Add explicit padding to the end of struct ptrace_syscall_info,
simplify obtaining of user ptrace_syscall_info,
do not introduce PTRACE_SYSCALL_INFO_SIZE_VER0
* ptrace: Change the return type of ptrace_set_syscall_info_* functions
from "unsigned long" to "int"
* ptrace: Add -ERANGE check to ptrace_set_syscall_info_exit,
add comments to -ERANGE checks
* ptrace: Update comments about supported syscall stops
* selftests: Extend set_syscall_info test, fix for mips n32
* Add Tested-by and Reviewed-by
v2:
* Add patch to fix syscall_set_return_value() on powerpc
* Add patch to fix mips_get_syscall_arg() on mips
* Add syscall_set_return_value() implementation on hexagon
* Add syscall_set_return_value() invocation to syscall_set_nr()
on arm and arm64.
* Fix syscall_set_nr() and mips_set_syscall_arg() on mips
* Add a comment to syscall_set_nr() on arc, powerpc, s390, sh,
and sparc
* Remove redundant ptrace_syscall_info.op assignments in
ptrace_get_syscall_info_*
* Minor style tweaks in ptrace_get_syscall_info_op()
* Remove syscall_set_return_value() invocation from
ptrace_set_syscall_info_entry()
* Skip syscall_set_arguments() invocation in case of syscall number -1
in ptrace_set_syscall_info_entry()
* Split ptrace_syscall_info.reserved into ptrace_syscall_info.reserved
and ptrace_syscall_info.flags
* Use __kernel_ulong_t instead of unsigned long in set_syscall_info test
Dmitry V. Levin (6):
mips: fix mips_get_syscall_arg() for o32
syscall.h: add syscall_set_arguments() and syscall_set_return_value()
syscall.h: introduce syscall_set_nr()
ptrace_get_syscall_info: factor out ptrace_get_syscall_info_op
ptrace: introduce PTRACE_SET_SYSCALL_INFO request
selftests/ptrace: add a test case for PTRACE_SET_SYSCALL_INFO
arch/arc/include/asm/syscall.h | 25 +
arch/arm/include/asm/syscall.h | 37 ++
arch/arm64/include/asm/syscall.h | 29 +
arch/csky/include/asm/syscall.h | 13 +
arch/hexagon/include/asm/syscall.h | 21 +
arch/loongarch/include/asm/syscall.h | 15 +
arch/m68k/include/asm/syscall.h | 7 +
arch/microblaze/include/asm/syscall.h | 7 +
arch/mips/include/asm/syscall.h | 70 ++-
arch/nios2/include/asm/syscall.h | 16 +
arch/openrisc/include/asm/syscall.h | 13 +
arch/parisc/include/asm/syscall.h | 19 +
arch/powerpc/include/asm/syscall.h | 20 +
arch/riscv/include/asm/syscall.h | 16 +
arch/s390/include/asm/syscall.h | 24 +
arch/sh/include/asm/syscall_32.h | 24 +
arch/sparc/include/asm/syscall.h | 22 +
arch/um/include/asm/syscall-generic.h | 19 +
arch/x86/include/asm/syscall.h | 43 ++
arch/xtensa/include/asm/syscall.h | 18 +
include/asm-generic/syscall.h | 30 +
include/uapi/linux/ptrace.h | 7 +-
kernel/ptrace.c | 179 +++++-
tools/testing/selftests/ptrace/Makefile | 2 +-
.../selftests/ptrace/set_syscall_info.c | 514 ++++++++++++++++++
25 files changed, 1143 insertions(+), 47 deletions(-)
create mode 100644 tools/testing/selftests/ptrace/set_syscall_info.c
--
ldv
From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org>
Hi,
Following Peter's comments [1] these patches rework handling of ROX caches
for module text allocations.
Instead of using a writable copy that really complicates alternatives
patching, temporarily remap parts of a large ROX page as RW for the time of
module formation and then restore it's ROX protections when the module is
ready.
To keep the ROX memory mapped with large pages, make set_memory_rox()
capable of restoring large pages (more details are in patch 3).
Since this is really about x86, I believe this should go in via tip tree.
The patches also available in git
https://git.kernel.org/rppt/h/execmem/x86-rox/v10
v3 changes:
* instead of adding a new module state handle ROX restoration locally in
load_module() as Petr suggested
v2: https://lore.kernel.org/all/20250121095739.986006-1-rppt@kernel.org
* only collapse large mappings in set_memory_rox()
* simplify RW <-> ROX remapping
* don't remove ROX cache pages from the direct map (patch 4)
v1: https://lore.kernel.org/all/20241227072825.1288491-1-rppt@kernel.org
[1] https://lore.kernel.org/all/20241209083818.GK8562@noisy.programming.kicks-a…
Kirill A. Shutemov (1):
x86/mm/pat: restore large ROX pages after fragmentation
Mike Rapoport (Microsoft) (8):
x86/mm/pat: cpa-test: fix length for CPA_ARRAY test
x86/mm/pat: drop duplicate variable in cpa_flush()
execmem: don't remove ROX cache from the direct map
execmem: add API for temporal remapping as RW and restoring ROX afterwards
module: switch to execmem API for remapping as RW and restoring ROX
Revert "x86/module: prepare module loading for ROX allocations of text"
module: drop unused module_writable_address()
x86: re-enable EXECMEM_ROX support
arch/um/kernel/um_arch.c | 11 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +-
arch/x86/include/asm/pgtable_types.h | 2 +
arch/x86/kernel/alternative.c | 181 +++++++++-------------
arch/x86/kernel/ftrace.c | 30 ++--
arch/x86/kernel/module.c | 45 ++----
arch/x86/mm/pat/cpa-test.c | 2 +-
arch/x86/mm/pat/set_memory.c | 220 ++++++++++++++++++++++++++-
include/linux/execmem.h | 31 ++++
include/linux/module.h | 16 --
include/linux/moduleloader.h | 4 -
include/linux/vm_event_item.h | 2 +
kernel/module/main.c | 78 +++-------
kernel/module/strict_rwx.c | 9 +-
mm/execmem.c | 39 +++--
mm/vmstat.c | 2 +
18 files changed, 422 insertions(+), 268 deletions(-)
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
--
2.45.2
This patch series extends the sev_init2 and the sev_smoke test to
exercise the SEV-SNP VM launch workflow.
Primarily, it introduces the architectural defines, its support in the SEV
library and extends the tests to interact with the SEV-SNP ioctl()
wrappers.
Patch 1 - Do not advertize SNP on incompatible firmware
Patch 2 - Remove SEV support on platform init failure
Patch 3 - SNP test for KVM_SEV_INIT2
Patch 4 - Add VMGEXIT helper
Patch 5 - Introduce SEV+ VM type check
Patch 6 - SNP iotcl() plumbing for the SEV library
Patch 7 - Force set GUEST_MEMFD for SNP
Patch 8 - Cleanups of smoke test - Decouple policy from type
Patch 9 - SNP smoke test
The series is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
v4..v5:
* Introduced a check to disable advertising support for SEV, SEV-ES
and SNP when platform initialization fails (Nikunj)
* Remove the redundant SNP check within is_sev_vm() (Nikunj)
* Cleanup of the encrypt_region flow for better readability (Nikunj)
* Refactor paths to use the canonical $(ARCH) to rebase for kvm/next
v3..v4:
https://lore.kernel.org/kvm/20241114234104.128532-1-pratikrajesh.sampat@amd…
* Remove SNP FW API version check in the test and ensure the KVM
capability advertizes the presence of the feature. Retain the minimum
version definitions to exercise these API versions in the smoke test
* Retained only the SNP smoke test and SNP_INIT2 test
* The SNP architectural defined merged with SNP_INIT2 test patch
* SNP shutdown merged with SNP smoke test patch
* Add SEV VM type check to abstract comparisons and reduce clutter
* Define a SNP default policy which sets bits based on the presence of
SMT
* Decouple privatization and encryption for it to be SNP agnostic
* Assert for only positive tests using vm_ioctl()
* Dropped tested-by tags
In summary - based on comments from Sean, I have primarily reduced the
scope of this patch series to focus on breaking down the SNP smoke test
patch (v3 - patch2) to first introduce SEV-SNP support and use this
interface to extend the sev_init2 and the sev_smoke test.
The rest of the v3 patchset that introduces ioctl, pre fault, fallocate
and negative tests, will be re-worked and re-introduced subsequently in
future patch series post addressing the issues discussed.
v2..v3:
https://lore.kernel.org/kvm/20240905124107.6954-1-pratikrajesh.sampat@amd.c…
* Remove the assignments for the prefault and fallocate test type
enums.
* Fix error message for sev launch measure and finish.
* Collect tested-by tags [Peter, Srikanth]
Pratik R. Sampat (9):
KVM: SEV: Disable SEV-SNP on FW validation failure
KVM: SEV: Disable SEV on platform init failure
KVM: selftests: SEV-SNP test for KVM_SEV_INIT2
KVM: selftests: Add VMGEXIT helper
KVM: selftests: Introduce SEV VM type check
KVM: selftests: Add library support for interacting with SNP
KVM: selftests: Force GUEST_MEMFD flag for SNP VM type
KVM: selftests: Abstractions for SEV to decouple policy from type
KVM: selftests: Add a basic SEV-SNP smoke test
arch/x86/kvm/svm/sev.c | 6 +-
drivers/crypto/ccp/sev-dev.c | 16 +++
include/linux/psp-sev.h | 6 ++
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/include/x86/sev.h | 55 ++++++++++-
tools/testing/selftests/kvm/lib/kvm_util.c | 7 +-
.../testing/selftests/kvm/lib/x86/processor.c | 4 +-
tools/testing/selftests/kvm/lib/x86/sev.c | 99 ++++++++++++++++++-
.../selftests/kvm/x86/sev_init2_tests.c | 13 +++
.../selftests/kvm/x86/sev_smoke_test.c | 96 ++++++++++++++----
10 files changed, 272 insertions(+), 31 deletions(-)
--
2.43.0
Signed-off-by: Miguel García <miguelgarciaroman8(a)gmail.com>
---
tools/testing/selftests/alsa/mixer-test.c | 2 +-
tools/testing/selftests/arm64/gcs/libc-gcs.c | 2 +-
tools/testing/selftests/cgroup/test_cpuset.c | 2 +-
tools/testing/selftests/mm/gup_longterm.c | 2 +-
tools/testing/selftests/mm/mseal_test.c | 2 +-
tools/testing/selftests/mm/protection_keys.c | 4 ++--
tools/testing/selftests/mm/test_vmalloc.sh | 2 +-
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 2 +-
.../pmu/event_code_tests/group_constraint_thresh_sel_test.c | 4 ++--
tools/testing/selftests/safesetid/safesetid-test.c | 2 +-
10 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/tools/testing/selftests/alsa/mixer-test.c b/tools/testing/selftests/alsa/mixer-test.c
index 2a4b2662035e..dc7b290fc4ad 100644
--- a/tools/testing/selftests/alsa/mixer-test.c
+++ b/tools/testing/selftests/alsa/mixer-test.c
@@ -685,7 +685,7 @@ static int write_and_verify(struct ctl_data *ctl,
}
/*
- * Use the libray to compare values, if there's a mismatch
+ * Use the library to compare values, if there's a mismatch
* carry on and try to provide a more useful diagnostic than
* just "mismatch".
*/
diff --git a/tools/testing/selftests/arm64/gcs/libc-gcs.c b/tools/testing/selftests/arm64/gcs/libc-gcs.c
index 17b2fabfec38..482e1c634a65 100644
--- a/tools/testing/selftests/arm64/gcs/libc-gcs.c
+++ b/tools/testing/selftests/arm64/gcs/libc-gcs.c
@@ -129,7 +129,7 @@ TEST(gcs_find_terminator)
* We can access a GCS via ptrace
*
* This could usefully have a fixture but note that each test is
- * fork()ed into a new child whcih causes issues. Might be better to
+ * fork()ed into a new child which causes issues. Might be better to
* lift at least some of this out into a separate, non-harness, test
* program.
*/
diff --git a/tools/testing/selftests/cgroup/test_cpuset.c b/tools/testing/selftests/cgroup/test_cpuset.c
index 4034d14ba69a..3f802e3e8480 100644
--- a/tools/testing/selftests/cgroup/test_cpuset.c
+++ b/tools/testing/selftests/cgroup/test_cpuset.c
@@ -160,7 +160,7 @@ static int test_cpuset_perms_object_deny(const char *root)
}
/*
- * Migrate a process between parent and child implicitely
+ * Migrate a process between parent and child implicitly
* Implicit migration happens when a controller is enabled/disabled.
*
*/
diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c
index 9423ad439a61..d2dc3b59a084 100644
--- a/tools/testing/selftests/mm/gup_longterm.c
+++ b/tools/testing/selftests/mm/gup_longterm.c
@@ -154,7 +154,7 @@ static void do_test(int fd, size_t size, enum test_type type, bool shared)
/*
* R/O pinning or pinning in a private mapping is always
* expected to work. Otherwise, we expect long-term R/W pinning
- * to only succeed for special fielesystems.
+ * to only succeed for special filesystems.
*/
should_work = !shared || !rw ||
fs_supports_writable_longterm_pinning(fs_type);
diff --git a/tools/testing/selftests/mm/mseal_test.c b/tools/testing/selftests/mm/mseal_test.c
index 01675c412b2a..2ec2c5aea44a 100644
--- a/tools/testing/selftests/mm/mseal_test.c
+++ b/tools/testing/selftests/mm/mseal_test.c
@@ -732,7 +732,7 @@ static void test_seal_mprotect_two_vma_with_split(bool seal)
else
FAIL_TEST_IF_FALSE(!ret);
- /* the fouth page is not sealed. */
+ /* the fourth page is not sealed. */
ret = sys_mprotect(ptr + 3 * page_size, page_size,
PROT_READ | PROT_WRITE);
FAIL_TEST_IF_FALSE(!ret);
diff --git a/tools/testing/selftests/mm/protection_keys.c b/tools/testing/selftests/mm/protection_keys.c
index 4990f7ab4cb7..fcac7bb26b7a 100644
--- a/tools/testing/selftests/mm/protection_keys.c
+++ b/tools/testing/selftests/mm/protection_keys.c
@@ -900,7 +900,7 @@ void expected_pkey_fault(int pkey)
#if defined(__i386__) || defined(__x86_64__) /* arch */
/*
- * The signal handler shold have cleared out PKEY register to let the
+ * The signal handler should have cleared out PKEY register to let the
* test program continue. We now have to restore it.
*/
if (__read_pkey_reg() != 0)
@@ -1372,7 +1372,7 @@ void test_ptrace_of_child(int *ptr, u16 pkey)
long ret;
int status;
/*
- * This is the "control" for our little expermient. Make sure
+ * This is the "control" for our little experiment. Make sure
* we can always access it when ptracing.
*/
int *plain_ptr_unaligned = malloc(HPAGE_SIZE);
diff --git a/tools/testing/selftests/mm/test_vmalloc.sh b/tools/testing/selftests/mm/test_vmalloc.sh
index d73b846736f1..2d4b3e0a6a17 100755
--- a/tools/testing/selftests/mm/test_vmalloc.sh
+++ b/tools/testing/selftests/mm/test_vmalloc.sh
@@ -21,7 +21,7 @@ ksft_skip=4
#
# Static templates for performance, stressing and smoke tests.
-# Also it is possible to pass any supported parameters manualy.
+# Also it is possible to pass any supported parameters manually.
#
PERF_PARAM="sequential_test_order=1 test_repeat_count=3"
SMOKE_PARAM="test_loop_count=10000 test_repeat_count=10"
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 70f65eb320a7..a6d9f7bd1443 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -682,7 +682,7 @@ TEST_F(mount_setattr, mount_has_writers)
ASSERT_GE(fd, 0);
/*
- * We're holding a fd open to a mount somwhere in the middle so this
+ * We're holding a fd open to a mount somewhere in the middle so this
* needs to fail somewhere in the middle. After this the mount options
* need to be unchanged.
*/
diff --git a/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c b/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c
index 50a8cd843ce7..1b1336c1ddb1 100644
--- a/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c
+++ b/tools/testing/selftests/powerpc/pmu/event_code_tests/group_constraint_thresh_sel_test.c
@@ -34,7 +34,7 @@ static int group_constraint_thresh_sel(void)
/* Check for platform support for the test */
SKIP_IF(platform_check_for_tests());
- /* Init the events for the group contraint thresh select test */
+ /* Init the events for the group constraint thresh select test */
event_init(&leader, EventCode_1);
FAIL_IF(event_open(&leader));
@@ -45,7 +45,7 @@ static int group_constraint_thresh_sel(void)
event_close(&event);
- /* Init the event for the group contraint thresh select test */
+ /* Init the event for the group constraint thresh select test */
event_init(&event, EventCode_3);
/* Expected to succeed as sibling and leader event request same thresh_sel bits */
diff --git a/tools/testing/selftests/safesetid/safesetid-test.c b/tools/testing/selftests/safesetid/safesetid-test.c
index eb9bf0aee951..80f736d545a9 100644
--- a/tools/testing/selftests/safesetid/safesetid-test.c
+++ b/tools/testing/selftests/safesetid/safesetid-test.c
@@ -19,7 +19,7 @@
/*
* NOTES about this test:
- * - requries libcap-dev to be installed on test system
+ * - requires libcap-dev to be installed on test system
* - requires securityfs to me mounted at /sys/kernel/security, e.g.:
* mount -n -t securityfs -o nodev,noexec,nosuid securityfs /sys/kernel/security
* - needs CONFIG_SECURITYFS and CONFIG_SAFESETID to be enabled
--
2.34.1
From: Brian Norris <briannorris(a)chromium.org>
[ Upstream commit 7687c66c18c66d4ccd9949c6f641c0e7b5773483 ]
If the <kunit/platform_device.h> header is included in a test without
certain other headers, it produces compiler warnings like:
In file included from [...]
../include/kunit/platform_device.h:15:57: warning: ‘struct completion’
declared inside parameter list will not be visible outside of this
definition or declaration
15 | struct completion *x);
| ^~~~~~~~~~
Add a 'struct completion' forward declaration to resolve this.
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412241958.dbAImJsA-lkp@intel.com/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Link: https://lore.kernel.org/r/20241213180841.3023843-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/kunit/platform_device.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/kunit/platform_device.h b/include/kunit/platform_device.h
index 0fc0999d2420a..f8236a8536f7e 100644
--- a/include/kunit/platform_device.h
+++ b/include/kunit/platform_device.h
@@ -2,6 +2,7 @@
#ifndef _KUNIT_PLATFORM_DRIVER_H
#define _KUNIT_PLATFORM_DRIVER_H
+struct completion;
struct kunit;
struct platform_device;
struct platform_driver;
--
2.39.5
From: Brian Norris <briannorris(a)chromium.org>
[ Upstream commit 7687c66c18c66d4ccd9949c6f641c0e7b5773483 ]
If the <kunit/platform_device.h> header is included in a test without
certain other headers, it produces compiler warnings like:
In file included from [...]
../include/kunit/platform_device.h:15:57: warning: ‘struct completion’
declared inside parameter list will not be visible outside of this
definition or declaration
15 | struct completion *x);
| ^~~~~~~~~~
Add a 'struct completion' forward declaration to resolve this.
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412241958.dbAImJsA-lkp@intel.com/
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
Link: https://lore.kernel.org/r/20241213180841.3023843-1-briannorris@chromium.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
include/kunit/platform_device.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/kunit/platform_device.h b/include/kunit/platform_device.h
index 0fc0999d2420a..f8236a8536f7e 100644
--- a/include/kunit/platform_device.h
+++ b/include/kunit/platform_device.h
@@ -2,6 +2,7 @@
#ifndef _KUNIT_PLATFORM_DRIVER_H
#define _KUNIT_PLATFORM_DRIVER_H
+struct completion;
struct kunit;
struct platform_device;
struct platform_driver;
--
2.39.5
This patch series introduces "kci-gitlab," a GitLab CI pipeline
specifically designed for kernel testing. It provides kernel
developers with an integrated, efficient, and flexible testing
framework using GitLab's CI/CD capabilities. This patch includes
a .gitlab-ci file in the tools/ci/gitlab-ci/ folder, along with
additional YAML and script files, to define a basic test pipeline
triggered by code pushes to a GitLab-CI instance.
The initial version implements:
- Static checks: Includes checkpatch and smatch for code validation.
- Build tests: Covers various architectures and configurations.
- Boot tests: Utilizes virtme for basic boot testing.
Additionally, it introduces a flexible "scenarios" mechanism to
support subsystem-specific extensions.
This series also introduces a drm scenario that adds a job to run IGT
tests for vkms. This scenario includes helper scripts to build deqp-runner
and IGT, leveraging approaches from the drm-ci/mesa-ci project.
We are working towards creating a generic, upstream GitLab-CI pipeline
(kci-gitlab) that will replace DRM-CI [1]. The proposed GitLab-CI pipeline
is designed with a distributed infrastructure model, making it possible
to run on any gitLab instance. We plan to leverage KernelCI [2] as the
backend, utilizing its hardware, rootfs, test plans, and KCIDB [3]
integration.
For an example of a fully executed pipeline with drm scenario set,
including documentation generation,
see: https://gitlab.freedesktop.org/vigneshraman/kernel/-/pipelines/1350262
Please refer to the documentation included in the patch, or check the
rendered version, here:
https://vigneshraman.pages.freedesktop.org/-/kernel/-/jobs/69787927/artifac…
Differences from v1 to v2:
- moved to tools/ci as suggested by Linus on the previous version
- add arm64 containers for native compilation
- added boot tests using virtme: this is the base structure for boot tests,
next steps would be adding other tests such as kselftests/kunit tests
- added DRM scenario testing on vkms: this should replace current vkms test
in drm-ci. This work shows how a test scenario can be used by different
subsystems to add their tests.
- update documentation
For more details on the motivation behind this work, please refer to the
cover letter of v1: https://patchwork.kernel.org/project/linux-kselftest/cover/20240228225527.1…
[1] https://www.collabora.com/news-and-blog/blog/2024/02/08/drm-ci-a-gitlab-ci-…
[2] https://kernelci.org/
[3] https://docs.kernelci.org/kcidb/
Helen Koike (3):
kci-gitlab: Introducing GitLab-CI Pipeline for Kernel Testing
kci-gitlab: Add documentation
kci-gitlab: docs: Add images
Vignesh Raman (2):
MAINTAINERS: Add an entry for ci automated testing
kci-gitlab: Add drm scenario
Documentation/ci/gitlab-ci/gitlab-ci.rst | 471 ++++++++++
.../ci/gitlab-ci/images/drm-vkms.png | Bin 0 -> 73810 bytes
.../ci/gitlab-ci/images/job-matrix.png | Bin 0 -> 20000 bytes
.../gitlab-ci/images/new-project-runner.png | Bin 0 -> 607737 bytes
.../ci/gitlab-ci/images/pipelines-on-push.png | Bin 0 -> 532143 bytes
.../ci/gitlab-ci/images/the-pipeline.png | Bin 0 -> 62464 bytes
.../ci/gitlab-ci/images/variables.png | Bin 0 -> 277518 bytes
Documentation/index.rst | 7 +
MAINTAINERS | 10 +
tools/ci/gitlab-ci/arm_cross_compile.yml | 9 +
tools/ci/gitlab-ci/arm_native_compile.yml | 20 +
tools/ci/gitlab-ci/bootstrap-gitlab-runner.sh | 55 ++
tools/ci/gitlab-ci/build.yml | 43 +
tools/ci/gitlab-ci/cache.yml | 24 +
tools/ci/gitlab-ci/ci-scripts/build-docs.sh | 35 +
tools/ci/gitlab-ci/ci-scripts/build-kernel.sh | 43 +
.../ci/gitlab-ci/ci-scripts/ici-functions.sh | 106 +++
.../ci/gitlab-ci/ci-scripts/install-smatch.sh | 13 +
.../ci-scripts/parse_commit_message.sh | 27 +
.../ci/gitlab-ci/ci-scripts/run-checkpatch.sh | 20 +
tools/ci/gitlab-ci/ci-scripts/run-smatch.sh | 45 +
tools/ci/gitlab-ci/ci-scripts/run-virtme.sh | 52 ++
tools/ci/gitlab-ci/ci-scripts/test-boot.sh | 14 +
tools/ci/gitlab-ci/container.yml | 114 +++
tools/ci/gitlab-ci/docker-compose.yaml | 18 +
tools/ci/gitlab-ci/gitlab-ci.yml | 72 ++
tools/ci/gitlab-ci/scenarios.yml | 15 +
.../scenarios/drm/build-deqp-runner.sh | 42 +
tools/ci/gitlab-ci/scenarios/drm/build-igt.sh | 80 ++
.../ci/gitlab-ci/scenarios/drm/build-rust.sh | 42 +
.../scenarios/drm/create-cross-file.sh | 65 ++
tools/ci/gitlab-ci/scenarios/drm/drm.yml | 44 +
.../scenarios/drm/prepare-container.sh | 18 +
tools/ci/gitlab-ci/scenarios/drm/run-igt.sh | 83 ++
tools/ci/gitlab-ci/scenarios/drm/test.yml | 32 +
.../scenarios/drm/xfails/vkms-none-fails.txt | 22 +
.../scenarios/drm/xfails/vkms-none-flakes.txt | 90 ++
.../scenarios/drm/xfails/vkms-none-skips.txt | 812 ++++++++++++++++++
.../scenarios/file-systems/file-systems.yml | 11 +
tools/ci/gitlab-ci/scenarios/media/media.yml | 11 +
.../gitlab-ci/scenarios/network/network.yml | 11 +
tools/ci/gitlab-ci/static-checks.yml | 21 +
tools/ci/gitlab-ci/test.yml | 16 +
43 files changed, 2613 insertions(+)
create mode 100644 Documentation/ci/gitlab-ci/gitlab-ci.rst
create mode 100644 Documentation/ci/gitlab-ci/images/drm-vkms.png
create mode 100644 Documentation/ci/gitlab-ci/images/job-matrix.png
create mode 100644 Documentation/ci/gitlab-ci/images/new-project-runner.png
create mode 100644 Documentation/ci/gitlab-ci/images/pipelines-on-push.png
create mode 100644 Documentation/ci/gitlab-ci/images/the-pipeline.png
create mode 100644 Documentation/ci/gitlab-ci/images/variables.png
create mode 100644 tools/ci/gitlab-ci/arm_cross_compile.yml
create mode 100644 tools/ci/gitlab-ci/arm_native_compile.yml
create mode 100755 tools/ci/gitlab-ci/bootstrap-gitlab-runner.sh
create mode 100644 tools/ci/gitlab-ci/build.yml
create mode 100644 tools/ci/gitlab-ci/cache.yml
create mode 100755 tools/ci/gitlab-ci/ci-scripts/build-docs.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/build-kernel.sh
create mode 100644 tools/ci/gitlab-ci/ci-scripts/ici-functions.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/install-smatch.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/parse_commit_message.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/run-checkpatch.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/run-smatch.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/run-virtme.sh
create mode 100755 tools/ci/gitlab-ci/ci-scripts/test-boot.sh
create mode 100644 tools/ci/gitlab-ci/container.yml
create mode 100644 tools/ci/gitlab-ci/docker-compose.yaml
create mode 100644 tools/ci/gitlab-ci/gitlab-ci.yml
create mode 100644 tools/ci/gitlab-ci/scenarios.yml
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/build-deqp-runner.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/build-igt.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/build-rust.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/create-cross-file.sh
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/drm.yml
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/prepare-container.sh
create mode 100755 tools/ci/gitlab-ci/scenarios/drm/run-igt.sh
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/test.yml
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/xfails/vkms-none-fails.txt
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/xfails/vkms-none-flakes.txt
create mode 100644 tools/ci/gitlab-ci/scenarios/drm/xfails/vkms-none-skips.txt
create mode 100644 tools/ci/gitlab-ci/scenarios/file-systems/file-systems.yml
create mode 100644 tools/ci/gitlab-ci/scenarios/media/media.yml
create mode 100644 tools/ci/gitlab-ci/scenarios/network/network.yml
create mode 100644 tools/ci/gitlab-ci/static-checks.yml
create mode 100644 tools/ci/gitlab-ci/test.yml
--
2.43.0
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 Add an helper to generate unique names
- PATCH 2 to 9 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 10 adds test cases for 'classic' bpf_redirect_map()
- PATCH 11 and 12 cover the broadcast flags
- PATCH 13 covers the XDP egress programs
- PATCH 14 removes test_xdp_redirect_multi.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v3:
- Add append_tid() helper and use unique names to allow parallel testing
- Check create_network()'s return value through ASSERT_OK()
- Remove check_ping() and unused defines
- Change next_veth type (from string to int)
- Link to v2: https://lore.kernel.org/r/20250121-redirect-multi-v2-0-fc9cacabc6b2@bootlin…
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (14):
selftests/bpf: helpers: Add append_tid()
selftests/bpf: test_xdp_veth: Remove unused defines
selftests/bpf: test_xdp_veth: Remove unecessarry check_ping()
selftests/bpf: test_xdp_veth: Use int to describe next veth
selftests/bpf: test_xdp_veth: Split network configuration
selftests/bpf: test_xdp_veth: Rename config[]
selftests/bpf: test_xdp_veth: Add prog_config[] table
selftests/bpf: test_xdp_veth: Add XDP flags to prog_configuration
selftests/bpf: test_xdp_veth: Use unique names
selftests/bpf: test_xdp_veth: Add new test cases for XDP flags
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
tools/testing/selftests/bpf/network_helpers.c | 11 +
tools/testing/selftests/bpf/network_helpers.h | 10 +
.../selftests/bpf/prog_tests/test_xdp_veth.c | 589 ++++++++++++++++-----
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 89 ++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 --------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 --------
8 files changed, 608 insertions(+), 574 deletions(-)
---
base-commit: e65d1a6e096af0648ffdbc9d7e8bd7a44dcc700b
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
The upcoming new Idle HLT Intercept feature allows for the HLT
instruction execution by a vCPU to be intercepted by the hypervisor
only if there are no pending V_INTR and V_NMI events for the vCPU.
When the vCPU is expected to service the pending V_INTR and V_NMI
events, the Idle HLT intercept won’t trigger. The feature allows the
hypervisor to determine if the vCPU is actually idle and reduces
wasteful VMEXITs.
The Idle HLT intercept feature is used for enlightened guests who wish
to securely handle the events. When an enlightened guest does a HLT
while an interrupt is pending, hypervisor will not have a way to
figure out whether the guest needs to be re-entered or not. The Idle
HLT intercept feature allows the HLT execution only if there are no
pending V_INTR and V_NMI events.
Presence of the Idle HLT Intercept feature is indicated via CPUID
function Fn8000_000A_EDX[30].
Document for the Idle HLT intercept feature is available at [1].
This series is based on kvm-x86/next (13e98294d7ce) + [2] + [3].
Testing Done:
- Tested the functionality for the Idle HLT intercept feature
using selftest ipi_hlt_test.
- Tested on normal, SEV, SEV-ES, SEV-SNP guest for the Idle HLT intercept
functionality.
- Tested the Idle HLT intercept functionality on nested guest.
v4 -> v5
- Incorporated Sean's review comments on nested Idle HLT intercept support.
- Make svm_idle_hlt_test independent of the Idle HLT to run on all hardware.
v3 -> v4
- Drop the patches to add vcpu_get_stat() into a new series [2].
- Added nested Idle HLT intercept support.
v2 -> v3
- Incorporated Andrew's suggestion to structure vcpu_stat_types in
a way that each architecture can share the generic types and also
provide its own.
v1 -> v2
- Done changes in svm_idle_hlt_test based on the review comments from Sean.
- Added an enum based approach to get binary stats in vcpu_get_stat() which
doesn't use string to get stat data based on the comments from Sean.
- Added safe_halt() and cli() helpers based on the comments from Sean.
[1]: AMD64 Architecture Programmer's Manual Pub. 24593, April 2024,
Vol 2, 15.9 Instruction Intercepts (Table 15-7: IDLE_HLT).
https://bugzilla.kernel.org/attachment.cgi?id=306250
[2]: https://lore.kernel.org/kvm/20241220013906.3518334-1-seanjc@google.com/T/#u
[3]: https://lore.kernel.org/kvm/20241220012617.3513898-1-seanjc@google.com/T/#u
---
V4: https://lore.kernel.org/kvm/20241022054810.23369-1-manali.shukla@amd.com/
V3: https://lore.kernel.org/kvm/20240528041926.3989-4-manali.shukla@amd.com/T/
V2: https://lore.kernel.org/kvm/20240501145433.4070-1-manali.shukla@amd.com/
V1: https://lore.kernel.org/kvm/20240307054623.13632-1-manali.shukla@amd.com/
Manali Shukla (3):
x86/cpufeatures: Add CPUID feature bit for Idle HLT intercept
KVM: SVM: Add Idle HLT intercept support
KVM: selftests: Add self IPI HLT test
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/svm.h | 1 +
arch/x86/include/uapi/asm/svm.h | 2 +
arch/x86/kvm/svm/svm.c | 13 ++-
tools/testing/selftests/kvm/Makefile.kvm | 1 +
.../selftests/kvm/include/x86/processor.h | 1 +
tools/testing/selftests/kvm/ipi_hlt_test.c | 85 +++++++++++++++++++
7 files changed, 101 insertions(+), 3 deletions(-)
create mode 100644 tools/testing/selftests/kvm/ipi_hlt_test.c
base-commit: 13e98294d7cec978e31138d16824f50556a62d17
prerequisite-patch-id: cb345fc0d814a351df2b5788b76eee0eef9de549
prerequisite-patch-id: 71806f400cffe09f47d6231cb072cbdbd540de1b
prerequisite-patch-id: 9ea0412aab7ecd8555fcee3e9609dbfe8456d47b
prerequisite-patch-id: 3504df50cdd33958456f2e56139d76867273525c
prerequisite-patch-id: 674e56729a56cc487cb85be1a64ef561eb7bac8a
prerequisite-patch-id: 48e87354f9d6e6bd121ca32ab73cd0d7f1dce74f
prerequisite-patch-id: 74daffd7677992995f37e5a5cb784b8d4357e342
prerequisite-patch-id: 509018dc2fc1657debc641544e86f5a92d04bc1a
prerequisite-patch-id: 4a50c6a4dc3b3c8c8c640a86072faafb7bae4384
--
2.34.1
From: Kevin Brodsky <kevin.brodsky(a)arm.com>
[ Upstream commit 46036188ea1f5266df23a6149dea0df1c77cd1c7 ]
The mm kselftests are currently built with no optimisation (-O0). It's
unclear why, and besides being obviously suboptimal, this also prevents
the pkeys tests from working as intended. Let's build all the tests with
-O2.
[kevin.brodsky(a)arm.com: silence unused-result warnings]
Link: https://lkml.kernel.org/r/20250107170110.2819685-1-kevin.brodsky@arm.com
Link: https://lkml.kernel.org/r/20241209095019.1732120-6-kevin.brodsky@arm.com
Signed-off-by: Kevin Brodsky <kevin.brodsky(a)arm.com>
Cc: Aruna Ramakrishna <aruna.ramakrishna(a)oracle.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Joey Gouly <joey.gouly(a)arm.com>
Cc: Keith Lucas <keith.lucas(a)oracle.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 46036188ea1f5266df23a6149dea0df1c77cd1c7)
[Yifei: This commit also fix the failure of pkey_sighandler_tests_64,
which is also in linux-6.12.y, thus backport this commit]
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
tools/testing/selftests/mm/Makefile | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 02e1204971b0..c0138cb19705 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -33,9 +33,16 @@ endif
# LDLIBS.
MAKEFLAGS += --no-builtin-rules
-CFLAGS = -Wall -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
+CFLAGS = -Wall -O2 -I $(top_srcdir) $(EXTRA_CFLAGS) $(KHDR_INCLUDES) $(TOOLS_INCLUDES)
LDLIBS = -lrt -lpthread -lm
+# Some distributions (such as Ubuntu) configure GCC so that _FORTIFY_SOURCE is
+# automatically enabled at -O1 or above. This triggers various unused-result
+# warnings where functions such as read() or write() are called and their
+# return value is not checked. Disable _FORTIFY_SOURCE to silence those
+# warnings.
+CFLAGS += -U_FORTIFY_SOURCE
+
TEST_GEN_FILES = cow
TEST_GEN_FILES += compaction_test
TEST_GEN_FILES += gup_longterm
--
2.46.0
Package build environments like Fedora rpmbuild introduced hardening
options (e.g. -pie -Wl,-z,now) by passing a -spec option to CFLAGS
and LDFLAGS.
Some Makefiles currently override CFLAGS but not LDFLAGS, which leads
to a mismatch and build failure, for example:
/usr/bin/ld: /tmp/ccd2apay.o: relocation R_X86_64_32 against
`.rodata.str1.1' can not be used when making a PIE object; recompile with -fPIE
/usr/bin/ld: failed to set dynamic section sizes: bad value
collect2: error: ld returned 1 exit status
make[1]: *** [../../lib.mk:222: tools/testing/selftests/net/lib/csum] Error 1
openvswitch/Makefile CFLAGS currently do not appear to be used, but
fix it anyway for the case when new tests are introduced in future.
Fixes: 1d0dc857b5d8 ("selftests: drv-net: add checksum tests")
Signed-off-by: Jan Stancek <jstancek(a)redhat.com>
---
tools/testing/selftests/net/lib/Makefile | 2 +-
tools/testing/selftests/net/openvswitch/Makefile | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/net/lib/Makefile b/tools/testing/selftests/net/lib/Makefile
index 18b9443454a9..578de40cc5e3 100644
--- a/tools/testing/selftests/net/lib/Makefile
+++ b/tools/testing/selftests/net/lib/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
-CFLAGS = -Wall -Wl,--no-as-needed -O2 -g
+CFLAGS += -Wall -Wl,--no-as-needed -O2 -g
CFLAGS += -I../../../../../usr/include/ $(KHDR_INCLUDES)
# Additional include paths needed by kselftest.h
CFLAGS += -I../../
diff --git a/tools/testing/selftests/net/openvswitch/Makefile b/tools/testing/selftests/net/openvswitch/Makefile
index 2f1508abc826..1567a549ba34 100644
--- a/tools/testing/selftests/net/openvswitch/Makefile
+++ b/tools/testing/selftests/net/openvswitch/Makefile
@@ -2,7 +2,7 @@
top_srcdir = ../../../../..
-CFLAGS = -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
+CFLAGS += -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include $(KHDR_INCLUDES)
TEST_PROGS := openvswitch.sh
--
2.43.0
This series expands the XDP TX metadata framework to allow user
applications to pass per packet 64-bit launch time directly to the kernel
driver, requesting launch time hardware offload support. The XDP TX
metadata framework will not perform any clock conversion or packet
reordering.
Please note that the role of Tx metadata is just to pass the launch time,
not to enable the offload feature. Users will need to enable the launch
time hardware offload feature of the device by using the respective
command, such as the tc-etf command.
Although some devices use the tc-etf command to enable their launch time
hardware offload feature, xsk packets will not go through the etf qdisc.
Therefore, in my opinion, the launch time should always be based on the PTP
Hardware Clock (PHC). Thus, i did not include a clock ID to indicate the
clock source.
To simplify the test steps, I modified the xdp_hw_metadata bpf self-test
tool in such a way that it will set the launch time based on the offset
provided by the user and the value of the Receive Hardware Timestamp, which
is against the PHC. This will eliminate the need to discipline System Clock
with the PHC and then use clock_gettime() to get the time.
Please note that AF_XDP lacks a feedback mechanism to inform the
application if the requested launch time is invalid. So, users are expected
to familiar with the horizon of the launch time of the device they use and
not request a launch time that is beyond the horizon. Otherwise, the driver
might interpret the launch time incorrectly and react wrongly. For stmmac
and igc, where modulo computation is used, a launch time larger than the
horizon will cause the device to transmit the packet earlier that the
requested launch time.
Although there is no feedback mechanism for the launch time request
for now, user still can check whether the requested launch time is
working or not, by requesting the Transmit Completion Hardware Timestamp.
Changes since v1:
- renamed to use Earliest TxTime First (Willem)
- renamed to use txtime (Willem)
Changes since v2:
- renamed to use launch time (Jesper & Willem)
- changed the default launch time in xdp_hw_metadata apps from 1s to 0.1s
because some NICs do not support such a large future time.
Changes since v3:
- added XDP launch time support to the igc driver (Jesper & Florian)
- added per-driver launch time limitation on xsk-tx-metadata.rst (Jesper)
- added explanation on FIFO behavior on xsk-tx-metadata.rst (Jakub)
- added step to enable launch time in the commit message (Jesper & Willem)
- explicitly documented the type of launch_time and which clock source
it is against (Willem)
Changes since v4:
- change netdev feature name from tx-launch-time to tx-launch-time-fifo
to explicitly state the FIFO behaviour (Stanislav)
- improve the looping of xdp_hw_metadata app to wait for packet tx
completion to be more readable by using clock_gettime() (Stanislav)
- add launch time setup steps into xdp_hw_metadata app (Stanislav)
Changes since v5:
- fix selftest build errors by using asprintf() and realloc() instead of
managing the buffer sizes manually (Daniel, Stanislav)
v1: https://patchwork.kernel.org/project/netdevbpf/cover/20231130162028.852006-…
v2: https://patchwork.kernel.org/project/netdevbpf/cover/20231201062421.1074768…
v3: https://patchwork.kernel.org/project/netdevbpf/cover/20231203165129.1740512…
v4: https://patchwork.kernel.org/project/netdevbpf/cover/20250106135506.9687-1-…
v5: https://patchwork.kernel.org/project/netdevbpf/cover/20250114152718.120588-…
Song Yoong Siang (4):
xsk: Add launch time hardware offload support to XDP Tx metadata
selftests/bpf: Add launch time request to xdp_hw_metadata
net: stmmac: Add launch time support to XDP ZC
igc: Add launch time support to XDP ZC
Documentation/netlink/specs/netdev.yaml | 4 +
Documentation/networking/xsk-tx-metadata.rst | 62 +++++++
drivers/net/ethernet/intel/igc/igc_main.c | 78 +++++---
drivers/net/ethernet/stmicro/stmmac/stmmac.h | 2 +
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 13 ++
include/net/xdp_sock.h | 10 ++
include/net/xdp_sock_drv.h | 1 +
include/uapi/linux/if_xdp.h | 10 ++
include/uapi/linux/netdev.h | 3 +
net/core/netdev-genl.c | 2 +
net/xdp/xsk.c | 3 +
tools/include/uapi/linux/if_xdp.h | 10 ++
tools/include/uapi/linux/netdev.h | 3 +
tools/testing/selftests/bpf/xdp_hw_metadata.c | 168 +++++++++++++++++-
14 files changed, 342 insertions(+), 27 deletions(-)
--
2.34.1
Enabling a (modular) test should not silently enable additional kernel
functionality, as that may increase the attack vector of a product.
Fix this by making FW_CS_DSP_KUNIT_TEST (and FW_CS_DSP_KUNIT_TEST_UTILS)
depend on REGMAP instead of selecting it.
After this, one can safely enable CONFIG_KUNIT_ALL_TESTS=m to build
modules for all appropriate tests for ones system, without pulling in
extra unwanted functionality, while still allowing a tester to manually
enable REGMAP_BUILD and this test suite on a system where REGMAP is not
enabled by default.
Fixes: dd0b6b1f29b92202 ("firmware: cs_dsp: Add KUnit testing of bin file download")
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
See also commits 70a640c0efa76674 ("regmap: REGMAP_KUNIT should not
select REGMAP") and 47ee108a113c72e ("regmap: Provide user selectable
option to enable regmap").
BTW, what's the point in having separate FW_CS_DSP_KUNIT_TEST and
FW_CS_DSP_KUNIT_TEST_UTILS symbols? They are always enabled or disabled
together.
---
drivers/firmware/cirrus/Kconfig | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/cirrus/Kconfig b/drivers/firmware/cirrus/Kconfig
index ee09269c63b51173..0a883091259a2c11 100644
--- a/drivers/firmware/cirrus/Kconfig
+++ b/drivers/firmware/cirrus/Kconfig
@@ -6,15 +6,13 @@ config FW_CS_DSP
config FW_CS_DSP_KUNIT_TEST_UTILS
tristate
- depends on KUNIT
- select REGMAP
+ depends on KUNIT && REGMAP
select FW_CS_DSP
config FW_CS_DSP_KUNIT_TEST
tristate "KUnit tests for Cirrus Logic cs_dsp" if !KUNIT_ALL_TESTS
- depends on KUNIT
+ depends on KUNIT && REGMAP
default KUNIT_ALL_TESTS
- select REGMAP
select FW_CS_DSP
select FW_CS_DSP_KUNIT_TEST_UTILS
help
--
2.43.0
Create a dedicated .gitignore for the tpm2 tests.
Move tpm2 related entries from parent directory's .gitignore.
Signed-off-by: Khaled Elnaggar <khaledelnaggarlinux(a)gmail.com>
---
Hello, as per Shuah's review, instead of adding another entry at
selftests/.gitignore, I created the dedicated .gitignore for
tpm2 tests.
Aside: CCing linux-kernel-mentees as I am working on the mentorship
application tasks.
Thanks
Changes in v2:
- Created a dedicated .gitignore
---
tools/testing/selftests/.gitignore | 1 -
tools/testing/selftests/tpm2/.gitignore | 4 ++++
2 files changed, 4 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/tpm2/.gitignore
diff --git a/tools/testing/selftests/.gitignore b/tools/testing/selftests/.gitignore
index cb24124ac5b9..674aaa02e396 100644
--- a/tools/testing/selftests/.gitignore
+++ b/tools/testing/selftests/.gitignore
@@ -4,7 +4,6 @@ gpiogpio-hammer
gpioinclude/
gpiolsgpio
kselftest_install/
-tpm2/SpaceTest.log
# Python bytecode and cache
__pycache__/
diff --git a/tools/testing/selftests/tpm2/.gitignore b/tools/testing/selftests/tpm2/.gitignore
new file mode 100644
index 000000000000..910bbdbb336a
--- /dev/null
+++ b/tools/testing/selftests/tpm2/.gitignore
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0-only
+AsyncTest.log
+SpaceTest.log
+
--
2.45.2
Changes in v6:
- Move la57 check to using mmap().
- Merge kernel_has_lam() and cpu_has_lam() into lam_is_available() since
the syscall (if CONFIG_ADDRESS_MASKING is set) and cpuid check
provides the same information.
Recent change in how get_user() handles pointers [1] has a specific case
for LAM. It assigns a different bitmask that's later used to check
whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel)
it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information
instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses
get_user() in its implementation. Execute the syscall with differently
tagged pointers to verify that valid user pointers are passing through
and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip
running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The
test was ran without issues with both the LAM lines from [1] untouched
and commented out. The test was also ran without issues with LAM_SUP
both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics
on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundati…
[2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@lin…
Previous series entries:
[v1] https://lore.kernel.org/all/20241029141421.715686-1-maciej.wieczor-retman@i…
[v2] https://lore.kernel.org/all/20241121090651.254054-1-maciej.wieczor-retman@i…
[v3] https://lore.kernel.org/all/20241122085521.270802-1-maciej.wieczor-retman@i…
[v4] https://lore.kernel.org/all/cover.1732627541.git.maciej.wieczor-retman@inte…
[v5] https://lore.kernel.org/all/cover.1732728879.git.maciej.wieczor-retman@inte…
Maciej Wieczor-Retman (3):
selftests/lam: Move cpu_has_la57() to use cpuinfo flag
selftests/lam: Skip test if LAM is disabled
selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 147 +++++++++++++++++++++++++++---
1 file changed, 136 insertions(+), 11 deletions(-)
--
2.47.1
From: Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
[ Upstream commit cbd8730aea8d79cda6b0f3c18b406dfdef0c1b80 ]
The verifier log when leaking resources on BPF_EXIT may be a bit
confusing, as it's a problem only when finally existing from the main
prog, not from any of the subprogs. Hence, update the verifier error
string and the corresponding selftests matching on it.
Acked-by: Eduard Zingerman <eddyz87(a)gmail.com>
Suggested-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
Link: https://lore.kernel.org/r/20241204030400.208005-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/bpf/verifier.c | 2 +-
.../testing/selftests/bpf/progs/exceptions_fail.c | 4 ++--
tools/testing/selftests/bpf/progs/preempt_lock.c | 14 +++++++-------
.../selftests/bpf/progs/verifier_spin_lock.c | 2 +-
4 files changed, 11 insertions(+), 11 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 77f56674aaa99..4f02345b764fd 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -18803,7 +18803,7 @@ static int do_check(struct bpf_verifier_env *env)
* match caller reference state when it exits.
*/
err = check_resource_leak(env, exception_exit, !env->cur_state->curframe,
- "BPF_EXIT instruction");
+ "BPF_EXIT instruction in main prog");
if (err)
return err;
diff --git a/tools/testing/selftests/bpf/progs/exceptions_fail.c b/tools/testing/selftests/bpf/progs/exceptions_fail.c
index fe0f3fa5aab68..8a0fdff899271 100644
--- a/tools/testing/selftests/bpf/progs/exceptions_fail.c
+++ b/tools/testing/selftests/bpf/progs/exceptions_fail.c
@@ -131,7 +131,7 @@ int reject_subprog_with_lock(void *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_rcu_read_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_rcu_read_lock-ed region")
int reject_with_rcu_read_lock(void *ctx)
{
bpf_rcu_read_lock();
@@ -147,7 +147,7 @@ __noinline static int throwing_subprog(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_rcu_read_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_rcu_read_lock-ed region")
int reject_subprog_with_rcu_read_lock(void *ctx)
{
bpf_rcu_read_lock();
diff --git a/tools/testing/selftests/bpf/progs/preempt_lock.c b/tools/testing/selftests/bpf/progs/preempt_lock.c
index 885377e836077..5269571cf7b57 100644
--- a/tools/testing/selftests/bpf/progs/preempt_lock.c
+++ b/tools/testing/selftests/bpf/progs/preempt_lock.c
@@ -6,7 +6,7 @@
#include "bpf_experimental.h"
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_1(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -14,7 +14,7 @@ int preempt_lock_missing_1(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_2(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -23,7 +23,7 @@ int preempt_lock_missing_2(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_3(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -33,7 +33,7 @@ int preempt_lock_missing_3(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_3_minus_2(struct __sk_buff *ctx)
{
bpf_preempt_disable();
@@ -55,7 +55,7 @@ static __noinline void preempt_enable(void)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_1_subprog(struct __sk_buff *ctx)
{
preempt_disable();
@@ -63,7 +63,7 @@ int preempt_lock_missing_1_subprog(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_2_subprog(struct __sk_buff *ctx)
{
preempt_disable();
@@ -72,7 +72,7 @@ int preempt_lock_missing_2_subprog(struct __sk_buff *ctx)
}
SEC("?tc")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_preempt_disable-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_preempt_disable-ed region")
int preempt_lock_missing_2_minus_1_subprog(struct __sk_buff *ctx)
{
preempt_disable();
diff --git a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
index 3f679de73229f..25599eac9a702 100644
--- a/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
+++ b/tools/testing/selftests/bpf/progs/verifier_spin_lock.c
@@ -187,7 +187,7 @@ l0_%=: r6 = r0; \
SEC("cgroup/skb")
__description("spin_lock: test6 missing unlock")
-__failure __msg("BPF_EXIT instruction cannot be used inside bpf_spin_lock-ed region")
+__failure __msg("BPF_EXIT instruction in main prog cannot be used inside bpf_spin_lock-ed region")
__failure_unpriv __msg_unpriv("")
__naked void spin_lock_test6_missing_unlock(void)
{
--
2.39.5
The fixed commit adds NETIF_F_GSO_ESP bit for bonding gso_partial_features.
However, if we don't set the dev NETIF_F_GSO_PARTIAL bit, the later
netdev_change_features() -> netdev_fix_features() will remove the
NETIF_F_GSO_ESP bit from the dev features. This causes ethtool to show
that the bond does not support tx-esp-segmentation. For example
# ethtool -k bond0 | grep esp
tx-esp-segmentation: off [requested on]
esp-hw-offload: on
esp-tx-csum-hw-offload: on
Add the NETIF_F_GSO_PARTIAL bit to bond dev features when set
gso_partial_features to fix this issue.
Fixes: 4861333b4217 ("bonding: add ESP offload features when slaves support")
Reported-by: Liang Li <liali(a)redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
v2: remove NETIF_F_GSO_PARTIAL bit if not set gso_partial_features.
The issue is reported internally, so there is no Closes tag.
BTW, I saw some drivers set NETIF_F_GSO_PARTIAL on dev->features. Some
other drivers set NETIF_F_GSO_PARTIAL on dev->hw_enc_features. I haven't
see a doc about where we should set. So I just set it on dev->features.
---
drivers/net/bonding/bond_main.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 7b78c2bada81..09d5a8433d86 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1598,10 +1598,13 @@ static void bond_compute_features(struct bonding *bond)
}
bond_dev->hard_header_len = max_hard_header_len;
- if (gso_partial_features & NETIF_F_GSO_ESP)
+ if (gso_partial_features & NETIF_F_GSO_ESP) {
bond_dev->gso_partial_features |= NETIF_F_GSO_ESP;
- else
+ bond_dev->features |= NETIF_F_GSO_PARTIAL;
+ } else {
bond_dev->gso_partial_features &= ~NETIF_F_GSO_ESP;
+ bond_dev->features &= ~NETIF_F_GSO_PARTIAL;
+ }
done:
bond_dev->vlan_features = vlan_features;
--
2.39.5 (Apple Git-154)
Recent change in how get_user() handles pointers [1] has a specific case
for LAM. It assigns a different bitmask that's later used to check
whether a pointer comes from userland in get_user().
While currently commented out (until LASS [2] is merged into the kernel)
it's worth making changes to the LAM selftest ahead of time.
Modify cpu_has_la57() so it provides current paging level information
instead of the cpuid one.
Add test case to LAM that utilizes a ioctl (FIOASYNC) syscall which uses
get_user() in its implementation. Execute the syscall with differently
tagged pointers to verify that valid user pointers are passing through
and invalid kernel/non-canonical pointers are not.
Also to avoid unhelpful test failures add a check in main() to skip
running tests if LAM was not compiled into the kernel.
Code was tested on a Sierra Forest Xeon machine that's LAM capable. The
test was ran without issues with both the LAM lines from [1] untouched
and commented out. The test was also ran without issues with LAM_SUP
both enabled and disabled.
4/5 level pagetables code paths were also successfully tested in Simics
on a 5-level capable machine.
[1] https://lore.kernel.org/all/20241024013214.129639-1-torvalds@linux-foundati…
[2] https://lore.kernel.org/all/20241028160917.1380714-1-alexander.shishkin@lin…
Maciej Wieczor-Retman (3):
selftests/lam: Move cpu_has_la57() to use cpuinfo flag
selftests/lam: Skip test if LAM is disabled
selftests/lam: Test get_user() LAM pointer handling
tools/testing/selftests/x86/lam.c | 120 ++++++++++++++++++++++++++++--
1 file changed, 115 insertions(+), 5 deletions(-)
--
2.47.1
The current implementation of netconsole sends all log messages in
parallel, which can lead to an intermixed and interleaved output on the
receiving side. This makes it challenging to demultiplex the messages
and attribute them to their originating CPUs.
As a result, users and developers often struggle to effectively analyze
and debug the parallel log output received through netconsole.
Example of a message got from produciton hosts:
------------[ cut here ]------------
------------[ cut here ]------------
refcount_t: saturated; leaking memory.
WARNING: CPU: 2 PID: 1613668 at lib/refcount.c:22 refcount_warn_saturate+0x5e/0xe0
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 26 PID: 4139916 at lib/refcount.c:25 refcount_warn_saturate+0x7d/0xe0
Modules linked in: bpf_preload(E) vhost_net(E) tun(E) vhost(E)
This series of patches introduces a new feature to the netconsole
subsystem that allows the automatic population of the CPU number in the
userdata field for each log message. This enhancement provides several
benefits:
* Improved demultiplexing of parallel log output: When multiple CPUs are
sending messages concurrently, the added CPU number in the userdata
makes it easier to differentiate and attribute the messages to their
originating CPUs.
* Better visibility into message sources: The CPU number information
gives users and developers more insight into which specific CPU a
particular log message came from, which can be valuable for debugging
and analysis.
The changes in this series are as follows:
Patch 1: netconsole: Rename userdata to extradata
=================================================
Create the a concept of extradata, which encompasses the concept of
userdata and the upcoming sysdatao
Sysdata is a new concept being added, which is basically fields that are
populated by the kernel. At this time only the CPU#, but, there is a
desire to add current task name, kernel release version, etc.
Patch 2: netconsole: Helper to count number of used entries
===========================================================
Create a simple helper to count number of entries in extradata. I am
separating this in a function since it will need to count userdata and
sysdata. For instance, when the user adds an extra userdata, we need to
check if there is space, counting the previous data entries (from
userdata and cpu data)
Patch 3: netconsole: add support for sysdata and CPU population
===============================================================
This is the core patch. Basically add a new option to enable automatic
CPU number population in the netconsole userdata Provides a new "cpu_nr"
sysfs attribute to control this feature
Patch 4: "netconsole: selftest: test CPU number auto-population"
=============================================================
Expands the existing netconsole selftest to verify the CPU number
auto-population functionality Ensures the received netconsole messages
contain the expected "cpu=<CPU>" entry in the message. Test different
permutation with userdata
Patch 5: "netconsole: docs: Add documentation for CPU number auto-population"
=============================================================================
Updates the netconsole documentation to explain the new CPU number
auto-population feature Provides instructions on how to enable and use
the feature
I believe these changes will be a valuable addition to the netconsole
subsystem, enhancing its usefulness for kernel developers and users.
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changes in v2:
- Create the concept of extradata and sysdata. This will make the design
easier to understand, and the code easier to read.
* Basically extradata encompasses userdata and the new sysdata.
Userdata originates from user, and sysdata originates in kernel.
- Improved the test to send from a very specific CPU, which can be
checked to be correct on the other side, as suggested by Jakub.
- Fixed a bug where CPU # was populated at the wrong place
- Link to v1: https://lore.kernel.org/r/20241113-netcon_cpu-v1-0-d187bf7c0321@debian.org
---
Breno Leitao (5):
netconsole: Rename userdata to extradata
netconsole: Helper to count number of used entries
netconsole: add support for sysdata and CPU population
netconsole: selftest: test for sysdata CPU
netconsole: docs: Add documentation for CPU number auto-population
Documentation/networking/netconsole.rst | 45 +++++
drivers/net/netconsole.c | 223 ++++++++++++++++-----
tools/testing/selftests/drivers/net/Makefile | 1 +
.../selftests/drivers/net/lib/sh/lib_netcons.sh | 17 ++
.../selftests/drivers/net/netcons_sysdata.sh | 166 +++++++++++++++
5 files changed, 407 insertions(+), 45 deletions(-)
---
base-commit: 7b24f164cf005b9649138ef6de94aaac49c9f3d1
change-id: 20241108-netcon_cpu-ce3917e88f4b
Best regards,
--
Breno Leitao <leitao(a)debian.org>
From: "Mike Rapoport (Microsoft)" <rppt(a)kernel.org>
Hi,
Following Peter's comments [1] these patches rework handling of ROX caches
for module text allocations.
Instead of using a writable copy that really complicates alternatives
patching, temporarily remap parts of a large ROX page as RW for the time of
module formation and then restore it's ROX protections when the module is
ready.
To keep the ROX memory mapped with large pages, make set_memory_rox()
capable of restoring large pages (more details are in patch 3).
Since this is really about x86, I believe this should go in via tip tree.
The patches also available in git
https://git.kernel.org/rppt/h/execmem/x86-rox/v9
v2 changes:
* only collapse large mappings in set_memory_rox()
* simplify RW <-> ROX remapping
* don't remove ROX cache pages from the direct map (patch 4)
v1: https://lore.kernel.org/all/20241227072825.1288491-1-rppt@kernel.org
[1] https://lore.kernel.org/all/20241209083818.GK8562@noisy.programming.kicks-a…
Kirill A. Shutemov (1):
x86/mm/pat: restore large ROX pages after fragmentation
Mike Rapoport (Microsoft) (9):
x86/mm/pat: cpa-test: fix length for CPA_ARRAY test
x86/mm/pat: drop duplicate variable in cpa_flush()
execmem: don't remove ROX cache from the direct map
execmem: add API for temporal remapping as RW and restoring ROX afterwards
module: introduce MODULE_STATE_GONE
module: switch to execmem API for remapping as RW and restoring ROX
Revert "x86/module: prepare module loading for ROX allocations of text"
module: drop unused module_writable_address()
x86: re-enable EXECMEM_ROX support
arch/um/kernel/um_arch.c | 11 +-
arch/x86/Kconfig | 1 +
arch/x86/entry/vdso/vma.c | 3 +-
arch/x86/include/asm/alternative.h | 14 +-
arch/x86/include/asm/pgtable_types.h | 2 +
arch/x86/kernel/alternative.c | 181 ++++++--------
arch/x86/kernel/ftrace.c | 30 ++-
arch/x86/kernel/module.c | 45 ++--
arch/x86/mm/pat/cpa-test.c | 2 +-
arch/x86/mm/pat/set_memory.c | 220 +++++++++++++++++-
include/linux/execmem.h | 31 +++
include/linux/module.h | 22 +-
include/linux/moduleloader.h | 4 -
include/linux/vm_event_item.h | 2 +
kernel/module/kallsyms.c | 8 +-
kernel/module/kdb.c | 2 +-
kernel/module/main.c | 86 ++-----
kernel/module/procfs.c | 2 +-
kernel/module/strict_rwx.c | 9 +-
kernel/tracepoint.c | 2 +
lib/kunit/test.c | 2 +
mm/execmem.c | 39 ++--
mm/vmstat.c | 2 +
samples/livepatch/livepatch-callbacks-demo.c | 1 +
.../test_modules/test_klp_callbacks_demo.c | 1 +
.../test_modules/test_klp_callbacks_demo2.c | 1 +
.../livepatch/test_modules/test_klp_state.c | 1 +
.../livepatch/test_modules/test_klp_state2.c | 1 +
28 files changed, 442 insertions(+), 283 deletions(-)
base-commit: ffd294d346d185b70e28b1a28abe367bbfe53c04
--
2.45.2
Hi all,
This patch series continues the work to migrate the *.sh tests into
prog_tests framework.
test_xdp_redirect_multi.sh tests the XDP redirections done through
bpf_redirect_map().
This is already partly covered by test_xdp_veth.c that already tests
map redirections at XDP level. What isn't covered yet by test_xdp_veth is
the use of the broadcast flags (BPF_F_BROADCAST or BPF_F_EXCLUDE_INGRESS)
and XDP egress programs.
Hence, this patch series add test cases to test_xdp_veth.c to get rid of
the test_xdp_redirect_multi.sh:
- PATCH 1 to 5 rework test_xdp_veth to make it more generic and allow to
configure different test cases
- PATCH 6 adds test cases for 'classic' bpf_redirect_map()
- PATCH 7 & 8 covers the broadcast flags
- PATCH 9 covers the XDP egress programs
- PATCH 10 removes test_xdp_redirect_multi.sh
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
---
Changes in v2:
- Use serial_test_* to avoid conflict between tests
- Link to v1: https://lore.kernel.org/r/20250121-redirect-multi-v1-0-b215e35ff505@bootlin…
---
Bastien Curutchet (eBPF Foundation) (10):
selftests/bpf: test_xdp_veth: Split network configuration
selftests/bpf: Remove unused argument
selftests/bpf: test_xdp_veth: Rename config[]
selftests/bpf: test_xdp_veth: Add prog_config[] table
selftests/bpf: test_xdp_veth: Add XDP flags to prog_configuration
selftests/bpf: test_xdp_veth: Add new test cases for XDP flags
selftests/bpf: Optionally select broadcasting flags
selftests/bpf: test_xdp_veth: Add XDP broadcast redirection tests
selftests/bpf: test_xdp_veth: Add XDP program on egress test
selftests/bpf: Remove test_xdp_redirect_multi.sh
tools/testing/selftests/bpf/Makefile | 2 -
.../selftests/bpf/prog_tests/test_xdp_veth.c | 534 +++++++++++++++++----
.../testing/selftests/bpf/progs/xdp_redirect_map.c | 89 ++++
.../selftests/bpf/progs/xdp_redirect_multi_kern.c | 41 +-
.../selftests/bpf/test_xdp_redirect_multi.sh | 214 ---------
tools/testing/selftests/bpf/xdp_redirect_multi.c | 226 ---------
6 files changed, 553 insertions(+), 553 deletions(-)
---
base-commit: 349e0551b929b4712b4d6127f67dfa41ed48d5a2
change-id: 20250103-redirect-multi-245d6eafb5d1
Best regards,
--
Bastien Curutchet (eBPF Foundation) <bastien.curutchet(a)bootlin.com>
Hi All,
This series contains a fix for a warning emitted when a uffd-registered region,
which doesn't have UFFD_FEATURE_EVENT_REMAP, is mremap()ed. patch 1 describes
the problem and fixes it, and patch 2 adds a selftest to verify the fix.
Thanks to Mikołaj Lenczewski who originally created the patch, which I have
subsequently extended.
Applies on top of mm-unstable (f349e79bfbf3)
Thanks,
Ryan
Ryan Roberts (2):
mm: Clear uffd-wp PTE/PMD state on mremap()
selftests/mm: Introduce uffd-wp-mremap regression test
include/linux/userfaultfd_k.h | 12 +
mm/huge_memory.c | 12 +
mm/hugetlb.c | 14 +-
mm/mremap.c | 32 +-
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/run_vmtests.sh | 1 +
tools/testing/selftests/mm/uffd-wp-mremap.c | 380 ++++++++++++++++++++
8 files changed, 452 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/mm/uffd-wp-mremap.c
--
2.43.0