January 2024 - Linux-kselftest-mirror

[PATCH v12 00/20] KVM: xen: update shared_info and vcpu_info handling

by Paul Durrant

From: Paul Durrant <pdurrant(a)amazon.com> This series has one small fix to what was in v11 [1]: * KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set The v11 patch failed to set the return code of the ioctl if the mode was not actually changed, leading to a spurious failure. This version of the series also contains a new bug-fix to the pfncache code from David Woodhouse. [1] https://lore.kernel.org/kvm/20231219161109.1318-1-paul@xen.org/ David Woodhouse (1): KVM: pfncache: rework __kvm_gpc_refresh() to fix locking issues Paul Durrant (19): KVM: pfncache: Add a map helper function KVM: pfncache: remove unnecessary exports KVM: xen: mark guest pages dirty with the pfncache lock held KVM: pfncache: add a mark-dirty helper KVM: pfncache: remove KVM_GUEST_USES_PFN usage KVM: pfncache: stop open-coding offset_in_page() KVM: pfncache: include page offset in uhva and use it consistently KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA KVM: xen: separate initialization of shared_info cache and content KVM: xen: re-initialize shared_info if guest (32/64-bit) mode is set KVM: xen: allow shared_info to be mapped by fixed HVA KVM: xen: allow vcpu_info to be mapped by fixed HVA KVM: selftests / xen: map shared_info using HVA rather than GFN KVM: selftests / xen: re-map vcpu_info using HVA rather than GPA KVM: xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability KVM: xen: split up kvm_xen_set_evtchn_fast() KVM: xen: don't block on pfncache locks in kvm_xen_set_evtchn_fast() KVM: pfncache: check the need for invalidation under read lock first KVM: xen: allow vcpu_info content to be 'safely' copied Documentation/virt/kvm/api.rst | 53 ++- arch/x86/kvm/x86.c | 7 +- arch/x86/kvm/xen.c | 356 +++++++++++------- include/linux/kvm_host.h | 40 +- include/linux/kvm_types.h | 8 - include/uapi/linux/kvm.h | 9 +- .../selftests/kvm/x86_64/xen_shinfo_test.c | 59 ++- virt/kvm/pfncache.c | 340 +++++++++-------- 8 files changed, 535 insertions(+), 337 deletions(-) base-commit: 1c6d984f523f67ecfad1083bb04c55d91977bb15 -- 2.39.2

3 months, 2 weeks

4
57
0 0

[RFC PATCH net-next v5 00/14] Device Memory TCP

by Mina Almasry

Major changes in RFC v5: ------------------------ 1. Rebased on top of 'Abstract page from net stack' series and used the new netmem type to refer to LSB set pointers instead of re-using struct page. 2. Downgraded this series back to RFC and called it RFC v5. This is because this series is now dependent on 'Abstract page from net stack'[1] and the queue API. Both are removed from the series to pre-requisite work. 3. Reworked the page_pool devmem support to use netmem and for some more unified handling. 4. Reworked the reference counting of net_iov (renamed from page_pool_iov) to use pp_ref_count for refcounting. The full changes including the dependent series and GVE page pool support is here: https://github.com/mina/linux/commits/tcpdevmem-rfcv5/ [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810774 Major changes in v1: -------------------- 1. Implemented MVP queue API ndos to remove the userspace-visible driver reset. 2. Fixed issues in the napi_pp_put_page() devmem frag unref path. 3. Removed RFC tag. Many smaller addressed comments across all the patches (patches have individual change log). Full tree including the rest of the GVE driver changes: https://github.com/mina/linux/commits/tcpdevmem-v1 Changes in RFC v3: ------------------ 1. Pulled in the memory-provider dependency from Jakub's RFC[1] to make the series reviewable and mergable. 2. Implemented multi-rx-queue binding which was a todo in v2. 3. Fix to cmsg handling. The sticking point in RFC v2[2] was the device reset required to refill the device rx-queues after the dmabuf bind/unbind. The solution suggested as I understand is a subset of the per-queue management ops Jakub suggested or similar: https://lore.kernel.org/netdev/20230815171638.4c057dcd@kernel.org/ This is not addressed in this revision, because: 1. This point was discussed at netconf & netdev and there is openness to using the current approach of requiring a device reset. 2. Implementing individual queue resetting seems to be difficult for my test bed with GVE. My prototype to test this ran into issues with the rx-queues not coming back up properly if reset individually. At the moment I'm unsure if it's a mistake in the POC or a genuine issue in the virtualization stack behind GVE, which currently doesn't test individual rx-queue restart. 3. Our usecases are not bothered by requiring a device reset to refill the buffer queues, and we'd like to support NICs that run into this limitation with resetting individual queues. My thought is that drivers that have trouble with per-queue configs can use the support in this series, while drivers that support new netdev ops to reset individual queues can automatically reset the queue as part of the dma-buf bind/unbind. The same approach with device resets is presented again for consideration with other sticking points addressed. This proposal includes the rx devmem path only proposed for merge. For a snapshot of my entire tree which includes the GVE POC page pool support & device memory support: https://github.com/torvalds/linux/compare/master...mina:linux:tcpdevmem-v3 [1] https://lore.kernel.org/netdev/f8270765-a27b-6ccf-33ea-cda097168d79@redhat.… [2] https://lore.kernel.org/netdev/CAHS8izOVJGJH5WF68OsRWFKJid1_huzzUK+hpKbLcL4… Changes in RFC v2: ------------------ The sticking point in RFC v1[1] was the dma-buf pages approach we used to deliver the device memory to the TCP stack. RFC v2 is a proof-of-concept that attempts to resolve this by implementing scatterlist support in the networking stack, such that we can import the dma-buf scatterlist directly. This is the approach proposed at a high level here[2]. Detailed changes: 1. Replaced dma-buf pages approach with importing scatterlist into the page pool. 2. Replace the dma-buf pages centric API with a netlink API. 3. Removed the TX path implementation - there is no issue with implementing the TX path with scatterlist approach, but leaving out the TX path makes it easier to review. 4. Functionality is tested with this proposal, but I have not conducted perf testing yet. I'm not sure there are regressions, but I removed perf claims from the cover letter until they can be re-confirmed. 5. Added Signed-off-by: contributors to the implementation. 6. Fixed some bugs with the RX path since RFC v1. Any feedback welcome, but specifically the biggest pending questions needing feedback IMO are: 1. Feedback on the scatterlist-based approach in general. 2. Netlink API (Patch 1 & 2). 3. Approach to handle all the drivers that expect to receive pages from the page pool (Patch 6). [1] https://lore.kernel.org/netdev/dfe4bae7-13a0-3c5d-d671-f61b375cb0b4@gmail.c… [2] https://lore.kernel.org/netdev/CAHS8izPm6XRS54LdCDZVd0C75tA1zHSu6jLVO8nzTLX… ---------------------- * TL;DR: Device memory TCP (devmem TCP) is a proposal for transferring data to and/or from device memory efficiently, without bouncing the data to a host memory buffer. * Problem: A large amount of data transfers have device memory as the source and/or destination. Accelerators drastically increased the volume of such transfers. Some examples include: - ML accelerators transferring large amounts of training data from storage into GPU/TPU memory. In some cases ML training setup time can be as long as 50% of TPU compute time, improving data transfer throughput & efficiency can help improving GPU/TPU utilization. - Distributed training, where ML accelerators, such as GPUs on different hosts, exchange data among them. - Distributed raw block storage applications transfer large amounts of data with remote SSDs, much of this data does not require host processing. Today, the majority of the Device-to-Device data transfers the network are implemented as the following low level operations: Device-to-Host copy, Host-to-Host network transfer, and Host-to-Device copy. The implementation is suboptimal, especially for bulk data transfers, and can put significant strains on system resources, such as host memory bandwidth, PCIe bandwidth, etc. One important reason behind the current state is the kernel’s lack of semantics to express device to network transfers. * Proposal: In this patch series we attempt to optimize this use case by implementing socket APIs that enable the user to: 1. send device memory across the network directly, and 2. receive incoming network packets directly into device memory. Packet _payloads_ go directly from the NIC to device memory for receive and from device memory to NIC for transmit. Packet _headers_ go to/from host memory and are processed by the TCP/IP stack normally. The NIC _must_ support header split to achieve this. Advantages: - Alleviate host memory bandwidth pressure, compared to existing network-transfer + device-copy semantics. - Alleviate PCIe BW pressure, by limiting data transfer to the lowest level of the PCIe tree, compared to traditional path which sends data through the root complex. * Patch overview: ** Part 1: netlink API Gives user ability to bind dma-buf to an RX queue. ** Part 2: scatterlist support Currently the standard for device memory sharing is DMABUF, which doesn't generate struct pages. On the other hand, networking stack (skbs, drivers, and page pool) operate on pages. We have 2 options: 1. Generate struct pages for dmabuf device memory, or, 2. Modify the networking stack to process scatterlist. ** part 3: page pool support We piggy back on page pool memory providers proposal: https://github.com/kuba-moo/linux/tree/pp-providers It allows the page pool to define a memory provider that provides the page allocation and freeing. It helps abstract most of the device memory TCP changes from the driver. ** part 4: support for unreadable skb frags Page pool iovs are not accessible by the host; we implement changes throughput the networking stack to correctly handle skbs with unreadable frags. ** Part 5: recvmsg() APIs We define user APIs for the user to send and receive device memory. Not included with this series is the GVE devmem TCP support, just to simplify the review. Code available here if desired: https://github.com/mina/linux/tree/tcpdevmem This series is built on top of net-next with Jakub's pp-providers changes cherry-picked. * NIC dependencies: 1. (strict) Devmem TCP require the NIC to support header split, i.e. the capability to split incoming packets into a header + payload and to put each into a separate buffer. Devmem TCP works by using device memory for the packet payload, and host memory for the packet headers. 2. (optional) Devmem TCP works better with flow steering support & RSS support, i.e. the NIC's ability to steer flows into certain rx queues. This allows the sysadmin to enable devmem TCP on a subset of the rx queues, and steer devmem TCP traffic onto these queues and non devmem TCP elsewhere. The NIC I have access to with these properties is the GVE with DQO support running in Google Cloud, but any NIC that supports these features would suffice. I may be able to help reviewers bring up devmem TCP on their NICs. * Testing: The series includes a udmabuf kselftest that show a simple use case of devmem TCP and validates the entire data path end to end without a dependency on a specific dmabuf provider. ** Test Setup Kernel: net-next with this series and memory provider API cherry-picked locally. Hardware: Google Cloud A3 VMs. NIC: GVE with header split & RSS & flow steering support. Cc: Pavel Begunkov <asml.silence(a)gmail.com> Cc: David Wei <dw(a)davidwei.uk> Cc: Jason Gunthorpe <jgg(a)ziepe.ca> Cc: Yunsheng Lin <linyunsheng(a)huawei.com> Cc: Shailend Chand <shailend(a)google.com> Cc: Harshitha Ramamurthy <hramamurthy(a)google.com> Cc: Shakeel Butt <shakeelb(a)google.com> Cc: Jeroen de Borst <jeroendb(a)google.com> Cc: Praveen Kaligineedi <pkaligineedi(a)google.com> Jakub Kicinski (1): net: page_pool: create hooks for custom page providers Mina Almasry (13): net: page_pool: factor out page_pool recycle check net: netdev netlink api to bind dma-buf to a net device netdev: support binding dma-buf to netdevice netdev: netdevice devmem allocator page_pool: convert to use netmem page_pool: devmem support memory-provider: dmabuf devmem memory provider net: support non paged skb frags net: add support for skbs with unreadable frags tcp: RX path for devmem TCP net: add SO_DEVMEM_DONTNEED setsockopt to release RX frags net: add devmem TCP documentation selftests: add ncdevmem, netcat for devmem TCP Documentation/netlink/specs/netdev.yaml | 52 +++ Documentation/networking/devmem.rst | 271 +++++++++++++ Documentation/networking/index.rst | 1 + arch/alpha/include/uapi/asm/socket.h | 8 +- arch/mips/include/uapi/asm/socket.h | 6 + arch/parisc/include/uapi/asm/socket.h | 6 + arch/sparc/include/uapi/asm/socket.h | 6 + include/linux/skbuff.h | 68 +++- include/linux/socket.h | 1 + include/net/devmem.h | 127 ++++++ include/net/netdev_rx_queue.h | 1 + include/net/netmem.h | 223 ++++++++++- include/net/page_pool/helpers.h | 117 ++++-- include/net/page_pool/types.h | 27 +- include/net/sock.h | 2 + include/net/tcp.h | 5 +- include/trace/events/page_pool.h | 28 +- include/uapi/asm-generic/socket.h | 6 + include/uapi/linux/netdev.h | 19 + include/uapi/linux/uio.h | 14 + net/bpf/test_run.c | 5 +- net/core/datagram.c | 6 + net/core/dev.c | 314 ++++++++++++++- net/core/gro.c | 7 +- net/core/netdev-genl-gen.c | 19 + net/core/netdev-genl-gen.h | 2 + net/core/netdev-genl.c | 123 ++++++ net/core/page_pool.c | 419 +++++++++++++------- net/core/skbuff.c | 92 ++++- net/core/sock.c | 45 +++ net/ipv4/tcp.c | 196 +++++++++- net/ipv4/tcp_input.c | 13 +- net/ipv4/tcp_ipv4.c | 9 + net/ipv4/tcp_output.c | 5 +- net/packet/af_packet.c | 4 +- tools/include/uapi/linux/netdev.h | 19 + tools/testing/selftests/net/.gitignore | 1 + tools/testing/selftests/net/Makefile | 5 + tools/testing/selftests/net/ncdevmem.c | 489 ++++++++++++++++++++++++ 39 files changed, 2527 insertions(+), 234 deletions(-) create mode 100644 Documentation/networking/devmem.rst create mode 100644 include/net/devmem.h create mode 100644 tools/testing/selftests/net/ncdevmem.c -- 2.43.0.472.g3155946c3a-goog

3 months, 2 weeks

2
20
0 0

resctrl selftests ready for inclusion

by Reinette Chatre

Hi Shuah, Could you please consider Ilpo's resctrl selftest enhancements [1] for inclusion into kselftest's "next" branch in preparation for the next merge window? Thank you very much. Reinette [1] https://lore.kernel.org/lkml/20231215150515.36983-1-ilpo.jarvinen@linux.int…

3 months, 2 weeks

2
4
0 0

[PATCH v3 0/7] Split a folio to any lower order folios

by Zi Yan

From: Zi Yan <ziy(a)nvidia.com> Hi all, File folio supports any order and people would like to support flexible orders for anonymous folio[1] too. Currently, split_huge_page() only splits a huge page to order-0 pages, but splitting to orders higher than 0 is also useful. This patchset adds support for splitting a huge page to any lower order pages and uses it during file folio truncate operations. The patchset is on top of mm-everything-2023-03-27-21-20. Changelog === Since v2 --- 1. Fixed an issue in __split_page_owner() introduced during my rebase Since v1 --- 1. Changed split_page_memcg() and split_page_owner() parameter to use order 2. Used folio_test_pmd_mappable() in place of the equivalent code Details === * Patch 1 changes split_page_memcg() to use order instead of nr_pages * Patch 2 changes split_page_owner() to use order instead of nr_pages * Patch 3 and 4 add new_order parameter split_page_memcg() and split_page_owner() and prepare for upcoming changes. * Patch 5 adds split_huge_page_to_list_to_order() to split a huge page to any lower order. The original split_huge_page_to_list() calls split_huge_page_to_list_to_order() with new_order = 0. * Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio truncation instead of split the large folio all the way down to order-0. * Patch 7 adds a test API to debugfs and test cases in split_huge_page_test selftests. Comments and/or suggestions are welcome. [1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/ Zi Yan (7): mm/memcg: use order instead of nr in split_page_memcg() mm/page_owner: use order instead of nr in split_page_owner() mm: memcg: make memcg huge page split support any order split. mm: page_owner: add support for splitting to any order in split page_owner. mm: thp: split huge page to any lower order pages. mm: truncate: split huge page cache page to a non-zero order if possible. mm: huge_memory: enable debugfs to split huge pages to any order. include/linux/huge_mm.h | 10 +- include/linux/memcontrol.h | 4 +- include/linux/page_owner.h | 10 +- mm/huge_memory.c | 137 ++++++++--- mm/memcontrol.c | 10 +- mm/page_alloc.c | 8 +- mm/page_owner.c | 8 +- mm/truncate.c | 21 +- .../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++- 9 files changed, 365 insertions(+), 68 deletions(-) -- 2.39.2

3 months, 2 weeks

6
22
0 0

[RFC PATCH 0/8] cgroup/cpuset: Support RCU_NOCB on isolated partitions

by Waiman Long

This patch series is based on the RFC patch from Frederic [1]. Instead of offering RCU_NOCB as a separate option, it is now lumped into a root-only cpuset.cpus.isolation_full flag that will enable all the additional CPU isolation capabilities available for isolated partitions if set. RCU_NOCB is just the first one to this party. Additional dynamic CPU isolation capabilities will be added in the future. The first 2 patches are adopted from Federic with minor twists to fix merge conflicts and compilation issue. The rests are for implementing the new cpuset.cpus.isolation_full interface which is essentially a flag to globally enable or disable full CPU isolation on isolated partitions. On read, it also shows the CPU isolation capabilities that are currently enabled. RCU_NOCB requires that the rcu_nocbs option be present in the kernel boot command line. Without that, the rcu_nocb functionality cannot be enabled even if the isolation_full flag is set. So we allow users to check the isolation_full file to verify that if the desired CPU isolation capability is enabled or not. Only sanity checking has been done so far. More testing, especially on the RCU side, will be needed. [1] https://lore.kernel.org/lkml/20220525221055.1152307-1-frederic@kernel.org/ Frederic Weisbecker (2): rcu/nocb: Pass a cpumask instead of a single CPU to offload/deoffload rcu/nocb: Prepare to change nocb cpumask from CPU-hotplug protected cpuset caller Waiman Long (6): rcu/no_cb: Add rcu_nocb_enabled() to expose the rcu_nocb state cgroup/cpuset: Better tracking of addition/deletion of isolated CPUs cgroup/cpuset: Add cpuset.cpus.isolation_full cgroup/cpuset: Enable dynamic rcu_nocb mode on isolated CPUs cgroup/cpuset: Document the new cpuset.cpus.isolation_full control file cgroup/cpuset: Update test_cpuset_prs.sh to handle cpuset.cpus.isolation_full Documentation/admin-guide/cgroup-v2.rst | 24 ++ include/linux/rcupdate.h | 15 +- kernel/cgroup/cpuset.c | 237 ++++++++++++++---- kernel/rcu/rcutorture.c | 6 +- kernel/rcu/tree_nocb.h | 118 ++++++--- .../selftests/cgroup/test_cpuset_prs.sh | 23 +- 6 files changed, 337 insertions(+), 86 deletions(-) -- 2.39.3

3 months, 3 weeks

6
19
0 0

[PATCH v2 0/7] Use TAP in some more x86 KVM selftests

by Thomas Huth

Here's a follow-up from my RFC series last year: https://lore.kernel.org/lkml/20221004093131.40392-1-thuth@redhat.com/T/ and from v1 earlier this year: https://lore.kernel.org/kvm/20230712075910.22480-1-thuth@redhat.com/ Basic idea of this series is now to use the kselftest_harness.h framework to get TAP output in the tests, so that it is easier for the user to see what is going on, and e.g. to be able to detect whether a certain test is part of the test binary or not (which is useful when tests get extended in the course of time). v2: - Dropped the "Rename the ASSERT_EQ macro" patch (already merged) - Split the fixes in the sync_regs_test into separate patches (see the first two patches) - Introduce the KVM_ONE_VCPU_TEST_SUITE() macro as suggested by Sean (see third patch) and use it in the following patches - Add a new patch to convert vmx_pmu_caps_test.c, too Thomas Huth (7): KVM: selftests: x86: sync_regs_test: Use vcpu_run() where appropriate KVM: selftests: x86: sync_regs_test: Get regs structure before modifying it KVM: selftests: Add a macro to define a test with one vcpu KVM: selftests: x86: Use TAP interface in the sync_regs test KVM: selftests: x86: Use TAP interface in the fix_hypercall test KVM: selftests: x86: Use TAP interface in the vmx_pmu_caps test KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test .../selftests/kvm/include/kvm_test_harness.h | 35 +++++ .../selftests/kvm/x86_64/fix_hypercall_test.c | 27 ++-- .../selftests/kvm/x86_64/sync_regs_test.c | 121 +++++++++++++----- .../kvm/x86_64/userspace_msr_exit_test.c | 19 +-- .../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 50 ++------ 5 files changed, 160 insertions(+), 92 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/kvm_test_harness.h -- 2.41.0

3 months, 3 weeks

2
9
0 0

[PATCH] selftests: Add missing gitignore entries

by Bernd Edlinger

Prevent them from polluting git status after building selftests. Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de> --- tools/testing/selftests/damon/.gitignore | 1 + tools/testing/selftests/thermal/intel/power_floor/.gitignore | 2 ++ tools/testing/selftests/thermal/intel/workload_hint/.gitignore | 2 ++ tools/testing/selftests/uevent/.gitignore | 2 ++ 4 files changed, 7 insertions(+) create mode 100644 tools/testing/selftests/thermal/intel/power_floor/.gitignore create mode 100644 tools/testing/selftests/thermal/intel/workload_hint/.gitignore create mode 100644 tools/testing/selftests/uevent/.gitignore diff --git a/tools/testing/selftests/damon/.gitignore b/tools/testing/selftests/damon/.gitignore index c6c2965a6607..79b32e30fce3 100644 --- a/tools/testing/selftests/damon/.gitignore +++ b/tools/testing/selftests/damon/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only huge_count_read_write +access_memory diff --git a/tools/testing/selftests/thermal/intel/power_floor/.gitignore b/tools/testing/selftests/thermal/intel/power_floor/.gitignore new file mode 100644 index 000000000000..754810406b33 --- /dev/null +++ b/tools/testing/selftests/thermal/intel/power_floor/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +power_floor_test diff --git a/tools/testing/selftests/thermal/intel/workload_hint/.gitignore b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore new file mode 100644 index 000000000000..b5448c0576c9 --- /dev/null +++ b/tools/testing/selftests/thermal/intel/workload_hint/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +workload_hint_test diff --git a/tools/testing/selftests/uevent/.gitignore b/tools/testing/selftests/uevent/.gitignore new file mode 100644 index 000000000000..15127939d872 --- /dev/null +++ b/tools/testing/selftests/uevent/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +uevent_filtering -- 2.39.2

3 months, 3 weeks

3
2
0 0

[PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers

by David Finkel

Other mechanisms for querying the peak memory usage of either a process or v1 memory cgroup allow for resetting the high watermark. Restore parity with those mechanisms. For example: - Any write to memory.max_usage_in_bytes in a cgroup v1 mount resets the high watermark. - writing "5" to the clear_refs pseudo-file in a processes's proc directory resets the peak RSS. This change copies the cgroup v1 behavior so any write to the memory.peak and memory.swap.peak pseudo-files reset the high watermark to the current usage. This behavior is particularly useful for work scheduling systems that need to track memory usage of worker processes/cgroups per-work-item. Since memory can't be squeezed like CPU can (the OOM-killer has opinions), these systems need to track the peak memory usage to compute system/container fullness when binpacking workitems. Signed-off-by: David Finkel <davidf(a)vimeo.com> --- Documentation/admin-guide/cgroup-v2.rst | 20 +++--- mm/memcontrol.c | 23 ++++++ .../selftests/cgroup/test_memcontrol.c | 72 ++++++++++++++++--- 3 files changed, 99 insertions(+), 16 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 3f85254f3cef..95af0628dc44 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1305,11 +1305,13 @@ PAGE_SIZE multiple when read back. reclaim induced by memory.reclaim. memory.peak - A read-only single value file which exists on non-root - cgroups. + A read-write single value file which exists on non-root cgroups. + + The max memory usage recorded for the cgroup and its descendants since + either the creation of the cgroup or the most recent reset. - The max memory usage recorded for the cgroup and its - descendants since the creation of the cgroup. + Any non-empty write to this file resets it to the current memory usage. + All content written is completely ignored. memory.oom.group A read-write single value file which exists on non-root @@ -1626,11 +1628,13 @@ PAGE_SIZE multiple when read back. Healthy workloads are not expected to reach this limit. memory.swap.peak - A read-only single value file which exists on non-root - cgroups. + A read-write single value file which exists on non-root cgroups. + + The max swap usage recorded for the cgroup and its descendants since + the creation of the cgroup or the most recent reset. - The max swap usage recorded for the cgroup and its - descendants since the creation of the cgroup. + Any non-empty write to this file resets it to the current swap usage. + All content written is completely ignored. memory.swap.max A read-write single value file which exists on non-root diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c1061df9cd1..b04af158922d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -25,6 +25,7 @@ * Copyright (C) 2020 Alibaba, Inc, Alex Shi */ +#include <linux/cgroup-defs.h> #include <linux/page_counter.h> #include <linux/memcontrol.h> #include <linux/cgroup.h> @@ -6635,6 +6636,16 @@ static u64 memory_peak_read(struct cgroup_subsys_state *css, return (u64)memcg->memory.watermark * PAGE_SIZE; } +static ssize_t memory_peak_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + page_counter_reset_watermark(&memcg->memory); + + return nbytes; +} + static int memory_min_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -6947,6 +6958,7 @@ static struct cftype memory_files[] = { .name = "peak", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = memory_peak_read, + .write = memory_peak_write, }, { .name = "min", @@ -7917,6 +7929,16 @@ static u64 swap_peak_read(struct cgroup_subsys_state *css, return (u64)memcg->swap.watermark * PAGE_SIZE; } +static ssize_t swap_peak_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of)); + + page_counter_reset_watermark(&memcg->swap); + + return nbytes; +} + static int swap_high_show(struct seq_file *m, void *v) { return seq_puts_memcg_tunable(m, @@ -7999,6 +8021,7 @@ static struct cftype swap_files[] = { .name = "swap.peak", .flags = CFTYPE_NOT_ON_ROOT, .read_u64 = swap_peak_read, + .write = swap_peak_write, }, { .name = "swap.events", diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c index c7c9572003a8..0326c317f1f2 100644 --- a/tools/testing/selftests/cgroup/test_memcontrol.c +++ b/tools/testing/selftests/cgroup/test_memcontrol.c @@ -161,12 +161,12 @@ static int alloc_pagecache_50M_check(const char *cgroup, void *arg) /* * This test create a memory cgroup, allocates * some anonymous memory and some pagecache - * and check memory.current and some memory.stat values. + * and checks memory.current, memory.peak, and some memory.stat values. */ -static int test_memcg_current(const char *root) +static int test_memcg_current_peak(const char *root) { int ret = KSFT_FAIL; - long current; + long current, peak, peak_reset; char *memcg; memcg = cg_name(root, "memcg_test"); @@ -180,12 +180,32 @@ static int test_memcg_current(const char *root) if (current != 0) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak != 0) + goto cleanup; + if (cg_run(memcg, alloc_anon_50M_check, NULL)) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(50)) + goto cleanup; + + peak_reset = cg_write(memcg, "memory.peak", "\n"); + if (peak_reset != 0) + goto cleanup; + + peak = cg_read_long(memcg, "memory.peak"); + if (peak > MB(30)) + goto cleanup; + if (cg_run(memcg, alloc_pagecache_50M_check, NULL)) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(50)) + goto cleanup; + ret = KSFT_PASS; cleanup: @@ -815,13 +835,14 @@ static int alloc_anon_50M_check_swap(const char *cgroup, void *arg) /* * This test checks that memory.swap.max limits the amount of - * anonymous memory which can be swapped out. + * anonymous memory which can be swapped out. Additionally, it verifies that + * memory.swap.peak reflects the high watermark and can be reset. */ -static int test_memcg_swap_max(const char *root) +static int test_memcg_swap_max_peak(const char *root) { int ret = KSFT_FAIL; char *memcg; - long max; + long max, peak; if (!is_swap_enabled()) return KSFT_SKIP; @@ -838,6 +859,12 @@ static int test_memcg_swap_max(const char *root) goto cleanup; } + if (cg_read_long(memcg, "memory.swap.peak")) + goto cleanup; + + if (cg_read_long(memcg, "memory.peak")) + goto cleanup; + if (cg_read_strcmp(memcg, "memory.max", "max\n")) goto cleanup; @@ -860,6 +887,27 @@ static int test_memcg_swap_max(const char *root) if (cg_read_key_long(memcg, "memory.events", "oom_kill ") != 1) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(29)) + goto cleanup; + + peak = cg_read_long(memcg, "memory.swap.peak"); + if (peak < MB(29)) + goto cleanup; + + if (cg_write(memcg, "memory.swap.peak", "\n")) + goto cleanup; + + if (cg_read_long(memcg, "memory.swap.peak") > MB(10)) + goto cleanup; + + + if (cg_write(memcg, "memory.peak", "\n")) + goto cleanup; + + if (cg_read_long(memcg, "memory.peak")) + goto cleanup; + if (cg_run(memcg, alloc_anon_50M_check_swap, (void *)MB(30))) goto cleanup; @@ -867,6 +915,14 @@ static int test_memcg_swap_max(const char *root) if (max <= 0) goto cleanup; + peak = cg_read_long(memcg, "memory.peak"); + if (peak < MB(29)) + goto cleanup; + + peak = cg_read_long(memcg, "memory.swap.peak"); + if (peak < MB(19)) + goto cleanup; + ret = KSFT_PASS; cleanup: @@ -1293,7 +1349,7 @@ struct memcg_test { const char *name; } tests[] = { T(test_memcg_subtree_control), - T(test_memcg_current), + T(test_memcg_current_peak), T(test_memcg_min), T(test_memcg_low), T(test_memcg_high), @@ -1301,7 +1357,7 @@ struct memcg_test { T(test_memcg_max), T(test_memcg_reclaim), T(test_memcg_oom_events), - T(test_memcg_swap_max), + T(test_memcg_swap_max_peak), T(test_memcg_sock), T(test_memcg_oom_group_leaf_events), T(test_memcg_oom_group_parent_events), -- 2.39.2

3 months, 3 weeks

3
5
0 0

[PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths

by Marco Elver

To allow the compiler to inline the bpf_local_storage_lookup() fast- path, factor it out by making bpf_local_storage_lookup() a static inline function and move the slow-path to bpf_local_storage_lookup_slowpath(). Base on results from './benchs/run_bench_local_storage.sh' this produces improvements in throughput and latency in the majority of cases: | Hashmap Control | =============== | num keys: 10 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%) | hits latency: 71.968 ns/op | 71.318 ns/op (-0.9%) | important_hits throughput: 13.895 ± 0.024 M ops/s | 14.022 ± 0.095 M ops/s (+0.9%) | | num keys: 1000 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%) | hits latency: 84.794 ns/op | 85.874 ns/op (+1.3%) | important_hits throughput: 11.793 ± 0.018 M ops/s | 11.645 ± 0.370 M ops/s (-1.3%) | | num keys: 10000 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%) | hits latency: 140.581 ns/op | 142.113 ns/op (+1.1%) | important_hits throughput: 7.113 ± 0.012 M ops/s | 7.037 ± 0.051 M ops/s (-1.1%) | | num keys: 100000 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%) | hits latency: 208.623 ns/op | 200.401 ns/op (-3.9%) | important_hits throughput: 4.793 ± 0.034 M ops/s | 4.990 ± 0.025 M ops/s (+4.1%) | | num keys: 4194304 | hashmap (control) sequential get: | <before> | <after> | hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%) | hits latency: 478.851 ns/op | 337.648 ns/op (-29.5%) | important_hits throughput: 2.088 ± 0.008 M ops/s | 2.962 ± 0.004 M ops/s (+41.9%) | | Local Storage | ============= | num_maps: 1 | local_storage cache sequential get: | <before> | <after> | hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%) | hits latency: 30.676 ns/op | 25.988 ns/op (-15.3%) | important_hits throughput: 32.598 ± 0.008 M ops/s | 38.480 ± 0.054 M ops/s (+18.0%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%) | hits latency: 27.054 ns/op | 22.807 ns/op (-15.7%) | important_hits throughput: 36.963 ± 0.045 M ops/s | 43.847 ± 0.037 M ops/s (+18.6%) | | num_maps: 10 | local_storage cache sequential get: | <before> | <after> | hits throughput: 32.078 ± 0.004 M ops/s | 37.813 ± 0.020 M ops/s (+17.9%) | hits latency: 31.174 ns/op | 26.446 ns/op (-15.2%) | important_hits throughput: 3.208 ± 0.000 M ops/s | 3.781 ± 0.002 M ops/s (+17.9%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 34.564 ± 0.011 M ops/s | 40.082 ± 0.037 M ops/s (+16.0%) | hits latency: 28.932 ns/op | 24.949 ns/op (-13.8%) | important_hits throughput: 12.344 ± 0.004 M ops/s | 14.315 ± 0.013 M ops/s (+16.0%) | | num_maps: 16 | local_storage cache sequential get: | <before> | <after> | hits throughput: 32.493 ± 0.023 M ops/s | 38.147 ± 0.029 M ops/s (+17.4%) | hits latency: 30.776 ns/op | 26.215 ns/op (-14.8%) | important_hits throughput: 2.031 ± 0.001 M ops/s | 2.384 ± 0.002 M ops/s (+17.4%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 34.380 ± 0.521 M ops/s | 41.605 ± 0.095 M ops/s (+21.0%) | hits latency: 29.087 ns/op | 24.035 ns/op (-17.4%) | important_hits throughput: 10.939 ± 0.166 M ops/s | 13.238 ± 0.030 M ops/s (+21.0%) | | num_maps: 17 | local_storage cache sequential get: | <before> | <after> | hits throughput: 28.748 ± 0.028 M ops/s | 32.248 ± 0.080 M ops/s (+12.2%) | hits latency: 34.785 ns/op | 31.009 ns/op (-10.9%) | important_hits throughput: 1.693 ± 0.002 M ops/s | 1.899 ± 0.005 M ops/s (+12.2%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 31.313 ± 0.030 M ops/s | 35.911 ± 0.020 M ops/s (+14.7%) | hits latency: 31.936 ns/op | 27.847 ns/op (-12.8%) | important_hits throughput: 9.533 ± 0.009 M ops/s | 10.933 ± 0.006 M ops/s (+14.7%) | | num_maps: 24 | local_storage cache sequential get: | <before> | <after> | hits throughput: 18.475 ± 0.027 M ops/s | 19.000 ± 0.006 M ops/s (+2.8%) | hits latency: 54.127 ns/op | 52.632 ns/op (-2.8%) | important_hits throughput: 0.770 ± 0.001 M ops/s | 0.792 ± 0.000 M ops/s (+2.9%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 21.361 ± 0.028 M ops/s | 22.388 ± 0.099 M ops/s (+4.8%) | hits latency: 46.814 ns/op | 44.667 ns/op (-4.6%) | important_hits throughput: 6.009 ± 0.008 M ops/s | 6.298 ± 0.028 M ops/s (+4.8%) | | num_maps: 32 | local_storage cache sequential get: | <before> | <after> | hits throughput: 14.220 ± 0.006 M ops/s | 14.168 ± 0.020 M ops/s (-0.4%) | hits latency: 70.323 ns/op | 70.580 ns/op (+0.4%) | important_hits throughput: 0.445 ± 0.000 M ops/s | 0.443 ± 0.001 M ops/s (-0.4%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 17.250 ± 0.011 M ops/s | 16.650 ± 0.021 M ops/s (-3.5%) | hits latency: 57.971 ns/op | 60.061 ns/op (+3.6%) | important_hits throughput: 4.815 ± 0.003 M ops/s | 4.647 ± 0.006 M ops/s (-3.5%) | | num_maps: 100 | local_storage cache sequential get: | <before> | <after> | hits throughput: 5.212 ± 0.012 M ops/s | 5.878 ± 0.004 M ops/s (+12.8%) | hits latency: 191.877 ns/op | 170.116 ns/op (-11.3%) | important_hits throughput: 0.052 ± 0.000 M ops/s | 0.059 ± 0.000 M ops/s (+13.5%) | local_storage cache interleaved get: | <before> | <after> | hits throughput: 6.521 ± 0.053 M ops/s | 7.086 ± 0.010 M ops/s (+8.7%) | hits latency: 153.343 ns/op | 141.116 ns/op (-8.0%) | important_hits throughput: 1.703 ± 0.014 M ops/s | 1.851 ± 0.003 M ops/s (+8.7%) | | num_maps: 1000 | local_storage cache sequential get: | <before> | <after> | hits throughput: 0.357 ± 0.005 M ops/s | 0.325 ± 0.005 M ops/s (-9.0%) | hits latency: 2803.738 ns/op | 3076.923 ns/op (+9.7%) | important_hits throughput: 0.000 ± 0.000 M ops/s | 0.000 ± 0.000 M ops/s | local_storage cache interleaved get: | <before> | <after> | hits throughput: 0.434 ± 0.007 M ops/s | 0.447 ± 0.007 M ops/s (+3.0%) | hits latency: 2306.539 ns/op | 2237.687 ns/op (-3.0%) | important_hits throughput: 0.109 ± 0.002 M ops/s | 0.112 ± 0.002 M ops/s (+2.8%) Signed-off-by: Marco Elver <elver(a)google.com> --- include/linux/bpf_local_storage.h | 17 ++++++++++++++++- kernel/bpf/bpf_local_storage.c | 14 ++++---------- .../selftests/bpf/progs/cgrp_ls_recursion.c | 2 +- .../selftests/bpf/progs/task_ls_recursion.c | 2 +- 4 files changed, 22 insertions(+), 13 deletions(-) diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 173ec7f43ed1..c8cecf7fff87 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -130,9 +130,24 @@ bpf_local_storage_map_alloc(union bpf_attr *attr, bool bpf_ma); struct bpf_local_storage_data * +bpf_local_storage_lookup_slowpath(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit); +static inline struct bpf_local_storage_data * bpf_local_storage_lookup(struct bpf_local_storage *local_storage, struct bpf_local_storage_map *smap, - bool cacheit_lockit); + bool cacheit_lockit) +{ + struct bpf_local_storage_data *sdata; + + /* Fast path (cache hit) */ + sdata = rcu_dereference_check(local_storage->cache[smap->cache_idx], + bpf_rcu_lock_held()); + if (likely(sdata && rcu_access_pointer(sdata->smap) == smap)) + return sdata; + + return bpf_local_storage_lookup_slowpath(local_storage, smap, cacheit_lockit); +} void bpf_local_storage_destroy(struct bpf_local_storage *local_storage); diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_storage.c index 146824cc9689..2ef782a1bd6f 100644 --- a/kernel/bpf/bpf_local_storage.c +++ b/kernel/bpf/bpf_local_storage.c @@ -415,20 +415,14 @@ void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool reuse_now) } /* If cacheit_lockit is false, this lookup function is lockless */ -struct bpf_local_storage_data * -bpf_local_storage_lookup(struct bpf_local_storage *local_storage, - struct bpf_local_storage_map *smap, - bool cacheit_lockit) +noinline struct bpf_local_storage_data * +bpf_local_storage_lookup_slowpath(struct bpf_local_storage *local_storage, + struct bpf_local_storage_map *smap, + bool cacheit_lockit) { struct bpf_local_storage_data *sdata; struct bpf_local_storage_elem *selem; - /* Fast path (cache hit) */ - sdata = rcu_dereference_check(local_storage->cache[smap->cache_idx], - bpf_rcu_lock_held()); - if (sdata && rcu_access_pointer(sdata->smap) == smap) - return sdata; - /* Slow path (cache miss) */ hlist_for_each_entry_rcu(selem, &local_storage->list, snode, rcu_read_lock_trace_held()) diff --git a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c index a043d8fefdac..9895087a9235 100644 --- a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c +++ b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c @@ -21,7 +21,7 @@ struct { __type(value, long); } map_b SEC(".maps"); -SEC("fentry/bpf_local_storage_lookup") +SEC("fentry/bpf_local_storage_lookup_slowpath") int BPF_PROG(on_lookup) { struct task_struct *task = bpf_get_current_task_btf(); diff --git a/tools/testing/selftests/bpf/progs/task_ls_recursion.c b/tools/testing/selftests/bpf/progs/task_ls_recursion.c index 4542dc683b44..d73b33a4c153 100644 --- a/tools/testing/selftests/bpf/progs/task_ls_recursion.c +++ b/tools/testing/selftests/bpf/progs/task_ls_recursion.c @@ -27,7 +27,7 @@ struct { __type(value, long); } map_b SEC(".maps"); -SEC("fentry/bpf_local_storage_lookup") +SEC("fentry/bpf_local_storage_lookup_slowpath") int BPF_PROG(on_lookup) { struct task_struct *task = bpf_get_current_task_btf(); -- 2.43.0.429.g432eaa2c6b-goog

3 months, 3 weeks

4
9
0 0

[KTAP V2 PATCH v2] ktap_v2: add test metadata

by Rae Moar

Add specification for test metadata to the KTAP v2 spec. KTAP v1 only specifies the output format of very basic test information: test result and test name. Any additional test information either gets added to general diagnostic data or is not included in the output at all. The purpose of KTAP metadata is to create a framework to include and easily identify additional important test information in KTAP. KTAP metadata could include any test information that is pertinent for user interaction before or after the running of the test. For example, the test file path or the test speed. Since this includes a large variety of information, this specification will recognize notable types of KTAP metadata to ensure consistent format across test frameworks. See the full list of types in the specification. Example of KTAP Metadata: KTAP version 2 # ktap_test: main # ktap_arch: uml 1..1 KTAP version 2 # ktap_test: suite_1 # ktap_subsystem: example # ktap_test_file: lib/test.c 1..2 ok 1 test_1 # ktap_test: test_2 # ktap_speed: very_slow # custom_is_flaky: true ok 2 test_2 ok 1 test_suite The changes to the KTAP specification outline the format, location, and different types of metadata. Here is a link to a version of the KUnit parser that is able to parse test metadata lines for KTAP version 2. Note this includes test metadata lines for the main level of KTAP. Link: https://kunit-review.googlesource.com/c/linux/+/5889 Signed-off-by: Rae Moar <rmoar(a)google.com> --- Documentation/dev-tools/ktap.rst | 163 ++++++++++++++++++++++++++++++- 1 file changed, 159 insertions(+), 4 deletions(-) diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst index ff77f4aaa6ef..4480eaf5bbc3 100644 --- a/Documentation/dev-tools/ktap.rst +++ b/Documentation/dev-tools/ktap.rst @@ -17,19 +17,20 @@ KTAP test results describe a series of tests (which may be nested: i.e., test can have subtests), each of which can contain both diagnostic data -- e.g., log lines -- and a final result. The test structure and results are machine-readable, whereas the diagnostic data is unstructured and is there to -aid human debugging. +aid human debugging. One exception to this is test metadata lines - a type +of diagnostic lines. Test metadata is used to identify important supplemental +test information and can be machine-readable. KTAP output is built from four different types of lines: - Version lines - Plan lines - Test case result lines -- Diagnostic lines +- Diagnostic lines (including test metadata) In general, valid KTAP output should also form valid TAP output, but some information, in particular nested test results, may be lost. Also note that there is a stagnant draft specification for TAP14, KTAP diverges from this in -a couple of places (notably the "Subtest" header), which are described where -relevant later in this document. +a couple of places, which are described where relevant later in this document. Version lines ------------- @@ -166,6 +167,154 @@ even if they do not start with a "#": this is to capture any other useful kernel output which may help debug the test. It is nevertheless recommended that tests always prefix any diagnostic output they have with a "#" character. +KTAP metadata lines +------------------- + +KTAP metadata lines are a subset of diagnostic lines that are used to include +and easily identify important supplemental test information in KTAP. + +.. code-block:: none + + # <prefix>_<metadata type>: <metadata value> + +The <prefix> indicates where to find the specification for the type of +metadata. The metadata types listed below use the prefix "ktap" (See Types of +KTAP Metadata). + +Types that are instead specified by an individual test framework use the +framework name as the prefix. For example, a metadata type documented by the +kselftest specification would use the prefix "kselftest". Any metadata type +that is not listed in a specification must use the prefix "custom". Note the +prefix must not include spaces or the characters ":" or "_". + +The format of <metadata type> and <value> varies based on the type. See the +individual specification. For "custom" types the <metadata type> can be any +string excluding ":", spaces, or newline characters and the <value> can be any +string. + +**Location:** + +The first KTAP metadata entry for a test must be "# ktap_test: <test name>", +which acts as a header to associate metadata with the correct test. + +For test cases, the location of the metadata is between the prior test result +line and the current test result line. For test suites, the location of the +metadata is between the suite's version line and test plan line. See the +example below. + +KTAP metadata for a test does not need to be contiguous. For example, a kernel +warning or other diagnostic output could interrupt metadata lines. However, it +is recommended to keep a test's metadata lines together when possible, as this +improves readability. + +**Here is an example of using KTAP metadata:** + +:: + + KTAP version 2 + # ktap_test: main + # ktap_arch: uml + 1..1 + KTAP version 2 + # ktap_test: suite_1 + # ktap_subsystem: example + # ktap_test_file: lib/test.c + 1..2 + ok 1 test_1 + # ktap_test: test_2 + # ktap_speed: very_slow + # custom_is_flaky: true + ok 2 test_2 + # suite_1 passed + ok 1 suite_1 + +In this example, the tests are running on UML. The test suite "suite_1" is part +of the subsystem "example" and belongs to the file "lib/example_test.c". It has +two subtests, "test_1" and "test_2". The subtest "test_2" has a speed of +"very_slow" and has been marked with a custom KTAP metadata type called +"custom_is_flaky" with the value of "true". + +**Types of KTAP Metadata:** + +This is the current list of KTAP metadata types recognized in this +specification. Note that all of these metadata types are optional (except for +ktap_test as the KTAP metadata header). + +- ``ktap_test``: Name of test (used as header of KTAP metadata). This should + match the test name printed in the test result line: "ok 1 [test_name]". + +- ``ktap_module``: Name of the module containing the test + +- ``ktap_subsystem``: Name of the subsystem being tested + +- ``ktap_start_time``: Time tests started in ISO8601 format + + - Example: "# ktap_start_time: 2024-01-09T13:09:01.990000+00:00" + +- ``ktap_duration``: Time taken (in seconds) to execute the test + + - Example: "ktap_duration: 10.154s" + +- ``ktap_speed``: Category of how fast test runs: "normal", "slow", or + "very_slow" + +- ``ktap_test_file``: Path to source file containing the test. This metadata + line can be repeated if the test is spread across multiple files. + + - Example: "# ktap_test_file: lib/test.c" + +- ``ktap_generated_file``: Description of and path to file generated during + test execution. This could be a core dump, generated filesystem image, some + form of visual output (for graphics drivers), etc. This metadata line can be + repeated to attach multiple files to the test. + + - Example: "# ktap_generated_file: Core dump: /var/lib/systemd/coredump/hello.core" + +- ``ktap_log_file``: Path to file containing kernel log test output + + - Example: "# ktap_log_file: /sys/kernel/debugfs/kunit/example/results" + +- ``ktap_error_file``: Path to file containing context for test failure or + error. This could include the difference between optimal test output and + actual test output. + + - Example: "# ktap_error_file: fs/results/example.out.bad" + +- ``ktap_results_url``: Link to webpage describing this test run and its + results + + - Example: "# ktap_results_url: https://kcidb.kernelci.org/hello" + +- ``ktap_arch``: Architecture used during test run + + - Example: "# ktap_arch: x86_64" + +- ``ktap_compiler``: Compiler used during test run + + - Example: "# ktap_compiler: gcc (GCC) 10.1.1 20200507 (Red Hat 10.1.1-1)" + +- ``ktap_respository_url``: Link to git repository of the checked out code. + + - Example: "# ktap_respository_url: https://github.com/torvalds/linux.git" + +- ``ktap_git_branch``: Name of git branch of checked out code + + - Example: "# ktap_git_branch: kselftest/kunit" + +- ``ktap_kernel_version``: Version of Linux Kernel being used during test run + + - Example: "# ktap_kernel_version: 6.7-rc1" + +- ``ktap_commit_hash``: The full git commit hash of the checked out base code. + + - Example: "# ktap_commit_hash: 064725faf8ec2e6e36d51e22d3b86d2707f0f47f" + +**Other Metadata Types:** + +There can also be KTAP metadata that is not included in the recognized list +above. This metadata must be prefixed with the test framework, ie. "kselftest", +or with the prefix "custom". For example, "# custom_batch: 20". + Unknown lines ------------- @@ -206,6 +355,7 @@ An example of a test with two nested subtests: KTAP version 2 1..1 KTAP version 2 + # ktap_test: example 1..2 ok 1 test_1 not ok 2 test_2 @@ -219,6 +369,7 @@ An example format with multiple levels of nested testing: KTAP version 2 1..2 KTAP version 2 + # ktap_test: example_test_1 1..2 KTAP version 2 1..2 @@ -254,6 +405,7 @@ Example KTAP output KTAP version 2 1..1 KTAP version 2 + # ktap_test: main_test 1..3 KTAP version 2 1..1 @@ -261,11 +413,14 @@ Example KTAP output ok 1 test_1 ok 1 example_test_1 KTAP version 2 + # ktap_test: example_test_2 + # ktap_speed: slow 1..2 ok 1 test_1 # SKIP test_1 skipped ok 2 test_2 ok 2 example_test_2 KTAP version 2 + # ktap_test: example_test_3 1..3 ok 1 test_1 # test_2: FAIL base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5 -- 2.43.0.429.g432eaa2c6b-goog

3 months, 3 weeks

4
7
0 0

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror January 2024