*Changes in v15*
- Build fix (Add missed build fix in RESEND)
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 481 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2105 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Hi,
On vanilla AlmaLinux 8.7 (CentOS fork) selftests/net/af_unix/diag_uid.c doesn't
compile out of the box, giving the errors:
make[2]: Entering directory '/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix'
gcc diag_uid.c -o /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix/diag_uid
diag_uid.c:36:16: error: ‘UDIAG_SHOW_UID’ undeclared here (not in a function); did you mean ‘UDIAG_SHOW_VFS’?
.udiag_show = UDIAG_SHOW_UID
^~~~~~~~~~~~~~
UDIAG_SHOW_VFS
In file included from diag_uid.c:17:
diag_uid.c: In function ‘render_response’:
diag_uid.c:128:28: error: ‘UNIX_DIAG_UID’ undeclared (first use in this function); did you mean ‘UNIX_DIAG_VFS’?
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~~~~~
../../kselftest_harness.h:707:13: note: in definition of macro ‘__EXPECT’
__typeof__(_seen) __seen = (_seen); \
^~~~~
diag_uid.c:128:2: note: in expansion of macro ‘ASSERT_EQ’
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~
diag_uid.c:128:28: note: each undeclared identifier is reported only once for each function it appears in
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~~~~~
../../kselftest_harness.h:707:13: note: in definition of macro ‘__EXPECT’
__typeof__(_seen) __seen = (_seen); \
^~~~~
diag_uid.c:128:2: note: in expansion of macro ‘ASSERT_EQ’
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~
make[2]: *** [../../lib.mk:147: /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix/diag_uid] Error 1
The correct value is in <uapi/linux/unix_diag.h>:
include/uapi/linux/unix_diag.h:23:#define UDIAG_SHOW_UID 0x00000040 /* show socket's UID */
The fix is as follows:
---
tools/testing/selftests/net/af_unix/diag_uid.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/net/af_unix/diag_uid.c b/tools/testing/selftests/net/af_unix/diag_uid.c
index 5b88f7129fea..66d75b646d35 100644
--- a/tools/testing/selftests/net/af_unix/diag_uid.c
+++ b/tools/testing/selftests/net/af_unix/diag_uid.c
@@ -16,6 +16,10 @@
#include "../../kselftest_harness.h"
+#ifndef UDIAG_SHOW_UID
+#define UDIAG_SHOW_UID 0x00000040 /* show socket's UID */
+#endif
+
FIXTURE(diag_uid)
{
int netlink_fd;
--
However, this patch reveals another undefined value:
make[2]: Entering directory '/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix'
gcc diag_uid.c -o /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix/diag_uid
In file included from diag_uid.c:17:
diag_uid.c: In function ‘render_response’:
diag_uid.c:132:28: error: ‘UNIX_DIAG_UID’ undeclared (first use in this function); did you mean ‘UNIX_DIAG_VFS’?
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~~~~~
../../kselftest_harness.h:707:13: note: in definition of macro ‘__EXPECT’
__typeof__(_seen) __seen = (_seen); \
^~~~~
diag_uid.c:132:2: note: in expansion of macro ‘ASSERT_EQ’
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~
diag_uid.c:132:28: note: each undeclared identifier is reported only once for each function it appears in
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~~~~~
../../kselftest_harness.h:707:13: note: in definition of macro ‘__EXPECT’
__typeof__(_seen) __seen = (_seen); \
^~~~~
diag_uid.c:132:2: note: in expansion of macro ‘ASSERT_EQ’
ASSERT_EQ(attr->rta_type, UNIX_DIAG_UID);
^~~~~~~~~
make[2]: *** [../../lib.mk:147: /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix/diag_uid] Error 1
Apparently, AlmaLinux 8.7 lacks this enum UNIX_DIAG_UID:
diff -u /usr/include/linux/unix_diag.h include/uapi/linux/unix_diag.h
--- /usr/include/linux/unix_diag.h 2023-05-16 13:47:51.000000000 +0200
+++ include/uapi/linux/unix_diag.h 2022-10-12 07:35:58.253481367 +0200
@@ -20,6 +20,7 @@
#define UDIAG_SHOW_ICONS 0x00000008 /* show pending connections */
#define UDIAG_SHOW_RQLEN 0x00000010 /* show skb receive queue len */
#define UDIAG_SHOW_MEMINFO 0x00000020 /* show memory info of a socket */
+#define UDIAG_SHOW_UID 0x00000040 /* show socket's UID */
struct unix_diag_msg {
__u8 udiag_family;
@@ -40,6 +41,7 @@
UNIX_DIAG_RQLEN,
UNIX_DIAG_MEMINFO,
UNIX_DIAG_SHUTDOWN,
+ UNIX_DIAG_UID,
__UNIX_DIAG_MAX,
};
Now, this is a change in enums and there doesn't seem to an easy way out
here. (I think I saw an example, but I cannot recall which thread. I will do
more research.)
When I included
# gcc -I ../../../../include diag_uid.c
I've got the following error:
[marvin@pc-mtodorov linux_torvalds]$ cd tools/testing/selftests/net/af_unix/
[marvin@pc-mtodorov af_unix]$ gcc -I ../../../../../include diag_uid.c -o
/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/net/af_unix/diag_uid
In file included from ../../../../../include/linux/build_bug.h:5,
from ../../../../../include/linux/bits.h:21,
from ../../../../../include/linux/capability.h:18,
from ../../../../../include/linux/netlink.h:6,
from diag_uid.c:8:
../../../../../include/linux/compiler.h:246:10: fatal error: asm/rwonce.h: No such file or directory
#include <asm/rwonce.h>
^~~~~~~~~~~~~~
compilation terminated.
[marvin@pc-mtodorov af_unix]$
At this point I gave up, as it would be an overkill to change kernel system
header to make a test pass, and this probably wouldn't be accepted upsteam?
Hope this helps. (If we still want to build on CentOS/AlmaLinux/Rocky 8?)
Best regards,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
Add documentation for the new Virtual PCM Test Driver. It covers all
possible usage cases: errors and delay injections, random and
pattern-based data generation, playback and ioctl redefinition
functionalities testing.
We have a lot of different virtual media drivers, which can be used for
testing of the userspace applications and media subsystem middle layer.
However, all of them are aimed at testing the video functionality and
simulating the video devices. For audio devices we have only snd-dummy
module, which is good in simulating the correct behavior of an ALSA device.
I decided to write a tool, which would help to test the userspace ALSA
programs (and the PCM middle layer as well) under unusual circumstances
to figure out how they would behave. So I came up with this Virtual PCM
Test Driver.
This new Virtual PCM Test Driver has several features which can be useful
during the userspace ALSA applications testing/fuzzing, or testing/fuzzing
of the PCM middle layer. Not all of them can be implemented using the
existing virtual drivers (like dummy or loopback). Here is what can this
driver do:
- Simulate both capture and playback processes
- Check the playback stream for containing the looped pattern
- Generate random or pattern-based capture data
- Inject delays into the playback and capturing processes
- Inject errors during the PCM callbacks
Also, this driver can check the playback stream for containing the
predefined pattern, which is used in the corresponding selftest to check
the PCM middle layer data transferring functionality. Additionally, this
driver redefines the default RESET ioctl, and the selftest covers this PCM
API functionality as well.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
V1 -> V2:
- Rename the driver from from 'valsa' to 'pcmtest'.
- Implement support for interleaved and non-interleaved access modes
- Add support for 8 substreams and 4 channels
- Extend supported formats
- Extend and rewrite in C the selftest for the driver
Documentation/sound/cards/index.rst | 1 +
Documentation/sound/cards/pcmtest.rst | 119 ++++++++++++++++++++++++++
2 files changed, 120 insertions(+)
create mode 100644 Documentation/sound/cards/pcmtest.rst
diff --git a/Documentation/sound/cards/index.rst b/Documentation/sound/cards/index.rst
index c016f8c3b88b..49c1f2f688f8 100644
--- a/Documentation/sound/cards/index.rst
+++ b/Documentation/sound/cards/index.rst
@@ -17,3 +17,4 @@ Card-Specific Information
hdspm
serial-u16550
img-spdif-in
+ pcmtest
diff --git a/Documentation/sound/cards/pcmtest.rst b/Documentation/sound/cards/pcmtest.rst
new file mode 100644
index 000000000000..ea8070eaa44e
--- /dev/null
+++ b/Documentation/sound/cards/pcmtest.rst
@@ -0,0 +1,119 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual PCM Test Driver
+===========================
+
+The Virtual PCM Test Driver emulates a generic PCM device, and can be used for
+testing/fuzzing of the userspace ALSA applications, as well as for testing/fuzzing of
+the PCM middle layer. Additionally, it can be used for simulating hard to reproduce
+problems with PCM devices.
+
+What can this driver do?
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+At this moment the driver can do the following things:
+ * Simulate both capture and playback processes
+ * Generate random or pattern-based capturing data
+ * Inject delays into the playback and capturing processes
+ * Inject errors during the PCM callbacks
+
+It supports up to 8 substreams and 4 channels. Also it supports both interleaved and
+non-interleaved access modes.
+
+Also, this driver can check the playback stream for containing the predefined pattern,
+which is used in the corresponding selftest (alsa/test-pcmtest-driver.c). To check the
+PCM middle layer data transferring functionality. Additionally, this driver redefines
+the default RESET ioctl, and the selftest covers this PCM API functionality as well.
+
+Configuration
+-------------
+
+The driver has several parameters besides the common ALSA module parameters:
+
+ * fill_mode (bool) - Buffer fill mode (see below)
+ * inject_delay (int)
+ * inject_hwpars_err (bool)
+ * inject_prepare_err (bool)
+ * inject_trigger_err (bool)
+
+
+Capture Data Generation
+-----------------------
+
+The driver has two modes of data generation: the first (0 in the fill_mode parameter)
+means random data generation, the second (1 in the fill_mode) - pattern-based
+data generation. Let's look at the second mode.
+
+First of all, you may want to specify the pattern for data generation. You can do it
+by writing the pattern to the debugfs file (/sys/kernel/debug/pcmtest/fill_pattern).
+Like that:
+
+.. code-block:: bash
+
+ echo -n mycoolpattern > /sys/kernel/debug/pcmtest/fill_pattern
+
+After this, every capture action performed on the 'pcmtest' device will return
+'mycoolpatternmycoolpatternmycoolpatternmy...' for every channel buffer.
+
+In case of interleaved access, the capture buffer will contain the repeated pattern
+for every channel. Otherwise, every channel buffer will contain the repeated pattern.
+
+The pattern itself can be up to 4096 bytes long.
+
+Delay injection
+---------------
+
+The driver has 'inject_delay' parameter, which has very self-descriptive name and
+can be used for time delay/speedup simulations. The parameter has integer type, and
+it means the delay added between module's internal timer ticks.
+
+If the 'inject_delay' value is positive, the buffer will be filled slower, if it is
+negative - faster. You can try it yourself by starting a recording in any
+audio recording application (like Audacity) and selecting the 'pcmtest' device as a
+source.
+
+This parameter can be also used for generating a huge amount of sound data in a very
+short period of time (with the negative 'inject_delay' value).
+
+Errors injection
+----------------
+
+This module can be used for injecting errors into the PCM communication process. This
+action can help you to figure out how the userspace ALSA program behaves under unusual
+circumstances.
+
+For example, you can make all 'hw_params' PCM callback calls return EBUSY error by
+writing '1' to the 'inject_hwpars_err' module parameter:
+
+.. code-block:: bash
+
+ echo 1 > /sys/module/snd_pcmtest/parameters/inject_hwpars_err
+
+Errors can be injected into the following PCM callbacks:
+
+ * hw_params (EBUSY)
+ * prepare (EINVAL)
+ * trigger (EINVAL)
+
+
+Playback test
+-------------
+
+This driver can be also used for the playback functionality testing - every time you
+write the playback data to the 'pcmtest' PCM device and close it, the driver checks the
+buffer of each channel for containing the looped pattern (which is specified in the
+fill_pattern debugfs file). If the playback buffer content represents the looped pattern,
+'pc_test' debugfs entry is set into '1'. Otherwise, the driver sets it to '0'.
+
+ioctl redefinition test
+-----------------------
+
+The driver redefines the 'reset' ioctl, which is default for all PCM devices. To test
+this functionality, we can trigger the reset ioctl and check the 'ioctl_test' debugfs
+entry:
+
+.. code-block:: bash
+
+ cat /sys/kernel/debug/pcmtest/ioctl_test
+
+If the ioctl is triggered successfully, this file will contain '1', and '0' otherwise.
--
2.34.1
iommufd gives userspace the capability to manipulate iommu subsytem.
e.g. DMA map/unmap etc. In the near future, it will support iommu nested
translation. Different platform vendors have different implementation for
the nested translation. So before set up nested translation, userspace
needs to know the hardware iommu information. For example, Intel VT-d
supports using guest I/O page table as the stage-1 translation table. This
requires guest I/O page table be compatible with hardware IOMMU.
This series reports the iommu hardware information for a given iommufd_device
which has been bound to iommufd. It is preparation work for userspace to
allocate hwpt for given device. Like the nested translation support[1].
This series introduces an iommu op to report the iommu hardware info,
and an ioctl IOMMU_DEVICE_GET_HW_INFO is added to report such hardware
info to user. enum iommu_hw_info_type is defined to differentiate the
iommu hardware info reported to user hence user can decode them. This
series only adds the framework for iommu hw info reporting, the complete
reporting path needs vendor specific definition and driver support. The
full picture is available in [1] as well.
base-commit: 35db4f4dac813ffaa987cf633694107fabf3aff5
[1] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
Change log:
v3:
- Add r-b from Baolu
- Rename IOMMU_HW_INFO_TYPE_DEFAULT to be IOMMU_HW_INFO_TYPE_NONE to
better suit what it means
- Let IOMMU_DEVICE_GET_HW_INFO succeed even the underlying iommu driver
does not have driver-specific data to report per below remark.
https://lore.kernel.org/kvm/ZAcwJSK%2F9UVI9LXu@nvidia.com/
v2: https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
- Drop patch 05 of v1 as it is already covered by other series
- Rename the capability info to be iommu hardware info
v1: https://lore.kernel.org/linux-iommu/20230209041642.9346-1-yi.l.liu@intel.co…
Regards,
Yi Liu
Lu Baolu (1):
iommu: Add new iommu op to get iommu hardware information
Nicolin Chen (1):
iommufd/selftest: Add coverage for IOMMU_DEVICE_GET_HW_INFO ioctl
Yi Liu (2):
iommu: Move dev_iommu_ops() to private header
iommufd: Add IOMMU_DEVICE_GET_HW_INFO
drivers/iommu/iommu-priv.h | 11 +++
drivers/iommu/iommu.c | 2 +
drivers/iommu/iommufd/device.c | 73 +++++++++++++++++++
drivers/iommu/iommufd/iommufd_private.h | 1 +
drivers/iommu/iommufd/iommufd_test.h | 9 +++
drivers/iommu/iommufd/main.c | 3 +
drivers/iommu/iommufd/selftest.c | 16 ++++
include/linux/iommu.h | 27 ++++---
include/uapi/linux/iommufd.h | 44 +++++++++++
tools/testing/selftests/iommu/iommufd.c | 17 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 26 +++++++
11 files changed, 217 insertions(+), 12 deletions(-)
--
2.34.1
As suggested by Willy it is possible to detect the availability of
stackprotector via preprocessor defines.
Make use of that to simplify the code and interface of nolibc.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (7):
tools/nolibc: fix typo pint -> point
tools/nolibc: x86_64: disable stack protector for _start
tools/nolibc: ensure stack protector guard is never zero
tools/nolibc: add test for __stack_chk_guard initialization
tools/nolibc: reformat list of headers to be installed
tools/nolibc: add autodetection for stackprotector support
tools/nolibc: simplify stackprotector compiler flags
tools/include/nolibc/Makefile | 19 +++++++++++++++++--
tools/include/nolibc/arch-aarch64.h | 6 +++---
tools/include/nolibc/arch-arm.h | 6 +++---
tools/include/nolibc/arch-i386.h | 6 +++---
tools/include/nolibc/arch-loongarch.h | 6 +++---
tools/include/nolibc/arch-mips.h | 6 +++---
tools/include/nolibc/arch-riscv.h | 6 +++---
tools/include/nolibc/arch-x86_64.h | 8 ++++----
tools/include/nolibc/arch.h | 2 +-
tools/include/nolibc/compiler.h | 15 +++++++++++++++
tools/include/nolibc/stackprotector.h | 15 ++++++---------
tools/testing/selftests/nolibc/Makefile | 13 ++-----------
tools/testing/selftests/nolibc/nolibc-test.c | 10 +++++++++-
13 files changed, 72 insertions(+), 46 deletions(-)
---
base-commit: 606343b7478c319cb30291a39ecbceddb42229d6
change-id: 20230521-nolibc-automatic-stack-protector-b4f7fab9e625
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi,
On AlmaLinux 8.7, make kselftest-all fails at memfd/memfd_test.c:
make[2]: Entering directory '/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/memfd'
gcc -D_FILE_OFFSET_BITS=64 -isystem /home/marvin/linux/kernel/linux_torvalds/usr/include memfd_test.c common.c -o
/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/memfd/memfd_test
memfd_test.c: In function ‘test_seal_future_write’:
memfd_test.c:916:27: error: ‘F_SEAL_FUTURE_WRITE’ undeclared (first use in this function); did you mean ‘F_SEAL_WRITE’?
mfd_assert_add_seals(fd, F_SEAL_FUTURE_WRITE);
^~~~~~~~~~~~~~~~~~~
F_SEAL_WRITE
memfd_test.c:916:27: note: each undeclared identifier is reported only once for each function it appears in
memfd_test.c: In function ‘test_exec_seal’:
memfd_test.c:36:7: error: ‘F_SEAL_FUTURE_WRITE’ undeclared (first use in this function); did you mean ‘F_SEAL_WRITE’?
F_SEAL_FUTURE_WRITE | \
^~~~~~~~~~~~~~~~~~~
memfd_test.c:1058:27: note: in expansion of macro ‘F_WX_SEALS’
mfd_assert_has_seals(fd, F_WX_SEALS);
^~~~~~~~~~
make[2]: *** [../lib.mk:147: /home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/memfd/memfd_test] Error 1
make[2]: Leaving directory '/home/marvin/linux/kernel/linux_torvalds/tools/testing/selftests/memfd'
Apparently, the include file include/uapi/linux/fcntl.h defines this
F_SEAL_FUTURE_WRITE as 0x0010:
include/uapi/linux/fcntl.h:45:#define F_SEAL_FUTURE_WRITE 0x0010 /* prevent future writes while mapped */
This patch fixed the issue:
---
tools/testing/selftests/memfd/memfd_test.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index dba0e8ba002f..868f17c02e32 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -28,7 +28,13 @@
#define MFD_DEF_SIZE 8192
#define STACK_SIZE 65536
-#define F_SEAL_EXEC 0x0020
+#ifndef F_SEAL_FUTURE_WRITE
+#define F_SEAL_FUTURE_WRITE 0x0010
+#endif
+
+#ifndef F_SEAL_EXEC
+#define F_SEAL_EXEC 0x0020
+#endif
#define F_WX_SEALS (F_SEAL_SHRINK | \
F_SEAL_GROW | \
Hope this helps.
Best regards,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
Changes since RFC v1:
* add two kselftests (patch 10-11)
* set virtual MSRs also on APs [Pawan]
* enable "virtualize IA32_SPEC_CTRL" for L2 to prevent L2 from changing
some bits of IA32_SPEC_CTRL (patch 4)
* other misc cleanup and cosmetic changes
RFC v1: https://lore.kernel.org/lkml/20221210160046.2608762-1-chen.zhang@intel.com/
This series introduces "virtualize IA32_SPEC_CTRL" support. Here are
introduction and use cases of this new feature.
### Virtualize IA32_SPEC_CTRL
"Virtualize IA32_SPEC_CTRL" [1] is a new VMX feature on Intel CPUs. This feature
allows VMM to lock some bits of IA32_SPEC_CTRL MSR even when the MSR is
pass-thru'd to a guest.
### Use cases of "virtualize IA32_SPEC_CTRL" [2]
Software mitigations like Retpoline and software BHB-clearing sequence depend on
CPU microarchitectures. And guest cannot know exactly the underlying
microarchitecture. When a guest is migrated between processors of different
microarchitectures, software mitigations which work perfectly on previous
microachitecture may be not effective on the new one. To fix the problem, some
hardware mitigations should be used in conjunction with software mitigations.
Using virtual IA32_SPEC_CTRL, VMM can enforce hardware mitigations transparently
to guests and avoid those hardware mitigations being unintentionally disabled
when guest changes IA32_SPEC_CTRL MSR.
### Intention of this series
This series adds the capability of enforcing hardware mitigations for guests
transparently and efficiently (i.e., without intecepting IA32_SPEC_CTRL MSR
accesses) to kvm. The capability can be used to solve the VM migration issue in
a pool consisting of processors of different microarchitectures.
Specifically, below are two target scenarios of this series:
Scenario 1: If retpoline is used by a VM to mitigate IMBTI in CPL0, VMM can set
RRSBA_DIS_S on parts enumerates RRSBA. Note that the VM is presented
with a microarchitecture doesn't enumerate RRSBA.
Scenario 2: If a VM uses software BHB-clearing sequence on transitions into CPL0
to mitigate BHI, VMM can use "virtualize IA32_SPEC_CTRL" to set
BHI_DIS_S on new parts which doesn't enumerate BHI_NO.
Intel defines some virtual MSRs [2] for guests to report in-use software
mitigations. This allows guests to opt in VMM's deploying hardware mitigations
for them if the guests are either running or later migrated to a system on which
in-use software mitigations are not effective. The virtual MSRs interface is
also added in this series.
### Organization of this series
1. Patch 1-3 Advertise RRSBA_CTRL and BHI_CTRL to guest
2. Patch 4 Add "virtualize IA32_SPEC_CTRL" support
3. Patch 5-9 Allow guests to report in-use software mitigations to KVM so
that KVM can enable hardware mitigations for guests.
4. Patch 10-11 Add kselftest for virtual MSRs and IA32_SPEC_CTRL
[1]: https://cdrdv2.intel.com/v1/dl/getContent/671368 Ref. #319433-047 Chapter 12
[2]: https://www.intel.com/content/www/us/en/developer/articles/technical/softwa…
Chao Gao (3):
KVM: VMX: Advertise MITI_ENUM_RETPOLINE_S_SUPPORT
KVM: selftests: Add tests for virtual enumeration/mitigation MSRs
KVM: selftests: Add tests for IA32_SPEC_CTRL MSR
Pawan Gupta (1):
x86/bugs: Use Virtual MSRs to request hardware mitigations
Zhang Chen (7):
x86/msr-index: Add bit definitions for BHI_DIS_S and BHI_NO
KVM: x86: Advertise CPUID.7.2.EDX and RRSBA_CTRL support
KVM: x86: Advertise BHI_CTRL support
KVM: VMX: Add IA32_SPEC_CTRL virtualization support
KVM: x86: Advertise ARCH_CAP_VIRTUAL_ENUM support
KVM: VMX: Advertise MITIGATION_CTRL support
KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
arch/x86/include/asm/msr-index.h | 33 +++-
arch/x86/include/asm/vmx.h | 5 +
arch/x86/include/asm/vmxfeatures.h | 2 +
arch/x86/kernel/cpu/bugs.c | 25 +++
arch/x86/kvm/cpuid.c | 22 ++-
arch/x86/kvm/reverse_cpuid.h | 8 +
arch/x86/kvm/svm/svm.c | 3 +
arch/x86/kvm/vmx/capabilities.h | 5 +
arch/x86/kvm/vmx/nested.c | 13 ++
arch/x86/kvm/vmx/vmcs.h | 2 +
arch/x86/kvm/vmx/vmx.c | 112 ++++++++++-
arch/x86/kvm/vmx/vmx.h | 43 ++++-
arch/x86/kvm/x86.c | 19 +-
tools/arch/x86/include/asm/msr-index.h | 37 +++-
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/x86_64/processor.h | 5 +
.../selftests/kvm/x86_64/spec_ctrl_msr_test.c | 178 ++++++++++++++++++
.../kvm/x86_64/virtual_mitigation_msr_test.c | 175 +++++++++++++++++
18 files changed, 676 insertions(+), 13 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/spec_ctrl_msr_test.c
create mode 100644 tools/testing/selftests/kvm/x86_64/virtual_mitigation_msr_test.c
base-commit: 400d2132288edbd6d500f45eab5d85526ca94e46
--
2.40.0
Dzień dobry,
chcielibyśmy zapewnić Państwu kompleksowe rozwiązania, jeśli chodzi o system monitoringu GPS.
Precyzyjne monitorowanie pojazdów na mapach cyfrowych, śledzenie ich parametrów eksploatacyjnych w czasie rzeczywistym oraz kontrola paliwa to kluczowe funkcjonalności naszego systemu.
Organizowanie pracy pracowników jest dzięki temu prostsze i bardziej efektywne, a oszczędności i optymalizacja w zakresie ponoszonych kosztów, mają dla każdego przedsiębiorcy ogromne znaczenie.
Dopasujemy naszą ofertę do Państwa oczekiwań i potrzeb organizacji. Czy moglibyśmy porozmawiać o naszej propozycji?
Pozdrawiam
Konrad Trojanowski
syscall() is used by "normal" libcs to allow users to directly call
syscalls.
By having the same syntax inside nolibc users can more easily write code
that works with different libcs.
The macro logic is adapted from systemtaps STAP_PROBEV() macro that is
released in the public domain / CC0.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/include/nolibc/unistd.h | 15 +++++++++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 2 ++
2 files changed, 17 insertions(+)
diff --git a/tools/include/nolibc/unistd.h b/tools/include/nolibc/unistd.h
index ac7d53d986cd..6773e83c16a0 100644
--- a/tools/include/nolibc/unistd.h
+++ b/tools/include/nolibc/unistd.h
@@ -56,6 +56,21 @@ int tcsetpgrp(int fd, pid_t pid)
return ioctl(fd, TIOCSPGRP, &pid);
}
+#define _syscall(N, ...) \
+({ \
+ int _ret = my_syscall##N(__VA_ARGS__); \
+ if (_ret < 0) { \
+ SET_ERRNO(-_ret); \
+ _ret = -1; \
+ } \
+ _ret; \
+})
+
+#define _sycall_narg(...) __syscall_narg(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)
+#define __syscall_narg(_0, _1, _2, _3, _4, _5, _6, N, ...) N
+#define _syscall_n(N, ...) _syscall(N, __VA_ARGS__)
+#define syscall(...) _syscall_n(_sycall_narg(__VA_ARGS__), ##__VA_ARGS__)
+
/* make sure to include all global symbols */
#include "nolibc.h"
diff --git a/tools/testing/selftests/nolibc/nolibc-test.c b/tools/testing/selftests/nolibc/nolibc-test.c
index f042a6436b6b..54bf91847af3 100644
--- a/tools/testing/selftests/nolibc/nolibc-test.c
+++ b/tools/testing/selftests/nolibc/nolibc-test.c
@@ -588,6 +588,8 @@ int run_syscall(int min, int max)
CASE_TEST(waitpid_child); EXPECT_SYSER(1, waitpid(getpid(), &tmp, WNOHANG), -1, ECHILD); break;
CASE_TEST(write_badf); EXPECT_SYSER(1, write(-1, &tmp, 1), -1, EBADF); break;
CASE_TEST(write_zero); EXPECT_SYSZR(1, write(1, &tmp, 0)); break;
+ CASE_TEST(syscall_noargs); EXPECT_SYSEQ(1, syscall(__NR_getpid), getpid()); break;
+ CASE_TEST(syscall_args); EXPECT_SYSER(1, syscall(__NR_fstat, 0, NULL), -1, EFAULT); break;
case __LINE__:
return ret; /* must be last */
/* note: do not set any defaults so as to permit holes above */
---
base-commit: 063dcc53b416ae1e89f767330feab3d0842943ed
change-id: 20230517-nolibc-syscall-bd13da6468c6
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hello,
Here is v2 of the mremap start address optimization / fix for exec warning.
v1->v2:
1. Trigger the optimization for mremaps smaller than a PMD. I tested by tracing
that it works correctly.
2. Fix issue with bogus return value found by Linus if we broke out of the
above loop for the first PMD itself.
Description of patches:
These patches optimizes the start addresses in move_page_tables() and tests the
changes. It addresses a warning [1] that occurs due to a downward, overlapping
move on a mutually-aligned offset within a PMD during exec. By initiating the
copy process at the PMD level when such alignment is present, we can prevent
this warning and speed up the copying process at the same time. Linus Torvalds
suggested this idea.
Please check the individual patches for more details.
thanks,
- Joel
[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/
Joel Fernandes (Google) (4):
mm/mremap: Optimize the start addresses in move_page_tables()
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
mm/mremap.c | 56 +++++++++++++++++++
tools/testing/selftests/mm/mremap_test.c | 69 +++++++++++++++++++++---
2 files changed, 119 insertions(+), 6 deletions(-)
--
2.40.1.698.g37aff9b760-goog
Dear,
Please grant me permission to share a very crucial discussion with
you. I am looking forward to hearing from you at your earliest
convenience.
Mrs. Nina Coulibal
> From: Tian, Kevin
> Sent: Friday, May 19, 2023 4:42 PM
> > +struct iommu_hw_info {
> > + __u32 size;
> > + __u32 flags;
> > + __u32 dev_id;
> > + __u32 data_len;
> > + __aligned_u64 data_ptr;
> > + __u32 out_data_type;
> > + __u32 __reserved;
>
> it's unusual to have reserved field in the end. It makes more sense
> to move data_ptr to the end to make it meaningful.
>
Please ignore this comment. typed too fast...
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.
Redirect the error message to /dev/null to mute it.
V2: Update commit message and fixes tag.
V3: resubmit due to missing netdev ML in V2
Fixes: b60417a9f2b8 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/fib_tests.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 7da8ec8..35d89df 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -68,7 +68,7 @@ setup()
cleanup()
{
$IP link del dev dummy0 &> /dev/null
- ip netns del ns1
+ ip netns del ns1 &> /dev/null
ip netns del ns2 &> /dev/null
}
--
2.7.4
Hello,
I am posting this as an RFC for any feedback. I have tested them suitably and I
am continuing to test them.
These patches optimizes the start addresses in move_page_tables(). It addresses a
warning [1] that occurs due to a downward, overlapping move on a mutually-aligned
offset within a PMD during exec. By initiating the copy process at the PMD
level when such alignment is present, we can prevent this warning and speed up
the copying process at the same time. Linus Torvalds suggested this idea.
Please check the individual patches for more details.
thanks,
- Joel
[1] https://lore.kernel.org/all/ZB2GTBD%2FLWTrkOiO@dhcp22.suse.cz/
Joel Fernandes (Google) (4):
mm/mremap: Optimize the start addresses in move_page_tables()
selftests: mm: Fix failure case when new remap region was not found
selftests: mm: Add a test for mutually aligned moves > PMD size
selftests: mm: Add a test for remapping to area immediately after
existing mapping
mm/mremap.c | 49 +++++++++++++++++
tools/testing/selftests/mm/mremap_test.c | 69 +++++++++++++++++++++---
2 files changed, 112 insertions(+), 6 deletions(-)
--
2.40.1.606.ga4b1b128d6-goog
KVM_GET_REG_LIST will dump all register IDs that are available to
KVM_GET/SET_ONE_REG and It's very useful to identify some platform
regression issue during VM migration.
Patch 1 enabled the KVM_GET_REG_LIST API in riscv and patch 2 added
the corresponding kselftest for checking possible register regressions.
Both patches were ported from arm64 and tested with Linux 6.4-rc1 on a
Qemu riscv virt machine.
Haibo Xu (2):
riscv: kvm: Add KVM_GET_REG_LIST API support
KVM: selftests: Add riscv get-reg-list test
Documentation/virt/kvm/api.rst | 2 +-
arch/riscv/kvm/vcpu.c | 346 +++++++
tools/testing/selftests/kvm/Makefile | 3 +
.../selftests/kvm/include/riscv/processor.h | 3 +
.../selftests/kvm/riscv/get-reg-list.c | 869 ++++++++++++++++++
5 files changed, 1222 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/kvm/riscv/get-reg-list.c
--
2.34.1
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
This can even be reproduced with just `./fib_tests.sh -h` as we're
calling cleanup() on exit.
Redirect the error message to /dev/null to mute it.
V2: Update commit message and fixes tag.
Fixes: b60417a9f2b8 ("selftest: fib_tests: Always cleanup before exit")
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/fib_tests.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 7da8ec8..35d89df 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -68,7 +68,7 @@ setup()
cleanup()
{
$IP link del dev dummy0 &> /dev/null
- ip netns del ns1
+ ip netns del ns1 &> /dev/null
ip netns del ns2 &> /dev/null
}
--
2.7.4
This patchset adds a stress test for kprobes and a test for checking
optimized probes.
The two tests are being added based on the below discussion:
https://lore.kernel.org/all/20230128101622.ce6f8e64d929e29d36b08b73@kernel.…
kprobe_opt_types.tc is modified as per the below review comments:
https://lore.kernel.org/all/1682506809.uus6y0ir3i.naveen@linux.ibm.com/#t
Changelog:
v3:
* Add Acked-by for kprobe_insn_boundary.tc
* Simplify test for optimized probe, as suggested by Masami
* Add exit_unresolved to exit as unresolved in case no probe was optimized
v2:
* Add an explicit fork after enabling the events ( echo "forked" )
* Remove the extended test from multiple_kprobe_types.tc which adds
multiple consecutive probes in a function and add it as a
separate test case.
* Add new test case which checks for optimized probes.
Akanksha J N (2):
selftests/ftrace: Add new test case which adds multiple consecutive
probes in a function
selftests/ftrace: Add new test case which checks for optimized probes
.../test.d/kprobe/kprobe_insn_boundary.tc | 19 +++++++++++
.../ftrace/test.d/kprobe/kprobe_opt_types.tc | 34 +++++++++++++++++++
2 files changed, 53 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_insn_boundary.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
--
2.31.1
In the end of the test, there will be an error message induced by the
`ip netns del ns1` command in cleanup()
Tests passed: 201
Tests failed: 0
Cannot remove namespace file "/run/netns/ns1": No such file or directory
Redirect the error message to /dev/null to mute it.
Fixes: a0e11da78f48 ("fib_tests: Add tests for metrics on routes")
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/fib_tests.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_tests.sh b/tools/testing/selftests/net/fib_tests.sh
index 7da8ec8..35d89df 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -68,7 +68,7 @@ setup()
cleanup()
{
$IP link del dev dummy0 &> /dev/null
- ip netns del ns1
+ ip netns del ns1 &> /dev/null
ip netns del ns2 &> /dev/null
}
--
2.7.4
Hi Linus,
Please pull the following Kselftest fixes update for Linux 6.4-rc3.
This Kselftest fixes update for Linux 6.4-rc3 consists of:
- sgx test fix for false negatives.
- ftrace output is hard to parse and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit ac9a78681b921877518763ba0e89202254349d1b:
Linux 6.4-rc1 (2023-05-07 13:34:35 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-fixes-6.4-rc3
for you to fetch changes up to dbcf76390eb9a65d5d0c37b0cd57335218564e37:
selftests/ftrace: Improve integration with kselftest runner (2023-05-08 11:10:13 -0600)
----------------------------------------------------------------
linux-kselftest-fixes-6.4-rc3
This Kselftest fixes update for Linux 6.4-rc3 consists of:
- sgx test fix for false negatives.
- ftrace output is hard to parse and it masks inappropriate skips etc.
This fix addresses the problems by integrating with kselftest runner.
----------------------------------------------------------------
Mark Brown (1):
selftests/ftrace: Improve integration with kselftest runner
Yi Lai (1):
selftests/sgx: Add "test_encl.elf" to TEST_FILES
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
tools/testing/selftests/sgx/Makefile | 1 +
4 files changed, 71 insertions(+), 4 deletions(-)
create mode 100755 tools/testing/selftests/ftrace/ftracetest-ktap
----------------------------------------------------------------
v2:
---
* swap order of patches (thanks Claudio)
* add r-b
* add comment why memslots are zeroed
Add a new selftest for CMMA migration. Also fix a small issue found during
development of the test.
Nico Boehr (2):
KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes
KVM: s390: selftests: add selftest for CMMA migration
arch/s390/kvm/kvm-s390.c | 4 +
tools/testing/selftests/kvm/Makefile | 1 +
tools/testing/selftests/kvm/s390x/cmma_test.c | 680 ++++++++++++++++++
3 files changed, 685 insertions(+)
create mode 100644 tools/testing/selftests/kvm/s390x/cmma_test.c
--
2.39.1
The cited commit added a stray colon to the 'v' option. That makes the
option work incorrectly.
ex:
tools/testing/selftests/net# ./fib_nexthops.sh -v
(should enable verbose mode, instead it shows help text due to missing arg)
Fixes: 5feba4727395 ("selftests: fib_nexthops: Make ping timeout configurable")
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Signed-off-by: Benjamin Poirier <bpoirier(a)nvidia.com>
---
tools/testing/selftests/net/fib_nexthops.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/fib_nexthops.sh b/tools/testing/selftests/net/fib_nexthops.sh
index a47b26ab48f2..0f5e88c8f4ff 100755
--- a/tools/testing/selftests/net/fib_nexthops.sh
+++ b/tools/testing/selftests/net/fib_nexthops.sh
@@ -2283,7 +2283,7 @@ EOF
################################################################################
# main
-while getopts :t:pP46hv:w: o
+while getopts :t:pP46hvw: o
do
case $o in
t) TESTS=$OPTARG;;
--
2.40.1
While KUnit tests that cannot be built as a loadable module must depend
on "KUNIT=y", this is not true for modular tests, where it adds an
unnecessary limitation.
Fix this by relaxing the dependency to "KUNIT".
Fixes: 08809e482a1c44d9 ("HID: uclogic: KUnit best practices and naming conventions")
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
---
drivers/hid/Kconfig | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig
index 4ce012f83253ec9f..b977450cac75265d 100644
--- a/drivers/hid/Kconfig
+++ b/drivers/hid/Kconfig
@@ -1285,7 +1285,7 @@ config HID_MCP2221
config HID_KUNIT_TEST
tristate "KUnit tests for HID" if !KUNIT_ALL_TESTS
- depends on KUNIT=y
+ depends on KUNIT
depends on HID_BATTERY_STRENGTH
depends on HID_UCLOGIC
default KUNIT_ALL_TESTS
--
2.34.1
Add documentation for the new Virtual ALSA driver. It covers all possible
usage cases: errors and delay injections, random and pattern-based data
generation, playback and ioctl redefinition functionalities testing.
We have a lot of different virtual media drivers, which can be used for
testing of the userspace applications and media subsystem middle layer.
However, all of them are aimed at testing the video functionality and
simulating the video devices. For audio devices we have only snd-dummy
module, which is good in simulating the correct behavior of an ALSA device.
I decided to write a tool, which would help to test the userspace ALSA
programs (and the PCM middle layer as well) under unusual circumstances
to figure out how they would behave. So I came up with this Virtual ALSA
Driver.
This new Virtual ALSA Driver has several features which can be useful
during the userspace ALSA applications testing/fuzzing, or testing/fuzzing
of the PCM middle layer. Not all of them can be implemented using the
existing virtual drivers (like dummy or loopback). Here is what can this
driver do:
- Simulate both capture and playback processes
- Check the playback stream for containing the looped pattern
- Generate random or pattern-based capture data
- Inject delays into the playback and capturing processes
- Inject errors during the PCM callbacks
Also, this driver can check the playback stream for containing the
predefined pattern, which is used in the corresponding selftest to check
the PCM middle layer data transferring functionality. Additionally, this
driver redefines the default RESET ioctl, and the selftest covers this PCM
API functionality as well.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/valsa.rst | 114 ++++++++++++++++++++++++++++
2 files changed, 115 insertions(+)
create mode 100644 Documentation/admin-guide/valsa.rst
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 43ea35613dfc..328cc59275a1 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -131,6 +131,7 @@ configure specific aspects of kernel behavior to your liking.
thunderbolt
ufs
unicode
+ valsa
vga-softcursor
video-output
xfs
diff --git a/Documentation/admin-guide/valsa.rst b/Documentation/admin-guide/valsa.rst
new file mode 100644
index 000000000000..64ffc130fb4c
--- /dev/null
+++ b/Documentation/admin-guide/valsa.rst
@@ -0,0 +1,114 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+The Virtual ALSA Driver
+=======================
+
+The Virtual ALSA Driver emulates a generic ALSA device, and can be used for
+testing/fuzzing of the userspace ALSA applications, as well as for testing/fuzzing of
+the ALSA middle layer. Additionally, it can be used for simulating hard to reproduce
+problems with PCM devices.
+
+What can this driver do?
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+At this moment the driver can do the following things:
+ * Simulate both capture and playback processes
+ * Generate random or pattern-based capturing data
+ * Inject delays into the playback and capturing processes
+ * Inject errors during the PCM callbacks
+
+Also, this driver can check the playback stream for containing the
+predefined pattern, which is used in the corresponding selftest (alsa/valsa-test.sh)
+to check the PCM middle layer data transferring functionality. Additionally, this
+driver redefines the default RESET ioctl, and the selftest covers this PCM
+API functionality as well.
+
+Configuration
+-------------
+
+The driver has several parameters besides the common ALSA module parameters:
+
+ * fill_mode (bool) - Buffer fill mode (see below)
+ * inject_delay (int)
+ * inject_hwpars_err (bool)
+ * inject_prepare_err (bool)
+ * inject_trigger_err (bool)
+
+
+Capture Data Generation
+-----------------------
+
+The driver has two modes of data generation: the first (0 in the fill_mode parameter)
+means random data generation, the second (1 in the fill_mode) - pattern-based
+data generation. Let's look at the second mode.
+
+First of all, you may want to specify the pattern for data generation. You can do it
+by writing the pattern to the debugfs file (/sys/kernel/debug/valsa/fill_pattern).
+Like that:
+
+.. code-block:: bash
+
+ echo -n mycoolpattern > /sys/kernel/debug/valsa/fill_pattern
+
+After this, every capture action performed on the 'valsa' device will return
+'mycoolpatternmycoolpatternmycoolpatternmy...' in the capturing buffer.
+
+The pattern itself can be up to 4096 bytes long.
+
+Delay injection
+---------------
+
+The driver has 'inject_delay' parameter, which has very self-descriptive name and
+can be used for time delay/speedup simulations. The parameter has integer type, and
+it means the delay added between module's internal timer ticks.
+
+If the 'inject_delay' value is positive, the buffer will be filled slower, if it is
+negative - faster. You can try it yourself by starting a recording in any
+audiorecording application (like Audacity) and selecting the 'valsa' device as a
+source.
+
+This parameter can be also used for generating a huge amount of sound data in a very
+short period of time (with the negative 'inject_delay' value).
+
+Errors injection
+----------------
+
+This module can be used for injecting errors into the PCM communication process. This
+action can help you to figure out how the userspace ALSA program behaves under unusual
+circumstances.
+
+For example, you can make all 'hw_params' PCM callback calls return EBUSY error by
+writing '1' to the 'inject_hwpars_err' module parameter:
+
+.. code-block:: bash
+
+ echo 1 > /sys/module/snd_valsa/parameters/inject_hwpars_err
+
+Errors can be injected into the following PCM callbacks:
+
+ * hw_params (EBUSY)
+ * prepare (EINVAL)
+ * trigger (EINVAL)
+
+
+Playback test
+-------------
+
+This driver can be also used for the playback functionality testing - every time you
+write the playback data to the 'valsa' PCM device and close it, the driver checks the
+buffer for containing the looped pattern (which is specified in the fill_pattern
+debugfs file). If the playback buffer content represents the looped pattern, 'pc_test'
+debugfs entry is set into '1'. Otherwise, the driver sets it to '0'.
+
+ioctl redefinition test
+-----------------------
+
+The driver redefines the 'reset' ioctl, which is default for all PCM devices. To test
+this functionality, we can trigger the reset ioctl and check the 'ioctl_test' debugfs
+entry:
+
+.. code-block:: bash
+
+ cat /sys/kernel/debug/valsa/ioctl_test
+
+If the ioctl is triggered successfully, this file will contain '1', and '0' otherwise.
--
2.34.1
This pachset aims to improve and make more robust the selftests performed to
check whether SRv6 End.DT4 beahvior works as expected under different system
configurations.
Some Linux distributions enable Deduplication Address Detection and Reverse
Path Filtering mechanisms by default which can interfere with SRv6 End.DT4
behavior and cause selftests to fail.
The following patches improve selftests for End.DT4 by taking these two
mechanisms into account. Specifically:
- patch 1/2: selftests: seg6: disable DAD on IPv6 router cfg for
srv6_end_dt4_l3vpn_test
- patch 2/2: selftets: seg6: disable rp_filter by default in
srv6_end_dt4_l3vpn_test
Thank you all,
Andrea
Andrea Mayer (2):
selftests: seg6: disable DAD on IPv6 router cfg for
srv6_end_dt4_l3vpn_test
selftets: seg6: disable rp_filter by default in
srv6_end_dt4_l3vpn_test
.../selftests/net/srv6_end_dt4_l3vpn_test.sh | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
--
2.20.1
Dzień dobry,
w ramach nowej edycji programu Czyste Powietrze dla klientów indywidualnych mogą otrzymać Państwo do 135 tys. zł wsparcia na zakup pompy ciepła.
Prócz wyższego dofinansowania program zakłada m.in. podwyższenie progów dochodowych oraz możliwość złożenia kolejnego wniosku o dofinansowanie dla tych, którzy już wcześniej skorzystali z Programu.
Jako firma specjalizująca się w dostawie, montażu i serwisie pomp ciepła pomożemy Państwu w uzyskaniu dofinansowania wraz z kompleksową realizacją całego projektu.
Są Państwo zainteresowani?
Pozdrawiam
Damian Hordych
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.
If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed. For example:
# example_simple_test: test aborted during cleanup. continuing without cleaning up
This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Reviewed-by: Benjamin Berg <benjamin.berg(a)intel.com>
Reviewed-by: Maxime Ripard <maxime(a)cerno.tech>
Tested-by: Maxime Ripard <maxime(a)cerno.tech>
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an updated version of / replacement for "kunit: Set the current
KUnit context when cleaning up", which instead creates a new kthread
for cleanup tasks if the original test kthread is aborted. This protects
us from failed assertions during cleanup, if the test exited early.
Changes since v3:
https://lore.kernel.org/all/20230421040218.2156548-1-davidgow@google.com/
- Get rid of a unused 'suite' variable (kernel test robot)
- Add Benjamin and Maxime's Reviewed-by tags.
Changes since v2:
https://lore.kernel.org/linux-kselftest/20230419085426.1671703-1-davidgow@g…
- Always run cleanup in its own kthread
- Therefore, never attempt to re-run it if it exits
- Thanks, Benjamin.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@go…
- Move cleanup execution to another kthread
- (Thanks, Benjamin, for pointing out the assertion issues)
---
lib/kunit/test.c | 56 +++++++++++++++++++++++++++++++++++++++++-------
1 file changed, 48 insertions(+), 8 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..f5e4ceffd282 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -419,15 +419,54 @@ static void kunit_try_run_case(void *data)
* thread will resume control and handle any necessary clean up.
*/
kunit_run_case_internal(test, suite, test_case);
- /* This line may never be reached. */
+}
+
+static void kunit_try_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ struct kunit_suite *suite = ctx->suite;
+
+ current->kunit_test = test;
+
kunit_run_case_cleanup(test, suite);
}
+static void kunit_catch_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+ /* It is always a failure if cleanup aborts. */
+ kunit_set_failure(test);
+
+ if (try_exit_code) {
+ /*
+ * Test case could not finish, we have no idea what state it is
+ * in, so don't do clean up.
+ */
+ if (try_exit_code == -ETIMEDOUT) {
+ kunit_err(test, "test case cleanup timed out\n");
+ /*
+ * Unknown internal error occurred preventing test case from
+ * running, so there is nothing to clean up.
+ */
+ } else {
+ kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+ try_exit_code);
+ }
+ return;
+ }
+
+ kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
static void kunit_catch_run_case(void *data)
{
struct kunit_try_catch_context *ctx = data;
struct kunit *test = ctx->test;
- struct kunit_suite *suite = ctx->suite;
int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
if (try_exit_code) {
@@ -448,12 +487,6 @@ static void kunit_catch_run_case(void *data)
}
return;
}
-
- /*
- * Test case was run, but aborted. It is the test case's business as to
- * whether it failed or not, we just need to clean up.
- */
- kunit_run_case_cleanup(test, suite);
}
/*
@@ -478,6 +511,13 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
context.test_case = test_case;
kunit_try_catch_run(try_catch, &context);
+ /* Now run the cleanup */
+ kunit_try_catch_init(try_catch,
+ test,
+ kunit_try_run_case_cleanup,
+ kunit_catch_run_case_cleanup);
+ kunit_try_catch_run(try_catch, &context);
+
/* Propagate the parameter result to the test case. */
if (test->status == KUNIT_FAILURE)
test_case->status = KUNIT_FAILURE;
--
2.40.1.521.gf1e218fcd8-goog
Hi All,
In TDX guest, the attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the
guest.
The TDX guest attestation process consists of two steps:
1. TDREPORT generation
2. Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest
measurement data in the format of TDREPORT which is further used to
validate the authenticity of the TDX guest. The second step involves
sending the TDREPORT to a Quoting Enclave (QE) server to generate a
remotely verifiable Quote. TDREPORT by design can only be verified on
the local platform. To support remote verification of the TDREPORT,
TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT
locally and convert it to a remotely verifiable Quote. Although
attestation software can use communication methods like TCP/IP or
vsock to send the TDREPORT to QE, not all platforms support these
communication models. So TDX GHCI specification [1] defines a method
for Quote generation via hypercalls. Please check the discussion from
Google [2] and Alibaba [3] which clarifies the need for hypercall based
Quote generation support. This patch set adds this support.
Support for TDREPORT generation already exists in the TDX guest driver.
This patchset extends the same driver to add the Quote generation
support.
Following are the details of the patch set:
Patch 1/3 -> Adds event notification IRQ support.
Patch 2/3 -> Adds Quote generation support.
Patch 3/3 -> Adds selftest support for Quote generation feature.
[1] https://cdrdv2.intel.com/v1/dl/getContent/726790, section titled "TDG.VP.VMCALL<GetQuote>".
[2] https://lore.kernel.org/lkml/CAAYXXYxxs2zy_978GJDwKfX5Hud503gPc8=1kQ-+JwG_k…
[3] https://lore.kernel.org/lkml/a69faebb-11e8-b386-d591-dbd08330b008@linux.ali…
Kuppuswamy Sathyanarayanan (3):
x86/tdx: Add TDX Guest event notify interrupt support
virt: tdx-guest: Add Quote generation support
selftests/tdx: Test GetQuote TDX attestation feature
Documentation/virt/coco/tdx-guest.rst | 11 ++
arch/x86/coco/tdx/tdx.c | 196 +++++++++++++++++++
arch/x86/include/asm/tdx.h | 8 +
drivers/virt/coco/tdx-guest/tdx-guest.c | 168 +++++++++++++++-
include/uapi/linux/tdx-guest.h | 43 ++++
tools/testing/selftests/tdx/tdx_guest_test.c | 68 ++++++-
6 files changed, 487 insertions(+), 7 deletions(-)
--
2.34.1
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
[ Upstream commit 735b0e0f2d001b7ed9486db84453fb860e764a4d ]
There is a 'malloc' call in vcpu_save_state function, which can
be unsuccessful. This patch will add the malloc failure checking
to avoid possible null dereference and give more information
about test fail reasons.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Link: https://lore.kernel.org/r/20230322144528.704077-1-ivan.orlov0322@gmail.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index acfa1d01e7df0..d9365a9d1c490 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -950,6 +950,7 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vcpu *vcpu)
vcpu_run_complete_io(vcpu);
state = malloc(sizeof(*state) + msr_list->nmsrs * sizeof(state->msrs.entries[0]));
+ TEST_ASSERT(state, "-ENOMEM when allocating kvm state");
vcpu_events_get(vcpu, &state->events);
vcpu_mp_state_get(vcpu, &state->mp_state);
--
2.39.2
From: Ivan Orlov <ivan.orlov0322(a)gmail.com>
[ Upstream commit 735b0e0f2d001b7ed9486db84453fb860e764a4d ]
There is a 'malloc' call in vcpu_save_state function, which can
be unsuccessful. This patch will add the malloc failure checking
to avoid possible null dereference and give more information
about test fail reasons.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
Link: https://lore.kernel.org/r/20230322144528.704077-1-ivan.orlov0322@gmail.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index c39a4353ba194..827647ff3d41b 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -954,6 +954,7 @@ struct kvm_x86_state *vcpu_save_state(struct kvm_vcpu *vcpu)
vcpu_run_complete_io(vcpu);
state = malloc(sizeof(*state) + msr_list->nmsrs * sizeof(state->msrs.entries[0]));
+ TEST_ASSERT(state, "-ENOMEM when allocating kvm state");
vcpu_events_get(vcpu, &state->events);
vcpu_mp_state_get(vcpu, &state->mp_state);
--
2.39.2
The generic fork() implementation in nolibc falls back to the clone()
syscall. On s390 the first two arguments to clone() are swapped compared
to other architectures, breaking the implementation in nolibc.
Add a custom implementation of fork() to s390 that works.
While at it also add a testcase for fork().
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
tools/nolibc: s390: provide custom implementation for sys_fork
tools/nolibc: add testcase for fork()/waitpid()
tools/include/nolibc/arch-s390.h | 8 ++++++++
tools/include/nolibc/sys.h | 2 ++
tools/testing/selftests/nolibc/nolibc-test.c | 20 ++++++++++++++++++++
3 files changed, 30 insertions(+)
---
base-commit: c1c4f33b6be9b3412d9e0ba01b367f4ffe47c379
change-id: 20230415-nolibc-fork-b7087a345166
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
[ This series depends on the VFIO device cdev series ]
Changelog
v7:
* Rebased on top of v6.4-rc1 and cdev v11 candidate
* Fixed a wrong file in replace() API patch
* Added Kevin's "Reviewed-by" to replace() API patch
v6:
https://lore.kernel.org/all/cover.1679939952.git.nicolinc@nvidia.com/
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/all/cover.1679559476.git.nicolinc@nvidia.com/
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/all/cover.1678284812.git.nicolinc@nvidia.com/
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/all/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/all/cover.1677288789.git.nicolinc@nvidia.com/
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/all/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@nvidia.co…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/all/cover.1675802050.git.nicolinc@nvidia.com/
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/all/cover.1675320212.git.nicolinc@nvidia.com/
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v7
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 53 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++++
drivers/vfio/iommufd.c | 11 ++--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +++
tools/testing/selftests/iommu/iommufd.c | 29 +++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++++
9 files changed, 127 insertions(+), 19 deletions(-)
--
2.40.1
In the unlikely case that CLOCK_REALTIME is not defined, variable ret is
not initialized and further accumulation of return values to ret can leave
ret in an undefined state. Fix this by initialized ret to zero and changing
the assignment of ret to an accumulation for the CLOCK_REALTIME case.
Fixes: 03f55c7952c9 ("kselftest: Extend vDSO selftest to clock_getres")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/vDSO/vdso_test_clock_getres.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vDSO/vdso_test_clock_getres.c b/tools/testing/selftests/vDSO/vdso_test_clock_getres.c
index 15dcee16ff72..38d46a8bf7cb 100644
--- a/tools/testing/selftests/vDSO/vdso_test_clock_getres.c
+++ b/tools/testing/selftests/vDSO/vdso_test_clock_getres.c
@@ -84,12 +84,12 @@ static inline int vdso_test_clock(unsigned int clock_id)
int main(int argc, char **argv)
{
- int ret;
+ int ret = 0;
#if _POSIX_TIMERS > 0
#ifdef CLOCK_REALTIME
- ret = vdso_test_clock(CLOCK_REALTIME);
+ ret += vdso_test_clock(CLOCK_REALTIME);
#endif
#ifdef CLOCK_BOOTTIME
--
2.30.2
When we added fd based file streams we created references to STx_FILENO in
stdio.h but these constants are declared in unistd.h which is the last file
included by the top level nolibc.h meaning those constants are not defined
when we try to build stdio.h. This causes programs using nolibc.h to fail
to build.
Reorder the headers to avoid this issue.
Fixes: d449546c957f ("tools/nolibc: implement fd-based FILE streams")
Acked-by: Willy Tarreau <w(a)1wt.eu>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Rebase onto v6.4-rc1.
- This is now a fix for Linus' tree.
- Link to v1: https://lore.kernel.org/r/20230413-nolibc-stdio-fix-v1-1-fa05fc3ba1fe@kerne…
---
tools/include/nolibc/nolibc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 04739a6293c4..05a228a6ee78 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -99,11 +99,11 @@
#include "sys.h"
#include "ctype.h"
#include "signal.h"
+#include "unistd.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "time.h"
-#include "unistd.h"
#include "stackprotector.h"
/* Used by programs to avoid std includes */
---
base-commit: ac9a78681b921877518763ba0e89202254349d1b
change-id: 20230413-nolibc-stdio-fix-fb42de39d099
Best regards,
--
Mark Brown,,, <broonie(a)kernel.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
It might make sense to merge this via the ftrace tree - kselftests often
get merged with the code they test.
Changes in v2:
- Rebase onto v6.4-rc1.
- Link to v1: https://lore.kernel.org/r/20230302-ftrace-kselftest-ktap-v1-1-a84a0765b7ad@…
---
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index d6e106fbce11..a1e955d2de4c 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c3311c8c4089..2506621e75df 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
echo " Options:"
echo " -h|--help Show help message"
echo " -k|--keep Keep passed test logs"
+echo " -K|--ktap Output in KTAP format"
echo " -v|--verbose Increase verbosity of test messages"
echo " -vv Alias of -v -v (Show all results in stdout)"
echo " -vvv Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
KEEP_LOG=1
shift 1
;;
+ --ktap|-K)
+ KTAP=1
+ shift 1
+ ;;
--verbose|-v|-vv|-vvv)
if [ $VERBOSE -eq -1 ]; then
usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR`
LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
KEEP_LOG=0
+KTAP=0
DEBUG=0
VERBOSE=0
UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
newline=
shift
fi
- printf "$*$newline"
+ [ "$KTAP" != "1" ] && printf "$*$newline"
[ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
}
catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
INSTANCE=
CASENO=0
+CASENAME=
testcase() { # testfile
CASENO=$((CASENO+1))
- desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
- prlog -n "[$CASENO]$INSTANCE$desc"
+ CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
}
checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
grep -q "^#[ \t]*flags:.*instance" $1
}
+ktaptest() { # result comment
+ if [ "$KTAP" != "1" ]; then
+ return
+ fi
+
+ local result=
+ if [ "$1" = "1" ]; then
+ result="ok"
+ else
+ result="not ok"
+ fi
+ shift
+
+ local comment=$*
+ if [ "$comment" != "" ]; then
+ comment="# $comment"
+ fi
+
+ echo $CASENO $result $INSTANCE$CASENAME $comment
+}
+
eval_result() { # sigval
case $1 in
$PASS)
prlog " [${color_green}PASS${color_reset}]"
+ ktaptest 1
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
prlog " [${color_red}FAIL${color_reset}]"
+ ktaptest 0
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
prlog " [${color_blue}UNRESOLVED${color_reset}]"
+ ktaptest 0 UNRESOLVED
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return $UNRESOLVED_RESULT # depends on use case
;;
$UNTESTED)
prlog " [${color_blue}UNTESTED${color_reset}]"
+ ktaptest 1 SKIP
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
prlog " [${color_blue}UNSUPPORTED${color_reset}]"
+ ktaptest 1 SKIP
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
prlog " [${color_green}XFAIL${color_reset}]"
+ ktaptest 1 XFAIL
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
prlog " [${color_blue}UNDEFINED${color_reset}]"
+ ktaptest 0 error
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
run_test() { # testfile
local testname=`basename $1`
testcase $1
+ prlog -n "[$CASENO]$INSTANCE$CASENAME"
if [ ! -z "$LOG_FILE" ] ; then
local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
else
@@ -405,6 +440,17 @@ run_test() { # testfile
# load in the helper functions
. $TEST_DIR/functions
+if [ "$KTAP" = "1" ]; then
+ echo "TAP version 13"
+
+ casecount=`echo $TEST_CASES | wc -w`
+ for t in $TEST_CASES; do
+ test_on_instance $t || continue
+ casecount=$((casecount+1))
+ done
+ echo "1..${casecount}"
+fi
+
# Main loop
for t in $TEST_CASES; do
run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
+if [ "$KTAP" = "1" ]; then
+ echo -n "# Totals:"
+ echo -n " pass:"`echo $PASSED_CASES | wc -w`
+ echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+ echo -n " xpass:0"
+ echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+ echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+ echo
+fi
+
cleanup
# if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755
index 000000000000..b3284679ef3a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/ftracetest-ktap
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
---
base-commit: ac9a78681b921877518763ba0e89202254349d1b
change-id: 20230302-ftrace-kselftest-ktap-9d7878691557
Best regards,
--
Mark Brown,,, <broonie(a)kernel.org>
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai(a)intel.com>
---
tools/testing/selftests/sgx/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 75af864e07b6..50aab6b57da3 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
--
2.25.1
Enabling a (modular) test must not silently enable additional kernel
functionality, as that may increase the attack vector of a product.
Fix this by:
1. making REGMAP visible if CONFIG_KUNIT_ALL_TESTS is enabled,
2. making REGMAP_KUNIT depend on REGMAP instead of selecting it.
After this, one can safely enable CONFIG_KUNIT_ALL_TESTS=m to build
modules for all appropriate tests for ones system, without pulling in
extra unwanted functionality, while still allowing a tester to manually
enable REGMAP and its test suite on a system where REGMAP is not enabled
by default.
Fixes: 2238959b6ad27040 ("regmap: Add some basic kunit tests")
Signed-off-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
v2:
- Make REGMAP visible if CONFIG_KUNIT_ALL_TESTS is enabled.
---
drivers/base/regmap/Kconfig | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/base/regmap/Kconfig b/drivers/base/regmap/Kconfig
index 33a8366e22a584a5..0db2021f7477f2ab 100644
--- a/drivers/base/regmap/Kconfig
+++ b/drivers/base/regmap/Kconfig
@@ -4,16 +4,23 @@
# subsystems should select the appropriate symbols.
config REGMAP
+ bool "Register Map support" if KUNIT_ALL_TESTS
default y if (REGMAP_I2C || REGMAP_SPI || REGMAP_SPMI || REGMAP_W1 || REGMAP_AC97 || REGMAP_MMIO || REGMAP_IRQ || REGMAP_SOUNDWIRE || REGMAP_SOUNDWIRE_MBQ || REGMAP_SCCB || REGMAP_I3C || REGMAP_SPI_AVMM || REGMAP_MDIO || REGMAP_FSI)
select IRQ_DOMAIN if REGMAP_IRQ
select MDIO_BUS if REGMAP_MDIO
- bool
+ help
+ Enable support for the Register Map (regmap) access API.
+
+ Usually, this option is automatically selected when needed.
+ However, you may want to enable it manually for running the regmap
+ KUnit tests.
+
+ If unsure, say N.
config REGMAP_KUNIT
tristate "KUnit tests for regmap"
- depends on KUNIT
+ depends on KUNIT && REGMAP
default KUNIT_ALL_TESTS
- select REGMAP
select REGMAP_RAM
config REGMAP_AC97
--
2.34.1
These patches relax a few verifier requirements around dynptrs.
Patches 1-3 are unchanged from v2, apart from rebasing
Patch 4 is the same as in v1, see
https://lore.kernel.org/bpf/CA+PiJmST4WUH061KaxJ4kRL=fqy3X6+Wgb2E2rrLT5OYjU…
Patch 5 adds a test for the change in Patch 4
Daniel Rosenberg (5):
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Test allowing NULL buffer in dynptr slice
selftests/bpf: Check overflow in optional buffer
bpf: verifier: Accept dynptr mem as mem in helpers
selftests/bpf: Accept mem from dynptr in helper funcs
Documentation/bpf/kfuncs.rst | 23 ++++++++++-
include/linux/skbuff.h | 2 +-
kernel/bpf/helpers.c | 30 +++++++++------
kernel/bpf/verifier.c | 21 ++++++++--
.../testing/selftests/bpf/prog_tests/dynptr.c | 2 +
.../testing/selftests/bpf/progs/dynptr_fail.c | 20 ++++++++++
.../selftests/bpf/progs/dynptr_success.c | 38 +++++++++++++++++++
7 files changed, 118 insertions(+), 18 deletions(-)
base-commit: f4dea9689c5fea3d07170c2cb0703e216f1a0922
--
2.40.1.521.gf1e218fcd8-goog
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v6->v7: Addressed comments from Hao Luo
- Get rid of unnecessary check
Details in here:
https://lore.kernel.org/all/20230505060818.60037-1-zhoufeng.zf@bytedance.co…
v5->v6: Addressed comments from Yonghong Song
- Some code format modifications.
- Add ack-by
Details in here:
https://lore.kernel.org/all/20230504031513.13749-1-zhoufeng.zf@bytedance.co…
v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.co…
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 17 ++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 53 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++
4 files changed, 122 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
This patch series improves guest vCPUs performances on Arm during clearing
dirty log operations by taking MMU read lock instead of MMU write lock.
vCPUs write protection faults are fixed in Arm using MMU read locks.
However, when userspace is clearing dirty logs via KVM_CLEAR_DIRTY_LOG
ioctl, then kernel code takes MMU write lock. This will block vCPUs
write protection faults and degrade guest performance. This
degradation gets worse as guest VM size increases in terms of memory and
vCPU count.
In this series, MMU read lock adoption is made possible by using
KVM_PGTABLE_WALK_SHARED flag in page walker.
Patches 1 to 5:
These patches are modifying dirty_log_perf_test. Intent is to mimic
production scenarios where guest keeps on executing while userspace
threads collect and clear dirty logs independently.
Three new command line options are added:
1. j: Allows to run guest vCPUs and main thread collecting dirty logs
independently of each other after initialization is complete.
2. k: Allows to clear dirty logs in smaller chunks compared to existing
whole memslot clear in one call.
3. l: Allows to add customizable wait time between consecutive clear
dirty log calls to mimic sending dirty memory to destination.
Patch 7-8:
These patches refactor code to move MMU lock operations to arch specific
code, refactor Arm's page table walker APIs, and change MMU write lock
for clearing dirty logs to read lock. Patch 8 has results showing
improvements based on dirty_log_perf_test.
Vipin Sharma (9):
KVM: selftests: Allow dirty_log_perf_test to clear dirty memory in
chunks
KVM: selftests: Add optional delay between consecutive Clear-Dirty-Log
calls
KVM: selftests: Pass count of read and write accesses from guest to
host
KVM: selftests: Print read and write accesses of pages by vCPUs in
dirty_log_perf_test
KVM: selftests: Allow independent execution of vCPUs in
dirty_log_perf_test
KVM: arm64: Correct the kvm_pgtable_stage2_flush() documentation
KVM: mmu: Move mmu lock/unlock to arch code for clear dirty log
KMV: arm64: Allow stage2_apply_range_sched() to pass page table walker
flags
KVM: arm64: Run clear-dirty-log under MMU read lock
arch/arm64/include/asm/kvm_pgtable.h | 17 ++-
arch/arm64/kvm/hyp/nvhe/mem_protect.c | 4 +-
arch/arm64/kvm/hyp/pgtable.c | 16 ++-
arch/arm64/kvm/mmu.c | 36 ++++--
arch/mips/kvm/mmu.c | 2 +
arch/riscv/kvm/mmu.c | 2 +
arch/x86/kvm/mmu/mmu.c | 3 +
.../selftests/kvm/dirty_log_perf_test.c | 108 ++++++++++++++----
.../testing/selftests/kvm/include/memstress.h | 13 ++-
tools/testing/selftests/kvm/lib/memstress.c | 43 +++++--
virt/kvm/dirty_ring.c | 2 -
virt/kvm/kvm_main.c | 4 -
12 files changed, 185 insertions(+), 65 deletions(-)
base-commit: 95b9779c1758f03cf494e8550d6249a40089ed1c
--
2.40.0.634.g4ca3ef3211-goog
There is a spelling mistake in a message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/cachestat/test_cachestat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/cachestat/test_cachestat.c b/tools/testing/selftests/cachestat/test_cachestat.c
index c3823b809c25..9be2262e5c17 100644
--- a/tools/testing/selftests/cachestat/test_cachestat.c
+++ b/tools/testing/selftests/cachestat/test_cachestat.c
@@ -191,7 +191,7 @@ bool test_cachestat_shmem(void)
}
if (ftruncate(fd, filesize)) {
- ksft_print_msg("Unable to trucate shmem file.\n");
+ ksft_print_msg("Unable to truncate shmem file.\n");
ret = false;
goto close_fd;
}
--
2.30.2
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v5->v6: Addressed comments from Yonghong Song
- Some code format modifications.
- Add ack-by
Details in here:
https://lore.kernel.org/all/20230504031513.13749-1-zhoufeng.zf@bytedance.co…
v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.co…
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 +++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 53 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++
4 files changed, 125 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v4->v5: Addressed comments from Yonghong Song
- Some code format modifications.
Details in here:
https://lore.kernel.org/all/20230428071737.43849-1-zhoufeng.zf@bytedance.co…
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 +++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 54 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 ++++++++++++++++++
4 files changed, 126 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
Verify that calling clone3 with an exit signal (SIGCHLD) in flags will
fail.
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/clone3/clone3.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index e495f895a2cd..e60cf4da8fb0 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -129,7 +129,7 @@ int main(int argc, char *argv[])
uid_t uid = getuid();
ksft_print_header();
- ksft_set_plan(18);
+ ksft_set_plan(19);
test_clone3_supported();
/* Just a simple clone3() should return 0.*/
@@ -198,5 +198,8 @@ int main(int argc, char *argv[])
/* Do a clone3() in a new time namespace */
test_clone3(CLONE_NEWTIME, 0, 0, CLONE3_ARGS_NO_TEST);
+ /* Do a clone3() with exit signal (SIGCHLD) in flags */
+ test_clone3(SIGCHLD, 0, -EINVAL, CLONE3_ARGS_NO_TEST);
+
ksft_finished();
}
--
2.40.0
Hi all,
This patch series fixes a crash in the new input selftest, and makes the
test available when the KUnit framework is modular.
Unfortunately test 3 still fails for me (tested on Koelsch (R-Car M2-W)
and ARAnyM):
KTAP version 1
# Subtest: input_core
1..3
input: Test input device as /devices/virtual/input/input1
ok 1 input_test_polling
input: Test input device as /devices/virtual/input/input2
ok 2 input_test_timestamp
input: Test input device as /devices/virtual/input/input3
# input_test_match_device_id: ASSERTION FAILED at # drivers/input/tests/input_test.c:99
Expected input_match_device_id(input_dev, &id) to be true, but is false
not ok 3 input_test_match_device_id
# input_core: pass:2 fail:1 skip:0 total:3
# Totals: pass:2 fail:1 skip:0 total:3
not ok 1 input_core
Thanks!
Geert Uytterhoeven (2):
Input: tests - fix use-after-free and refcount underflow in
input_test_exit()
Input: tests - modular KUnit tests should not depend on KUNIT=y
drivers/input/Kconfig | 2 +-
drivers/input/tests/input_test.c | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
--
2.34.1
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
When the documentation was updated for modular tests, the dependency on
"KUNIT=y" was forgotten to be updated, now encouraging people to create
tests that cannot be enabled when the KUNIT framework itself is modular.
Fix this by changing the dependency to "KUNIT".
Document when it is appropriate (and required) to depend on "KUNIT=y".
Fixes: c9ef2d3e3f3b3e56 ("KUnit: Docs: make start.rst example Kconfig follow style.rst")
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
---
Documentation/dev-tools/kunit/start.rst | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/Documentation/dev-tools/kunit/start.rst b/Documentation/dev-tools/kunit/start.rst
index c736613c9b199bff..9619a044093042ce 100644
--- a/Documentation/dev-tools/kunit/start.rst
+++ b/Documentation/dev-tools/kunit/start.rst
@@ -256,9 +256,12 @@ Now we are ready to write the test cases.
config MISC_EXAMPLE_TEST
tristate "Test for my example" if !KUNIT_ALL_TESTS
- depends on MISC_EXAMPLE && KUNIT=y
+ depends on MISC_EXAMPLE && KUNIT
default KUNIT_ALL_TESTS
+Note: If your test does not support being built as a loadable module (which is
+discouraged), replace tristate by bool, and depend on KUNIT=y instead of KUNIT.
+
3. Add the following lines to ``drivers/misc/Makefile``:
.. code-block:: make
--
2.34.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v3->v4: Addressed comments from Yonghong Song
- Modify test cases and test other tasks, not the current task.
Details in here:
https://lore.kernel.org/all/20230427023019.73576-1-zhoufeng.zf@bytedance.co…
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 +++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 55 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 51 +++++++++++++++++
4 files changed, 127 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
These patches relax a few verifier requirements around dynptrs.
I was unable to test the patch in 0003 due to unrelated issues compiling the
bpf selftests, but did run an equivalent local test program.
This is the issue I was running into:
progs/cgrp_ls_attach_cgroup.c:17:15: error: use of undeclared identifier 'BPF_MAP_TYPE_CGRP_STORAGE'; did you mean 'BPF_MAP_TYPE_CGROUP_STORAGE'?
__uint(type, BPF_MAP_TYPE_CGRP_STORAGE);
^~~~~~~~~~~~~~~~~~~~~~~~~
BPF_MAP_TYPE_CGROUP_STORAGE
/ssd/kernel/fuse-bpf/tools/testing/selftests/bpf/tools/include/bpf/bpf_helpers.h:13:39: note: expanded from macro '__uint'
#define __uint(name, val) int (*name)[val]
^
/ssd/kernel/fuse-bpf/tools/testing/selftests/bpf/tools/include/vmlinux.h:27892:2: note: 'BPF_MAP_TYPE_CGROUP_STORAGE' declared here
BPF_MAP_TYPE_CGROUP_STORAGE = 19,
^
1 error generated.
Daniel Rosenberg (3):
bpf: verifier: Accept dynptr mem as mem in helpers
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Test allowing NULL buffer in dynptr slice
Documentation/bpf/kfuncs.rst | 23 ++++++++++++-
kernel/bpf/helpers.c | 32 ++++++++++++-------
kernel/bpf/verifier.c | 21 ++++++++++++
.../testing/selftests/bpf/prog_tests/dynptr.c | 1 +
.../selftests/bpf/progs/dynptr_success.c | 21 ++++++++++++
5 files changed, 85 insertions(+), 13 deletions(-)
base-commit: 5af607a861d43ffff830fc1890033e579ec44799
--
2.40.0.577.gac1e443424-goog
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v4: - Move dev_sdif() to include/linux/netdevice.h as suggested by Stanislav Fomichev
- Remove SYS and SYS_NOFAIL duplicate definitions
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
include/linux/netdevice.h | 9 +
net/core/filter.c | 123 +++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 312 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
4 files changed, 511 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
> This adds the general_profit KSM sysfs knob and the process profit metric
> knobs to ksm_stat.
>
> 1) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 2) document general_profit sysfs knob
>
> 3) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 4) mm: expose ksm process profit metric in ksm_stat
>
> This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
> The documentation mentions the formula for the ksm process profit
> metric, however it does not calculate it. In addition the formula
> depends on the size of internal structures. So it makes sense to
> expose it.
>
Hi, Stefan, I think you should give some credits to me about my contributions on
the concept and formula of ksm profit (process wide and system wide), it's kind
of idea stealing.
Besides, the idea of Process control KSM was proposed by me last year although you use
prctl instead of /proc fs. you even didn't CC my email. I think you should CC my email
(xu.xin16(a)zte.com.cn) as least.
> 5) document new procfs ksm knobs
>
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Acked-by: David Hildenbrand <david(a)redhat.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
> Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +++++++
> Documentation/admin-guide/mm/ksm.rst | 5 ++++-
> fs/proc/base.c | 3 +++
> include/linux/ksm.h | 4 ++++
> mm/ksm.c | 21 +++++++++++++++++++
> 5 files changed, 40 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-ksm b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> index d244674a9480..6041a025b65a 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> @@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
>
> When it is set to 0 only pages from the same node are merged,
> otherwise pages from all nodes can be merged together (default).
> +
> +What: /sys/kernel/mm/ksm/general_profit
> +Date: April 2023
> +KernelVersion: 6.4
> +Contact: Linux memory management mailing list <linux-mm(a)kvack.org>
> +Description: Measure how effective KSM is.
> + general_profit: how effective is KSM. The formula for the
> + calculation is in Documentation/admin-guide/mm/ksm.rst.
> diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
On some distributions, the rp_filter is automatically set (=1) by
default on a netdev basis (also on VRFs).
In an SRv6 End.DT46 behavior, decapsulated IPv4 packets are routed using
the table associated with the VRF bound to that tunnel. During lookup
operations, the rp_filter can lead to packet loss when activated on the
VRF.
Therefore, we chose to make this selftest more robust by explicitly
disabling the rp_filter during tests (as it is automatically set by some
Linux distributions).
Fixes: 03a0b567a03d ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior")
Reported-by: Hangbin Liu <liuhangbin(a)gmail.com>
Signed-off-by: Andrea Mayer <andrea.mayer(a)uniroma2.it>
Tested-by: Hangbin Liu <liuhangbin(a)gmail.com>
---
.../testing/selftests/net/srv6_end_dt46_l3vpn_test.sh | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh b/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
index aebaab8ce44c..441eededa031 100755
--- a/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
+++ b/tools/testing/selftests/net/srv6_end_dt46_l3vpn_test.sh
@@ -292,6 +292,11 @@ setup_hs()
ip netns exec ${hsname} sysctl -wq net.ipv6.conf.all.accept_dad=0
ip netns exec ${hsname} sysctl -wq net.ipv6.conf.default.accept_dad=0
+ # disable the rp_filter otherwise the kernel gets confused about how
+ # to route decap ipv4 packets.
+ ip netns exec ${rtname} sysctl -wq net.ipv4.conf.all.rp_filter=0
+ ip netns exec ${rtname} sysctl -wq net.ipv4.conf.default.rp_filter=0
+
ip -netns ${hsname} link add veth0 type veth peer name ${rtveth}
ip -netns ${hsname} link set ${rtveth} netns ${rtname}
ip -netns ${hsname} addr add ${IPv6_HS_NETWORK}::${hs}/64 dev veth0 nodad
@@ -316,11 +321,6 @@ setup_hs()
ip netns exec ${rtname} sysctl -wq net.ipv6.conf.${rtveth}.proxy_ndp=1
ip netns exec ${rtname} sysctl -wq net.ipv4.conf.${rtveth}.proxy_arp=1
- # disable the rp_filter otherwise the kernel gets confused about how
- # to route decap ipv4 packets.
- ip netns exec ${rtname} sysctl -wq net.ipv4.conf.all.rp_filter=0
- ip netns exec ${rtname} sysctl -wq net.ipv4.conf.${rtveth}.rp_filter=0
-
ip netns exec ${rtname} sh -c "echo 1 > /proc/sys/net/vrf/strict_mode"
}
--
2.20.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *s to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before s is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/powerpc/stringloops/strlen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
index 9055ebc484d0..f9c1f9cc2d32 100644
--- a/tools/testing/selftests/powerpc/stringloops/strlen.c
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
@@ -51,10 +50,11 @@ static void bench_test(char *s)
static int testcase(void)
{
char *s;
+ int ret;
unsigned long i;
- s = memalign(128, SIZE);
- if (!s) {
+ ret = posix_memalign((void **)&s, 128, SIZE);
+ if (ret < 0) {
perror("memalign");
exit(1);
}
--
2.27.0
For cases like IPv6 addresses, having a means to supply tracing
predicates for fields with more than 8 bytes would be convenient.
This series provides a simple way to support this by allowing
simple ==, != memory comparison with the predicate supplied when
the size of the field exceeds 8 bytes. For example, to trace
::1, the predicate
"dst == 0x00000000000000000000000000000001"
..could be used.
Patch 1 provides the support for > 8 byte fields via a memcmp()-style
predicate. Patch 2 adds tests for filter predicates, and patch 3
documents the fact that for > 8 bytes. only == and != are supported.
Changes since RFC [1]:
- originally a fix was intermixed with the new functionality as
patch 1 in series [1]; the fix landed separately
- small tweaks to how filter predicates are defined via fn_num as
opposed to via fn directly
[1] https://lore.kernel.org/lkml/1659910883-18223-1-git-send-email-alan.maguire…
Alan Maguire (3):
tracing: support > 8 byte array filter predicates
selftests/ftrace: add test coverage for filter predicates
tracing: document > 8 byte numeric filtering support
Documentation/trace/events.rst | 9 +++
kernel/trace/trace_events_filter.c | 55 +++++++++++++++-
.../selftests/ftrace/test.d/event/filter.tc | 62 +++++++++++++++++++
3 files changed, 125 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/ftrace/test.d/event/filter.tc
--
2.31.1
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which factor out the tc helper's
logic which was shared with cg/sk_skb (which operate correctly).
This refactoring is needed in order to avoid affecting the cgroup/sk_skb
flows as there does not seem to be a strict criteria for discerning which
flow the helper is called from based on the net device or packet
information.
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v3: - Rename bpf_l2_sdif() to dev_sdif() as suggested by Stanislav Fomichev
- Added xdp tests as suggested by Daniel Borkmann
- Use start_server() to avoid duplicate code as suggested by Stanislav Fomichev
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add vrf_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/vrf_socket_lookup.c | 327 ++++++++++++++++++
.../selftests/bpf/progs/vrf_socket_lookup.c | 88 +++++
3 files changed, 526 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/vrf_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/vrf_socket_lookup.c
--
2.34.1
From: SeongJae Park <sjpark(a)amazon.de>
When running a test program, 'run_one()' checks if the program has the
execution permission and fails if it doesn't. However, it's easy to
mistakenly missing the permission, as some common tools like 'diff'
don't support the permission change well[1]. Compared to that, making
mistakes in the test program's path would only rare, as those are
explicitly listed in 'TEST_PROGS'. Therefore, it might make more sense
to resolve the situation on our own and run the program.
For the reason, this commit makes the test program runner function to
still print the warning message but try parsing the interpreter of the
program and explicitly run it with the interpreter, in the case.
[1] https://lore.kernel.org/mm-commits/YRJisBs9AunccCD4@kroah.com/
Suggested-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: SeongJae Park <sjpark(a)amazon.de>
---
Changes from v1
(https://lore.kernel.org/linux-kselftest/20210810140459.23990-1-sj38.park@gm…)
- Parse and use the interpreter instead of changing the file
tools/testing/selftests/kselftest/runner.sh | 28 +++++++++++++--------
1 file changed, 18 insertions(+), 10 deletions(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index cc9c846585f0..a9ba782d8ca0 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -33,9 +33,9 @@ tap_timeout()
{
# Make sure tests will time out if utility is available.
if [ -x /usr/bin/timeout ] ; then
- /usr/bin/timeout --foreground "$kselftest_timeout" "$1"
+ /usr/bin/timeout --foreground "$kselftest_timeout" $1
else
- "$1"
+ $1
fi
}
@@ -65,17 +65,25 @@ run_one()
TEST_HDR_MSG="selftests: $DIR: $BASENAME_TEST"
echo "# $TEST_HDR_MSG"
- if [ ! -x "$TEST" ]; then
- echo -n "# Warning: file $TEST is "
- if [ ! -e "$TEST" ]; then
- echo "missing!"
- else
- echo "not executable, correct this."
- fi
+ if [ ! -e "$TEST" ]; then
+ echo "# Warning: file $TEST is missing!"
echo "not ok $test_num $TEST_HDR_MSG"
else
+ cmd="./$BASENAME_TEST"
+ if [ ! -x "$TEST" ]; then
+ echo "# Warning: file $TEST is not executable"
+
+ if [ $(head -n 1 "$TEST" | cut -c -2) = "#!" ]
+ then
+ interpreter=$(head -n 1 "$TEST" | cut -c 3-)
+ cmd="$interpreter ./$BASENAME_TEST"
+ else
+ echo "not ok $test_num $TEST_HDR_MSG"
+ return
+ fi
+ fi
cd `dirname $TEST` > /dev/null
- ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout "$cmd" 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.17.1
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup() kfunc
selftests/bpf: Add testcase for bpf_task_under_cgroup
Changelog:
v2->v3: Addressed comments from Alexei Starovoitov
- Modify the comment information of the function.
- Narrow down the testcase's hook point
Details in here:
https://lore.kernel.org/all/20230421090403.15515-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Alexei Starovoitov
- Add kfunc instead.
Details in here:
https://lore.kernel.org/all/20230420072657.80324-1-zhoufeng.zf@bytedance.co…
kernel/bpf/helpers.c | 20 ++++++++
tools/testing/selftests/bpf/DENYLIST.s390x | 1 +
.../bpf/prog_tests/task_under_cgroup.c | 47 +++++++++++++++++++
.../selftests/bpf/progs/cgrp_kfunc_common.h | 1 +
.../bpf/progs/test_task_under_cgroup.c | 37 +++++++++++++++
5 files changed, 106 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Goals
=====
Support new key and signature formats with the same kernel component.
Verify the authenticity of system data with newly supported data formats.
Mitigate the risk of parsing arbitrary data in the kernel.
Motivation
==========
Adding new functionality to the kernel comes with an increased risk of
introducing new bugs which, once exploited, can lead to a partial or full
system compromise.
Parsing arbitrary data is particularly critical, since it allows an
attacker to send a malicious sequence of bytes to exploit vulnerabilities
in the parser code. The attacker might be able to overwrite kernel memory
to bypass kernel protections, and obtain more privileges.
User Mode Drivers (UMDs) can effectively mitigate this risk. If the parser
runs in user space, even if it has a bug, it won't allow the attacker to
overwrite kernel memory.
The communication protocol between the UMD and the kernel should be simple
enough, that the kernel can immediately recognize malformed data sent by an
attacker controlling the UMD, and discard it.
Solution
========
Register a new parser of the asymmetric key type which, instead of parsing
the key blob, forwards it to a UMD, and populates the key fields from the
UMD response. That response contains the data for each field of the public
key structure, defined in the kernel, and possibly a key description.
Supporting new data formats can be achieved by simply extending the UMD. As
long as the UMD recognizes them, and provides the crypto material to the
kernel Crypto API in the expected format, the kernel does not need to be
aware of the UMD changes.
Add a new API to verify the authenticity of system data, similar to the one
for PKCS#7 signatures. As for the key parser, send the signature to a UMD,
and fill the public_key_signature structure from the UMD response.
The API still supports a very basic trust model, it accepts a key for
signature verification if it is in the supplied keyring. The API can be
extended later to support more sophisticated models.
Use cases
=========
eBPF
----
The eBPF infrastructure already offers to eBPF programs the ability to
verify PKCS#7 signatures, through the bpf_verify_pkcs7_signature() kfunc.
Add the new bpf_verify_umd_signature() kfunc, to allow eBPF programs verify
signatures in a data format that is not PKCS#7 (for example PGP).
IMA Appraisal
-------------
An alternative to appraising each file with its signature (Fedora 38) is to
build a repository of reference file digests from signed RPM headers, and
lookup the calculated digest of files being accessed in that repository
(DIGLIM[1]).
With this patch set, the kernel can verify the authenticity of RPM headers
from their PGP signature against the Linux distribution GPG keys. Once
verified, RPM headers can be parsed with a UMD to build the repository of
reference file digests.
With DIGLIM, Linux distributions are not required to change anything in
their building infrastructure (no extra data in the RPM header, no new PKI
for IMA signatures).
[1]: https://lore.kernel.org/linux-integrity/20210914163401.864635-1-roberto.sas…
UMD development
===============
The header file crypto/asymmetric_keys/umd_key_sig_umh.h contains the
details of the communication protocol between the kernel and the UMD
handler.
The UMD handler should implement the commands defined, CMD_KEY and CMD_SIG,
should set the result of the processing, and fill the key and
signature-specific structures umd_key_msg_out and umd_sig_msg_out.
The UMD handler should provide the key and signature blobs in a format that
is understood by the kernel. For example, for RSA keys, it should provide
them in ASN.1 format (SEQUENCE of INTEGER).
The auth IDs of the keys and signatures should match, for signature
verification. Auth ID matching can be partial.
Patch set dependencies
======================
This patch set depends on 'usermode_driver: Add management library and
API':
https://lore.kernel.org/bpf/20230317145240.363908-1-roberto.sassu@huaweiclo…
Patch set content
=================
Patch 1 introduces the new parser for the asymmetric key type.
Patch 2 introduces the parser for signatures and its API.
Patch 3 introduces the system-level API for signature verification.
Patch 4 extends eBPF to use the new system-level API.
Patch 5 adds a test for UMD-parser signatures (not executed until the UMD
supports PGP).
Patch 6 introduces the skeleton of the UMD handler.
PGP
===
A work in progress implementation of the PGP format (RFC 4880 and RFC 6637)
in the UMD handler is available at:
https://github.com/robertosassu/linux/commits/pgp-signatures-umd-v1-devel-v…
It is based on a previous work of David Howells, available at:
https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-modsign.git/…
The patches have been adapted for use in user space.
Roberto Sassu (6):
KEYS: asymmetric: Introduce UMD-based asymmetric key parser
KEYS: asymmetric: Introduce UMD-based asymmetric key signature parser
verification: Introduce verify_umd_signature() and
verify_umd_message_sig()
bpf: Introduce bpf_verify_umd_signature() kfunc
selftests/bpf: Prepare a test for UMD-parsed signatures
KEYS: asymmetric: Add UMD handler
.gitignore | 3 +
MAINTAINERS | 1 +
certs/system_keyring.c | 125 ++++++
crypto/asymmetric_keys/Kconfig | 32 ++
crypto/asymmetric_keys/Makefile | 23 +
crypto/asymmetric_keys/asymmetric_type.c | 3 +-
crypto/asymmetric_keys/umd_key.h | 28 ++
crypto/asymmetric_keys/umd_key_parser.c | 203 +++++++++
crypto/asymmetric_keys/umd_key_sig_loader.c | 32 ++
crypto/asymmetric_keys/umd_key_sig_umh.h | 71 +++
crypto/asymmetric_keys/umd_key_sig_umh_blob.S | 7 +
crypto/asymmetric_keys/umd_key_sig_umh_user.c | 84 ++++
crypto/asymmetric_keys/umd_sig_parser.c | 416 ++++++++++++++++++
include/crypto/umd_sig.h | 71 +++
include/keys/asymmetric-type.h | 1 +
include/linux/verification.h | 48 ++
kernel/trace/bpf_trace.c | 69 ++-
...ify_pkcs7_sig.c => verify_pkcs7_umd_sig.c} | 109 +++--
...kcs7_sig.c => test_verify_pkcs7_umd_sig.c} | 18 +-
.../testing/selftests/bpf/verify_sig_setup.sh | 82 +++-
20 files changed, 1378 insertions(+), 48 deletions(-)
create mode 100644 crypto/asymmetric_keys/umd_key.h
create mode 100644 crypto/asymmetric_keys/umd_key_parser.c
create mode 100644 crypto/asymmetric_keys/umd_key_sig_loader.c
create mode 100644 crypto/asymmetric_keys/umd_key_sig_umh.h
create mode 100644 crypto/asymmetric_keys/umd_key_sig_umh_blob.S
create mode 100644 crypto/asymmetric_keys/umd_key_sig_umh_user.c
create mode 100644 crypto/asymmetric_keys/umd_sig_parser.c
create mode 100644 include/crypto/umd_sig.h
rename tools/testing/selftests/bpf/prog_tests/{verify_pkcs7_sig.c => verify_pkcs7_umd_sig.c} (75%)
rename tools/testing/selftests/bpf/progs/{test_verify_pkcs7_sig.c => test_verify_pkcs7_umd_sig.c} (82%)
--
2.25.1
Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
ret = kstrtou8(buf, 10, &val);
if (ret)
return ret;
mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
static ssize_t config_num_requests_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int rc;
mutex_lock(&test_fw_mutex);
if (test_fw_config->reqs) {
pr_err("Must call release_all_firmware prior to changing config\n");
rc = -EINVAL;
mutex_unlock(&test_fw_mutex);
goto out;
}
mutex_unlock(&test_fw_mutex);
rc = test_dev_config_update_u8(buf, count,
&test_fw_config->num_requests);
out:
return rc;
}
static ssize_t config_read_fw_idx_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
return test_dev_config_update_u8(buf, count,
&test_fw_config->read_fw_idx);
}
The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.
To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.
Having two locks wouldn't assure a race-proof mutual exclusion.
This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:
static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
int ret;
mutex_lock(&test_fw_mutex);
ret = __test_dev_config_update_u8(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
}
doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.
The similar approach was applied to all functions called from the locked
and the unlocked context, which safely mitigates both deadlocks and race
conditions in the driver.
__test_dev_config_update_bool(), __test_dev_config_update_u8() and
__test_dev_config_update_size_t() unlocked versions of the functions
were introduced to be called from the locked contexts as a workaround
without releasing the main driver's lock and thereof causing a race
condition.
The test_dev_config_update_bool(), test_dev_config_update_u8() and
test_dev_config_update_size_t() locked versions of the functions
are being called from driver methods without the unnecessary multiplying
of the locking and unlocking code for each method, and complicating
the code with saving of the return value across lock.
Fixes: 7feebfa487b92 ("test_firmware: add support for request_firmware_into_buf")
Cc: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Russ Weight <russell.h.weight(a)intel.com>
Cc: Takashi Iwai <tiwai(a)suse.de>
Cc: Tianfei Zhang <tianfei.zhang(a)intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Randy Dunlap <rdunlap(a)infradead.org>
Cc: linux-kselftest(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27(a)gmail.com>
Signed-off-by: Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
---
lib/test_firmware.c | 52 ++++++++++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 17 deletions(-)
diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 05ed84c2fc4c..35417e0af3f4 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -353,16 +353,26 @@ static ssize_t config_test_show_str(char *dst,
return len;
}
-static int test_dev_config_update_bool(const char *buf, size_t size,
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
bool *cfg)
{
int ret;
- mutex_lock(&test_fw_mutex);
if (kstrtobool(buf, cfg) < 0)
ret = -EINVAL;
else
ret = size;
+
+ return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+ bool *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_bool(buf, size, cfg);
mutex_unlock(&test_fw_mutex);
return ret;
@@ -373,7 +383,8 @@ static ssize_t test_dev_config_show_bool(char *buf, bool val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_size_t(const char *buf,
+static int __test_dev_config_update_size_t(
+ const char *buf,
size_t size,
size_t *cfg)
{
@@ -384,9 +395,7 @@ static int test_dev_config_update_size_t(const char *buf,
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(size_t *)cfg = new;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
@@ -402,7 +411,7 @@ static ssize_t test_dev_config_show_int(char *buf, int val)
return snprintf(buf, PAGE_SIZE, "%d\n", val);
}
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
u8 val;
int ret;
@@ -411,14 +420,23 @@ static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
if (ret)
return ret;
- mutex_lock(&test_fw_mutex);
*(u8 *)cfg = val;
- mutex_unlock(&test_fw_mutex);
/* Always return full write size even if we didn't consume all */
return size;
}
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+ int ret;
+
+ mutex_lock(&test_fw_mutex);
+ ret = __test_dev_config_update_u8(buf, size, cfg);
+ mutex_unlock(&test_fw_mutex);
+
+ return ret;
+}
+
static ssize_t test_dev_config_show_u8(char *buf, u8 val)
{
return snprintf(buf, PAGE_SIZE, "%u\n", val);
@@ -471,10 +489,10 @@ static ssize_t config_num_requests_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_u8(buf, count,
- &test_fw_config->num_requests);
+ rc = __test_dev_config_update_u8(buf, count,
+ &test_fw_config->num_requests);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -518,10 +536,10 @@ static ssize_t config_buf_size_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->buf_size);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->buf_size);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
@@ -548,10 +566,10 @@ static ssize_t config_file_offset_store(struct device *dev,
mutex_unlock(&test_fw_mutex);
goto out;
}
- mutex_unlock(&test_fw_mutex);
- rc = test_dev_config_update_size_t(buf, count,
- &test_fw_config->file_offset);
+ rc = __test_dev_config_update_size_t(buf, count,
+ &test_fw_config->file_offset);
+ mutex_unlock(&test_fw_mutex);
out:
return rc;
--
2.30.2
This patchset adds a stress test for kprobes and a test for checking
optimized probes.
The two tests are being added based on the below discussion:
https://lore.kernel.org/all/20230128101622.ce6f8e64d929e29d36b08b73@kernel.…
Changelog:
* Add an explicit fork after enabling the events ( echo "forked" )
* Remove the extended test from multiple_kprobe_types.tc which adds
multiple consecutive probes in a function and add it as a
separate test case.
* Add new test case which checks for optimized probes.
Akanksha J N (2):
selftests/ftrace: Add new test case which adds multiple consecutive
probes in a function
selftests/ftrace: Add new test case which checks for optimized probes
.../test.d/kprobe/kprobe_insn_boundary.tc | 19 +++++++++++
.../ftrace/test.d/kprobe/kprobe_opt_types.tc | 34 +++++++++++++++++++
2 files changed, 53 insertions(+)
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_insn_boundary.tc
create mode 100644 tools/testing/selftests/ftrace/test.d/kprobe/kprobe_opt_types.tc
--
2.31.1
This is v1 of the KUnit deferred actions API, which implements an
equivalent of devm_add_action[1] on top of KUnit managed resources. This
provides a simple way of scheduling a function to run when the test
terminates (whether successfully, or with an error). It's therefore very
useful for freeing resources, or otherwise cleaning up.
The notable changes since RFCv2[2] are:
- Got rid of the 'cancellation token' concept. It was overcomplicated,
and we can add it back if we need to.
- kunit_add_action() therefore now returns 0 on success, and an error
otherwise (like devm_add_action()). Though you may wish to use:
- Added kunit_add_action_or_reset(), which will call the deferred
function if an error occurs. (See devm_add_action_or_reset()). This
also returns an error on failure, which can be asserted safely.
- Got rid of the function pointer typedef. Personally, I liked it, but
it's more typedef-y than most kernel code.
- Got rid of the 'internal_gfp' argument: all internal state is now
allocated with GFP_KERNEL. The main KUnit resource API can be used
instead if this doesn't work for your use-case.
I'd love to hear any further thoughts!
Cheers,
-- David
[1]: https://docs.kernel.org/driver-api/basics.html#c.devm_add_action
[2]: https://patchwork.kernel.org/project/linux-kselftest/list/?series=735720
David Gow (3):
kunit: Add kunit_add_action() to defer a call until test exit
kunit: executor_test: Use kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
include/kunit/resource.h | 76 +++++++++++++++++++++++++++++++
include/kunit/test.h | 10 ++++-
lib/kunit/executor_test.c | 11 ++---
lib/kunit/kunit-test.c | 88 +++++++++++++++++++++++++++++++++++-
lib/kunit/resource.c | 95 +++++++++++++++++++++++++++++++++++++++
lib/kunit/test.c | 48 ++++----------------
6 files changed, 279 insertions(+), 49 deletions(-)
--
2.40.0.634.g4ca3ef3211-goog
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.
If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed. For example:
# example_simple_test: test aborted during cleanup. continuing without cleaning up
This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an updated version of / replacement of "kunit: Set the current
KUnit context when cleaning up", which instead creates a new kthread
for cleanup tasks if the original test kthread is aborted. This protects
us from failed assertions during cleanup, if the test exited early.
Changes since v2:
https://lore.kernel.org/linux-kselftest/20230419085426.1671703-1-davidgow@g…
- Always run cleanup in its own kthread
- Therefore, never attempt to re-run it if it exits
- Thanks, Benjamin.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@go…
- Move cleanup execution to another kthread
- (Thanks, Benjamin, for pointing out the assertion issues)
---
lib/kunit/test.c | 55 ++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 48 insertions(+), 7 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..2025e51941e6 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -419,10 +419,50 @@ static void kunit_try_run_case(void *data)
* thread will resume control and handle any necessary clean up.
*/
kunit_run_case_internal(test, suite, test_case);
- /* This line may never be reached. */
+}
+
+static void kunit_try_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ struct kunit_suite *suite = ctx->suite;
+
+ current->kunit_test = test;
+
kunit_run_case_cleanup(test, suite);
}
+static void kunit_catch_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+ /* It is always a failure if cleanup aborts. */
+ kunit_set_failure(test);
+
+ if (try_exit_code) {
+ /*
+ * Test case could not finish, we have no idea what state it is
+ * in, so don't do clean up.
+ */
+ if (try_exit_code == -ETIMEDOUT) {
+ kunit_err(test, "test case cleanup timed out\n");
+ /*
+ * Unknown internal error occurred preventing test case from
+ * running, so there is nothing to clean up.
+ */
+ } else {
+ kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+ try_exit_code);
+ }
+ return;
+ }
+
+ kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
static void kunit_catch_run_case(void *data)
{
struct kunit_try_catch_context *ctx = data;
@@ -448,12 +488,6 @@ static void kunit_catch_run_case(void *data)
}
return;
}
-
- /*
- * Test case was run, but aborted. It is the test case's business as to
- * whether it failed or not, we just need to clean up.
- */
- kunit_run_case_cleanup(test, suite);
}
/*
@@ -478,6 +512,13 @@ static void kunit_run_case_catch_errors(struct kunit_suite *suite,
context.test_case = test_case;
kunit_try_catch_run(try_catch, &context);
+ /* Now run the cleanup */
+ kunit_try_catch_init(try_catch,
+ test,
+ kunit_try_run_case_cleanup,
+ kunit_catch_run_case_cleanup);
+ kunit_try_catch_run(try_catch, &context);
+
/* Propagate the parameter result to the test case. */
if (test->status == KUNIT_FAILURE)
test_case->status = KUNIT_FAILURE;
--
2.40.0.634.g4ca3ef3211-goog
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Acked-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Tested-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Update help text.
- Link to v1: https://lore.kernel.org/r/20230302-ftrace-kselftest-ktap-v1-1-a84a0765b7ad@…
---
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index d6e106fbce11..a1e955d2de4c 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c3311c8c4089..2506621e75df 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
echo " Options:"
echo " -h|--help Show help message"
echo " -k|--keep Keep passed test logs"
+echo " -K|--ktap Output in KTAP format"
echo " -v|--verbose Increase verbosity of test messages"
echo " -vv Alias of -v -v (Show all results in stdout)"
echo " -vvv Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
KEEP_LOG=1
shift 1
;;
+ --ktap|-K)
+ KTAP=1
+ shift 1
+ ;;
--verbose|-v|-vv|-vvv)
if [ $VERBOSE -eq -1 ]; then
usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR`
LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
KEEP_LOG=0
+KTAP=0
DEBUG=0
VERBOSE=0
UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
newline=
shift
fi
- printf "$*$newline"
+ [ "$KTAP" != "1" ] && printf "$*$newline"
[ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
}
catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
INSTANCE=
CASENO=0
+CASENAME=
testcase() { # testfile
CASENO=$((CASENO+1))
- desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
- prlog -n "[$CASENO]$INSTANCE$desc"
+ CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
}
checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
grep -q "^#[ \t]*flags:.*instance" $1
}
+ktaptest() { # result comment
+ if [ "$KTAP" != "1" ]; then
+ return
+ fi
+
+ local result=
+ if [ "$1" = "1" ]; then
+ result="ok"
+ else
+ result="not ok"
+ fi
+ shift
+
+ local comment=$*
+ if [ "$comment" != "" ]; then
+ comment="# $comment"
+ fi
+
+ echo $CASENO $result $INSTANCE$CASENAME $comment
+}
+
eval_result() { # sigval
case $1 in
$PASS)
prlog " [${color_green}PASS${color_reset}]"
+ ktaptest 1
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
prlog " [${color_red}FAIL${color_reset}]"
+ ktaptest 0
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
prlog " [${color_blue}UNRESOLVED${color_reset}]"
+ ktaptest 0 UNRESOLVED
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return $UNRESOLVED_RESULT # depends on use case
;;
$UNTESTED)
prlog " [${color_blue}UNTESTED${color_reset}]"
+ ktaptest 1 SKIP
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
prlog " [${color_blue}UNSUPPORTED${color_reset}]"
+ ktaptest 1 SKIP
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
prlog " [${color_green}XFAIL${color_reset}]"
+ ktaptest 1 XFAIL
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
prlog " [${color_blue}UNDEFINED${color_reset}]"
+ ktaptest 0 error
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
run_test() { # testfile
local testname=`basename $1`
testcase $1
+ prlog -n "[$CASENO]$INSTANCE$CASENAME"
if [ ! -z "$LOG_FILE" ] ; then
local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
else
@@ -405,6 +440,17 @@ run_test() { # testfile
# load in the helper functions
. $TEST_DIR/functions
+if [ "$KTAP" = "1" ]; then
+ echo "TAP version 13"
+
+ casecount=`echo $TEST_CASES | wc -w`
+ for t in $TEST_CASES; do
+ test_on_instance $t || continue
+ casecount=$((casecount+1))
+ done
+ echo "1..${casecount}"
+fi
+
# Main loop
for t in $TEST_CASES; do
run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
+if [ "$KTAP" = "1" ]; then
+ echo -n "# Totals:"
+ echo -n " pass:"`echo $PASSED_CASES | wc -w`
+ echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+ echo -n " xpass:0"
+ echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+ echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+ echo
+fi
+
cleanup
# if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755
index 000000000000..b3284679ef3a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/ftracetest-ktap
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
---
base-commit: 457391b0380335d5e9a5babdec90ac53928b23b4
change-id: 20230302-ftrace-kselftest-ktap-9d7878691557
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The ftrace selftests do not currently produce KTAP output, they produce a
custom format much nicer for human consumption. This means that when run in
automated test systems we just get a single result for the suite as a whole
rather than recording results for individual test cases, making it harder
to look at the test data and masking things like inappropriate skips.
Address this by adding support for KTAP output to the ftracetest script and
providing a trivial wrapper which will be invoked by the kselftest runner
to generate output in this format by default, users using ftracetest
directly will continue to get the existing output.
This is not the most elegant solution but it is simple and effective. I
did consider implementing this by post processing the existing output
format but that felt more complex and likely to result in all output being
lost if something goes seriously wrong during the run which would not be
helpful. I did also consider just writing a separate runner script but
there's enough going on with things like the signal handling for that to
seem like it would be duplicating too much.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/ftrace/Makefile | 3 +-
tools/testing/selftests/ftrace/ftracetest | 63 ++++++++++++++++++++++++--
tools/testing/selftests/ftrace/ftracetest-ktap | 8 ++++
3 files changed, 70 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/ftrace/Makefile b/tools/testing/selftests/ftrace/Makefile
index d6e106fbce11..a1e955d2de4c 100644
--- a/tools/testing/selftests/ftrace/Makefile
+++ b/tools/testing/selftests/ftrace/Makefile
@@ -1,7 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
-TEST_PROGS := ftracetest
+TEST_PROGS_EXTENDED := ftracetest
+TEST_PROGS := ftracetest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index c3311c8c4089..539c8d6d5d71 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -13,6 +13,7 @@ echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
echo " Options:"
echo " -h|--help Show help message"
echo " -k|--keep Keep passed test logs"
+echo " -K|--KTAP Output in KTAP format"
echo " -v|--verbose Increase verbosity of test messages"
echo " -vv Alias of -v -v (Show all results in stdout)"
echo " -vvv Alias of -v -v -v (Show all commands immediately)"
@@ -85,6 +86,10 @@ parse_opts() { # opts
KEEP_LOG=1
shift 1
;;
+ --ktap|-K)
+ KTAP=1
+ shift 1
+ ;;
--verbose|-v|-vv|-vvv)
if [ $VERBOSE -eq -1 ]; then
usage "--console can not use with --verbose"
@@ -178,6 +183,7 @@ TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR`
LOG_DIR=$TOP_DIR/logs/`date +%Y%m%d-%H%M%S`/
KEEP_LOG=0
+KTAP=0
DEBUG=0
VERBOSE=0
UNSUPPORTED_RESULT=0
@@ -229,7 +235,7 @@ prlog() { # messages
newline=
shift
fi
- printf "$*$newline"
+ [ "$KTAP" != "1" ] && printf "$*$newline"
[ "$LOG_FILE" ] && printf "$*$newline" | strip_esc >> $LOG_FILE
}
catlog() { #file
@@ -260,11 +266,11 @@ TOTAL_RESULT=0
INSTANCE=
CASENO=0
+CASENAME=
testcase() { # testfile
CASENO=$((CASENO+1))
- desc=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
- prlog -n "[$CASENO]$INSTANCE$desc"
+ CASENAME=`grep "^#[ \t]*description:" $1 | cut -f2- -d:`
}
checkreq() { # testfile
@@ -277,40 +283,68 @@ test_on_instance() { # testfile
grep -q "^#[ \t]*flags:.*instance" $1
}
+ktaptest() { # result comment
+ if [ "$KTAP" != "1" ]; then
+ return
+ fi
+
+ local result=
+ if [ "$1" = "1" ]; then
+ result="ok"
+ else
+ result="not ok"
+ fi
+ shift
+
+ local comment=$*
+ if [ "$comment" != "" ]; then
+ comment="# $comment"
+ fi
+
+ echo $CASENO $result $INSTANCE$CASENAME $comment
+}
+
eval_result() { # sigval
case $1 in
$PASS)
prlog " [${color_green}PASS${color_reset}]"
+ ktaptest 1
PASSED_CASES="$PASSED_CASES $CASENO"
return 0
;;
$FAIL)
prlog " [${color_red}FAIL${color_reset}]"
+ ktaptest 0
FAILED_CASES="$FAILED_CASES $CASENO"
return 1 # this is a bug.
;;
$UNRESOLVED)
prlog " [${color_blue}UNRESOLVED${color_reset}]"
+ ktaptest 0 UNRESOLVED
UNRESOLVED_CASES="$UNRESOLVED_CASES $CASENO"
return $UNRESOLVED_RESULT # depends on use case
;;
$UNTESTED)
prlog " [${color_blue}UNTESTED${color_reset}]"
+ ktaptest 1 SKIP
UNTESTED_CASES="$UNTESTED_CASES $CASENO"
return 0
;;
$UNSUPPORTED)
prlog " [${color_blue}UNSUPPORTED${color_reset}]"
+ ktaptest 1 SKIP
UNSUPPORTED_CASES="$UNSUPPORTED_CASES $CASENO"
return $UNSUPPORTED_RESULT # depends on use case
;;
$XFAIL)
prlog " [${color_green}XFAIL${color_reset}]"
+ ktaptest 1 XFAIL
XFAILED_CASES="$XFAILED_CASES $CASENO"
return 0
;;
*)
prlog " [${color_blue}UNDEFINED${color_reset}]"
+ ktaptest 0 error
UNDEFINED_CASES="$UNDEFINED_CASES $CASENO"
return 1 # this must be a test bug
;;
@@ -371,6 +405,7 @@ __run_test() { # testfile
run_test() { # testfile
local testname=`basename $1`
testcase $1
+ prlog -n "[$CASENO]$INSTANCE$CASENAME"
if [ ! -z "$LOG_FILE" ] ; then
local testlog=`mktemp $LOG_DIR/${CASENO}-${testname}-log.XXXXXX`
else
@@ -405,6 +440,17 @@ run_test() { # testfile
# load in the helper functions
. $TEST_DIR/functions
+if [ "$KTAP" = "1" ]; then
+ echo "TAP version 13"
+
+ casecount=`echo $TEST_CASES | wc -w`
+ for t in $TEST_CASES; do
+ test_on_instance $t || continue
+ casecount=$((casecount+1))
+ done
+ echo "1..${casecount}"
+fi
+
# Main loop
for t in $TEST_CASES; do
run_test $t
@@ -439,6 +485,17 @@ prlog "# of unsupported: " `echo $UNSUPPORTED_CASES | wc -w`
prlog "# of xfailed: " `echo $XFAILED_CASES | wc -w`
prlog "# of undefined(test bug): " `echo $UNDEFINED_CASES | wc -w`
+if [ "$KTAP" = "1" ]; then
+ echo -n "# Totals:"
+ echo -n " pass:"`echo $PASSED_CASES | wc -w`
+ echo -n " faii:"`echo $FAILED_CASES | wc -w`
+ echo -n " xfail:"`echo $XFAILED_CASES | wc -w`
+ echo -n " xpass:0"
+ echo -n " skip:"`echo $UNTESTED_CASES $UNSUPPORTED_CASES | wc -w`
+ echo -n " error:"`echo $UNRESOLVED_CASES $UNDEFINED_CASES | wc -w`
+ echo
+fi
+
cleanup
# if no error, return 0
diff --git a/tools/testing/selftests/ftrace/ftracetest-ktap b/tools/testing/selftests/ftrace/ftracetest-ktap
new file mode 100755
index 000000000000..b3284679ef3a
--- /dev/null
+++ b/tools/testing/selftests/ftrace/ftracetest-ktap
@@ -0,0 +1,8 @@
+#!/bin/sh -e
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
+#
+# Copyright (C) Arm Ltd., 2023
+
+./ftracetest -K
---
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
change-id: 20230302-ftrace-kselftest-ktap-9d7878691557
Best regards,
--
Mark Brown <broonie(a)kernel.org>
Dzień dobry,
czy rozważali Państwo rozwój kwalifikacji językowych swoich pracowników?
Opracowaliśmy kursy językowe dla różnych branż, w których koncentrujemy się na podniesieniu poziomu słownictwa i jakości komunikacji wykorzystując autorską metodę, stworzoną specjalnie dla wymagającego biznesu.
Niestandardowy kurs on-line, dopasowany do profilu firmy i obszarów świadczonych usług, w szybkim czasie przyniesie efekty, które zwiększą komfort i jakość pracy, rozwijając możliwości biznesowe.
Zdalne szkolenie językowe to m.in. zajęcia z native speakerami, które w szybkim czasie nauczą pracowników rozmawiać za pomocą jasnego i zwięzłego języka Business English.
Czy mógłbym przedstawić więcej szczegółów i opowiedzieć jak działamy?
Pozdrawiam
Krzysztof Maj
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
This is a follow-up to [1]:
[PATCH v9 0/3] mm: process/cgroup ksm support
which is now in mm-stable. Ideally we'd get at least patch #1 into the
same kernel release as [1], so the semantics of setting
PR_SET_MEMORY_MERGE=0 are unchanged between kernel versions.
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
v1 -> v2:
- "mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0"
-> Cleanup one if/else
-> Add doc for ksm_disable_merge_any()
- Added ACKs
[1] https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
David Hildenbrand (3):
mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: move disabling KSM from s390/gmap code to KSM code
arch/s390/mm/gmap.c | 20 +-----
include/linux/ksm.h | 7 ++
kernel/sys.c | 12 +---
mm/ksm.c | 70 +++++++++++++++++++
.../selftests/mm/ksm_functional_tests.c | 46 ++++++++++--
5 files changed, 121 insertions(+), 34 deletions(-)
--
2.40.0
Hi Linus,
Please pull the following KUnit next update for Linux 6.4-rc1.
linux-kselftest-kunit-6.4-rc1
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6:
Linux 6.3-rc1 (2023-03-05 14:52:03 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-6.4-rc1
for you to fetch changes up to a42077b787680cbc365a96446b30f32399fa3f6f:
kunit: add tests for using current KUnit test field (2023-04-05 12:51:30 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-6.4-rc1
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
----------------------------------------------------------------
Andy Shevchenko (1):
.gitignore: Unignore .kunitconfig
Daniel Latypov (3):
kunit: tool: add subscripts for type annotations where appropriate
kunit: tool: remove unused imports and variables
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
Geert Uytterhoeven (3):
kunit: tool: Add support for m68k under QEMU
kunit: tool: Add support for overriding the QEMU serial port
kunit: tool: Add support for SH under QEMU
Heiko Carstens (1):
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
Rae Moar (4):
kunit: fix bug in debugfs logs of parameterized tests
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug of extra newline characters in debugfs logs
kunit: add tests for using current KUnit test field
Sadiya Kazi (1):
list: test: Test the klist structure
Stephen Boyd (1):
kunit: Use gfp in kunit_alloc_resource() kernel-doc
.gitignore | 1 +
include/kunit/resource.h | 2 +-
include/kunit/test.h | 4 +-
lib/kunit/debugfs.c | 14 +-
lib/kunit/kunit-test.c | 77 ++++++--
lib/kunit/test.c | 57 ++++--
lib/list-test.c | 300 ++++++++++++++++++++++++++++++-
tools/testing/kunit/kunit.py | 26 +--
tools/testing/kunit/kunit_config.py | 4 +-
tools/testing/kunit/kunit_kernel.py | 39 ++--
tools/testing/kunit/kunit_parser.py | 1 -
tools/testing/kunit/kunit_printer.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 2 +-
tools/testing/kunit/qemu_config.py | 1 +
tools/testing/kunit/qemu_configs/m68k.py | 10 ++
tools/testing/kunit/qemu_configs/sh.py | 17 ++
tools/testing/kunit/run_checks.py | 6 +-
17 files changed, 491 insertions(+), 72 deletions(-)
create mode 100644 tools/testing/kunit/qemu_configs/m68k.py
create mode 100644 tools/testing/kunit/qemu_configs/sh.py
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 6.4-rc1.
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6:
Linux 6.3-rc1 (2023-03-05 14:52:03 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-next-6.4-rc1
for you to fetch changes up to 50ad2fb7ec2b18186b8a4fa1c0e00f78b3de5119:
selftests/resctrl: Fix incorrect error return on test complete (2023-04-14 11:13:18 -0600)
----------------------------------------------------------------
linux-kselftest-next-6.4-rc1
linux-kselftest-next-6.4-rc1
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
----------------------------------------------------------------
Fenghua Yu (1):
selftests/resctrl: Change name from CBM_MASK_PATH to INFO_PATH
Ilpo Järvinen (8):
selftests/resctrl: Return NULL if malloc_and_init_memory() did not alloc mem
selftests/resctrl: Move ->setup() call outside of test specific branches
selftests/resctrl: Allow ->setup() to return errors
selftests/resctrl: Check for return value after write_schemata()
selftests/resctrl: Replace obsolete memalign() with posix_memalign()
selftests/resctrl: Change initialize_llc_perf() return type to void
selftests/resctrl: Use remount_resctrlfs() consistently with boolean
selftests/resctrl: Correct get_llc_perf() param in function comment
Ivan Orlov (4):
selftests: Refactor 'peeksiginfo' ptrace test part
selftests: cgroup: Add 'malloc' failures checks in test_memcontrol
selftests: sched: Add more core schedule prctl calls
selftests: prctl: Add new prctl test for PR_SET_VMA action
Mark Brown (3):
tools/nolibc/stdio: Implement vprintf()
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
Peter Newman (1):
selftests/resctrl: Use correct exit code when tests fail
Reinette Chatre (1):
selftests/resctrl: Fix incorrect error return on test complete
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test
selftests/resctrl: Return MBA check result and make it to output message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister for all tests
selftests/resctrl: Remove duplicate codes that clear each test result file
Sukrut Bellary (1):
kselftest: amd-pstate: Fix spelling mistakes
tools/include/nolibc/stdio.h | 6 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/amd-pstate/gitsource.sh | 4 +-
tools/testing/selftests/amd-pstate/run.sh | 4 +-
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 ++++-------------
tools/testing/selftests/cgroup/test_memcontrol.c | 15 +++
tools/testing/selftests/kselftest.h | 2 +
tools/testing/selftests/prctl/.gitignore | 1 +
tools/testing/selftests/prctl/Makefile | 2 +-
tools/testing/selftests/prctl/config | 1 +
.../selftests/prctl/set-anon-vma-name-test.c | 104 +++++++++++++++++++++
tools/testing/selftests/ptrace/peeksiginfo.c | 14 +--
tools/testing/selftests/resctrl/cache.c | 17 ++--
tools/testing/selftests/resctrl/cat_test.c | 33 ++++---
tools/testing/selftests/resctrl/cmt_test.c | 16 ++--
tools/testing/selftests/resctrl/fill_buf.c | 21 +----
tools/testing/selftests/resctrl/mba_test.c | 34 ++++---
tools/testing/selftests/resctrl/mbm_test.c | 22 ++---
tools/testing/selftests/resctrl/resctrl.h | 8 +-
tools/testing/selftests/resctrl/resctrl_tests.c | 14 +--
tools/testing/selftests/resctrl/resctrl_val.c | 88 +++++++++++------
tools/testing/selftests/resctrl/resctrlfs.c | 7 +-
tools/testing/selftests/sched/cs_prctl_test.c | 6 ++
24 files changed, 306 insertions(+), 204 deletions(-)
create mode 100644 tools/testing/selftests/prctl/config
create mode 100644 tools/testing/selftests/prctl/set-anon-vma-name-test.c
----------------------------------------------------------------
From: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
The verification function of this test case is likely to encounter the
following error, which may confuse users. The problem is easily
reproducible in the latest kernel.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$IP_B"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 17664, a(97) != q(113)
If the packet is captured, you will see:
Environment A, the sender:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
Environment B, the receiver:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 7360
IP $IP_A.41025 > $IP_B.8000: UDP, length 14720
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
In one test, the verification data is printed as follows:
abcd...xyz | 1...
.. |
abcd...xyz |
abcd...opabcd...xyz | ...1472... Not xyzabcd, messages are merged
.. |
The issue is that the test on receive for expected payload pattern
{AB..Z}+ fail for GRO packets if segment payload does not end on a Z.
The issue still exists when using the GRO with -G, but not using the -S
to obtain gsosize. Therefore, a print has been added to remind users.
Changes in v3:
- Simplify description and adjust judgment order.
Changes in v2:
- Fix confusing descriptions.
Signed-off-by: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
Reviewed-by: Xu Xin (CGEL ZTE) <xu.xin16(a)zte.com.cn>
Reviewed-by: Yang Yang (CGEL ZTE) <yang.yang29(a)zte.com.cn>
Cc: Xuexin Jiang (CGEL ZTE) <jiang.xuexin(a)zte.com.cn>
---
tools/testing/selftests/net/udpgso_bench_rx.c | 34 +++++++++++++++++++++++----
1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924d4a30..3ad18cbc570d 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -189,26 +189,45 @@ static char sanitized_char(char val)
return (val >= 'a' && val <= 'z') ? val : '.';
}
-static void do_verify_udp(const char *data, int len)
+static void do_verify_udp(const char *data, int start, int len)
{
- char cur = data[0];
+ char cur = data[start];
int i;
/* verify contents */
if (cur < 'a' || cur > 'z')
error(1, 0, "data initial byte out of range");
- for (i = 1; i < len; i++) {
+ for (i = start + 1; i < start + len; i++) {
if (cur == 'z')
cur = 'a';
else
cur++;
- if (data[i] != cur)
+ if (data[i] != cur) {
+ if (cfg_gro_segment && !cfg_expected_gso_size)
+ error(0, 0, "Use -S to obtain gsosize to guide "
+ "splitting and verification.");
+
error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
i, len,
sanitized_char(data[i]), data[i],
sanitized_char(cur), cur);
+ }
+ }
+}
+
+static void do_verify_udp_gro(const char *data, int len, int segment_size)
+{
+ int start = 0;
+
+ while (len - start > 0) {
+ if (len - start > segment_size)
+ do_verify_udp(data, start, segment_size);
+ else
+ do_verify_udp(data, start, len - start);
+
+ start += segment_size;
}
}
@@ -268,7 +287,12 @@ static void do_flush_udp(int fd)
if (ret == 0)
error(1, errno, "recv: 0 byte datagram\n");
- do_verify_udp(rbuf, ret);
+ if (!cfg_gro_segment)
+ do_verify_udp(rbuf, 0, ret);
+ else if (gso_size > 0)
+ do_verify_udp_gro(rbuf, ret, gso_size);
+ else
+ do_verify_udp_gro(rbuf, ret, ret);
}
if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
error(1, 0, "recv: bad gso size, got %d, expected %d "
--
2.15.2
This is a follow-up to [1]:
[PATCH v9 0/3] mm: process/cgroup ksm support
which is not in mm-unstable yet (but soon? :) ). I'll be on vacation for
~2 weeks, so sending it out now as reply to [1].
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
[1] https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
David Hildenbrand (3):
mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: move disabling KSM from s390/gmap code to KSM code
arch/s390/mm/gmap.c | 20 +------
include/linux/ksm.h | 7 +++
kernel/sys.c | 7 +--
mm/ksm.c | 58 +++++++++++++++++++
.../selftests/mm/ksm_functional_tests.c | 46 +++++++++++++--
5 files changed, 107 insertions(+), 31 deletions(-)
--
2.39.2
This series consolidates the behavior of the 2 drivers that implement
the ethtool MAC Merge layer by making NXP ENETC commit its preemptible
traffic classes to hardware only when MM TX is active (same as Ocelot).
Then, after resolving an issue with the ENETC driver, it restricts user
space from entering 2 states which don't make sense:
- pmac-enabled off tx-enabled on verify-enabled *
- pmac-enabled * tx-enabled off verify-enabled on
Then, it introduces a selftest (ethtool_mm.sh) which puts everything
together and tests all valid configurations known to me.
This is simultaneously the v2 of "[PATCH net-next 0/2] ethtool mm API
improvements":
https://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp…
which had caused some problems to openlldp. Those were solved in the
meantime, see:
https://github.com/intel/openlldp/commit/11171b474f6f3cbccac5d608b7f26b32ff…
and of "[RFC PATCH net-next] selftests: forwarding: add a test for MAC
Merge layer":
https://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.…
Petr Machata (2):
selftests: forwarding: sch_tbf_*: Add a pre-run hook
selftests: forwarding: generalize bail_on_lldpad from mlxsw
Vladimir Oltean (7):
net: enetc: fix MAC Merge layer remaining enabled until a link down
event
net: enetc: report mm tx-active based on tx-enabled and verify-status
net: enetc: only commit preemptible TCs to hardware when MM TX is
active
net: enetc: include MAC Merge / FP registers in register dump
net: ethtool: mm: sanitize some UAPI configurations
selftests: forwarding: introduce helper for standard ethtool counters
selftests: forwarding: add a test for MAC Merge layer
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 5 +-
.../ethernet/freescale/enetc/enetc_ethtool.c | 94 +++++-
.../net/ethernet/freescale/enetc/enetc_hw.h | 3 +
net/ethtool/mm.c | 10 +
.../drivers/net/mlxsw/qos_headroom.sh | 3 +-
.../selftests/drivers/net/mlxsw/qos_lib.sh | 28 --
.../selftests/drivers/net/mlxsw/qos_pfc.sh | 3 +-
.../selftests/drivers/net/mlxsw/sch_ets.sh | 3 +-
.../drivers/net/mlxsw/sch_red_core.sh | 1 -
.../drivers/net/mlxsw/sch_red_ets.sh | 2 +-
.../drivers/net/mlxsw/sch_red_root.sh | 2 +-
.../drivers/net/mlxsw/sch_tbf_ets.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_prio.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_root.sh | 6 +-
.../testing/selftests/net/forwarding/Makefile | 1 +
.../selftests/net/forwarding/ethtool_mm.sh | 288 ++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 60 ++++
.../net/forwarding/sch_tbf_etsprio.sh | 4 +
.../selftests/net/forwarding/sch_tbf_root.sh | 4 +
20 files changed, 486 insertions(+), 66 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/ethtool_mm.sh
--
2.34.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v6:
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Compared to v3 the diff for the whole series looks like:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 024ed8ee9939cd..2770087059ba73 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -341,14 +341,15 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
- lockdep_assert_held(&idev->igroup->lock);
-
+ mutex_lock(&idev->igroup->lock);
list_del(&idev->group_item);
if (list_empty(&idev->igroup->device_list)) {
iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+ mutex_unlock(&idev->igroup->lock);
+
/* Caller must destroy hwpt */
return hwpt;
}
@@ -515,8 +516,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
hwpt->auto_domain = true;
*pt_id = hwpt->obj.id;
- mutex_unlock(&ioas->mutex);
iommufd_object_finalize(idev->ictx, &hwpt->obj);
+ mutex_unlock(&ioas->mutex);
return destroy_hwpt;
out_abort:
@@ -610,7 +611,6 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
* This is the same as
* iommufd_device_detach();
* iommufd_device_attach();
- *
* If it fails then no change is made to the attachment. The iommu driver may
* implement this so there is no disruption in translation. This can only be
* called if iommufd_device_attach() has already succeeded.
@@ -633,10 +633,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt;
- mutex_lock(&idev->igroup->lock);
hwpt = iommufd_hw_pagetable_detach(idev);
- mutex_unlock(&idev->igroup->lock);
-
iommufd_hw_pagetable_put(idev->ictx, hwpt);
refcount_dec(&idev->obj.users);
}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 8aa9ac130b5960..655ed32144f62e 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -26,6 +26,21 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
refcount_dec(&hwpt->ioas->obj.users);
}
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
+{
+ struct iommufd_hw_pagetable *hwpt =
+ container_of(obj, struct iommufd_hw_pagetable, obj);
+
+ /* The ioas->mutex must be held until finalize is called. */
+ lockdep_assert_held(&hwpt->ioas->mutex);
+
+ if (!list_empty(&hwpt->hwpt_item)) {
+ list_del_init(&hwpt->hwpt_item);
+ iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain);
+ }
+ iommufd_hw_pagetable_destroy(obj);
+}
+
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->enforce_cache_coherency)
@@ -50,6 +65,10 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
* Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
* will be linked to the given ioas and upon return the underlying iommu_domain
* is fully popoulated.
+ *
+ * The caller must hold the ioas->mutex until after
+ * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
+ * the returned hwpt.
*/
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
@@ -93,9 +112,6 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
* directly allocate a domain. These drivers do not finish creating the
* domain until attach is completed. Thus we must have this call
* sequence. Once those drivers are fixed this should be removed.
- *
- * Note we hold the igroup->lock here which prevents any other thread
- * from observing igroup->hwpt until we finish setting it up.
*/
if (immediate_attach) {
rc = iommufd_hw_pagetable_attach(hwpt, idev);
@@ -140,10 +156,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
mutex_lock(&ioas->mutex);
hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
- mutex_unlock(&ioas->mutex);
if (IS_ERR(hwpt)) {
rc = PTR_ERR(hwpt);
- goto out_put_ioas;
+ goto out_unlock;
}
cmd->out_hwpt_id = hwpt->obj.id;
@@ -151,11 +166,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
if (rc)
goto out_hwpt;
iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
- goto out_put_ioas;
+ goto out_unlock;
out_hwpt:
iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
-out_put_ioas:
+out_unlock:
+ mutex_unlock(&ioas->mutex);
iommufd_put_object(&ioas->obj);
out_put_idev:
iommufd_put_object(&idev->obj);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index f842768b2e250b..21052f64f95649 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -1172,6 +1172,9 @@ int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt,
unsigned int num_sw_msi = 0;
int rc;
+ if (iommufd_should_fail())
+ return -EINVAL;
+
down_write(&iopt->iova_rwsem);
/* FIXME: drivers allocate memory but there is no failure propogated */
iommu_get_resv_regions(dev, &resv_regions);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index cb693190bf51c5..ba50eb4661e217 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_detach(struct iommufd_device *idev);
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 694da191e4b155..73a91e96896252 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -24,6 +24,7 @@
struct iommufd_object_ops {
void (*destroy)(struct iommufd_object *obj);
+ void (*abort)(struct iommufd_object *obj);
};
static const struct iommufd_object_ops iommufd_object_ops[];
static struct miscdevice vfio_misc_dev;
@@ -104,7 +105,10 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj)
void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
struct iommufd_object *obj)
{
- iommufd_object_ops[obj->type].destroy(obj);
+ if (iommufd_object_ops[obj->type].abort)
+ iommufd_object_ops[obj->type].abort(obj);
+ else
+ iommufd_object_ops[obj->type].destroy(obj);
iommufd_object_abort(ictx, obj);
}
@@ -413,6 +417,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
},
[IOMMUFD_OBJ_HW_PAGETABLE] = {
.destroy = iommufd_hw_pagetable_destroy,
+ .abort = iommufd_hw_pagetable_abort,
},
#ifdef CONFIG_IOMMUFD_TEST
[IOMMUFD_OBJ_SELFTEST] = {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c07252dbf62d72..8b2c18ac6a2864 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -9,9 +9,6 @@
#include "iommufd_utils.h"
-static void *buffer;
-
-static unsigned long PAGE_SIZE;
static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 7e1afb6ff9bd8d..d4c552e5694812 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -41,6 +41,8 @@ static int writeat(int dfd, const char *fn, const char *val)
static __attribute__((constructor)) void setup_buffer(void)
{
+ PAGE_SIZE = sysconf(_SC_PAGE_SIZE);
+
BUFFER_SIZE = 2*1024*1024;
buffer = mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE,
@@ -579,6 +581,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
uint32_t stdev_id;
uint32_t idev_id;
uint32_t hwpt_id;
+ __u64 iova;
self->fd = open("/dev/iommu", O_RDWR);
if (self->fd == -1)
@@ -590,6 +593,18 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_ioctl_ioas_alloc(self->fd, &ioas_id2))
return -1;
+ iova = MOCK_APERTURE_START;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id2, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+
fail_nth_enable();
if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 9b6dcb921750b6..53b4d3f2d9fc6c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -19,6 +19,8 @@
static void *buffer;
static unsigned long BUFFER_SIZE;
+static unsigned long PAGE_SIZE;
+
/*
* Have the kernel check the refcount on pages. I don't know why a freshly
* mmap'd anon non-compound page starts out with a ref of 3
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 525 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 30 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++-
14 files changed, 853 insertions(+), 211 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup to a map.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup helper
selftests/bpf: Add testcase for bpf_task_under_cgroup
include/uapi/linux/bpf.h | 13 +++++
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 31 ++++++++++++
tools/include/uapi/linux/bpf.h | 13 +++++
.../bpf/prog_tests/task_under_cgroup.c | 49 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 31 ++++++++++++
6 files changed, 140 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program [2] and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
[2] https://pastebin.com/x84NEKaS
Changes in v6:
- Remove spurious blank line (Conorbot)
- Update copyrights (Paul)
- Update copyrights (Paul)
- Wrap init_hwprobe_vdso_data() in CONFIG_MMU to fix nommu build break
(Conorbot)
- Update copyrights (Paul)
Changes in v5:
- Added tags
- Fixed misuse of ISA_EXT_c as bitmap, changed to use
riscv_isa_extension_available() (Heiko, Conor)
- Document the alternatives approach in the commit message (Conor and
Heiko).
- Fix __init call warnings by making probe_vendor_features() and
thead_feature_probe_func() __init_or_module.
- Fixed compat vdso compilation failure (lkp).
Changes in v4:
- Used real types in syscall prototypes (Arnd)
- Fixed static line break in do_riscv_hwprobe() (Conor)
- Added newlines between documentation lists (Conor)
- Crispen up size types to size_t, and cpu indices to int (Joe)
- Fix copy_from_user() return logic bug (found via kselftests!)
- Add __user to SYSCALL_DEFINE() to fix warning
- More newlines in BASE_BEHAVIOR_IMA documentation (Conor)
- Add newlines to CPUPERF_0 documentation (Conor)
- Add UNSUPPORTED value (Conor)
- Switched from DT to alternatives-based probing (Rob)
- Crispen up cpu index type to always be int (Conor)
- Fixed selftests commit description, no more tiny libc (Mark Brown)
- Fixed selftest syscall prototype types to match v4.
- Added a prototype to fix -Wmissing-prototype warning (lkp(a)intel.com)
- Fixed rv32 build failure (lkp(a)intel.com)
- Make vdso prototype match syscall types update
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
- Introduced vDSO function
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
Evan Green (6):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface
RISC-V: Add hwprobe vDSO function and data
Documentation/riscv/hwprobe.rst | 86 +++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/Kconfig | 1 +
arch/riscv/errata/thead/errata.c | 10 +
arch/riscv/include/asm/alternative.h | 5 +
arch/riscv/include/asm/cpufeature.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 13 +
arch/riscv/include/asm/syscall.h | 4 +
arch/riscv/include/asm/vdso/data.h | 17 ++
arch/riscv/include/asm/vdso/gettimeofday.h | 8 +
arch/riscv/include/uapi/asm/hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/unistd.h | 9 +
arch/riscv/kernel/alternative.c | 19 ++
arch/riscv/kernel/compat_vdso/Makefile | 2 +-
arch/riscv/kernel/cpu.c | 8 +-
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/smpboot.c | 1 +
arch/riscv/kernel/sys_riscv.c | 228 +++++++++++++++++-
arch/riscv/kernel/vdso.c | 6 -
arch/riscv/kernel/vdso/Makefile | 4 +
arch/riscv/kernel/vdso/hwprobe.c | 52 ++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 ++
arch/riscv/kernel/vdso/vdso.lds.S | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 +++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 90 +++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +
28 files changed, 712 insertions(+), 14 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
--
2.25.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
I have this on github:
https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v3:
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 512 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 27 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 64 ++-
.../selftests/iommu/iommufd_fail_nth.c | 52 +-
tools/testing/selftests/iommu/iommufd_utils.h | 61 ++-
14 files changed, 804 insertions(+), 200 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
Resending... It was not clear to me if this file has a specific
maintainer. I chose to send it to the most recent committer.
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.
If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed.
This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an updated version of / replacement of "kunit: Set the current
KUnit context when cleaning up", which instead creates a new kthread
for cleanup tasks if the original test kthread is aborted. This protects
us from failed assertions during cleanup, if the test exited early.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@go…
- Move cleanup execution to another kthread
- (Thanks, Benjamin, for pointing out the assertion issues)
---
lib/kunit/test.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..caeae0dfd82b 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -423,8 +423,51 @@ static void kunit_try_run_case(void *data)
kunit_run_case_cleanup(test, suite);
}
+static void kunit_try_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ struct kunit_suite *suite = ctx->suite;
+
+ current->kunit_test = test;
+
+ kunit_run_case_cleanup(test, suite);
+}
+
+static void kunit_catch_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+ /* It is always a failure if cleanup aborts. */
+ kunit_set_failure(test);
+
+ if (try_exit_code) {
+ /*
+ * Test case could not finish, we have no idea what state it is
+ * in, so don't do clean up.
+ */
+ if (try_exit_code == -ETIMEDOUT) {
+ kunit_err(test, "test case cleanup timed out\n");
+ /*
+ * Unknown internal error occurred preventing test case from
+ * running, so there is nothing to clean up.
+ */
+ } else {
+ kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+ try_exit_code);
+ }
+ return;
+ }
+
+ kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
static void kunit_catch_run_case(void *data)
{
+ struct kunit_try_catch cleanup;
struct kunit_try_catch_context *ctx = data;
struct kunit *test = ctx->test;
struct kunit_suite *suite = ctx->suite;
@@ -451,9 +494,16 @@ static void kunit_catch_run_case(void *data)
/*
* Test case was run, but aborted. It is the test case's business as to
- * whether it failed or not, we just need to clean up.
+ * whether it failed or not, we just need to clean up. Do this in a new
+ * try / catch context, in case it asserts, too.
*/
- kunit_run_case_cleanup(test, suite);
+ kunit_try_catch_init(&cleanup,
+ test,
+ kunit_try_run_case_cleanup,
+ kunit_catch_run_case_cleanup);
+ ctx->test = test;
+ ctx->suite = suite;
+ kunit_try_catch_run(&cleanup, ctx);
}
/*
--
2.40.0.634.g4ca3ef3211-goog
*Changes in v15*
- Build fix
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 481 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2105 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
From: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
The verification function of this test case is likely to encounter the
following error, which may confuse users. The problem is easily
reproducible in the latest kernel.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$IP_B"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 17664, a(97) != q(113)
If the packet is captured, you will see:
Environment A, the sender:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
Environment B, the receiver:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 7360
IP $IP_A.41025 > $IP_B.8000: UDP, length 14720
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
In one test, the verification data is printed as follows:
abcd...xyz | 1...
.. |
abcd...xyz |
abcd...opabcd...xyz | ...1472... Not xyzabcd, messages are merged
.. |
This is because the sending buffer is buf[64K], and its content is a
loop of A-Z. But maybe only 1472 bytes per send, or more if UDP GSO is
used. The message content does not necessarily end with XYZ, but GRO
will merge these packets, and the -v parameter directly verifies the
entire GRO receive buffer. So we do the validation after the data is split
at the receiving end, just as the application actually uses this feature.
If the sender does not use GSO, each individual segment starts at A,
end at somewhere. Using GSO also has the same problem, and. The data
between each segment during transmission is continuous, but GRO is merged
in the order received, which is not necessarily the order of transmission.
Execution in the same environment does not cause problems, because the
lo device is not NAPI, and does not perform GRO processing. Perhaps it
could be worth supporting to reduce system calls.
bash# tcpdump -i lo host "$IP_self" &
bash# echo udp_gro_receive > /sys/kernel/debug/tracing/set_ftrace_filter
bash# echo function > /sys/kernel/debug/tracing/current_tracer
bash# udpgso_bench_rx -4 -G -S 1472 -v &
bash# udpgso_bench_tx -l 4 -4 -D "$IP_self"
The issue still exists when using the GRO with -G, but not using the -S
to obtain gsosize. Therefore, a print has been added to remind users.
After this issue is resolved, another issue will be encountered and will
be resolved in the next patch.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472
udp rx: 15 MB/s 256 calls/s
udp rx: 30 MB/s 512 calls/s
udpgso_bench_rx: recv: bad gso size, got -1, expected 1472
(-1 == no gso cmsg))
v2:
- Fix confusing descriptions
Signed-off-by: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
Reviewed-by: Xu Xin (CGEL ZTE) <xu.xin16(a)zte.com.cn>
Reviewed-by: Yang Yang (CGEL ZTE) <yang.yang29(a)zte.com.cn>
Cc: Xuexin Jiang (CGEL ZTE) <jiang.xuexin(a)zte.com.cn>
---
tools/testing/selftests/net/udpgso_bench_rx.c | 40 +++++++++++++++++++++------
1 file changed, 31 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924d4a30..6a2026494cdb 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -189,26 +189,44 @@ static char sanitized_char(char val)
return (val >= 'a' && val <= 'z') ? val : '.';
}
-static void do_verify_udp(const char *data, int len)
+static void do_verify_udp(const char *data, int start, int len)
{
- char cur = data[0];
+ char cur = data[start];
int i;
/* verify contents */
if (cur < 'a' || cur > 'z')
error(1, 0, "data initial byte out of range");
- for (i = 1; i < len; i++) {
+ for (i = start + 1; i < start + len; i++) {
if (cur == 'z')
cur = 'a';
else
cur++;
- if (data[i] != cur)
+ if (data[i] != cur) {
+ if (cfg_gro_segment && !cfg_expected_gso_size)
+ error(0, 0, "Use -S to obtain gsosize, to %s"
+ , "help guide split and verification.");
+
error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
i, len,
sanitized_char(data[i]), data[i],
sanitized_char(cur), cur);
+ }
+ }
+}
+
+static void do_verify_udp_gro(const char *data, int len, int gso_size)
+{
+ int start = 0;
+
+ while (len - start > 0) {
+ if (len - start > gso_size)
+ do_verify_udp(data, start, gso_size);
+ else
+ do_verify_udp(data, start, len - start);
+ start += gso_size;
}
}
@@ -264,16 +282,20 @@ static void do_flush_udp(int fd)
if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
error(1, 0, "recv: bad packet len, got %d,"
" expected %d\n", ret, cfg_expected_pkt_len);
+ if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+ error(1, 0, "recv: bad gso size, got %d, expected %d %s",
+ gso_size, cfg_expected_gso_size, "(-1 == no gso cmsg))\n");
if (len && cfg_verify) {
if (ret == 0)
error(1, errno, "recv: 0 byte datagram\n");
- do_verify_udp(rbuf, ret);
+ if (!cfg_gro_segment)
+ do_verify_udp(rbuf, 0, ret);
+ else if (gso_size > 0)
+ do_verify_udp_gro(rbuf, ret, gso_size);
+ else
+ do_verify_udp_gro(rbuf, ret, ret);
}
- if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
- error(1, 0, "recv: bad gso size, got %d, expected %d "
- "(-1 == no gso cmsg))\n", gso_size,
- cfg_expected_gso_size);
packets++;
bytes += ret;
--
2.15.2
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 478 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2102 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
From: zhang yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
1.Fix verifty exception
Executing the following command fails:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
bash# udpgso_bench_tx -l 4 -4 -D "$DST" -S 0
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 2944, a(97) != q(113)
This is because the sending buffers are not aligned by 26 bytes, and the
GRO is not merged sequentially, and the receiver does not judge this
situation. We think of the receiver to split the data and then validate
it, just as the application actually uses this feature.
2.Fix gsosize exception
Executing the following command fails:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
bash# udpgso_bench_tx -l 4 -4 -D "$DST" -S 0
bash# udpgso_bench_rx -4 -G -S 1472
udp rx: 15 MB/s 256 calls/s
udp rx: 30 MB/s 512 calls/s
udpgso_bench_rx: recv: bad gso size, got -1, expected 1472
(-1 == no gso cmsg))
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 7360
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 1472
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 1472
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 4416
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 11776
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 20608
recv: got one message len:1472, probably not an error.
recv: got one message len:1472, probably not an error.
This is due to network, NAPI, timer, etc., only one message being received.
We believe that this situation should be normal.
3.Fix packet number exception
bash# udpgso_bench_rx -4 -n 100
bash# udpgso_bench_tx -l 1 -4 -D "$DST"
udpgso_bench_rx: wrong packet number! got 0, expected 100
This is because the packets is cleared after print.
Zhang Yunkai (3):
selftests: net: udpgso_bench_rx: Fix verifty exceptions
selftests: net: udpgso_bench_rx: Fix gsosize exceptions
selftests: net: udpgso_bench_rx: Fix packet number exceptions
tools/testing/selftests/net/udpgso_bench_rx.c | 45 +++++++++++++++++++++------
1 file changed, 35 insertions(+), 10 deletions(-)
--
2.15.2
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Fix this by setting the active KUnit context for the duration of the
test shutdown procedure. When the test exits normally, this does
nothing. When run from the KUits previous value (probably NULL)
afterwards.
This should make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This becomes useful with the current kunit_add_action() implementation,
as actions do not get the KUnit context passed in by default:
https://lore.kernel.org/linux-kselftest/CABVgOSmjs0wLUa4=ErkB9tH8p6A1P6N33b…
I think it's probably correct anyway, though, so we should either do
this, or totally rule out using kunit_get_current_test() here at all, by
resetting current->kunit_test to NULL before running cleanup even in
the normal case.
I've only given this the most cursory testing so far (I'm not sure how
much of the executor innards I want to expose to be able to actually
write a proper test for it), so more eyes and/or suggestions are
welcome.
Cheers,
-- David
---
lib/kunit/test.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..2d7cad249863 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -392,10 +392,21 @@ static void kunit_case_internal_cleanup(struct kunit *test)
static void kunit_run_case_cleanup(struct kunit *test,
struct kunit_suite *suite)
{
+ /*
+ * If we're no-longer running from within the test kthread() because it failed
+ * or timed out, we still need the context to be okay when running exit and
+ * cleanup functions.
+ */
+ struct kunit *old_current = current->kunit_test;
+
+ current->kunit_test = test;
if (suite->exit)
suite->exit(test);
kunit_case_internal_cleanup(test);
+
+ /* Restore the thread's previous test context (probably NULL or test). */
+ current->kunit_test = old_current;
}
struct kunit_try_catch_context {
--
2.40.0.634.g4ca3ef3211-goog
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Add support for integer type of accessing variable length array.
Add a selftest to check it.
Feng Zhou (2):
bpf: support access variable length array of integer type
selftests/bpf: Add test to access integer type of variable array
kernel/bpf/btf.c | 8 +++++---
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 20 +++++++++++++++++++
.../selftests/bpf/prog_tests/tracing_struct.c | 2 ++
.../selftests/bpf/progs/tracing_struct.c | 13 ++++++++++++
4 files changed, 40 insertions(+), 3 deletions(-)
--
2.20.1
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 478 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2102 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Add stackprotector support for all remaining architectures, except s390.
On s390 the stackprotectors are not supported in "global" mode; only
"sysreg" mode which is not suppored in nolibc.
The series also contains a small optimization to strace output during
execution of nolibc-test.
This series is based on the 20230415-nolibc-updates-4a branch of the
nolibc tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (6):
selftests/nolibc: reduce syscalls during space padding
tools/nolibc: riscv: add stackprotector support
tools/nolibc: aarch64: add stackprotector support
tools/nolibc: arm: add stackprotector support
tools/nolibc: loongarch: add stackprotector support
tools/nolibc: mips: add stackprotector support
tools/include/nolibc/arch-aarch64.h | 7 ++++++-
tools/include/nolibc/arch-arm.h | 7 ++++++-
tools/include/nolibc/arch-loongarch.h | 7 ++++++-
tools/include/nolibc/arch-mips.h | 8 +++++++-
tools/include/nolibc/arch-riscv.h | 7 ++++++-
tools/testing/selftests/nolibc/Makefile | 5 +++++
tools/testing/selftests/nolibc/nolibc-test.c | 15 +++++++++++----
7 files changed, 47 insertions(+), 9 deletions(-)
---
base-commit: e35214ea9db7477a02e67a8b412e8046534bb97c
change-id: 20230408-nolibc-stackprotector-archs-42244674616e
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
There was a report that the hardware breakpoints and watch points weren't
reporting the debug architecture version as expected, they were reporting
a version of 0 which is not defined in the architecture. This happens
when running in a KVM guest if the host has a debug architecture version
not supported by KVM, it in turn confuses GDB which rejects any debug
architecture version it does not know about.
Add a test that covers that situation and while we're at it reports the
debug architecture version and number of slots available to aid with
figuring out problems that may arise.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/abi/ptrace.c | 32 +++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/abi/ptrace.c b/tools/testing/selftests/arm64/abi/ptrace.c
index be952511af22..abe4d58d731d 100644
--- a/tools/testing/selftests/arm64/abi/ptrace.c
+++ b/tools/testing/selftests/arm64/abi/ptrace.c
@@ -20,7 +20,7 @@
#include "../../kselftest.h"
-#define EXPECTED_TESTS 7
+#define EXPECTED_TESTS 11
#define MAX_TPIDRS 2
@@ -132,6 +132,34 @@ static void test_tpidr(pid_t child)
}
}
+static void test_hw_debug(pid_t child, int type, const char *type_name)
+{
+ struct user_hwdebug_state state;
+ struct iovec iov;
+ int slots, arch, ret;
+
+ iov.iov_len = sizeof(state);
+ iov.iov_base = &state;
+
+ /* Should be able to read the values */
+ ret = ptrace(PTRACE_GETREGSET, child, type, &iov);
+ ksft_test_result(ret == 0, "read_%s\n", type_name);
+
+ if (ret == 0) {
+ /* Low 8 bits is the number of slots, next 4 bits the arch */
+ slots = state.dbg_info & 0xff;
+ arch = (state.dbg_info >> 8) & 0xf;
+
+ ksft_print_msg("%s version %d with %d slots\n", type_name,
+ arch, slots);
+
+ /* Zero is not currently architecturally valid */
+ ksft_test_result(arch, "%s_arch_set\n", type_name);
+ } else {
+ ksft_test_result_skip("%s_arch_set\n");
+ }
+}
+
static int do_child(void)
{
if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
@@ -207,6 +235,8 @@ static int do_parent(pid_t child)
ksft_print_msg("Parent is %d, child is %d\n", getpid(), child);
test_tpidr(child);
+ test_hw_debug(child, NT_ARM_HW_WATCH, "NT_ARM_HW_WATCH");
+ test_hw_debug(child, NT_ARM_HW_BREAK, "NT_ARM_HW_BREAK");
ret = EXIT_SUCCESS;
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230414-arm64-test-hw-breakpoint-83fe02f607fc
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is a follow-up to the kunit_defer() parts of 'KUnit device API
proposal'[1], with a number of changes suggested by Matti Vaittinen,
Maxime Ripard and Benjamin Berg.
Most notably, kunit_defer() has been renamed to kunit_add_action(), in
order to match the equivalent devres API[2]. Likewise:
kunit_defer_cancel() has become kunit_remove_action(), and
kunit_defer_trigger() has become kunit_release_action().
The _token() versions of these APIs remain, for the moment, even though
they're a bit more awkward and less useful, as they have two advantages:
1. They're faster, as the action doesn't need to be looked up.
2. They provide more flexibility in the ordering of actions in cases
where several identical actions are interleaved with other, different
actions.
Similarly, the internal_gfp argument remains for now, as this is useful
in implementing kunit_kalloc() and similar.
The implementation now uses a single allocation for both the
kunit_resource and the kunit_action_ctx (previously kunit_defer_ctx).
The 'cancellation token' is now of type 'struct kunit_action_ctx',
instead of void*.
Tests have been added to the kunit-resource-test suite which exercise
this functionality. Similarly, the kunit executor tests and
kunit allocation functions have been updated to make use of this API.
I'd love to hear any further thoughts!
Cheers,
-- David
[1]: https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@g…
[2]: https://docs.kernel.org/driver-api/basics.html#c.devm_add_action
David Gow (3):
kunit: Add kunit_add_action() to defer a call until test exit
kunit: executor_test: Use kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
include/kunit/resource.h | 89 +++++++++++++++++++++++++++
lib/kunit/executor_test.c | 12 ++--
lib/kunit/kunit-test.c | 123 +++++++++++++++++++++++++++++++++++++-
lib/kunit/resource.c | 99 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 48 +++------------
5 files changed, 323 insertions(+), 48 deletions(-)
--
2.40.0.348.gf938b09366-goog
On 16.04.23 00:59, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test to ksm_tests
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add two functions for debugging merge outcome for ksm_tests
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm. The debug output is enabled with the
> -d option.
>
> 4) add KSM prctl test to ksm_functional_tests
>
> This adds a test to the ksm_functional_test that verifies that the
> prctl system call to enable / disable KSM works.
>
> 5) add KSM fork test to ksm_functional_test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
Thanks!
Acked-by: David Hildenbrand <david(a)redhat.com>
--
Thanks,
David / dhildenb
Thanks for moving the functional tests. Some more feedback forksm_functional_tests change. Writing tests in the
ksft testing framework can be a bit "special".
I'm seeing some weird test failures due to
prctl(PR_GET_MEMORY_MERGE, 0)
Apparently, these go away when using
prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0)
to explicitly force the other values to 0. Most probably, we should do that
for PR_SET_MEMORY_MERGE as well (especially if we check for the arguments as
well).
[...]
> @@ -15,8 +15,10 @@
> #include <errno.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> +#include <sys/prctl.h>
> #include <sys/syscall.h>
> #include <sys/ioctl.h>
> +#include <sys/wait.h>
> #include <linux/userfaultfd.h>
>
> #include "../kselftest.h"
> @@ -326,9 +328,80 @@ static void test_unmerge_uffd_wp(void)
> }
> #endif
>
> +/* Verify that KSM can be enabled / queried with prctl. */
> +static void test_ksm_prctl(void)
Maybe call this "test_prctl", because after all, these are all KSM tests.
> +{
> + bool ret = false;
> + int is_on;
> + int is_off;
> +
> + ksft_print_msg("[RUN] %s\n", __func__);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + goto out;
> + }
> +
> + is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + goto out;
> + }
> +
> + is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> + if (is_on && is_off)
> + ret = true;
> +
> +out:
> + ksft_test_result(ret, "prctl get / set\n");
The test fails if the kernel does not support PR_SET_MEMORY_MERGE.
I'd modify this test to:
(1) skip if the first PR_SET_MEMORY_MERGE=1 failed with EINVAL.
(2) distinguish for PR_GET_MEMORY_MERGE whether it returned an error or
whether it returned a wrong value. Feel free to keep that as is, whatever
you prefer.
(3) exit early for all failures, you get exactly one expected skip/pass/fail for the
test and use specific test failure messages.
(4) Pass "0" for all other arguments of prctl.
Something like:
static void test_prctl(void)
{
int ret;
ksft_print_msg("[RUN] %s\n", __func__);
ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
if (ret < 0 && errno == EINVAL){
ksft_test_result_skip("PR_SET_MEMORY_MERGE not supported\n");
return;
} else if (ret) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 failed\n");
return;
}
ret = prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret < 0) {
ksft_test_result_fail("PR_GET_MEMORY_MERGE failed\n");
return;
} else if (ret != 1) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 not effective\n");
return;
}
ret = prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret){
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 failed\n");
return;
}
ret = prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret < 0) {
ksft_test_result_fail("PR_GET_MEMORY_MERGE failed\n");
return;
} else if (ret != 0) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 not effective\n");
return;
}
ksft_test_result_pass("Setting/clearing PR_SET_MEMORY_MERGE works\n");
}
> +}
> +
> +/* Verify that prctl ksm flag is inherited. */
> +static void test_ksm_fork(void)
Maybe call it "test_prctl_fork"
> +{
> + int status;
> + bool ret = false;
> + pid_t child_pid;
> +
> + ksft_print_msg("[RUN] %s\n", __func__);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + ksft_test_result_fail("prctl failed\n");
> + goto out;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on =
> +
> + if (!is_on)
> + exit(-1);
> +
> + exit(0);
> + }
> +
> + if (child_pid < 0) {
> + ksft_test_result_fail("child pid < 0\n");
> + goto out;> +
> + if (waitpid(child_pid, &status, 0) < 0 || WEXITSTATUS(status) != 0) {
> + ksft_test_result_fail("wait pid < 0\n");
> + goto out;
> + }
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0))
> + ksft_test_result_fail("prctl 2 failed\n");
> + else
> + ret = true;
> +
> +out:
> + ksft_test_result(ret, "ksm_flag is inherited\n");
> +}
Again, test fails if kernel support is not around.
I'd modify this test to:
(1) skip if the first PR_SET_MEMORY_MERGE=1 failed with EINVAL just as in the other test.
(2) Use a simple exit(prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0)); in the child.
(3) exit early for all failures, you get exactly one expected skip/pass/fail for the
test and use specific test failure messages.
(4) Split up the waitpid() check to test what failed.
(5) Pass "0" for all other arguments of prctl.
Something like:
static void test_prctl_fork(void)
{
int ret, status;
pid_t child_pid;
ksft_print_msg("[RUN] %s\n", __func__);
ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
if (ret < 0 && errno == EINVAL){
ksft_test_result_skip("PR_SET_MEMORY_MERGE not supported\n");
return;
} else if (ret) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 failed\n");
return;
}
child_pid = fork();
if (!child_pid) {
exit(prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0));
} else if (child_pid < 0) {
ksft_test_result_fail("fork() failed\n");
return;
}
if (waitpid(child_pid, &status, 0) < 0) {
ksft_test_result_fail("waitpid() failed\n");
return;
} else if (WEXITSTATUS(status) != 1) {
ksft_test_result_fail("unexpected PR_GET_MEMORY_MERGE result in child\n");
return;
}
if (prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0)) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 failed\n");
return;
}
ksft_test_result_pass("PR_SET_MEMORY_MERGE value is inherited\n");
}
> +
> int main(int argc, char **argv)
> {
> - unsigned int tests = 2;
> + unsigned int tests = 6;
Assuming you execute exactly one ksft_test_result_skip/fail/pass on every path of your two
test, this would become "4".
> int err;
>
> #ifdef __NR_userfaultfd
> @@ -358,6 +431,8 @@ int main(int argc, char **argv)
> #ifdef __NR_userfaultfd
> test_unmerge_uffd_wp();
> #endif
> + test_ksm_prctl();
> + test_ksm_fork();
>
With above outlined changes (feel free to integrate what you consider valuable),
on an older kernel I get:
$ sudo ./ksm_functional_tests
TAP version 13
1..5
# [RUN] test_unmerge
ok 1 Pages were unmerged
# [RUN] test_unmerge_discarded
ok 2 Pages were unmerged
# [RUN] test_unmerge_uffd_wp
ok 3 Pages were unmerged
# [RUN] test_prctl
ok 4 # SKIP PR_SET_MEMORY_MERGE not supported
# [RUN] test_prctl_fork
ok 5 # SKIP PR_SET_MEMORY_MERGE not supported
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:2 error:0
On a kernel with your patch #1:
# ./ksm_functional_tests
TAP version 13
1..5
# [RUN] test_unmerge
ok 1 Pages were unmerged
# [RUN] test_unmerge_discarded
ok 2 Pages were unmerged
# [RUN] test_unmerge_uffd_wp
ok 3 Pages were unmerged
# [RUN] test_prctl
ok 4 Setting/clearing PR_SET_MEMORY_MERGE works
# [RUN] test_prctl_fork
ok 5 PR_SET_MEMORY_MERGE value is inherited
# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
> err = ksft_get_fail_cnt();
> if (err)
> diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
> index f9eb4d67e0dd..35b3828d44b4 100644
> --- a/tools/testing/selftests/mm/ksm_tests.c
> +++ b/tools/testing/selftests/mm/ksm_tests.c
> @@ -1,6 +1,8 @@
> // SPDX-License-Identifier: GPL-2.0
[...]
Changes to ksm_tests mostly look good. Two comments:
> - if (ksm_merge_pages(map_ptr, page_size * page_count, start_time, timeout))
> + if (ksm_merge_pages(merge_type, map_ptr, page_size * page_count, start_time, timeout))
> goto err_out;
>
> /* verify that the right number of pages are merged */
> if (assert_ksm_pages_count(page_count)) {
> printf("OK\n");
> - munmap(map_ptr, page_size * page_count);
> + if (merge_type == KSM_MERGE_MADVISE)
> + munmap(map_ptr, page_size * page_count);
> + else if (merge_type == KSM_MERGE_PRCTL)
> + prctl(PR_SET_MEMORY_MERGE, 0);
Are you sure that we don't want to unmap here? I'd assume we want to unmap in either way.
[...]
> + case 'd':
> + debug = 1;
> + break;
> case 's':
> size_MB = atoi(optarg);
> if (size_MB <= 0) {
> printf("Size must be greater than 0\n");
> return KSFT_FAIL;
> }
> + case 't':
> + {
> + int tmp = atoi(optarg);
> +
> + if (tmp < 0 || tmp > KSM_MERGE_LAST) {
> + printf("Invalid merge type\n");
> + return KSFT_FAIL;
> + }
> + merge_type = atoi(optarg);
You can simply reuse tmp
merge_type = tmp;
--
Thanks,
David / dhildenb
Patch 1 makes a function static because it is only used in one file.
Patch 2 adds info about the git trees we use to help occasional devs.
Patch 3 removes an unused variable.
Patch 4 removes duplicated entries from the help menu of a tool used in
MPTCP selftests.
Patch 5 removes some ShellCheck warnings in mptcp_join.sh selftest.
Only very minor improvements then.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Geliang Tang (1):
mptcp: make userspace_pm_append_new_local_addr static
Matthieu Baerts (4):
MAINTAINERS: add git trees for MPTCP
mptcp: remove unused 'remaining' variable
selftests: mptcp: remove duplicated entries in usage
selftests: mptcp: join: fix ShellCheck warnings
MAINTAINERS | 2 ++
net/mptcp/options.c | 7 ++-----
net/mptcp/pm_userspace.c | 4 ++--
net/mptcp/protocol.h | 2 --
tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 ++++----
tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 ++++++++--
6 files changed, 18 insertions(+), 15 deletions(-)
---
base-commit: c11d2e718c792468e67389b506451eddf26c2dac
change-id: 20230414-upstream-net-next-20230414-mptcp-small-cleanups-1cba986990b1
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
The existing selftest suite for openvswitch will work for regression
testing the datapath feature bits, but won't test things like adding
interfaces, or the upcall interface. Here, we add some additional
test facilities.
First, extend the ovs-dpctl.py python module to support the OVS_FLOW
and OVS_PACKET netlink families, with some associated messages. These
can be extended over time, but the initial support is for more well
known cases (output, userspace, and CT).
Next, extend the test suite to test upcalls by adding a datapath,
monitoring the upcall socket associated with the datapath, and then
dumping any upcalls that are received. Compare with expected ARP
upcall via arping.
Aaron Conole (3):
selftests: openvswitch: add interface support
selftests: openvswitch: add flow dump support
selftests: openvswitch: add support for upcall testing
.../selftests/net/openvswitch/openvswitch.sh | 89 +-
.../selftests/net/openvswitch/ovs-dpctl.py | 1276 ++++++++++++++++-
2 files changed, 1349 insertions(+), 16 deletions(-)
--
2.39.2
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
From: Jinrong Liang <cloudliang(a)tencent.com>
Hi,
This patch set adds some tests to ensure consistent PMU performance event
filter behavior. Specifically, the patches aim to improve KVM's PMU event
filter by strengthening the test coverage, adding documentation, and making
other small changes.
The first patch replaces int with uint32_t for nevents to ensure consistency
and readability in the code. The second patch adds fixed_counter_bitmap to
create_pmu_event_filter() to support the use of the same creator to control
the use of guest fixed counters. The third patch adds test cases for
unsupported input values in PMU filter, including unsupported "action"
values, unsupported "flags" values, and unsupported "nevents" values. Also,
it tests setting non-existent fixed counters in the fixed bitmap doesn't
fail.
The fourth patch updates the documentation for KVM_SET_PMU_EVENT_FILTER ioctl
to include a detailed description of how fixed performance events are handled
in the pmu filter. The fifth patch adds tests to cover that pmu_event_filter
works as expected when applied to fixed performance counters, even if there
is no fixed counter exists. The sixth patch adds a test to ensure that setting
both generic and fixed performance event filters does not affect the consistency
of the fixed performance filter behavior in KVM. The seventh patch adds a test
to verify the behavior of the pmu event filter when an incomplete
kvm_pmu_event_filter structure is used.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases. These patches have been tested and verified to
function properly.
Thanks for your review and feedback.
Sincerely,
Jinrong Liang
Jinrong Liang (7):
KVM: selftests: Replace int with uint32_t for nevents
KVM: selftests: Apply create_pmu_event_filter() to fixed ctrs
KVM: selftests: Test unavailable event filters are rejected
KVM: x86/pmu: Add documentation for fixed ctr on PMU filter
KVM: selftests: Check if pmu_event_filter meets expectations on fixed
ctrs
KVM: selftests: Check gp event filters without affecting fixed event
filters
KVM: selftests: Test pmu event filter with incompatible
kvm_pmu_event_filter
Documentation/virt/kvm/api.rst | 21 ++
.../kvm/x86_64/pmu_event_filter_test.c | 239 ++++++++++++++++--
2 files changed, 243 insertions(+), 17 deletions(-)
base-commit: a25497a280bbd7bbcc08c87ddb2b3909affc8402
--
2.31.1
This series replaces the C99 compatibility patch. (See v1 link below).
After the discussion about support C99 and/or GNU89 I came to the
conclusion supporting straight C89 is not very hard.
Instead of validating both C99 and GNU89 in some awkward way only for
somebody requesting true C89 support let's just do it this way.
Feel free to squash all the comment syntax patches together if you
prefer.
All changes in this series are cosmetic only.
To: Willy Tarreau <w(a)1wt.eu>
To: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
This series is based on the "dev" branch of the RCU tree.
---
Changes in v2:
- Target C89 instead of C99
- Link to v1: https://lore.kernel.org/r/20230328-nolibc-c99-v1-1-a8302fb19f19@weissschuh.…
---
Thomas Weißschuh (11):
tools/nolibc: use standard __asm__ statements
tools/nolibc: use __inline__ syntax
tools/nolibc: i386: use C89 comment syntax
tools/nolibc: x86_64: use C89 comment syntax
tools/nolibc: riscv: use C89 comment syntax
tools/nolibc: aarch64: use C89 comment syntax
tools/nolibc: arm: use C89 comment syntax
tools/nolibc: mips: use C89 comment syntax
tools/nolibc: loongarch: use C89 comment syntax
tools/nolibc: use C89 comment syntax
tools/nolibc: validate C89 compatibility
tools/include/nolibc/arch-aarch64.h | 32 ++++++++--------
tools/include/nolibc/arch-arm.h | 42 ++++++++++-----------
tools/include/nolibc/arch-i386.h | 40 ++++++++++----------
tools/include/nolibc/arch-loongarch.h | 38 +++++++++----------
tools/include/nolibc/arch-mips.h | 56 ++++++++++++++--------------
tools/include/nolibc/arch-riscv.h | 40 ++++++++++----------
tools/include/nolibc/arch-x86_64.h | 34 ++++++++---------
tools/include/nolibc/stackprotector.h | 4 +-
tools/include/nolibc/stdlib.h | 18 ++++-----
tools/include/nolibc/string.h | 4 +-
tools/include/nolibc/sys.h | 8 ++--
tools/testing/selftests/nolibc/Makefile | 2 +-
tools/testing/selftests/nolibc/nolibc-test.c | 14 +++----
13 files changed, 166 insertions(+), 166 deletions(-)
---
base-commit: bd5b341f0f69eb4c958ffd48699213c5b9af8145
change-id: 20230328-nolibc-c99-59f44ea45636
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Due to the lack of the SKIP directive in the output, if any of the
parameterized test was skipped, the parser could not recognize that
correctly and was marking the test as PASSED.
This can easily be seen by running the new subtest from patch 1:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [PASSED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 3
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params* \
--raw_output
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
After adding the SKIP directive, the report looks as expected:
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [SKIPPED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 2, skipped: 1
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0 # SKIP unsupported param value
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
v2: better align with future support for arbitrary levels of testing
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update kunit_print_ok_not_ok function
include/kunit/test.h | 1 +
lib/kunit/kunit-example-test.c | 34 +++++++++++++++++++++++++++
lib/kunit/test.c | 43 ++++++++++++++++++++++------------
3 files changed, 63 insertions(+), 15 deletions(-)
--
2.25.1
Building bpf selftests with clang can trigger errors like the following:
CLNG-BPF [test_maps] bpf_iter_netlink.bpf.o
progs/bpf_iter_netlink.c:32:4: error: incompatible pointer types assigning to 'struct sock *' from 'struct sock___17 *' [-Werror,-Wincompatible-pointer-types]
s = &nlk->sk;
^ ~~~~~~~~
1 error generated.
This is due to the fact that bpftool emits duplicate data types with
different names in vmlinux.h (i.e., `struct sock` in this case) and
these types, despite having a different name, represent in fact the same
object.
Add -Wno-incompatible-pointer-types to CLANG_CLAGS to prevent these
errors.
Signed-off-by: Andrea Righi <andrea.righi(a)canonical.com>
---
tools/testing/selftests/bpf/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index b677dcd0b77a..0d9ef819a065 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -356,7 +356,8 @@ BPF_CFLAGS = -g -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
-I$(abspath $(OUTPUT)/../usr/include)
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
- -Wno-compare-distinct-pointer-types
+ -Wno-compare-distinct-pointer-types \
+ -Wno-incompatible-pointer-types
$(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
--
2.39.2
An error snuck in between two recent conflicting changes:
Until recently ->setup() used negative values to indicate
normal test termination. This was changed in
commit fa10366cc6f4 ("selftests/resctrl: Allow ->setup() to return
errors") that transitioned ->setup() to use negative values
to indicate errors and a new END_OF_TESTS to indicate normal
termination.
commit 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100%
allocation on first run in MBM test") continued to use
negative return to indicate normal test termination.
Fix mbm_setup() to use the new END_OF_TESTS to indicate
error-free test termination.
Fixes: 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test")
Reported-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Link: https://lore.kernel.org/lkml/bb65cce8-54d7-68c5-ef19-3364ec95392a@linux.int…
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
Hi Shuah,
Apologies, this error snuck in between the two series
merged into kselftest's next this week.
tools/testing/selftests/resctrl/mbm_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 146132fa986d..538d35a6485a 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
--
2.34.1
Hello Reinette,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on based on the "next" branch of
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
[v8] https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
There is a spelling mistake in an error message string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mm/uffd-common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 61c6250adf93..54dfd92d86cf 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -311,7 +311,7 @@ int uffd_test_ctx_init(uint64_t features, const char **errmsg)
ret = userfaultfd_open(&features);
if (ret) {
if (errmsg)
- *errmsg = "possible lack of priviledge";
+ *errmsg = "possible lack of privilege";
return ret;
}
--
2.30.2
Due to the lack of the SKIP directive in the output, if any of the
parameterized test was skipped, the parser could not recognize that
correctly and was marking the test as PASSED.
This can easily be seen by running the new subtest from patch 1:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [PASSED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 3
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params* \
--raw_output
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
After adding the SKIP directive, the report looks as expected:
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [SKIPPED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 2, skipped: 1
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0 # SKIP unsupported param value
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
Cc: David Gow <davidgow(a)google.com>
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update reporting function to support results from subtests
lib/kunit/kunit-example-test.c | 34 ++++++++++++++++++++++++++++++++++
lib/kunit/test.c | 26 +++++++++++++++++---------
2 files changed, 51 insertions(+), 9 deletions(-)
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before map is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
When we added fd based file streams we created references to STx_FILENO in
stdio.h but these constants are declared in unistd.h which is the last file
included by the top level nolibc.h meaning those constants are not defined
when we try to build stdio.h. This causes programs using nolibc.h to fail
to build.
Reorder the headers to avoid this issue.
Fixes: d449546c957f ("tools/nolibc: implement fd-based FILE streams")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 04739a6293c4..05a228a6ee78 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -99,11 +99,11 @@
#include "sys.h"
#include "ctype.h"
#include "signal.h"
+#include "unistd.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "time.h"
-#include "unistd.h"
#include "stackprotector.h"
/* Used by programs to avoid std includes */
---
base-commit: 7d8214bba44c1aa6a75921a09a691945d26a8d43
change-id: 20230413-nolibc-stdio-fix-fb42de39d099
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 12.04.23 05:16, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add KSM get merge type test
>
> This adds the -G flag to the ksm_tests program to query the KSM
> status with prctl after KSM has been enabled with prctl.
>
> 4) add KSM fork test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> 5) add two functions for debugging merge outcome
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm.
>
> The debugging can be enabled with the following command line:
> make -C tools/testing/selftests TARGETS="mm" --keep-going \
> EXTRA_CFLAGS=-DDEBUG=1
Would it make sense to instead have a "-D" (if still unused) runtime
options to print this data? Dead code that's not compiled is a bit
unfortunate as it can easily bit-rot.
This patch essentially does two things
1) Add the option to run all tests/benchmarks with the PRCTL instead of
MADVISE
2) Add some functional KSM tests for the new PRCTL (fork, enabling
works, disabling works).
The latter should rather go into ksm_functional_tests().
[...]
>
> -static int check_ksm_unmerge(int mapping, int prot, int timeout, size_t page_size)
> +/* Verify that prctl ksm flag is inherited. */
> +static int check_ksm_fork(void)
> +{
> + int rc = KSFT_FAIL;
> + pid_t child_pid;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl");
> + return KSFT_FAIL;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (!is_on)
> + exit(KSFT_FAIL);
> +
> + exit(KSFT_PASS);
> + }
> +
> + if (child_pid < 0)
> + goto out;
> +
> + if (waitpid(child_pid, &rc, 0) < 0)
> + rc = KSFT_FAIL;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl");
> + rc = KSFT_FAIL;
> + }
> +
> +out:
> + if (rc == KSFT_PASS)
> + printf("OK\n");
> + else
> + printf("Not OK\n");
> +
> + return rc;
> +}
> +
> +static int check_ksm_get_merge_type(void)
> +{
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (is_on && is_off) {
> + printf("OK\n");
> + return KSFT_PASS;
> + }
> +
> + printf("Not OK\n");
> + return KSFT_FAIL;
> +}
Yes, these two are better located in ksm_functional_tests() to just run
them both automatically when the test is executed.
--
Thanks,
David / dhildenb
Hi Shuah and kselftest team,
There are a couple of resctrl selftest patches that are ready for inclusion. They have been percolating on the list for a while without expecting more feedback. All have "Reviewed-by" tags from at least one reviewer. Could you please consider including them into the kselftest repo? There is one minor merge conflict between two of the series for which the snippet below shows resolution.
[PATCH v8 0/6] Some improvements of resctrl selftest
https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
[PATCH v2 0/9] selftests/resctrl: Fixes to error handling logic and cleanups
https://lore.kernel.org/lkml/20230215130605.31583-1-ilpo.jarvinen@linux.int…
[PATCH] selftests/resctrl: Use correct exit code when tests fail
https://lore.kernel.org/lkml/20230309145757.2280518-1-peternewman@google.co…
The snippet below shows resolution of the merge conflict between the
first and second series:
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 040ca1f9c173..775f9e542ff6 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
Thank you very much.
Reinette
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (1):
selftests: mptcp: userspace pm: uniform verify events
Paolo Abeni (3):
mptcp: use mptcp_schedule_work instead of open-coding it
mptcp: stricter state check in mptcp_worker
mptcp: fix NULL pointer dereference on fastopen early fallback
net/mptcp/fastopen.c | 11 +++++++++--
net/mptcp/options.c | 5 ++---
net/mptcp/protocol.c | 2 +-
net/mptcp/subflow.c | 18 ++++++------------
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 ++
5 files changed, 20 insertions(+), 18 deletions(-)
---
base-commit: a4506722dc39ca840593f14e3faa4c9ba9408211
change-id: 20230411-upstream-net-20230411-mptcp-fixes-db47f50c2688
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. ioctl IOMMU_DEVICE_GET_HW_INFO is extended to report
the supported hwpt types bitmap to user. Selftest is added as well to cover
the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
base-commit: 3dfe670c94c7fc4af42e5c08cdd8a110b594e18e
[1] https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add new iommu op to create domains owned by userspace
iommu: Add nested domain support
Nicolin Chen (5):
iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (5):
iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
iommufd: Pass parent hwpt and user_data to
iommufd_hw_pagetable_alloc()
iommufd: IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd/device: Report supported hwpt_types
drivers/iommu/iommufd/device.c | 9 +-
drivers/iommu/iommufd/hw_pagetable.c | 242 +++++++++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 16 +-
drivers/iommu/iommufd/iommufd_test.h | 30 +++
drivers/iommu/iommufd/main.c | 7 +-
drivers/iommu/iommufd/selftest.c | 104 +++++++-
include/linux/iommu.h | 11 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd.c | 126 ++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++
10 files changed, 654 insertions(+), 27 deletions(-)
--
2.34.1
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v3:
- also provide and use fflush/fclose.
- reject fileno(NULL).
- provide compatability with buffered streams from glibc.
- Link to v2: https://lore.kernel.org/r/20230328-nolibc-printf-test-v2-0-f72bdf210190@wei…
Changes in v2:
- Include <sys/mman.h> for tests.
- Implement FILE* in terms of integer pointers.
- Provide fdopen() and fileno().
- Link to v1: https://lore.kernel.org/lkml/20230328-nolibc-printf-test-v1-0-d7290ec893dd@…
---
Thomas Weißschuh (4):
tools/nolibc: add libc-test binary
tools/nolibc: add wrapper for memfd_create
tools/nolibc: implement fd-based FILE streams
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 95 ++++++++++++++++++++--------
tools/include/nolibc/sys.h | 23 +++++++
tools/testing/selftests/nolibc/.gitignore | 1 +
tools/testing/selftests/nolibc/Makefile | 5 ++
tools/testing/selftests/nolibc/nolibc-test.c | 86 +++++++++++++++++++++++++
5 files changed, 183 insertions(+), 27 deletions(-)
---
base-commit: a63baab5f60110f3631c98b55d59066f1c68c4f7
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before one_page is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
Here is a series with some fixes and cleanups to resctrl selftests and
rewrite of CAT test into something that really tests CAT working or not
condition.
I know that this series will conflict with some of patches from
Shaopeng Tan that so far have not made it into the kselftest tree. Due
to CAT test rewrite done in this series, some of those patches would no
longer be relevant anyway but some of them are still very valid (I've
not tried to reinvent the fixes in Shaopeng's series in this series).
Ilpo Järvinen (22):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Check also too low values for CBM bits
selftests/resctrl: Make span unsigned long everywhere
selftests/resctrl: Express span in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Split run_fill_buf() to alloc, work, and dealloc
helpers
selftests/resctrl: Remove start_buf local variable from buffer alloc
func
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Add flush_buffer() to fill_buf
selftests/resctrl: Remove test type checks from cat_val()
selftests/resctrl: Refactor get_cbm_mask()
selftests/resctrl: Create cache_alloc_size() helper
selftests/resctrl: Replace count_bits with count_consecutive_bits()
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Move CAT/CMT test global vars to func they are used
selftests/resctrl: Read in less obvious order to defeat prefetch
optimizations
selftests/resctrl: Split measure_cache_vals() function
selftests/resctrl: Split show_cache_info() to test specific and
generic parts
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 154 ++++++------
tools/testing/selftests/resctrl/cat_test.c | 221 +++++++++---------
tools/testing/selftests/resctrl/cmt_test.c | 60 +++--
tools/testing/selftests/resctrl/fill_buf.c | 107 +++++----
tools/testing/selftests/resctrl/mba_test.c | 8 +-
tools/testing/selftests/resctrl/mbm_test.c | 16 +-
tools/testing/selftests/resctrl/resctrl.h | 28 ++-
.../testing/selftests/resctrl/resctrl_tests.c | 34 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 4 +-
tools/testing/selftests/resctrl/resctrlfs.c | 160 ++++++++++---
11 files changed, 447 insertions(+), 347 deletions(-)
--
2.30.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they almost all
depend on each other.
Changes from v2 (https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org):
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
Stephen Boyd (11):
of: Add KUnit test to confirm DTB is loaded
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 22 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../bindings/test/test,clk-fixed-rate.yaml | 35 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 140 ++++++
drivers/base/test/platform_kunit.c | 215 ++++++++
drivers/clk/.kunitconfig | 3 +
drivers/clk/Kconfig | 7 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 374 ++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 224 +++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 459 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 5 +
drivers/of/Kconfig | 19 +
drivers/of/Makefile | 4 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 125 +++++
drivers/of/of_test.c | 34 ++
drivers/of/overlay_test.c | 110 +++++
include/kunit/clk.h | 28 ++
include/kunit/of.h | 94 ++++
include/kunit/platform_device.h | 15 +
31 files changed, 2109 insertions(+), 2 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,clk-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/.kunitconfig
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/of_test.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
prerequisite-patch-id: 33517b96dd0768ab9c265f5721629786354ee320
prerequisite-patch-id: 909221815eeca0a2b0cdd385c76f57e185fb9e33
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
---
v3:
- Integrate selftest into test_progs (Alexei)
v2:
- Fixes for checkpatch.pl
- Fixes for kernel test robot
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 28 ++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 119 ++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 ++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++
10 files changed, 432 insertions(+), 19 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
[1] https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
David Hildenbrand (6):
selftests/mm: reuse read_pmd_pagesize() in COW selftest
selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs
without write permissions
sparc/mm: don't unconditionally set HW writable bit when setting PTE
dirty on 64bit
mm/migrate: revert "mm/migrate: fix wrongly apply write bit after
mkdirty on sparc64"
mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit
when thp splits on pmd""
mm/huge_memory: conditionally call maybe_mkwrite() and drop
pte_wrprotect() in __split_huge_pmd_locked()
arch/sparc/include/asm/pgtable_64.h | 116 +++---
mm/huge_memory.c | 16 +-
mm/migrate.c | 2 -
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/cow.c | 33 +-
tools/testing/selftests/mm/khugepaged.c | 4 +
tools/testing/selftests/mm/mkdirty.c | 379 ++++++++++++++++++
tools/testing/selftests/mm/soft-dirty.c | 3 +
.../selftests/mm/split_huge_page_test.c | 4 +
tools/testing/selftests/mm/vm_util.c | 4 +-
10 files changed, 468 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/mm/mkdirty.c
--
2.39.2
This patch series introduces a new "isolcpus" partition type to the
existing list of {member, root, isolated} types. The primary reason
of adding this new "isolcpus" partition is to facilitate the
distribution of isolated CPUs down the cgroup v2 hierarchy.
The other non-member partition types have the limitation that their
parents have to be valid partitions too. It will be hard to create a
partition a few layers down the hierarchy.
It is relatively rare to have applications that require creation of
a separate scheduling domain (root). However, it is more common to
have applications that require the use of isolated CPUs (isolated),
e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options
to get that statically. Of course, the "isolated" partition is another
way to achieve that dynamically.
Modern container orchestration tools like Kubernetes use the cgroup
hierarchy to manage different containers. If a container needs to use
isolated CPUs, it is hard to get those with existing set of cpuset
partition types. With this patch series, a new "isolcpus" partition
can be created to hold a set of isolated CPUs that can be pull into
other "isolated" partitions.
The "isolcpus" partition is special that there can have at most one
instance of this in a system. It serves as a pool for isolated CPUs
and cannot hold tasks or sub-cpusets underneath it. It is also not
cpu-exclusive so that the isolated CPUs can be distributed down the
sibling hierarchies, though those isolated CPUs will not be useable
until the partition type becomes "isolated".
Once isolated CPUs are needed in a cgroup, the administrator can write
a list of isolated CPUs into its "cpuset.cpus" and change its partition
type to "isolated" to pull in those isolated CPUs from the "isolcpus"
partition and use them in that cgroup. That will make the distribution
of isolated CPUs to cgroups that need them much easier.
In the future, we may be able to extend this special "isolcpus" partition
type to support other isolation attributes like those that can be
specified with the "isolcpus" boot command line and related options.
Waiman Long (5):
cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE
handling
cgroup/cpuset: Add a new "isolcpus" paritition root state
cgroup/cpuset: Make isolated partition pull CPUs from isolcpus
partition
cgroup/cpuset: Documentation update for the new "isolcpus" partition
cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition
Documentation/admin-guide/cgroup-v2.rst | 89 ++-
kernel/cgroup/cpuset.c | 548 +++++++++++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++----
3 files changed, 789 insertions(+), 224 deletions(-)
--
2.31.1
Hello,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on Linux v6.2-rc7.
Difference from v7:
[patch 4]
- Fix commitlog.
[patch 5]
- Fix commitlog.
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
On Tue, Apr 11, 2023 at 08:16:46PM -0700, Stefan Roesch wrote:
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> +
> + if (!err)
> + ksm_add_vmas(me->mm);
in the last version of this patch, you reported the error. Now you
swallow the error. I have no idea which is correct, but you've
changed the behaviour without explaining it, so I assume it's wrong.
> + } else {
> + clear_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + }
> + mmap_write_unlock(me->mm);
> + break;
> + case PR_GET_MEMORY_MERGE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + break;
Why do we need a GET? Just for symmetry, or is there an actual need for
it?
This patch updates the cgroup-v2.rst file to include information about
the new "isolcpus" partition type.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 89 +++++++++++++++++++------
1 file changed, 70 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index f67c0829350b..352a02849fa7 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2225,7 +2225,8 @@ Cpuset Interface Files
========== =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
========== =====================================
The root cgroup is always a partition root and its state
@@ -2237,24 +2238,41 @@ Cpuset Interface Files
its descendants except those that are separate partition roots
themselves and their descendants.
+ When set to "isolcpus", the CPUs in that partition root will
+ be in an isolated state without any load balancing from the
+ scheduler. This partition root is special as there can be at
+ most one instance of it in a system and no task or child cpuset
+ is allowed in this cgroup. It acts as a pool of isolated CPUs to
+ be pulled into other "isolated" partitions. The "cpuset.cpus"
+ of an "isolcpus" partition root contains the list of isolated
+ CPUs it holds, where "cpuset.cpus.effective" contains the list
+ of freely available isolated CPUs that are ready to be pull
+ into other "isolated" partition.
+
When set to "isolated", the CPUs in that partition root will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
- individual CPUs for optimal performance.
-
- The value shown in "cpuset.cpus.effective" of a partition root
- is the CPUs that the partition root can dedicate to a potential
- new child partition root. The new child subtracts available
- CPUs from its parent "cpuset.cpus.effective".
-
- A partition root ("root" or "isolated") can be in one of the
- two possible states - valid or invalid. An invalid partition
- root is in a degraded state where some state information may
- be retained, but behaves more like a "member".
-
- All possible state transitions among "member", "root" and
- "isolated" are allowed.
+ individual CPUs for optimal performance. The isolated CPUs can
+ come from either the parent partition root or from an "isolcpus"
+ partition if the parent cannot satisfy its request.
+
+ The value shown in "cpuset.cpus.effective" of a partition root is
+ the CPUs that the partition root can dedicate to a potential new
+ child partition root. The new child partition subtracts available
+ CPUs from its parent "cpuset.cpus.effective". An exception is
+ an "isolated" partition that pulls its isolated CPUs from the
+ "isolcpus" partition root that is not its direct parent.
+
+ A partition root can be in one of the two possible states -
+ valid or invalid. An invalid partition root is in a degraded
+ state where some state information may be retained, but behaves
+ more like a "member".
+
+ All possible state transitions among "member", "root", "isolcpus"
+ and "isolated" are allowed. However, the partition root may
+ not be valid if the corresponding prerequisite conditions are
+ not met.
On read, the "cpuset.cpus.partition" file can show the following
values.
@@ -2262,16 +2280,18 @@ Cpuset Interface Files
============================= =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
"root invalid (<reason>)" Invalid partition root
+ "isolcpus invalid (<reason>)" Invalid isolcpus partition root
"isolated invalid (<reason>)" Invalid isolated partition root
============================= =====================================
In the case of an invalid partition root, a descriptive string on
- why the partition is invalid is included within parentheses.
+ why the partition is invalid may be included within parentheses.
- For a partition root to become valid, the following conditions
- must be met.
+ For a "root" partition root to become valid, the following
+ conditions must be met.
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
are not shared by any of its siblings (exclusivity rule).
@@ -2281,6 +2301,37 @@ Cpuset Interface Files
4) The "cpuset.cpus.effective" cannot be empty unless there is
no task associated with this partition.
+ A valid "isolcpus" partition root requires the following
+ conditions.
+
+ 1) The parent cgroup is a valid partition root.
+ 2) The "cpuset.cpus" must be a subset of parent's "cpuset.cpus"
+ including an empty cpu list.
+ 3) There can be no more than one valid "isolcpus" partition.
+ 4) No task or child cpuset is allowed.
+
+ Note that an "isolcpus" partition is not exclusive and its
+ isolated CPUs can be distributed down sibling cgroups even
+ though they may not appear in their "cpuset.cpus.effective".
+
+ A valid "isolated" partition root can pull isolated CPUs from
+ either its parent partition or from the "isolcpus" partition.
+ It also requires the following conditions to be met.
+
+ 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
+ are not shared by any of its siblings (exclusivity rule).
+ 2) The "cpuset.cpus" is not empty and must be a subset of
+ parent's "cpuset.cpus".
+ 3) The "cpuset.cpus.effective" cannot be empty unless there is
+ no task associated with this partition.
+
+ If pulling isolated CPUS from "isolcpus" partition,
+ the "cpuset.cpus" must also be a subset of "isolcpus"
+ partition's "cpuset.cpus" and all the requested CPUs must
+ be available for pulling, i.e. in "isolcpus" partition's
+ "cpuset.cpus.effective". In this case, its hierarchical parent
+ does not need to be a valid partition root.
+
External events like hotplug or changes to "cpuset.cpus" can
cause a valid partition root to become invalid and vice versa.
Note that a task cannot be moved to a cgroup with empty
--
2.31.1
With the addition of a new "isolcpus" partition in a previous patch,
this patch adds the capability for a privileged user to pull isolated
CPUs from the "isolcpus" partition to an "isolated" partition if its
parent cannot satisfy its request directly.
The following conditions must be true for the pulling of isolated CPUs
from "isolcpus" partition to be successful.
(1) The value of "cpuset.cpus" must still be a subset of its parent's
"cpuset.cpus" to ensure proper inheritance even though these CPUs
cannot be used until the cpuset becomes an "isolated" partition.
(2) All the CPUs in "cpuset.cpus" are freely available in the
"isolcpus" partition, i.e. in its "cpuset.cpus.effective" and not
yet claimed by other isolated partitions.
With this change, the CPUs in an "isolated" partition can either
come from the "isolcpus" partition or from its direct parent, but not
both. Now the parent of an isolated partition does not need to be a
partition root anymore.
Because of the cpu exclusive nature of an "isolated" partition, these
isolated CPUs cannot be distributed to other siblings of that isolated
partition.
Changes to "cpuset.cpus" of such an isolated partition is allowed as
long as all the newly requested CPUs can be granted from the "isolcpus"
partition. Otherwise, the partition will become invalid.
This makes the management and distribution of isolated CPUs to those
applications that require them much easier.
An "isolated" partition that pulls CPUs from the special "isolcpus"
partition can now have 2 parents - the "isolcpus" partition where
it gets its isolated CPUs and its hierarchical parent where it gets
all the other resources. However, such an "isolated" partition cannot
have subpartitions as all the CPUs from "isolcpus" must be in the same
isolated state.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 282 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 264 insertions(+), 18 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 444eae3a9a6b..a5bbd43ed46e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -101,6 +101,7 @@ enum prs_errcode {
PERR_ISOLCPUS,
PERR_ISOLTASK,
PERR_ISOLCHILD,
+ PERR_ISOPARENT,
};
static const char * const perr_strings[] = {
@@ -114,6 +115,7 @@ static const char * const perr_strings[] = {
[PERR_ISOLCPUS] = "An isolcpus partition is already present",
[PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
[PERR_ISOLCHILD] = "Isolcpus partition can't have children",
+ [PERR_ISOPARENT] = "Isolated/isolcpus parent can't have subpartition",
};
struct cpuset {
@@ -1333,6 +1335,195 @@ static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
rebuild_sched_domains_locked();
}
+/*
+ * isolcpus_pull - Enable or disable pulling of isolated cpus from isolcpus
+ * @cs: the cpuset to update
+ * @cmd: the command code (only partcmd_enable or partcmd_disable)
+ * Return: 1 if successful, 0 if error
+ *
+ * Note that pulling isolated cpus from isolcpus or cpus from parent does
+ * not require rebuilding sched domains. So we can change the flags directly.
+ */
+static int isolcpus_pull(struct cpuset *cs, enum subparts_cmd cmd)
+{
+ struct cpuset *parent = parent_cs(cs);
+
+ if (!isolcpus_cs)
+ return 0;
+
+ /*
+ * To enable pulling of isolated CPUs from isolcpus, cpus_allowed
+ * must be a subset of both its parent's cpus_allowed and isolcpus_cs's
+ * effective_cpus and the user has sysadmin privilege.
+ */
+ if ((cmd == partcmd_enable) && capable(CAP_SYS_ADMIN) &&
+ cpumask_subset(cs->cpus_allowed, isolcpus_cs->effective_cpus) &&
+ cpumask_subset(cs->cpus_allowed, parent->cpus_allowed)) {
+ /*
+ * Move cpus from effective_cpus to subparts_cpus & make
+ * cs a child of isolcpus partition.
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_copy(cs->effective_cpus, cs->cpus_allowed);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (cs->use_parent_ecpus) {
+ cs->use_parent_ecpus = false;
+ parent->child_ecpus_count--;
+ }
+ list_add(&cs->isol_sibling, &isol_children);
+ clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+
+ if ((cmd == partcmd_disable) && !list_empty(&cs->isol_sibling)) {
+ /*
+ * This can be called after isolcpus shrinks its cpu list.
+ * So not all the cpus should be returned back to isolcpus.
+ */
+ WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED);
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->effective_cpus);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus,
+ isolcpus_cs->cpus_allowed);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (!cpumask_and(cs->effective_cpus, parent->effective_cpus,
+ cs->cpus_allowed)) {
+ cs->use_parent_ecpus = true;
+ parent->child_ecpus_count++;
+ cpumask_copy(cs->effective_cpus,
+ parent->effective_cpus);
+ }
+ list_del_init(&cs->isol_sibling);
+ cs->partition_root_state = PRS_INVALID_ISOLATED;
+ cs->prs_err = PERR_INVCPUS;
+
+ set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ clear_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+ return 0;
+}
+
+static void isolcpus_disable(void)
+{
+ struct cpuset *child, *next;
+
+ list_for_each_entry_safe(child, next, &isol_children, isol_sibling)
+ WARN_ON_ONCE(isolcpus_pull(child, partcmd_disable));
+
+ isolcpus_cs = NULL;
+}
+
+/*
+ * isolcpus_cpus_update - cpuset.cpus change in isolcpus partition
+ */
+static void isolcpus_cpus_update(struct cpuset *cs)
+{
+ struct cpuset *child, *next;
+
+ if (WARN_ON_ONCE(isolcpus_cs != cs))
+ return;
+
+ if (list_empty(&isol_children))
+ return;
+
+ /*
+ * Remove child isolated partitions that are not fully covered by
+ * subparts_cpus.
+ */
+ list_for_each_entry_safe(child, next, &isol_children,
+ isol_sibling) {
+ if (cpumask_subset(child->cpus_allowed,
+ cs->subparts_cpus))
+ continue;
+
+ isolcpus_pull(child, partcmd_disable);
+ }
+}
+
+/*
+ * isolated_cpus_update - cpuset.cpus change in isolated partition
+ *
+ * Return: 1 if no further action needs, 0 otherwise
+ */
+static int isolated_cpus_update(struct cpuset *cs, struct cpumask *newmask,
+ struct tmpmasks *tmp)
+{
+ struct cpumask *addmask = tmp->addmask;
+ struct cpumask *delmask = tmp->delmask;
+
+ if (WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED) ||
+ list_empty(&cs->isol_sibling))
+ return 0;
+
+ if (WARN_ON_ONCE(!isolcpus_cs) || cpumask_empty(newmask)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ if (cpumask_andnot(addmask, newmask, cs->cpus_allowed)) {
+ /*
+ * Check if isolcpus partition can provide the new CPUs
+ */
+ if (!cpumask_subset(addmask, isolcpus_cs->cpus_allowed) ||
+ cpumask_intersects(addmask, isolcpus_cs->subparts_cpus)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ /*
+ * Pull addmask isolated CPUs from isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, addmask);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, addmask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ if (cpumask_andnot(tmp->delmask, cs->cpus_allowed, newmask)) {
+ /*
+ * Return isolated CPUs back to isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, delmask);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, delmask);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ spin_lock_irq(&callback_lock);
+ cpumask_copy(cs->cpus_allowed, newmask);
+ cpumask_andnot(cs->effective_cpus, newmask, cs->subparts_cpus);
+ cpumask_and(cs->effective_cpus, cs->effective_cpus, cpu_active_mask);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+}
+
/**
* update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset
* @cs: The cpuset that requests change in partition root state
@@ -1579,7 +1770,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
- isolcpus_cs = NULL;
+ isolcpus_disable();
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1625,6 +1816,12 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
struct cpuset *parent = parent_cs(cp);
bool update_parent = false;
+ /*
+ * Skip isolated cpuset that pull isolated CPUs from isolcpus
+ */
+ if (!list_empty(&cp->isol_sibling))
+ continue;
+
compute_effective_cpumask(tmp->new_cpus, cp, parent);
/*
@@ -1742,7 +1939,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
- update_tasks_cpumask(cp, tmp->new_cpus);
+ update_tasks_cpumask(cp, cp->effective_cpus);
/*
* On legacy hierarchy, if the effective cpumask of any non-
@@ -1888,6 +2085,10 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
return retval;
if (cs->partition_root_state) {
+ if (!list_empty(&cs->isol_sibling) &&
+ isolated_cpus_update(cs, trialcs->cpus_allowed, &tmp))
+ goto update_hier; /* CPUs update done */
+
if (invalidate)
update_parent_subparts_cpumask(cs, partcmd_invalidate,
NULL, &tmp);
@@ -1920,6 +2121,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
}
spin_unlock_irq(&callback_lock);
+update_hier:
#ifdef CONFIG_CPUMASK_OFFSTACK
/* Now trialcs->cpus_allowed is available */
tmp.new_cpus = trialcs->cpus_allowed;
@@ -1928,8 +2130,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
/* effective_cpus will be updated here */
update_cpumasks_hier(cs, &tmp, false);
- if (cs->partition_root_state) {
- bool force = (cs->partition_root_state == PRS_ISOLCPUS);
+ if (cs->partition_root_state && list_empty(&cs->isol_sibling)) {
struct cpuset *parent = parent_cs(cs);
/*
@@ -1937,8 +2138,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
* cpusets if they use parent's effective_cpus or when
* the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count || force)
- update_sibling_cpumasks(parent, cs, &tmp, force);
+ if (cs->partition_root_state == PRS_ISOLCPUS) {
+ update_sibling_cpumasks(parent, cs, &tmp, true);
+ isolcpus_cpus_update(cs);
+ } else if (parent->child_ecpus_count) {
+ update_sibling_cpumasks(parent, cs, &tmp, false);
+ }
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2307,7 +2512,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
return err;
}
-/**
+/*
* update_prstate - update partition_root_state
* @cs: the cpuset to update
* @new_prs: new partition root state
@@ -2325,13 +2530,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
return 0;
/*
- * For a previously invalid partition root, leave it at being
- * invalid if new_prs is not "member".
+ * For a previously invalid partition root, treat it like a "member".
*/
- if (new_prs && is_prs_invalid(old_prs)) {
- cs->partition_root_state = -new_prs;
- return 0;
- }
+ if (new_prs && is_prs_invalid(old_prs))
+ old_prs = PRS_MEMBER;
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
@@ -2371,6 +2573,21 @@ static int update_prstate(struct cpuset *cs, int new_prs)
}
}
+ /*
+ * A parent isolated partition that gets its isolated CPUs from
+ * isolcpus cannot have subpartition.
+ */
+ if (new_prs && !list_empty(&parent->isol_sibling)) {
+ err = PERR_ISOPARENT;
+ goto out;
+ }
+
+ if ((old_prs == PRS_ISOLATED) && !list_empty(&cs->isol_sibling)) {
+ isolcpus_pull(cs, partcmd_disable);
+ old_prs = 0;
+ }
+ WARN_ON_ONCE(!list_empty(&cs->isol_sibling));
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2386,6 +2603,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
err = update_parent_subparts_cpumask(cs, partcmd_enable,
NULL, &tmpmask);
+ if (err && (new_prs == PRS_ISOLATED) &&
+ isolcpus_pull(cs, partcmd_enable))
+ err = 0; /* Successful isolcpus pull */
+
if (err)
goto out;
} else if (old_prs && new_prs) {
@@ -2445,7 +2666,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (new_prs == PRS_ISOLCPUS)
isolcpus_cs = cs;
else if (cs == isolcpus_cs)
- isolcpus_cs = NULL;
+ isolcpus_disable();
/*
* Update child cpusets, if present.
@@ -3674,8 +3895,31 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
}
parent = parent_cs(cs);
- compute_effective_cpumask(&new_cpus, cs, parent);
nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
+ /*
+ * In the special case of a valid isolated cpuset pulling isolated
+ * cpus from isolcpus. We just need to mask offline cpus from
+ * cpus_allowed unless all the isolated cpus are gone.
+ */
+ if (!list_empty(&cs->isol_sibling)) {
+ if (!cpumask_and(&new_cpus, cs->cpus_allowed, cpu_active_mask))
+ isolcpus_pull(cs, partcmd_disable);
+ } else if ((cs->partition_root_state == PRS_ISOLCPUS) &&
+ cpumask_empty(cs->cpus_allowed)) {
+ /*
+ * For isolcpus with empty cpus_allowed, just update
+ * effective_mems and be done with it.
+ */
+ spin_lock_irq(&callback_lock);
+ if (nodes_empty(new_mems))
+ cs->effective_mems = parent->effective_mems;
+ else
+ cs->effective_mems = new_mems;
+ spin_unlock_irq(&callback_lock);
+ goto unlock;
+ } else {
+ compute_effective_cpumask(&new_cpus, cs, parent);
+ }
if (cs->nr_subparts_cpus)
/*
@@ -3707,10 +3951,12 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
* the following conditions hold:
* 1) empty effective cpus but not valid empty partition.
* 2) parent is invalid or doesn't grant any cpus to child
- * partitions.
+ * partitions and not an isolated cpuset pulling cpus from
+ * isolcpus.
*/
- if (is_partition_valid(cs) && (!parent->nr_subparts_cpus ||
- (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
+ if (is_partition_valid(cs) &&
+ ((!parent->nr_subparts_cpus && list_empty(&cs->isol_sibling)) ||
+ (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
int old_prs, parent_prs;
update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp);
--
2.31.1
One can use "cpuset.cpus.partition" to create multiple scheduling domains
or to produce a set of isolated CPUs where load balancing is disabled.
The former use case is less common but the latter one can be frequently
used especially for the Telco use cases like DPDK.
The existing "isolated" partition can be used to produce isolated
CPUs if the applications have full control of a system. However, in a
containerized environment where all the apps are run in a container,
it is hard to distribute out isolated CPUs from the root down given
the unified hierarchy nature of cgroup v2.
The container running on isolated CPUs can be several layers down from
the root. The current partition feature requires that all the ancestors
of a leaf partition root must be parititon roots themselves. This can
be hard to manage.
This patch introduces a new special partition root state called
"isolcpus" that serves as a pool of isolated CPUs to be pulled into other
"isolated" partitions. At most one instance of the "isolcpus" partition
is allowed in a system preferrably as a child of the top cpuset.
In a valid "isolcpus" partition, "cpuset.cpus" contains the set of
isolated CPUs and "cpuset.cpus.effective" contains the set of freely
available isolated CPUs that have not yet been pulled into other
"isolated" cpusets.
The special "isolcpus" partition cannot have normal cpuset children. So
we are not allowed to enable child cpuset in its "cgroup.subtree_control"
file if it has children. Tasks are also not allowed in the "cgroup.procs"
of the "isolcpus" partition. Unlike other partition roots, empty
"cpuset.cpus" is allowed in the "isolcpus" partition as this special
cpuset is not designed to hold tasks.
The CPUs in the "isolcpus" partition are not exclusive so that those
isolated CPUs can be distributed down sibling hierarchies as usual even
though they will not show up in their "cpuset.cpus.effective".
Right now, an "isolcpus" partition only disable load balancing of
the isolated CPUs. In the near future, it may be extended to support
additional isolation attributes like those currently supported by the
"isolcpus" or related kernel boot command line options.
In a subsequent patch, a privileged user can change a "member" cpuset
to an "isolated" partition root by pulling isolated CPUs from the
"isolcpus" partition if its parent is not a partition root that can
directly satisfy the request.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 158 ++++++++++++++++++++++++++++++++++-------
1 file changed, 133 insertions(+), 25 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 83a7193e0f2c..444eae3a9a6b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -98,6 +98,9 @@ enum prs_errcode {
PERR_NOCPUS,
PERR_HOTPLUG,
PERR_CPUSEMPTY,
+ PERR_ISOLCPUS,
+ PERR_ISOLTASK,
+ PERR_ISOLCHILD,
};
static const char * const perr_strings[] = {
@@ -108,6 +111,9 @@ static const char * const perr_strings[] = {
[PERR_NOCPUS] = "Parent unable to distribute cpu downstream",
[PERR_HOTPLUG] = "No cpu available due to hotplug",
[PERR_CPUSEMPTY] = "cpuset.cpus is empty",
+ [PERR_ISOLCPUS] = "An isolcpus partition is already present",
+ [PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
+ [PERR_ISOLCHILD] = "Isolcpus partition can't have children",
};
struct cpuset {
@@ -198,6 +204,9 @@ struct cpuset {
/* Handle for cpuset.cpus.partition */
struct cgroup_file partition_file;
+
+ /* siblings list anchored at isol_children */
+ struct list_head isol_sibling;
};
/*
@@ -206,14 +215,26 @@ struct cpuset {
* 0 - member (not a partition root)
* 1 - partition root
* 2 - partition root without load balancing (isolated)
+ * 3 - isolated cpu pool (isolcpus)
* -1 - invalid partition root
* -2 - invalid isolated partition root
+ * -3 - invalid isolated cpu pool
+ *
+ * An isolated cpu pool is a special isolated partition root. At most one
+ * instance of it is allowed in a system. It provides a pool of isolated
+ * cpus that a normal isolated partition root can pull from, if privileged,
+ * in case its parent cannot fulfill its request.
*/
#define PRS_MEMBER 0
#define PRS_ROOT 1
#define PRS_ISOLATED 2
+#define PRS_ISOLCPUS 3
#define PRS_INVALID_ROOT -1
#define PRS_INVALID_ISOLATED -2
+#define PRS_INVALID_ISOLCPUS -3
+
+static struct cpuset *isolcpus_cs; /* System isolcpus partition root */
+static struct list_head isol_children; /* Children that pull isolated cpus */
static inline bool is_prs_invalid(int prs_state)
{
@@ -335,6 +356,7 @@ static struct cpuset top_cpuset = {
.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
(1 << CS_MEM_EXCLUSIVE)),
.partition_root_state = PRS_ROOT,
+ .isol_sibling = LIST_HEAD_INIT(top_cpuset.isol_sibling),
};
/**
@@ -1282,7 +1304,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
*/
static int update_partition_exclusive(struct cpuset *cs, int new_prs)
{
- bool exclusive = (new_prs > 0);
+ bool exclusive = (new_prs == PRS_ROOT) || (new_prs == PRS_ISOLATED);
if (exclusive && !is_cpu_exclusive(cs)) {
if (update_flag(CS_CPU_EXCLUSIVE, cs, 1))
@@ -1303,7 +1325,7 @@ static int update_partition_exclusive(struct cpuset *cs, int new_prs)
static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
{
int new_prs = cs->partition_root_state;
- bool new_lb = (new_prs != PRS_ISOLATED);
+ bool new_lb = (new_prs != PRS_ISOLATED) && (new_prs != PRS_ISOLCPUS);
if (new_lb != !!is_sched_load_balance(cs))
update_flag(CS_SCHED_LOAD_BALANCE, cs, new_lb);
@@ -1360,18 +1382,20 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
int part_error = PERR_NONE; /* Partition error? */
percpu_rwsem_assert_held(&cpuset_rwsem);
+ old_prs = new_prs = cs->partition_root_state;
/*
* The parent must be a partition root.
* The new cpumask, if present, or the current cpus_allowed must
- * not be empty.
+ * not be empty except for isolcpus partition.
*/
if (!is_partition_valid(parent)) {
return is_partition_invalid(parent)
? PERR_INVPARENT : PERR_NOTPART;
}
- if ((newmask && cpumask_empty(newmask)) ||
- (!newmask && cpumask_empty(cs->cpus_allowed)))
+ if ((new_prs != PRS_ISOLCPUS) &&
+ ((newmask && cpumask_empty(newmask)) ||
+ (!newmask && cpumask_empty(cs->cpus_allowed))))
return PERR_CPUSEMPTY;
/*
@@ -1379,7 +1403,6 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
* partcmd_invalidate commands.
*/
adding = deleting = false;
- old_prs = new_prs = cs->partition_root_state;
if (cmd == partcmd_enable) {
/*
* Enabling partition root is not allowed if cpus_allowed
@@ -1498,11 +1521,13 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
switch (cs->partition_root_state) {
case PRS_ROOT:
case PRS_ISOLATED:
+ case PRS_ISOLCPUS:
if (part_error)
new_prs = -old_prs;
break;
case PRS_INVALID_ROOT:
case PRS_INVALID_ISOLATED:
+ case PRS_INVALID_ISOLCPUS:
if (!part_error)
new_prs = -old_prs;
break;
@@ -1553,6 +1578,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
+ if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
+ isolcpus_cs = NULL;
+
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1640,7 +1668,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
*/
old_prs = new_prs = cp->partition_root_state;
if ((cp != cs) && old_prs) {
- switch (parent->partition_root_state) {
+ int parent_prs = parent->partition_root_state;
+
+ /*
+ * isolcpus partition parent can't have children
+ */
+ WARN_ON_ONCE(parent_prs == PRS_ISOLCPUS);
+
+ switch (parent_prs) {
case PRS_ROOT:
case PRS_ISOLATED:
update_parent = true;
@@ -1735,9 +1770,10 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
* @parent: Parent cpuset
* @cs: Current cpuset
* @tmp: Temp variables
+ * @force: Force update if set
*/
static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
- struct tmpmasks *tmp)
+ struct tmpmasks *tmp, bool force)
{
struct cpuset *sibling;
struct cgroup_subsys_state *pos_css;
@@ -1756,7 +1792,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
cpuset_for_each_child(sibling, pos_css, parent) {
if (sibling == cs)
continue;
- if (!sibling->use_parent_ecpus)
+ if (!sibling->use_parent_ecpus && !force)
continue;
if (!css_tryget_online(&sibling->css))
continue;
@@ -1893,14 +1929,16 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
update_cpumasks_hier(cs, &tmp, false);
if (cs->partition_root_state) {
+ bool force = (cs->partition_root_state == PRS_ISOLCPUS);
struct cpuset *parent = parent_cs(cs);
/*
* For partition root, update the cpumasks of sibling
- * cpusets if they use parent's effective_cpus.
+ * cpusets if they use parent's effective_cpus or when
+ * the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmp);
+ if (parent->child_ecpus_count || force)
+ update_sibling_cpumasks(parent, cs, &tmp, force);
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2298,6 +2336,41 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
+ /*
+ * Only one isolcpus partition is allowed and it can't have children
+ * or tasks in it. The isolcpus partition is also not exclusive so
+ * that the isolated but unused cpus can be distributed down the
+ * hierarchy.
+ */
+ if (new_prs == PRS_ISOLCPUS) {
+ if (isolcpus_cs)
+ err = PERR_ISOLCPUS;
+ else if (!list_empty(&cs->css.children))
+ err = PERR_ISOLCHILD;
+ else if (cs->css.cgroup->nr_populated_csets)
+ err = PERR_ISOLTASK;
+
+ if (err && old_prs) {
+ /*
+ * A previous valid partition root is now invalid
+ */
+ goto disable_partition;
+ } else if (err) {
+ goto out;
+ }
+
+ /*
+ * Unlike other partition types, an isolated cpu pool can
+ * be empty as it is essentially a place holder for isolated
+ * CPUs.
+ */
+ if (!old_prs && cpumask_empty(cs->cpus_allowed)) {
+ /* Force effective_cpus to be empty too */
+ cpumask_clear(cs->effective_cpus);
+ goto out;
+ }
+ }
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2316,11 +2389,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (err)
goto out;
} else if (old_prs && new_prs) {
- /*
- * A change in load balance state only, no change in cpumasks.
- */
- goto out;
+ goto out; /* Skip cpuset and sibling task update */
} else {
+disable_partition:
/*
* Switching back to member is always allowed even if it
* disables child partitions.
@@ -2342,8 +2413,13 @@ static int update_prstate(struct cpuset *cs, int new_prs)
update_tasks_cpumask(parent, tmpmask.new_cpus);
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmpmask);
+ /*
+ * Since isolcpus partition is not exclusive, we have to update
+ * sibling hierarchies as well.
+ */
+ if ((new_prs == PRS_ISOLCPUS) || parent->child_ecpus_count)
+ update_sibling_cpumasks(parent, cs, &tmpmask,
+ new_prs == PRS_ISOLCPUS);
out:
/*
@@ -2363,6 +2439,14 @@ static int update_prstate(struct cpuset *cs, int new_prs)
/* Update sched domains and load balance flag */
update_partition_sd_lb(cs, old_prs);
+ /*
+ * Check isolcpus_cs state
+ */
+ if (new_prs == PRS_ISOLCPUS)
+ isolcpus_cs = cs;
+ else if (cs == isolcpus_cs)
+ isolcpus_cs = NULL;
+
/*
* Update child cpusets, if present.
* Force update if switching back to member.
@@ -2486,7 +2570,12 @@ static struct cpuset *cpuset_attach_old_cs;
*/
static int cpuset_can_attach_check(struct cpuset *cs)
{
+ /*
+ * Task cannot be moved to a cpuset with empty effective cpus or
+ * is an isolcpus partition.
+ */
if (cpumask_empty(cs->effective_cpus) ||
+ (cs->partition_root_state == PRS_ISOLCPUS) ||
(!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
return -ENOSPC;
return 0;
@@ -2902,24 +2991,30 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
static int sched_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
+ int prs = cs->partition_root_state;
const char *err, *type = NULL;
- switch (cs->partition_root_state) {
+ switch (prs) {
case PRS_ROOT:
seq_puts(seq, "root\n");
break;
case PRS_ISOLATED:
seq_puts(seq, "isolated\n");
break;
+ case PRS_ISOLCPUS:
+ seq_puts(seq, "isolcpus\n");
+ break;
case PRS_MEMBER:
seq_puts(seq, "member\n");
break;
- case PRS_INVALID_ROOT:
- type = "root";
- fallthrough;
- case PRS_INVALID_ISOLATED:
- if (!type)
+ default:
+ if (prs == PRS_INVALID_ROOT)
+ type = "root";
+ else if (prs == PRS_INVALID_ISOLATED)
type = "isolated";
+ else
+ type = "isolcpus";
+
err = perr_strings[READ_ONCE(cs->prs_err)];
if (err)
seq_printf(seq, "%s invalid (%s)\n", type, err);
@@ -2948,6 +3043,8 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
val = PRS_MEMBER;
else if (!strcmp(buf, "isolated"))
val = PRS_ISOLATED;
+ else if (!strcmp(buf, "isolcpus"))
+ val = PRS_ISOLCPUS;
else
return -EINVAL;
@@ -3157,6 +3254,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
nodes_clear(cs->effective_mems);
fmeter_init(&cs->fmeter);
cs->relax_domain_level = -1;
+ INIT_LIST_HEAD(&cs->isol_sibling);
/* Set CS_MEMORY_MIGRATE for default hierarchy */
if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
@@ -3171,6 +3269,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
struct cpuset *parent = parent_cs(cs);
struct cpuset *tmp_cs;
struct cgroup_subsys_state *pos_css;
+ int err = 0;
if (!parent)
return 0;
@@ -3178,6 +3277,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpus_read_lock();
percpu_down_write(&cpuset_rwsem);
+ /*
+ * An isolcpus partition cannot have direct children.
+ */
+ if (parent->partition_root_state == PRS_ISOLCPUS) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
set_bit(CS_ONLINE, &cs->flags);
if (is_spread_page(parent))
set_bit(CS_SPREAD_PAGE, &cs->flags);
@@ -3229,7 +3336,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
out_unlock:
percpu_up_write(&cpuset_rwsem);
cpus_read_unlock();
- return 0;
+ return err;
}
/*
@@ -3434,6 +3541,7 @@ int __init cpuset_init(void)
fmeter_init(&top_cpuset.fmeter);
set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
top_cpuset.relax_domain_level = -1;
+ INIT_LIST_HEAD(&isol_children);
BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
--
2.31.1
I have been running hid-tools for a while, but it was in its own
separate repository for multiple reasons. And the past few weeks
I finally managed to make the kernel tests in that repo in a
state where we can merge them in the kernel tree directly:
- the tests run in ~2 to 3 minutes
- the tests are way more reliable than previously
- the tests are mostly self-contained now (to the exception
of the Sony ones)
To be able to run the tests we need to use the latest release
of hid-tools, as this project still keeps the HID parsing logic
and is capable of generating the HID events.
The series also ensures we can run the tests with vmtest.sh,
allowing for a quick development and test in the tree itself.
This should allow us to require tests to be added to a series
when we see fit and keep them alive properly instead of having
to deal with 2 repositories.
In Cc are all of the people who participated in the elaboration
of those tests, so please send back a signed-off-by for each
commit you are part of.
This series applies on top of the for-6.3/hid-bpf branch, which
is the one that added the tools/testing/selftests/hid directory.
Given that this is unlikely this series will make the cut for
6.3, we might just consider this series to be based on top of
the future 6.3-rc1.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
Benjamin Tissoires (11):
selftests: hid: make vmtest rely on make
selftests: hid: import hid-tools hid-core tests
selftests: hid: import hid-tools hid-gamepad tests
selftests: hid: import hid-tools hid-keyboards tests
selftests: hid: import hid-tools hid-mouse tests
selftests: hid: import hid-tools hid-multitouch and hid-tablets tests
selftests: hid: import hid-tools wacom tests
selftests: hid: import hid-tools hid-apple tests
selftests: hid: import hid-tools hid-ite tests
selftests: hid: import hid-tools hid-sony and hid-playstation tests
selftests: hid: import hid-tools usb-crash tests
tools/testing/selftests/hid/Makefile | 12 +
tools/testing/selftests/hid/config | 11 +
tools/testing/selftests/hid/hid-apple.sh | 7 +
tools/testing/selftests/hid/hid-core.sh | 7 +
tools/testing/selftests/hid/hid-gamepad.sh | 7 +
tools/testing/selftests/hid/hid-ite.sh | 7 +
tools/testing/selftests/hid/hid-keyboard.sh | 7 +
tools/testing/selftests/hid/hid-mouse.sh | 7 +
tools/testing/selftests/hid/hid-multitouch.sh | 7 +
tools/testing/selftests/hid/hid-sony.sh | 7 +
tools/testing/selftests/hid/hid-tablet.sh | 7 +
tools/testing/selftests/hid/hid-usb_crash.sh | 7 +
tools/testing/selftests/hid/hid-wacom.sh | 7 +
tools/testing/selftests/hid/run-hid-tools-tests.sh | 28 +
tools/testing/selftests/hid/settings | 3 +
tools/testing/selftests/hid/tests/__init__.py | 2 +
tools/testing/selftests/hid/tests/base.py | 345 ++++
tools/testing/selftests/hid/tests/conftest.py | 81 +
.../selftests/hid/tests/descriptors_wacom.py | 1360 +++++++++++++
.../selftests/hid/tests/test_apple_keyboard.py | 440 +++++
tools/testing/selftests/hid/tests/test_gamepad.py | 209 ++
tools/testing/selftests/hid/tests/test_hid_core.py | 154 ++
.../selftests/hid/tests/test_ite_keyboard.py | 166 ++
tools/testing/selftests/hid/tests/test_keyboard.py | 485 +++++
tools/testing/selftests/hid/tests/test_mouse.py | 977 +++++++++
.../testing/selftests/hid/tests/test_multitouch.py | 2088 ++++++++++++++++++++
tools/testing/selftests/hid/tests/test_sony.py | 282 +++
tools/testing/selftests/hid/tests/test_tablet.py | 872 ++++++++
.../testing/selftests/hid/tests/test_usb_crash.py | 103 +
.../selftests/hid/tests/test_wacom_generic.py | 844 ++++++++
tools/testing/selftests/hid/vmtest.sh | 25 +-
31 files changed, 8554 insertions(+), 10 deletions(-)
---
base-commit: 2f7f4efb9411770b4ad99eb314d6418e980248b4
change-id: 20230217-import-hid-tools-tests-dc0cd4f3c8a8
Best regards,
--
Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *s to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/powerpc/stringloops/strlen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
index 9055ebc484d0..f9c1f9cc2d32 100644
--- a/tools/testing/selftests/powerpc/stringloops/strlen.c
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
@@ -51,10 +50,11 @@ static void bench_test(char *s)
static int testcase(void)
{
char *s;
+ int ret;
unsigned long i;
- s = memalign(128, SIZE);
- if (!s) {
+ ret = posix_memalign((void **)&s, 128, SIZE);
+ if (ret < 0) {
perror("memalign");
exit(1);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai(a)intel.com>
---
tools/testing/selftests/sgx/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 75af864e07b6..50aab6b57da3 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..8f48f07bc821 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)(&one_page), pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0