memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before map is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
When we added fd based file streams we created references to STx_FILENO in
stdio.h but these constants are declared in unistd.h which is the last file
included by the top level nolibc.h meaning those constants are not defined
when we try to build stdio.h. This causes programs using nolibc.h to fail
to build.
Reorder the headers to avoid this issue.
Fixes: d449546c957f ("tools/nolibc: implement fd-based FILE streams")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 04739a6293c4..05a228a6ee78 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -99,11 +99,11 @@
#include "sys.h"
#include "ctype.h"
#include "signal.h"
+#include "unistd.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "time.h"
-#include "unistd.h"
#include "stackprotector.h"
/* Used by programs to avoid std includes */
---
base-commit: 7d8214bba44c1aa6a75921a09a691945d26a8d43
change-id: 20230413-nolibc-stdio-fix-fb42de39d099
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 12.04.23 05:16, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add KSM get merge type test
>
> This adds the -G flag to the ksm_tests program to query the KSM
> status with prctl after KSM has been enabled with prctl.
>
> 4) add KSM fork test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> 5) add two functions for debugging merge outcome
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm.
>
> The debugging can be enabled with the following command line:
> make -C tools/testing/selftests TARGETS="mm" --keep-going \
> EXTRA_CFLAGS=-DDEBUG=1
Would it make sense to instead have a "-D" (if still unused) runtime
options to print this data? Dead code that's not compiled is a bit
unfortunate as it can easily bit-rot.
This patch essentially does two things
1) Add the option to run all tests/benchmarks with the PRCTL instead of
MADVISE
2) Add some functional KSM tests for the new PRCTL (fork, enabling
works, disabling works).
The latter should rather go into ksm_functional_tests().
[...]
>
> -static int check_ksm_unmerge(int mapping, int prot, int timeout, size_t page_size)
> +/* Verify that prctl ksm flag is inherited. */
> +static int check_ksm_fork(void)
> +{
> + int rc = KSFT_FAIL;
> + pid_t child_pid;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl");
> + return KSFT_FAIL;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (!is_on)
> + exit(KSFT_FAIL);
> +
> + exit(KSFT_PASS);
> + }
> +
> + if (child_pid < 0)
> + goto out;
> +
> + if (waitpid(child_pid, &rc, 0) < 0)
> + rc = KSFT_FAIL;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl");
> + rc = KSFT_FAIL;
> + }
> +
> +out:
> + if (rc == KSFT_PASS)
> + printf("OK\n");
> + else
> + printf("Not OK\n");
> +
> + return rc;
> +}
> +
> +static int check_ksm_get_merge_type(void)
> +{
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (is_on && is_off) {
> + printf("OK\n");
> + return KSFT_PASS;
> + }
> +
> + printf("Not OK\n");
> + return KSFT_FAIL;
> +}
Yes, these two are better located in ksm_functional_tests() to just run
them both automatically when the test is executed.
--
Thanks,
David / dhildenb
Hi Shuah and kselftest team,
There are a couple of resctrl selftest patches that are ready for inclusion. They have been percolating on the list for a while without expecting more feedback. All have "Reviewed-by" tags from at least one reviewer. Could you please consider including them into the kselftest repo? There is one minor merge conflict between two of the series for which the snippet below shows resolution.
[PATCH v8 0/6] Some improvements of resctrl selftest
https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
[PATCH v2 0/9] selftests/resctrl: Fixes to error handling logic and cleanups
https://lore.kernel.org/lkml/20230215130605.31583-1-ilpo.jarvinen@linux.int…
[PATCH] selftests/resctrl: Use correct exit code when tests fail
https://lore.kernel.org/lkml/20230309145757.2280518-1-peternewman@google.co…
The snippet below shows resolution of the merge conflict between the
first and second series:
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 040ca1f9c173..775f9e542ff6 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
Thank you very much.
Reinette
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (1):
selftests: mptcp: userspace pm: uniform verify events
Paolo Abeni (3):
mptcp: use mptcp_schedule_work instead of open-coding it
mptcp: stricter state check in mptcp_worker
mptcp: fix NULL pointer dereference on fastopen early fallback
net/mptcp/fastopen.c | 11 +++++++++--
net/mptcp/options.c | 5 ++---
net/mptcp/protocol.c | 2 +-
net/mptcp/subflow.c | 18 ++++++------------
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 ++
5 files changed, 20 insertions(+), 18 deletions(-)
---
base-commit: a4506722dc39ca840593f14e3faa4c9ba9408211
change-id: 20230411-upstream-net-20230411-mptcp-fixes-db47f50c2688
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. ioctl IOMMU_DEVICE_GET_HW_INFO is extended to report
the supported hwpt types bitmap to user. Selftest is added as well to cover
the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
base-commit: 3dfe670c94c7fc4af42e5c08cdd8a110b594e18e
[1] https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add new iommu op to create domains owned by userspace
iommu: Add nested domain support
Nicolin Chen (5):
iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (5):
iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
iommufd: Pass parent hwpt and user_data to
iommufd_hw_pagetable_alloc()
iommufd: IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd/device: Report supported hwpt_types
drivers/iommu/iommufd/device.c | 9 +-
drivers/iommu/iommufd/hw_pagetable.c | 242 +++++++++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 16 +-
drivers/iommu/iommufd/iommufd_test.h | 30 +++
drivers/iommu/iommufd/main.c | 7 +-
drivers/iommu/iommufd/selftest.c | 104 +++++++-
include/linux/iommu.h | 11 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd.c | 126 ++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++
10 files changed, 654 insertions(+), 27 deletions(-)
--
2.34.1
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v3:
- also provide and use fflush/fclose.
- reject fileno(NULL).
- provide compatability with buffered streams from glibc.
- Link to v2: https://lore.kernel.org/r/20230328-nolibc-printf-test-v2-0-f72bdf210190@wei…
Changes in v2:
- Include <sys/mman.h> for tests.
- Implement FILE* in terms of integer pointers.
- Provide fdopen() and fileno().
- Link to v1: https://lore.kernel.org/lkml/20230328-nolibc-printf-test-v1-0-d7290ec893dd@…
---
Thomas Weißschuh (4):
tools/nolibc: add libc-test binary
tools/nolibc: add wrapper for memfd_create
tools/nolibc: implement fd-based FILE streams
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 95 ++++++++++++++++++++--------
tools/include/nolibc/sys.h | 23 +++++++
tools/testing/selftests/nolibc/.gitignore | 1 +
tools/testing/selftests/nolibc/Makefile | 5 ++
tools/testing/selftests/nolibc/nolibc-test.c | 86 +++++++++++++++++++++++++
5 files changed, 183 insertions(+), 27 deletions(-)
---
base-commit: a63baab5f60110f3631c98b55d59066f1c68c4f7
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before one_page is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
Here is a series with some fixes and cleanups to resctrl selftests and
rewrite of CAT test into something that really tests CAT working or not
condition.
I know that this series will conflict with some of patches from
Shaopeng Tan that so far have not made it into the kselftest tree. Due
to CAT test rewrite done in this series, some of those patches would no
longer be relevant anyway but some of them are still very valid (I've
not tried to reinvent the fixes in Shaopeng's series in this series).
Ilpo Järvinen (22):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Check also too low values for CBM bits
selftests/resctrl: Make span unsigned long everywhere
selftests/resctrl: Express span in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Split run_fill_buf() to alloc, work, and dealloc
helpers
selftests/resctrl: Remove start_buf local variable from buffer alloc
func
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Add flush_buffer() to fill_buf
selftests/resctrl: Remove test type checks from cat_val()
selftests/resctrl: Refactor get_cbm_mask()
selftests/resctrl: Create cache_alloc_size() helper
selftests/resctrl: Replace count_bits with count_consecutive_bits()
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Move CAT/CMT test global vars to func they are used
selftests/resctrl: Read in less obvious order to defeat prefetch
optimizations
selftests/resctrl: Split measure_cache_vals() function
selftests/resctrl: Split show_cache_info() to test specific and
generic parts
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 154 ++++++------
tools/testing/selftests/resctrl/cat_test.c | 221 +++++++++---------
tools/testing/selftests/resctrl/cmt_test.c | 60 +++--
tools/testing/selftests/resctrl/fill_buf.c | 107 +++++----
tools/testing/selftests/resctrl/mba_test.c | 8 +-
tools/testing/selftests/resctrl/mbm_test.c | 16 +-
tools/testing/selftests/resctrl/resctrl.h | 28 ++-
.../testing/selftests/resctrl/resctrl_tests.c | 34 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 4 +-
tools/testing/selftests/resctrl/resctrlfs.c | 160 ++++++++++---
11 files changed, 447 insertions(+), 347 deletions(-)
--
2.30.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they almost all
depend on each other.
Changes from v2 (https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org):
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
Stephen Boyd (11):
of: Add KUnit test to confirm DTB is loaded
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 22 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../bindings/test/test,clk-fixed-rate.yaml | 35 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 140 ++++++
drivers/base/test/platform_kunit.c | 215 ++++++++
drivers/clk/.kunitconfig | 3 +
drivers/clk/Kconfig | 7 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 374 ++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 224 +++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 459 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 5 +
drivers/of/Kconfig | 19 +
drivers/of/Makefile | 4 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 125 +++++
drivers/of/of_test.c | 34 ++
drivers/of/overlay_test.c | 110 +++++
include/kunit/clk.h | 28 ++
include/kunit/of.h | 94 ++++
include/kunit/platform_device.h | 15 +
31 files changed, 2109 insertions(+), 2 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,clk-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/.kunitconfig
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/of_test.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
prerequisite-patch-id: 33517b96dd0768ab9c265f5721629786354ee320
prerequisite-patch-id: 909221815eeca0a2b0cdd385c76f57e185fb9e33
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
---
v3:
- Integrate selftest into test_progs (Alexei)
v2:
- Fixes for checkpatch.pl
- Fixes for kernel test robot
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 28 ++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 119 ++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 ++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++
10 files changed, 432 insertions(+), 19 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
[1] https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
David Hildenbrand (6):
selftests/mm: reuse read_pmd_pagesize() in COW selftest
selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs
without write permissions
sparc/mm: don't unconditionally set HW writable bit when setting PTE
dirty on 64bit
mm/migrate: revert "mm/migrate: fix wrongly apply write bit after
mkdirty on sparc64"
mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit
when thp splits on pmd""
mm/huge_memory: conditionally call maybe_mkwrite() and drop
pte_wrprotect() in __split_huge_pmd_locked()
arch/sparc/include/asm/pgtable_64.h | 116 +++---
mm/huge_memory.c | 16 +-
mm/migrate.c | 2 -
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/cow.c | 33 +-
tools/testing/selftests/mm/khugepaged.c | 4 +
tools/testing/selftests/mm/mkdirty.c | 379 ++++++++++++++++++
tools/testing/selftests/mm/soft-dirty.c | 3 +
.../selftests/mm/split_huge_page_test.c | 4 +
tools/testing/selftests/mm/vm_util.c | 4 +-
10 files changed, 468 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/mm/mkdirty.c
--
2.39.2
This patch series introduces a new "isolcpus" partition type to the
existing list of {member, root, isolated} types. The primary reason
of adding this new "isolcpus" partition is to facilitate the
distribution of isolated CPUs down the cgroup v2 hierarchy.
The other non-member partition types have the limitation that their
parents have to be valid partitions too. It will be hard to create a
partition a few layers down the hierarchy.
It is relatively rare to have applications that require creation of
a separate scheduling domain (root). However, it is more common to
have applications that require the use of isolated CPUs (isolated),
e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options
to get that statically. Of course, the "isolated" partition is another
way to achieve that dynamically.
Modern container orchestration tools like Kubernetes use the cgroup
hierarchy to manage different containers. If a container needs to use
isolated CPUs, it is hard to get those with existing set of cpuset
partition types. With this patch series, a new "isolcpus" partition
can be created to hold a set of isolated CPUs that can be pull into
other "isolated" partitions.
The "isolcpus" partition is special that there can have at most one
instance of this in a system. It serves as a pool for isolated CPUs
and cannot hold tasks or sub-cpusets underneath it. It is also not
cpu-exclusive so that the isolated CPUs can be distributed down the
sibling hierarchies, though those isolated CPUs will not be useable
until the partition type becomes "isolated".
Once isolated CPUs are needed in a cgroup, the administrator can write
a list of isolated CPUs into its "cpuset.cpus" and change its partition
type to "isolated" to pull in those isolated CPUs from the "isolcpus"
partition and use them in that cgroup. That will make the distribution
of isolated CPUs to cgroups that need them much easier.
In the future, we may be able to extend this special "isolcpus" partition
type to support other isolation attributes like those that can be
specified with the "isolcpus" boot command line and related options.
Waiman Long (5):
cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE
handling
cgroup/cpuset: Add a new "isolcpus" paritition root state
cgroup/cpuset: Make isolated partition pull CPUs from isolcpus
partition
cgroup/cpuset: Documentation update for the new "isolcpus" partition
cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition
Documentation/admin-guide/cgroup-v2.rst | 89 ++-
kernel/cgroup/cpuset.c | 548 +++++++++++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++----
3 files changed, 789 insertions(+), 224 deletions(-)
--
2.31.1
Hello,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on Linux v6.2-rc7.
Difference from v7:
[patch 4]
- Fix commitlog.
[patch 5]
- Fix commitlog.
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
On Tue, Apr 11, 2023 at 08:16:46PM -0700, Stefan Roesch wrote:
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> +
> + if (!err)
> + ksm_add_vmas(me->mm);
in the last version of this patch, you reported the error. Now you
swallow the error. I have no idea which is correct, but you've
changed the behaviour without explaining it, so I assume it's wrong.
> + } else {
> + clear_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + }
> + mmap_write_unlock(me->mm);
> + break;
> + case PR_GET_MEMORY_MERGE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + break;
Why do we need a GET? Just for symmetry, or is there an actual need for
it?
This patch updates the cgroup-v2.rst file to include information about
the new "isolcpus" partition type.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 89 +++++++++++++++++++------
1 file changed, 70 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index f67c0829350b..352a02849fa7 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2225,7 +2225,8 @@ Cpuset Interface Files
========== =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
========== =====================================
The root cgroup is always a partition root and its state
@@ -2237,24 +2238,41 @@ Cpuset Interface Files
its descendants except those that are separate partition roots
themselves and their descendants.
+ When set to "isolcpus", the CPUs in that partition root will
+ be in an isolated state without any load balancing from the
+ scheduler. This partition root is special as there can be at
+ most one instance of it in a system and no task or child cpuset
+ is allowed in this cgroup. It acts as a pool of isolated CPUs to
+ be pulled into other "isolated" partitions. The "cpuset.cpus"
+ of an "isolcpus" partition root contains the list of isolated
+ CPUs it holds, where "cpuset.cpus.effective" contains the list
+ of freely available isolated CPUs that are ready to be pull
+ into other "isolated" partition.
+
When set to "isolated", the CPUs in that partition root will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
- individual CPUs for optimal performance.
-
- The value shown in "cpuset.cpus.effective" of a partition root
- is the CPUs that the partition root can dedicate to a potential
- new child partition root. The new child subtracts available
- CPUs from its parent "cpuset.cpus.effective".
-
- A partition root ("root" or "isolated") can be in one of the
- two possible states - valid or invalid. An invalid partition
- root is in a degraded state where some state information may
- be retained, but behaves more like a "member".
-
- All possible state transitions among "member", "root" and
- "isolated" are allowed.
+ individual CPUs for optimal performance. The isolated CPUs can
+ come from either the parent partition root or from an "isolcpus"
+ partition if the parent cannot satisfy its request.
+
+ The value shown in "cpuset.cpus.effective" of a partition root is
+ the CPUs that the partition root can dedicate to a potential new
+ child partition root. The new child partition subtracts available
+ CPUs from its parent "cpuset.cpus.effective". An exception is
+ an "isolated" partition that pulls its isolated CPUs from the
+ "isolcpus" partition root that is not its direct parent.
+
+ A partition root can be in one of the two possible states -
+ valid or invalid. An invalid partition root is in a degraded
+ state where some state information may be retained, but behaves
+ more like a "member".
+
+ All possible state transitions among "member", "root", "isolcpus"
+ and "isolated" are allowed. However, the partition root may
+ not be valid if the corresponding prerequisite conditions are
+ not met.
On read, the "cpuset.cpus.partition" file can show the following
values.
@@ -2262,16 +2280,18 @@ Cpuset Interface Files
============================= =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
"root invalid (<reason>)" Invalid partition root
+ "isolcpus invalid (<reason>)" Invalid isolcpus partition root
"isolated invalid (<reason>)" Invalid isolated partition root
============================= =====================================
In the case of an invalid partition root, a descriptive string on
- why the partition is invalid is included within parentheses.
+ why the partition is invalid may be included within parentheses.
- For a partition root to become valid, the following conditions
- must be met.
+ For a "root" partition root to become valid, the following
+ conditions must be met.
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
are not shared by any of its siblings (exclusivity rule).
@@ -2281,6 +2301,37 @@ Cpuset Interface Files
4) The "cpuset.cpus.effective" cannot be empty unless there is
no task associated with this partition.
+ A valid "isolcpus" partition root requires the following
+ conditions.
+
+ 1) The parent cgroup is a valid partition root.
+ 2) The "cpuset.cpus" must be a subset of parent's "cpuset.cpus"
+ including an empty cpu list.
+ 3) There can be no more than one valid "isolcpus" partition.
+ 4) No task or child cpuset is allowed.
+
+ Note that an "isolcpus" partition is not exclusive and its
+ isolated CPUs can be distributed down sibling cgroups even
+ though they may not appear in their "cpuset.cpus.effective".
+
+ A valid "isolated" partition root can pull isolated CPUs from
+ either its parent partition or from the "isolcpus" partition.
+ It also requires the following conditions to be met.
+
+ 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
+ are not shared by any of its siblings (exclusivity rule).
+ 2) The "cpuset.cpus" is not empty and must be a subset of
+ parent's "cpuset.cpus".
+ 3) The "cpuset.cpus.effective" cannot be empty unless there is
+ no task associated with this partition.
+
+ If pulling isolated CPUS from "isolcpus" partition,
+ the "cpuset.cpus" must also be a subset of "isolcpus"
+ partition's "cpuset.cpus" and all the requested CPUs must
+ be available for pulling, i.e. in "isolcpus" partition's
+ "cpuset.cpus.effective". In this case, its hierarchical parent
+ does not need to be a valid partition root.
+
External events like hotplug or changes to "cpuset.cpus" can
cause a valid partition root to become invalid and vice versa.
Note that a task cannot be moved to a cgroup with empty
--
2.31.1
With the addition of a new "isolcpus" partition in a previous patch,
this patch adds the capability for a privileged user to pull isolated
CPUs from the "isolcpus" partition to an "isolated" partition if its
parent cannot satisfy its request directly.
The following conditions must be true for the pulling of isolated CPUs
from "isolcpus" partition to be successful.
(1) The value of "cpuset.cpus" must still be a subset of its parent's
"cpuset.cpus" to ensure proper inheritance even though these CPUs
cannot be used until the cpuset becomes an "isolated" partition.
(2) All the CPUs in "cpuset.cpus" are freely available in the
"isolcpus" partition, i.e. in its "cpuset.cpus.effective" and not
yet claimed by other isolated partitions.
With this change, the CPUs in an "isolated" partition can either
come from the "isolcpus" partition or from its direct parent, but not
both. Now the parent of an isolated partition does not need to be a
partition root anymore.
Because of the cpu exclusive nature of an "isolated" partition, these
isolated CPUs cannot be distributed to other siblings of that isolated
partition.
Changes to "cpuset.cpus" of such an isolated partition is allowed as
long as all the newly requested CPUs can be granted from the "isolcpus"
partition. Otherwise, the partition will become invalid.
This makes the management and distribution of isolated CPUs to those
applications that require them much easier.
An "isolated" partition that pulls CPUs from the special "isolcpus"
partition can now have 2 parents - the "isolcpus" partition where
it gets its isolated CPUs and its hierarchical parent where it gets
all the other resources. However, such an "isolated" partition cannot
have subpartitions as all the CPUs from "isolcpus" must be in the same
isolated state.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 282 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 264 insertions(+), 18 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 444eae3a9a6b..a5bbd43ed46e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -101,6 +101,7 @@ enum prs_errcode {
PERR_ISOLCPUS,
PERR_ISOLTASK,
PERR_ISOLCHILD,
+ PERR_ISOPARENT,
};
static const char * const perr_strings[] = {
@@ -114,6 +115,7 @@ static const char * const perr_strings[] = {
[PERR_ISOLCPUS] = "An isolcpus partition is already present",
[PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
[PERR_ISOLCHILD] = "Isolcpus partition can't have children",
+ [PERR_ISOPARENT] = "Isolated/isolcpus parent can't have subpartition",
};
struct cpuset {
@@ -1333,6 +1335,195 @@ static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
rebuild_sched_domains_locked();
}
+/*
+ * isolcpus_pull - Enable or disable pulling of isolated cpus from isolcpus
+ * @cs: the cpuset to update
+ * @cmd: the command code (only partcmd_enable or partcmd_disable)
+ * Return: 1 if successful, 0 if error
+ *
+ * Note that pulling isolated cpus from isolcpus or cpus from parent does
+ * not require rebuilding sched domains. So we can change the flags directly.
+ */
+static int isolcpus_pull(struct cpuset *cs, enum subparts_cmd cmd)
+{
+ struct cpuset *parent = parent_cs(cs);
+
+ if (!isolcpus_cs)
+ return 0;
+
+ /*
+ * To enable pulling of isolated CPUs from isolcpus, cpus_allowed
+ * must be a subset of both its parent's cpus_allowed and isolcpus_cs's
+ * effective_cpus and the user has sysadmin privilege.
+ */
+ if ((cmd == partcmd_enable) && capable(CAP_SYS_ADMIN) &&
+ cpumask_subset(cs->cpus_allowed, isolcpus_cs->effective_cpus) &&
+ cpumask_subset(cs->cpus_allowed, parent->cpus_allowed)) {
+ /*
+ * Move cpus from effective_cpus to subparts_cpus & make
+ * cs a child of isolcpus partition.
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_copy(cs->effective_cpus, cs->cpus_allowed);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (cs->use_parent_ecpus) {
+ cs->use_parent_ecpus = false;
+ parent->child_ecpus_count--;
+ }
+ list_add(&cs->isol_sibling, &isol_children);
+ clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+
+ if ((cmd == partcmd_disable) && !list_empty(&cs->isol_sibling)) {
+ /*
+ * This can be called after isolcpus shrinks its cpu list.
+ * So not all the cpus should be returned back to isolcpus.
+ */
+ WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED);
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->effective_cpus);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus,
+ isolcpus_cs->cpus_allowed);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (!cpumask_and(cs->effective_cpus, parent->effective_cpus,
+ cs->cpus_allowed)) {
+ cs->use_parent_ecpus = true;
+ parent->child_ecpus_count++;
+ cpumask_copy(cs->effective_cpus,
+ parent->effective_cpus);
+ }
+ list_del_init(&cs->isol_sibling);
+ cs->partition_root_state = PRS_INVALID_ISOLATED;
+ cs->prs_err = PERR_INVCPUS;
+
+ set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ clear_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+ return 0;
+}
+
+static void isolcpus_disable(void)
+{
+ struct cpuset *child, *next;
+
+ list_for_each_entry_safe(child, next, &isol_children, isol_sibling)
+ WARN_ON_ONCE(isolcpus_pull(child, partcmd_disable));
+
+ isolcpus_cs = NULL;
+}
+
+/*
+ * isolcpus_cpus_update - cpuset.cpus change in isolcpus partition
+ */
+static void isolcpus_cpus_update(struct cpuset *cs)
+{
+ struct cpuset *child, *next;
+
+ if (WARN_ON_ONCE(isolcpus_cs != cs))
+ return;
+
+ if (list_empty(&isol_children))
+ return;
+
+ /*
+ * Remove child isolated partitions that are not fully covered by
+ * subparts_cpus.
+ */
+ list_for_each_entry_safe(child, next, &isol_children,
+ isol_sibling) {
+ if (cpumask_subset(child->cpus_allowed,
+ cs->subparts_cpus))
+ continue;
+
+ isolcpus_pull(child, partcmd_disable);
+ }
+}
+
+/*
+ * isolated_cpus_update - cpuset.cpus change in isolated partition
+ *
+ * Return: 1 if no further action needs, 0 otherwise
+ */
+static int isolated_cpus_update(struct cpuset *cs, struct cpumask *newmask,
+ struct tmpmasks *tmp)
+{
+ struct cpumask *addmask = tmp->addmask;
+ struct cpumask *delmask = tmp->delmask;
+
+ if (WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED) ||
+ list_empty(&cs->isol_sibling))
+ return 0;
+
+ if (WARN_ON_ONCE(!isolcpus_cs) || cpumask_empty(newmask)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ if (cpumask_andnot(addmask, newmask, cs->cpus_allowed)) {
+ /*
+ * Check if isolcpus partition can provide the new CPUs
+ */
+ if (!cpumask_subset(addmask, isolcpus_cs->cpus_allowed) ||
+ cpumask_intersects(addmask, isolcpus_cs->subparts_cpus)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ /*
+ * Pull addmask isolated CPUs from isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, addmask);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, addmask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ if (cpumask_andnot(tmp->delmask, cs->cpus_allowed, newmask)) {
+ /*
+ * Return isolated CPUs back to isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, delmask);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, delmask);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ spin_lock_irq(&callback_lock);
+ cpumask_copy(cs->cpus_allowed, newmask);
+ cpumask_andnot(cs->effective_cpus, newmask, cs->subparts_cpus);
+ cpumask_and(cs->effective_cpus, cs->effective_cpus, cpu_active_mask);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+}
+
/**
* update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset
* @cs: The cpuset that requests change in partition root state
@@ -1579,7 +1770,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
- isolcpus_cs = NULL;
+ isolcpus_disable();
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1625,6 +1816,12 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
struct cpuset *parent = parent_cs(cp);
bool update_parent = false;
+ /*
+ * Skip isolated cpuset that pull isolated CPUs from isolcpus
+ */
+ if (!list_empty(&cp->isol_sibling))
+ continue;
+
compute_effective_cpumask(tmp->new_cpus, cp, parent);
/*
@@ -1742,7 +1939,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
- update_tasks_cpumask(cp, tmp->new_cpus);
+ update_tasks_cpumask(cp, cp->effective_cpus);
/*
* On legacy hierarchy, if the effective cpumask of any non-
@@ -1888,6 +2085,10 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
return retval;
if (cs->partition_root_state) {
+ if (!list_empty(&cs->isol_sibling) &&
+ isolated_cpus_update(cs, trialcs->cpus_allowed, &tmp))
+ goto update_hier; /* CPUs update done */
+
if (invalidate)
update_parent_subparts_cpumask(cs, partcmd_invalidate,
NULL, &tmp);
@@ -1920,6 +2121,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
}
spin_unlock_irq(&callback_lock);
+update_hier:
#ifdef CONFIG_CPUMASK_OFFSTACK
/* Now trialcs->cpus_allowed is available */
tmp.new_cpus = trialcs->cpus_allowed;
@@ -1928,8 +2130,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
/* effective_cpus will be updated here */
update_cpumasks_hier(cs, &tmp, false);
- if (cs->partition_root_state) {
- bool force = (cs->partition_root_state == PRS_ISOLCPUS);
+ if (cs->partition_root_state && list_empty(&cs->isol_sibling)) {
struct cpuset *parent = parent_cs(cs);
/*
@@ -1937,8 +2138,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
* cpusets if they use parent's effective_cpus or when
* the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count || force)
- update_sibling_cpumasks(parent, cs, &tmp, force);
+ if (cs->partition_root_state == PRS_ISOLCPUS) {
+ update_sibling_cpumasks(parent, cs, &tmp, true);
+ isolcpus_cpus_update(cs);
+ } else if (parent->child_ecpus_count) {
+ update_sibling_cpumasks(parent, cs, &tmp, false);
+ }
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2307,7 +2512,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
return err;
}
-/**
+/*
* update_prstate - update partition_root_state
* @cs: the cpuset to update
* @new_prs: new partition root state
@@ -2325,13 +2530,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
return 0;
/*
- * For a previously invalid partition root, leave it at being
- * invalid if new_prs is not "member".
+ * For a previously invalid partition root, treat it like a "member".
*/
- if (new_prs && is_prs_invalid(old_prs)) {
- cs->partition_root_state = -new_prs;
- return 0;
- }
+ if (new_prs && is_prs_invalid(old_prs))
+ old_prs = PRS_MEMBER;
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
@@ -2371,6 +2573,21 @@ static int update_prstate(struct cpuset *cs, int new_prs)
}
}
+ /*
+ * A parent isolated partition that gets its isolated CPUs from
+ * isolcpus cannot have subpartition.
+ */
+ if (new_prs && !list_empty(&parent->isol_sibling)) {
+ err = PERR_ISOPARENT;
+ goto out;
+ }
+
+ if ((old_prs == PRS_ISOLATED) && !list_empty(&cs->isol_sibling)) {
+ isolcpus_pull(cs, partcmd_disable);
+ old_prs = 0;
+ }
+ WARN_ON_ONCE(!list_empty(&cs->isol_sibling));
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2386,6 +2603,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
err = update_parent_subparts_cpumask(cs, partcmd_enable,
NULL, &tmpmask);
+ if (err && (new_prs == PRS_ISOLATED) &&
+ isolcpus_pull(cs, partcmd_enable))
+ err = 0; /* Successful isolcpus pull */
+
if (err)
goto out;
} else if (old_prs && new_prs) {
@@ -2445,7 +2666,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (new_prs == PRS_ISOLCPUS)
isolcpus_cs = cs;
else if (cs == isolcpus_cs)
- isolcpus_cs = NULL;
+ isolcpus_disable();
/*
* Update child cpusets, if present.
@@ -3674,8 +3895,31 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
}
parent = parent_cs(cs);
- compute_effective_cpumask(&new_cpus, cs, parent);
nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
+ /*
+ * In the special case of a valid isolated cpuset pulling isolated
+ * cpus from isolcpus. We just need to mask offline cpus from
+ * cpus_allowed unless all the isolated cpus are gone.
+ */
+ if (!list_empty(&cs->isol_sibling)) {
+ if (!cpumask_and(&new_cpus, cs->cpus_allowed, cpu_active_mask))
+ isolcpus_pull(cs, partcmd_disable);
+ } else if ((cs->partition_root_state == PRS_ISOLCPUS) &&
+ cpumask_empty(cs->cpus_allowed)) {
+ /*
+ * For isolcpus with empty cpus_allowed, just update
+ * effective_mems and be done with it.
+ */
+ spin_lock_irq(&callback_lock);
+ if (nodes_empty(new_mems))
+ cs->effective_mems = parent->effective_mems;
+ else
+ cs->effective_mems = new_mems;
+ spin_unlock_irq(&callback_lock);
+ goto unlock;
+ } else {
+ compute_effective_cpumask(&new_cpus, cs, parent);
+ }
if (cs->nr_subparts_cpus)
/*
@@ -3707,10 +3951,12 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
* the following conditions hold:
* 1) empty effective cpus but not valid empty partition.
* 2) parent is invalid or doesn't grant any cpus to child
- * partitions.
+ * partitions and not an isolated cpuset pulling cpus from
+ * isolcpus.
*/
- if (is_partition_valid(cs) && (!parent->nr_subparts_cpus ||
- (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
+ if (is_partition_valid(cs) &&
+ ((!parent->nr_subparts_cpus && list_empty(&cs->isol_sibling)) ||
+ (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
int old_prs, parent_prs;
update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp);
--
2.31.1
One can use "cpuset.cpus.partition" to create multiple scheduling domains
or to produce a set of isolated CPUs where load balancing is disabled.
The former use case is less common but the latter one can be frequently
used especially for the Telco use cases like DPDK.
The existing "isolated" partition can be used to produce isolated
CPUs if the applications have full control of a system. However, in a
containerized environment where all the apps are run in a container,
it is hard to distribute out isolated CPUs from the root down given
the unified hierarchy nature of cgroup v2.
The container running on isolated CPUs can be several layers down from
the root. The current partition feature requires that all the ancestors
of a leaf partition root must be parititon roots themselves. This can
be hard to manage.
This patch introduces a new special partition root state called
"isolcpus" that serves as a pool of isolated CPUs to be pulled into other
"isolated" partitions. At most one instance of the "isolcpus" partition
is allowed in a system preferrably as a child of the top cpuset.
In a valid "isolcpus" partition, "cpuset.cpus" contains the set of
isolated CPUs and "cpuset.cpus.effective" contains the set of freely
available isolated CPUs that have not yet been pulled into other
"isolated" cpusets.
The special "isolcpus" partition cannot have normal cpuset children. So
we are not allowed to enable child cpuset in its "cgroup.subtree_control"
file if it has children. Tasks are also not allowed in the "cgroup.procs"
of the "isolcpus" partition. Unlike other partition roots, empty
"cpuset.cpus" is allowed in the "isolcpus" partition as this special
cpuset is not designed to hold tasks.
The CPUs in the "isolcpus" partition are not exclusive so that those
isolated CPUs can be distributed down sibling hierarchies as usual even
though they will not show up in their "cpuset.cpus.effective".
Right now, an "isolcpus" partition only disable load balancing of
the isolated CPUs. In the near future, it may be extended to support
additional isolation attributes like those currently supported by the
"isolcpus" or related kernel boot command line options.
In a subsequent patch, a privileged user can change a "member" cpuset
to an "isolated" partition root by pulling isolated CPUs from the
"isolcpus" partition if its parent is not a partition root that can
directly satisfy the request.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 158 ++++++++++++++++++++++++++++++++++-------
1 file changed, 133 insertions(+), 25 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 83a7193e0f2c..444eae3a9a6b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -98,6 +98,9 @@ enum prs_errcode {
PERR_NOCPUS,
PERR_HOTPLUG,
PERR_CPUSEMPTY,
+ PERR_ISOLCPUS,
+ PERR_ISOLTASK,
+ PERR_ISOLCHILD,
};
static const char * const perr_strings[] = {
@@ -108,6 +111,9 @@ static const char * const perr_strings[] = {
[PERR_NOCPUS] = "Parent unable to distribute cpu downstream",
[PERR_HOTPLUG] = "No cpu available due to hotplug",
[PERR_CPUSEMPTY] = "cpuset.cpus is empty",
+ [PERR_ISOLCPUS] = "An isolcpus partition is already present",
+ [PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
+ [PERR_ISOLCHILD] = "Isolcpus partition can't have children",
};
struct cpuset {
@@ -198,6 +204,9 @@ struct cpuset {
/* Handle for cpuset.cpus.partition */
struct cgroup_file partition_file;
+
+ /* siblings list anchored at isol_children */
+ struct list_head isol_sibling;
};
/*
@@ -206,14 +215,26 @@ struct cpuset {
* 0 - member (not a partition root)
* 1 - partition root
* 2 - partition root without load balancing (isolated)
+ * 3 - isolated cpu pool (isolcpus)
* -1 - invalid partition root
* -2 - invalid isolated partition root
+ * -3 - invalid isolated cpu pool
+ *
+ * An isolated cpu pool is a special isolated partition root. At most one
+ * instance of it is allowed in a system. It provides a pool of isolated
+ * cpus that a normal isolated partition root can pull from, if privileged,
+ * in case its parent cannot fulfill its request.
*/
#define PRS_MEMBER 0
#define PRS_ROOT 1
#define PRS_ISOLATED 2
+#define PRS_ISOLCPUS 3
#define PRS_INVALID_ROOT -1
#define PRS_INVALID_ISOLATED -2
+#define PRS_INVALID_ISOLCPUS -3
+
+static struct cpuset *isolcpus_cs; /* System isolcpus partition root */
+static struct list_head isol_children; /* Children that pull isolated cpus */
static inline bool is_prs_invalid(int prs_state)
{
@@ -335,6 +356,7 @@ static struct cpuset top_cpuset = {
.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
(1 << CS_MEM_EXCLUSIVE)),
.partition_root_state = PRS_ROOT,
+ .isol_sibling = LIST_HEAD_INIT(top_cpuset.isol_sibling),
};
/**
@@ -1282,7 +1304,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
*/
static int update_partition_exclusive(struct cpuset *cs, int new_prs)
{
- bool exclusive = (new_prs > 0);
+ bool exclusive = (new_prs == PRS_ROOT) || (new_prs == PRS_ISOLATED);
if (exclusive && !is_cpu_exclusive(cs)) {
if (update_flag(CS_CPU_EXCLUSIVE, cs, 1))
@@ -1303,7 +1325,7 @@ static int update_partition_exclusive(struct cpuset *cs, int new_prs)
static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
{
int new_prs = cs->partition_root_state;
- bool new_lb = (new_prs != PRS_ISOLATED);
+ bool new_lb = (new_prs != PRS_ISOLATED) && (new_prs != PRS_ISOLCPUS);
if (new_lb != !!is_sched_load_balance(cs))
update_flag(CS_SCHED_LOAD_BALANCE, cs, new_lb);
@@ -1360,18 +1382,20 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
int part_error = PERR_NONE; /* Partition error? */
percpu_rwsem_assert_held(&cpuset_rwsem);
+ old_prs = new_prs = cs->partition_root_state;
/*
* The parent must be a partition root.
* The new cpumask, if present, or the current cpus_allowed must
- * not be empty.
+ * not be empty except for isolcpus partition.
*/
if (!is_partition_valid(parent)) {
return is_partition_invalid(parent)
? PERR_INVPARENT : PERR_NOTPART;
}
- if ((newmask && cpumask_empty(newmask)) ||
- (!newmask && cpumask_empty(cs->cpus_allowed)))
+ if ((new_prs != PRS_ISOLCPUS) &&
+ ((newmask && cpumask_empty(newmask)) ||
+ (!newmask && cpumask_empty(cs->cpus_allowed))))
return PERR_CPUSEMPTY;
/*
@@ -1379,7 +1403,6 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
* partcmd_invalidate commands.
*/
adding = deleting = false;
- old_prs = new_prs = cs->partition_root_state;
if (cmd == partcmd_enable) {
/*
* Enabling partition root is not allowed if cpus_allowed
@@ -1498,11 +1521,13 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
switch (cs->partition_root_state) {
case PRS_ROOT:
case PRS_ISOLATED:
+ case PRS_ISOLCPUS:
if (part_error)
new_prs = -old_prs;
break;
case PRS_INVALID_ROOT:
case PRS_INVALID_ISOLATED:
+ case PRS_INVALID_ISOLCPUS:
if (!part_error)
new_prs = -old_prs;
break;
@@ -1553,6 +1578,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
+ if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
+ isolcpus_cs = NULL;
+
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1640,7 +1668,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
*/
old_prs = new_prs = cp->partition_root_state;
if ((cp != cs) && old_prs) {
- switch (parent->partition_root_state) {
+ int parent_prs = parent->partition_root_state;
+
+ /*
+ * isolcpus partition parent can't have children
+ */
+ WARN_ON_ONCE(parent_prs == PRS_ISOLCPUS);
+
+ switch (parent_prs) {
case PRS_ROOT:
case PRS_ISOLATED:
update_parent = true;
@@ -1735,9 +1770,10 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
* @parent: Parent cpuset
* @cs: Current cpuset
* @tmp: Temp variables
+ * @force: Force update if set
*/
static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
- struct tmpmasks *tmp)
+ struct tmpmasks *tmp, bool force)
{
struct cpuset *sibling;
struct cgroup_subsys_state *pos_css;
@@ -1756,7 +1792,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
cpuset_for_each_child(sibling, pos_css, parent) {
if (sibling == cs)
continue;
- if (!sibling->use_parent_ecpus)
+ if (!sibling->use_parent_ecpus && !force)
continue;
if (!css_tryget_online(&sibling->css))
continue;
@@ -1893,14 +1929,16 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
update_cpumasks_hier(cs, &tmp, false);
if (cs->partition_root_state) {
+ bool force = (cs->partition_root_state == PRS_ISOLCPUS);
struct cpuset *parent = parent_cs(cs);
/*
* For partition root, update the cpumasks of sibling
- * cpusets if they use parent's effective_cpus.
+ * cpusets if they use parent's effective_cpus or when
+ * the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmp);
+ if (parent->child_ecpus_count || force)
+ update_sibling_cpumasks(parent, cs, &tmp, force);
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2298,6 +2336,41 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
+ /*
+ * Only one isolcpus partition is allowed and it can't have children
+ * or tasks in it. The isolcpus partition is also not exclusive so
+ * that the isolated but unused cpus can be distributed down the
+ * hierarchy.
+ */
+ if (new_prs == PRS_ISOLCPUS) {
+ if (isolcpus_cs)
+ err = PERR_ISOLCPUS;
+ else if (!list_empty(&cs->css.children))
+ err = PERR_ISOLCHILD;
+ else if (cs->css.cgroup->nr_populated_csets)
+ err = PERR_ISOLTASK;
+
+ if (err && old_prs) {
+ /*
+ * A previous valid partition root is now invalid
+ */
+ goto disable_partition;
+ } else if (err) {
+ goto out;
+ }
+
+ /*
+ * Unlike other partition types, an isolated cpu pool can
+ * be empty as it is essentially a place holder for isolated
+ * CPUs.
+ */
+ if (!old_prs && cpumask_empty(cs->cpus_allowed)) {
+ /* Force effective_cpus to be empty too */
+ cpumask_clear(cs->effective_cpus);
+ goto out;
+ }
+ }
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2316,11 +2389,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (err)
goto out;
} else if (old_prs && new_prs) {
- /*
- * A change in load balance state only, no change in cpumasks.
- */
- goto out;
+ goto out; /* Skip cpuset and sibling task update */
} else {
+disable_partition:
/*
* Switching back to member is always allowed even if it
* disables child partitions.
@@ -2342,8 +2413,13 @@ static int update_prstate(struct cpuset *cs, int new_prs)
update_tasks_cpumask(parent, tmpmask.new_cpus);
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmpmask);
+ /*
+ * Since isolcpus partition is not exclusive, we have to update
+ * sibling hierarchies as well.
+ */
+ if ((new_prs == PRS_ISOLCPUS) || parent->child_ecpus_count)
+ update_sibling_cpumasks(parent, cs, &tmpmask,
+ new_prs == PRS_ISOLCPUS);
out:
/*
@@ -2363,6 +2439,14 @@ static int update_prstate(struct cpuset *cs, int new_prs)
/* Update sched domains and load balance flag */
update_partition_sd_lb(cs, old_prs);
+ /*
+ * Check isolcpus_cs state
+ */
+ if (new_prs == PRS_ISOLCPUS)
+ isolcpus_cs = cs;
+ else if (cs == isolcpus_cs)
+ isolcpus_cs = NULL;
+
/*
* Update child cpusets, if present.
* Force update if switching back to member.
@@ -2486,7 +2570,12 @@ static struct cpuset *cpuset_attach_old_cs;
*/
static int cpuset_can_attach_check(struct cpuset *cs)
{
+ /*
+ * Task cannot be moved to a cpuset with empty effective cpus or
+ * is an isolcpus partition.
+ */
if (cpumask_empty(cs->effective_cpus) ||
+ (cs->partition_root_state == PRS_ISOLCPUS) ||
(!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
return -ENOSPC;
return 0;
@@ -2902,24 +2991,30 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
static int sched_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
+ int prs = cs->partition_root_state;
const char *err, *type = NULL;
- switch (cs->partition_root_state) {
+ switch (prs) {
case PRS_ROOT:
seq_puts(seq, "root\n");
break;
case PRS_ISOLATED:
seq_puts(seq, "isolated\n");
break;
+ case PRS_ISOLCPUS:
+ seq_puts(seq, "isolcpus\n");
+ break;
case PRS_MEMBER:
seq_puts(seq, "member\n");
break;
- case PRS_INVALID_ROOT:
- type = "root";
- fallthrough;
- case PRS_INVALID_ISOLATED:
- if (!type)
+ default:
+ if (prs == PRS_INVALID_ROOT)
+ type = "root";
+ else if (prs == PRS_INVALID_ISOLATED)
type = "isolated";
+ else
+ type = "isolcpus";
+
err = perr_strings[READ_ONCE(cs->prs_err)];
if (err)
seq_printf(seq, "%s invalid (%s)\n", type, err);
@@ -2948,6 +3043,8 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
val = PRS_MEMBER;
else if (!strcmp(buf, "isolated"))
val = PRS_ISOLATED;
+ else if (!strcmp(buf, "isolcpus"))
+ val = PRS_ISOLCPUS;
else
return -EINVAL;
@@ -3157,6 +3254,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
nodes_clear(cs->effective_mems);
fmeter_init(&cs->fmeter);
cs->relax_domain_level = -1;
+ INIT_LIST_HEAD(&cs->isol_sibling);
/* Set CS_MEMORY_MIGRATE for default hierarchy */
if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
@@ -3171,6 +3269,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
struct cpuset *parent = parent_cs(cs);
struct cpuset *tmp_cs;
struct cgroup_subsys_state *pos_css;
+ int err = 0;
if (!parent)
return 0;
@@ -3178,6 +3277,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpus_read_lock();
percpu_down_write(&cpuset_rwsem);
+ /*
+ * An isolcpus partition cannot have direct children.
+ */
+ if (parent->partition_root_state == PRS_ISOLCPUS) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
set_bit(CS_ONLINE, &cs->flags);
if (is_spread_page(parent))
set_bit(CS_SPREAD_PAGE, &cs->flags);
@@ -3229,7 +3336,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
out_unlock:
percpu_up_write(&cpuset_rwsem);
cpus_read_unlock();
- return 0;
+ return err;
}
/*
@@ -3434,6 +3541,7 @@ int __init cpuset_init(void)
fmeter_init(&top_cpuset.fmeter);
set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
top_cpuset.relax_domain_level = -1;
+ INIT_LIST_HEAD(&isol_children);
BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
--
2.31.1
I have been running hid-tools for a while, but it was in its own
separate repository for multiple reasons. And the past few weeks
I finally managed to make the kernel tests in that repo in a
state where we can merge them in the kernel tree directly:
- the tests run in ~2 to 3 minutes
- the tests are way more reliable than previously
- the tests are mostly self-contained now (to the exception
of the Sony ones)
To be able to run the tests we need to use the latest release
of hid-tools, as this project still keeps the HID parsing logic
and is capable of generating the HID events.
The series also ensures we can run the tests with vmtest.sh,
allowing for a quick development and test in the tree itself.
This should allow us to require tests to be added to a series
when we see fit and keep them alive properly instead of having
to deal with 2 repositories.
In Cc are all of the people who participated in the elaboration
of those tests, so please send back a signed-off-by for each
commit you are part of.
This series applies on top of the for-6.3/hid-bpf branch, which
is the one that added the tools/testing/selftests/hid directory.
Given that this is unlikely this series will make the cut for
6.3, we might just consider this series to be based on top of
the future 6.3-rc1.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
Benjamin Tissoires (11):
selftests: hid: make vmtest rely on make
selftests: hid: import hid-tools hid-core tests
selftests: hid: import hid-tools hid-gamepad tests
selftests: hid: import hid-tools hid-keyboards tests
selftests: hid: import hid-tools hid-mouse tests
selftests: hid: import hid-tools hid-multitouch and hid-tablets tests
selftests: hid: import hid-tools wacom tests
selftests: hid: import hid-tools hid-apple tests
selftests: hid: import hid-tools hid-ite tests
selftests: hid: import hid-tools hid-sony and hid-playstation tests
selftests: hid: import hid-tools usb-crash tests
tools/testing/selftests/hid/Makefile | 12 +
tools/testing/selftests/hid/config | 11 +
tools/testing/selftests/hid/hid-apple.sh | 7 +
tools/testing/selftests/hid/hid-core.sh | 7 +
tools/testing/selftests/hid/hid-gamepad.sh | 7 +
tools/testing/selftests/hid/hid-ite.sh | 7 +
tools/testing/selftests/hid/hid-keyboard.sh | 7 +
tools/testing/selftests/hid/hid-mouse.sh | 7 +
tools/testing/selftests/hid/hid-multitouch.sh | 7 +
tools/testing/selftests/hid/hid-sony.sh | 7 +
tools/testing/selftests/hid/hid-tablet.sh | 7 +
tools/testing/selftests/hid/hid-usb_crash.sh | 7 +
tools/testing/selftests/hid/hid-wacom.sh | 7 +
tools/testing/selftests/hid/run-hid-tools-tests.sh | 28 +
tools/testing/selftests/hid/settings | 3 +
tools/testing/selftests/hid/tests/__init__.py | 2 +
tools/testing/selftests/hid/tests/base.py | 345 ++++
tools/testing/selftests/hid/tests/conftest.py | 81 +
.../selftests/hid/tests/descriptors_wacom.py | 1360 +++++++++++++
.../selftests/hid/tests/test_apple_keyboard.py | 440 +++++
tools/testing/selftests/hid/tests/test_gamepad.py | 209 ++
tools/testing/selftests/hid/tests/test_hid_core.py | 154 ++
.../selftests/hid/tests/test_ite_keyboard.py | 166 ++
tools/testing/selftests/hid/tests/test_keyboard.py | 485 +++++
tools/testing/selftests/hid/tests/test_mouse.py | 977 +++++++++
.../testing/selftests/hid/tests/test_multitouch.py | 2088 ++++++++++++++++++++
tools/testing/selftests/hid/tests/test_sony.py | 282 +++
tools/testing/selftests/hid/tests/test_tablet.py | 872 ++++++++
.../testing/selftests/hid/tests/test_usb_crash.py | 103 +
.../selftests/hid/tests/test_wacom_generic.py | 844 ++++++++
tools/testing/selftests/hid/vmtest.sh | 25 +-
31 files changed, 8554 insertions(+), 10 deletions(-)
---
base-commit: 2f7f4efb9411770b4ad99eb314d6418e980248b4
change-id: 20230217-import-hid-tools-tests-dc0cd4f3c8a8
Best regards,
--
Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *s to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/powerpc/stringloops/strlen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
index 9055ebc484d0..f9c1f9cc2d32 100644
--- a/tools/testing/selftests/powerpc/stringloops/strlen.c
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
@@ -51,10 +50,11 @@ static void bench_test(char *s)
static int testcase(void)
{
char *s;
+ int ret;
unsigned long i;
- s = memalign(128, SIZE);
- if (!s) {
+ ret = posix_memalign((void **)&s, 128, SIZE);
+ if (ret < 0) {
perror("memalign");
exit(1);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai(a)intel.com>
---
tools/testing/selftests/sgx/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 75af864e07b6..50aab6b57da3 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..8f48f07bc821 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)(&one_page), pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..4bb7421141a2 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,8 +80,8 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
+ ret = posix_memalign((void *)(&map), hpage_len, hpage_len);
+ if (ret < 0)
ksft_exit_fail_msg("memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
--
2.27.0
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..4bb7421141a2 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,8 +80,8 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
+ ret = posix_memalign((void *)(&map), hpage_len, hpage_len)
+ if (ret < 0)
ksft_exit_fail_msg("memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
--
2.27.0
On Thu, Apr 06, 2023 at 09:53:37AM -0700, Stefan Roesch wrote:
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> + if (err)
> + return err;
You'll return to userspace with the mutex held, no?