So far KSM can only be enabled by calling madvise for memory regions. What is
required to enable KSM for more workloads is to enable / disable it at the
process / cgroup level.
Use case:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Experiments with using UKSM have shown a capacity increase of around 20%.
1. New options for prctl system command
This patch series adds two new options to the prctl system call. The first
one allows to enable KSM at the process level and the second one to query the
setting.
The setting will be inherited by child processes.
With the above setting, KSM can be enabled for the seed process of a cgroup
and all processes in the cgroup will inherit the setting.
2. Changes to KSM processing
When KSM is enabled at the process level, the KSM code will iterate over all
the VMA's and enable KSM for the eligible VMA's.
When forking a process that has KSM enabled, the setting will be inherited by
the new child process.
In addition when KSM is disabled for a process, KSM will be disabled for the
VMA's where KSM has been enabled.
3. Add general_profit metric
The general_profit metric of KSM is specified in the documentation, but not
calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm.
4. Add more metrics to ksm_stat
This adds the process profit and ksm type metric to /proc/<pid>/ksm_stat.
5. Add more tests to ksm_tests
This adds an option to specify the merge type to the ksm_tests. This allows to
test madvise and prctl KSM. It also adds a new option to query if prctl KSM has
been enabled. It adds a fork test to verify that the KSM process setting is
inherited by client processes.
Changes:
- V2:
- Added use cases to the cover letter
- Removed the tracing patch from the patch series and posted it as an
individual patch
- Refreshed repo
Stefan Roesch (19):
mm: add new flag to enable ksm per process
mm: add flag to __ksm_enter
mm: add flag to __ksm_exit call
mm: invoke madvise for all vmas in scan_get_next_rmap_item
mm: support disabling of ksm for a process
mm: add new prctl option to get and set ksm for a process
mm: split off pages_volatile function
mm: expose general_profit metric
docs: document general_profit sysfs knob
mm: calculate ksm process profit metric
mm: add ksm_merge_type() function
mm: expose ksm process profit metric in ksm_stat
mm: expose ksm merge type in ksm_stat
docs: document new procfs ksm knobs
tools: add new prctl flags to prctl in tools dir
selftests/vm: add KSM prctl merge test
selftests/vm: add KSM get merge type test
selftests/vm: add KSM fork test
selftests/vm: add two functions for debugging merge outcome
Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +
Documentation/admin-guide/mm/ksm.rst | 8 +-
fs/proc/base.c | 5 +
include/linux/ksm.h | 19 +-
include/linux/sched/coredump.h | 1 +
include/uapi/linux/prctl.h | 2 +
kernel/sys.c | 29 ++
mm/ksm.c | 114 +++++++-
tools/include/uapi/linux/prctl.h | 2 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/ksm_tests.c | 254 +++++++++++++++---
11 files changed, 389 insertions(+), 56 deletions(-)
base-commit: 234a68e24b120b98875a8b6e17a9dead277be16a
--
2.30.2
Stack protectors need support from libc.
This support is not provided by nolibc which leads to compiler errors
when stack protectors are enabled by default in a compiler:
CC nolibc-test
/usr/bin/ld: /tmp/ccqbHEPk.o: in function `stat':
nolibc-test.c:(.text+0x1d1): undefined reference to `__stack_chk_fail'
/usr/bin/ld: /tmp/ccqbHEPk.o: in function `poll.constprop.0':
nolibc-test.c:(.text+0x37b): undefined reference to `__stack_chk_fail'
/usr/bin/ld: /tmp/ccqbHEPk.o: in function `vfprintf.constprop.0':
nolibc-test.c:(.text+0x712): undefined reference to `__stack_chk_fail'
/usr/bin/ld: /tmp/ccqbHEPk.o: in function `pad_spc.constprop.0':
nolibc-test.c:(.text+0x80d): undefined reference to `__stack_chk_fail'
/usr/bin/ld: /tmp/ccqbHEPk.o: in function `printf':
nolibc-test.c:(.text+0x8c4): undefined reference to `__stack_chk_fail'
/usr/bin/ld: /tmp/ccqbHEPk.o:nolibc-test.c:(.text+0x12d4): more undefined references to `__stack_chk_fail' follow
collect2: error: ld returned 1 exit status
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
tools/testing/selftests/nolibc/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index 22f1e1d73fa8..ec724e445b5a 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for nolibc tests
include ../../../scripts/Makefile.include
+# We need this for the "cc-option" macro.
+include ../../../build/Build.include
# we're in ".../tools/testing/selftests/nolibc"
ifeq ($(srctree),)
@@ -63,6 +65,7 @@ Q=@
endif
CFLAGS ?= -Os -fno-ident -fno-asynchronous-unwind-tables
+CFLAGS += $(call cc-option,-fno-stack-protector)
LDFLAGS := -s
help:
---
base-commit: 3f0b0903fde584a7398f82fc00bf4f8138610b87
change-id: 20230221-nolibc-no-stack-protector-5b54d2be4c61
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
From: Jiri Pirko <jiri(a)nvidia.com>
When devlink instance is put into network namespace and that network
namespace gets deleted, devlink instance is moved back into init_ns.
This is done as a part of cleanup_net() routine. Since cleanup_net()
is called asynchronously from workqueue, there is no guarantee that
the devlink instance move is done after "ip netns del" returns.
So fix this race by making sure that the devlink instance is present
before any other operation.
Reported-by: Amir Tzin <amirtz(a)nvidia.com>
Fixes: b74c37fd35a2 ("selftests: netdevsim: add tests for devlink reload with resources")
Signed-off-by: Jiri Pirko <jiri(a)nvidia.com>
---
.../selftests/drivers/net/netdevsim/devlink.sh | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
index a08c02abde12..7f7d20f22207 100755
--- a/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
+++ b/tools/testing/selftests/drivers/net/netdevsim/devlink.sh
@@ -17,6 +17,18 @@ SYSFS_NET_DIR=/sys/bus/netdevsim/devices/$DEV_NAME/net/
DEBUGFS_DIR=/sys/kernel/debug/netdevsim/$DEV_NAME/
DL_HANDLE=netdevsim/$DEV_NAME
+wait_for_devlink()
+{
+ "$@" | grep -q $DL_HANDLE
+}
+
+devlink_wait()
+{
+ local timeout=$1
+
+ busywait "$timeout" wait_for_devlink devlink dev
+}
+
fw_flash_test()
{
RET=0
@@ -256,6 +268,9 @@ netns_reload_test()
ip netns del testns2
ip netns del testns1
+ # Wait until netns async cleanup is done.
+ devlink_wait 2000
+
log_test "netns reload test"
}
@@ -348,6 +363,9 @@ resource_test()
ip netns del testns2
ip netns del testns1
+ # Wait until netns async cleanup is done.
+ devlink_wait 2000
+
log_test "resource test"
}
--
2.39.0
Usually when a subtest is executed, setup and cleanup functions
are linearly called at the beginning and end of it.
In some of them, `set -e` is used before executing commands.
If one of the commands returns a non zero code, the whole script exists
without cleaning up the resources allocated at setup.
This can affect the next tests that use the same resources,
leading to a chain of failures.
To be consistent with other tests, calling cleanup function when the
script exists fixes the issue.
Steps to reproduce it:
1. Build with CONFIG_IP_ROUTE_MULTIPATH disabled.
2. Run net kselftest suite
3. fib_tests:fib_unreg_multipath_test fails when executing
`ip -netns ns1 route add 203.0.113.0/24 nexthop via 198.51.100.2 dev
dummy0 nexthop via 192.0.2.2 dev dummy1` because
CONFIG_IP_ROUTE_MULTIPATH is disabled.
This results in resources allocated during setup (e.g namespace ns1)
not being cleaned up.
4. When icmp.sh tries to create namespace ns1 during its setup, it fails
with the following error:
Cannot create namespace file "/run/netns/ns1": File exists
Roxana Nicolescu (1):
selftest: fib_tests: Always cleanup before exit
tools/testing/selftests/net/fib_tests.sh | 2 ++
1 file changed, 2 insertions(+)
--
2.34.1
[
RESEND, get rid of: Linux x86-64 Mailing List <linux-x86_64(a)vger.kernel.org>
from the CC list. It bounces and makes noise. Sorry for the wrong address.
]
Hi,
This is an RFC v8. Based on the x86/cpu branch in the tip tree.
The 'syscall' instruction on the Intel FRED architecture does not
clobber %rcx and %r11. This behavior leads to an assertion failure in
the sysret_rip selftest because it asserts %r11 = %rflags.
In the previous discussion, we agreed that there are two cases for
'syscall':
A) 'syscall' in a FRED system preserves %rcx and %r11.
B) 'syscall' in a non-FRED system sets %rcx=%rip and %r11=%rflags.
This series fixes the selftest. Make it work on the Intel FRED
architecture. Also, add more tests to ensure the syscall behavior is
consistent. It must always be (A) or always be (B). Not a mix of them.
See the previous discussion here:
https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
## Changelog revision
v8:
- Stop using "+r"(rsp) to avoid the red zone problem because it
generates the wrong Assembly code (Ammar).
See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799
- Update commit message (Ammar).
v7:
- Fix comment, REGS_ERROR no longer exists in the enum (Ammar).
- Update commit message (Ammar).
v6:
- Move the check-regs assertion in sigusr1() to check_regs_result() (HPA).
- Add a new test just like sigusr1(), but don't modify REG_RCX and
REG_R11. This is used to test SYSRET behavior consistency (HPA).
v5:
- Fix do_syscall() return value (Ammar).
v4:
- Fix the assertion condition inside the SIGUSR1 handler (Xin Li).
- Explain the purpose of patch #2 in the commit message (HPA).
- Update commit message (Ammar).
- Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
that the result is really consistent (Ammar).
v3:
- Test that we don't get a mix of REGS_SAVED and REGS_SYSRET, which
is a major part of the point (HPA).
v2:
- Use "+r"(rsp) as the right way to avoid redzone problems per
Andrew's comment (HPA).
Co-developed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
Ammar Faizi (3):
selftests/x86: sysret_rip: Handle syscall on the Intel FRED architecture
selftests/x86: sysret_rip: Add more tests to verify the 'syscall' behavior
selftests/x86: sysret_rip: Test SYSRET with a signal handler
tools/testing/selftests/x86/sysret_rip.c | 169 +++++++++++++++++++++--
1 file changed, 160 insertions(+), 9 deletions(-)
base-commit: e067248949e3de7fbeae812b0ccbbee7a401e7aa
--
Ammar Faizi
Hi,
This is an RFC v8. Based on the x86/cpu branch in the tip tree.
The 'syscall' instruction on the Intel FRED architecture does not
clobber %rcx and %r11. This behavior leads to an assertion failure in
the sysret_rip selftest because it asserts %r11 = %rflags.
In the previous discussion, we agreed that there are two cases for
'syscall':
A) 'syscall' in a FRED system preserves %rcx and %r11.
B) 'syscall' in a non-FRED system sets %rcx=%rip and %r11=%rflags.
This series fixes the selftest. Make it work on the Intel FRED
architecture. Also, add more tests to ensure the syscall behavior is
consistent. It must always be (A) or always be (B). Not a mix of them.
See the previous discussion here:
https://lore.kernel.org/lkml/5d4ad3e3-034f-c7da-d141-9c001c2343af@intel.com
## Changelog revision
v8:
- Stop using "+r"(rsp) to avoid the red zone problem because it
generates the wrong Assembly code (Ammar).
See: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108799
- Update commit message (Ammar).
v7:
- Fix comment, REGS_ERROR no longer exists in the enum (Ammar).
- Update commit message (Ammar).
v6:
- Move the check-regs assertion in sigusr1() to check_regs_result() (HPA).
- Add a new test just like sigusr1(), but don't modify REG_RCX and
REG_R11. This is used to test SYSRET behavior consistency (HPA).
v5:
- Fix do_syscall() return value (Ammar).
v4:
- Fix the assertion condition inside the SIGUSR1 handler (Xin Li).
- Explain the purpose of patch #2 in the commit message (HPA).
- Update commit message (Ammar).
- Repeat test_syscall_rcx_r11_consistent() 32 times to be more sure
that the result is really consistent (Ammar).
v3:
- Test that we don't get a mix of REGS_SAVED and REGS_SYSRET, which
is a major part of the point (HPA).
v2:
- Use "+r"(rsp) as the right way to avoid redzone problems per
Andrew's comment (HPA).
Co-developed-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Signed-off-by: H. Peter Anvin (Intel) <hpa(a)zytor.com>
Signed-off-by: Ammar Faizi <ammarfaizi2(a)gnuweeb.org>
---
Ammar Faizi (3):
selftests/x86: sysret_rip: Handle syscall on the Intel FRED architecture
selftests/x86: sysret_rip: Add more tests to verify the 'syscall' behavior
selftests/x86: sysret_rip: Test SYSRET with a signal handler
tools/testing/selftests/x86/sysret_rip.c | 169 +++++++++++++++++++++--
1 file changed, 160 insertions(+), 9 deletions(-)
base-commit: e067248949e3de7fbeae812b0ccbbee7a401e7aa
--
Ammar Faizi
AMX architecture involves several entities such as xstate, XCR0,
IA32_XFD. This series add several missing checks on top of the existing
amx_test.
v1 -> v2:
- Add a working xstate data structure suggested by seanjc.
- Split the checking of CR0.TS from the checking of XFD.
- Fix all the issues pointed by in review.
v1:
https://lore.kernel.org/all/20230110185823.1856951-1-mizhang@google.com/
Mingwei Zhang (7):
KVM: selftests: x86: Fix an error in comment of amx_test
KVM: selftests: x86: Add a working xstate data structure
KVM: selftests: x86: Add check of CR0.TS in the #NM handler in
amx_test
KVM: selftests: Add the XFD check to IA32_XFD in #NM handler
KVM: selftests: Fix the checks to XFD_ERR using and operation
KVM: selftests: x86: Enable checking on xcomp_bv in amx_test
KVM: selftests: x86: Repeat the checking of xheader when
IA32_XFD[XTILEDATA] is set in amx_test
.../selftests/kvm/include/x86_64/processor.h | 12 ++++
tools/testing/selftests/kvm/x86_64/amx_test.c | 59 ++++++++++---------
2 files changed, 43 insertions(+), 28 deletions(-)
--
2.39.1.581.gbfd45094c4-goog
commit 90091c367e74d5b58d9ebe979cc363f7468f58d3 upstream.
This patch fixes the stack-entropy.sh test to exit gracefully when the LKDTM is
not available. Test will hang otherwise as reported in [1].
Applicability of this fix to other LTS kernels:
- 4.14: No lkdtm selftest
- 4.19: No lkdtm selftest
- 5.4: No lkdtm selftests
- 5.10: Inital selftest version introduced in 46d1a0f03d661 ("selftests/lkdtm:
Add tests for LKDTM targets") is a single script which has the LKDTM
availability check
- 6.1: Fix applied
This patch applies cleanly to stable-5.15 tree. Updated test was executed in
Qemu VM with different kernels:
- CONFIG_LKDTM not enabled. Test finished with status SKIP.
- CONFIG_LKDTM enabled. Test failed (but not hanged) with error 'Stack entropy
is low'.
- CONFIG_LKDTM enabled and randomize_kstack_offset=on boot argument provided.
Test succeed.
[1] https://lore.kernel.org/lkml/2836f48a-d4e2-7f00-f06c-9f556fbd6332@linuxfoun…