0Day/LKP observed that the kselftest blocks forever since one of the
pidfd_wait doesn't terminate in 1 of 30 runs. After digging into
the source, we found that it blocks at:
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
wait_states has below testing flow:
CHILD PARENT
---------------+--------------
1 STOP itself
2 WAIT for CHILD STOPPED
3 SIGNAL CHILD to CONT
4 CONT
5 STOP itself
5' WAIT for CHILD CONT
6 WAIT for CHILD STOPPED
The problem is that the kernel cannot ensure the order of 5 and 5', once
5 goes first, the test will fail.
we can reproduce it by:
$ while true; do make run_tests -C pidfd; done
Introduce a blocking read in child process to make sure the parent can
check its WCONTINUED.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
---
I have almost forgotten this patch since the former version post over 6 months
ago. This time I just do a rebase and update the comments.
V3: fixes description and add review tag
V2: rewrite with pipe to avoid usleep
---
tools/testing/selftests/pidfd/pidfd_wait.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c
index 070c1c876df1..c3e2a3041f55 100644
--- a/tools/testing/selftests/pidfd/pidfd_wait.c
+++ b/tools/testing/selftests/pidfd/pidfd_wait.c
@@ -95,20 +95,28 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options,
.flags = CLONE_PIDFD | CLONE_PARENT_SETTID,
.exit_signal = SIGCHLD,
};
+ int pfd[2];
pid_t pid;
siginfo_t info = {
.si_signo = 0,
};
+ ASSERT_EQ(pipe(pfd), 0);
pid = sys_clone3(&args);
ASSERT_GE(pid, 0);
if (pid == 0) {
+ char buf[2];
+
+ close(pfd[1]);
kill(getpid(), SIGSTOP);
+ ASSERT_EQ(read(pfd[0], buf, 1), 1);
+ close(pfd[0]);
kill(getpid(), SIGSTOP);
exit(EXIT_SUCCESS);
}
+ close(pfd[0]);
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0);
ASSERT_EQ(info.si_signo, SIGCHLD);
ASSERT_EQ(info.si_code, CLD_STOPPED);
@@ -117,6 +125,8 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options,
ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0);
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
+ ASSERT_EQ(write(pfd[1], "C", 1), 1);
+ close(pfd[1]);
ASSERT_EQ(info.si_signo, SIGCHLD);
ASSERT_EQ(info.si_code, CLD_CONTINUED);
ASSERT_EQ(info.si_pid, parent_tid);
--
1.8.3.1
This patch series is a result of long debug work to find out why
sometimes guests with win11 secure boot
were failing during boot.
During writing a unit test I found another bug, turns out
that on rsm emulation, if the rsm instruction was done in real
or 32 bit mode, KVM would truncate the restored RIP to 32 bit.
I also refactored the way we write SMRAM so it is easier
now to understand what is going on.
The main bug in this series which I fixed is that we
allowed #SMI to happen during the STI interrupt shadow,
and we did nothing to both reset it on #SMI handler
entry and restore it on RSM.
V4:
- rebased on top of patch series from Paolo which
allows smm support to be disabled by Kconfig option.
- addressed review feedback.
I included these patches in the series for reference.
Best regards,
Maxim Levitsky
Maxim Levitsky (15):
bug: introduce ASSERT_STRUCT_OFFSET
KVM: x86: emulator: em_sysexit should update ctxt->mode
KVM: x86: emulator: introduce emulator_recalc_and_set_mode
KVM: x86: emulator: update the emulation mode after rsm
KVM: x86: emulator: update the emulation mode after CR0 write
KVM: x86: smm: number of GPRs in the SMRAM image depends on the image
format
KVM: x86: smm: check for failures on smm entry
KVM: x86: smm: add structs for KVM's smram layout
KVM: x86: smm: use smram structs in the common code
KVM: x86: smm: use smram struct for 32 bit smram load/restore
KVM: x86: smm: use smram struct for 64 bit smram load/restore
KVM: svm: drop explicit return value of kvm_vcpu_map
KVM: x86: SVM: use smram structs
KVM: x86: SVM: don't save SVM state to SMRAM when VM is not long mode
capable
KVM: x86: smm: preserve interrupt shadow in SMRAM
Paolo Bonzini (8):
KVM: x86: start moving SMM-related functions to new files
KVM: x86: move SMM entry to a new file
KVM: x86: move SMM exit to a new file
KVM: x86: do not go through ctxt->ops when emulating rsm
KVM: allow compiling out SMM support
KVM: x86: compile out vendor-specific code if SMM is disabled
KVM: x86: remove SMRAM address space if SMM is not supported
KVM: x86: do not define KVM_REQ_SMI if SMM disabled
arch/x86/include/asm/kvm-x86-ops.h | 2 +
arch/x86/include/asm/kvm_host.h | 29 +-
arch/x86/kvm/Kconfig | 11 +
arch/x86/kvm/Makefile | 1 +
arch/x86/kvm/emulate.c | 458 +++----------
arch/x86/kvm/kvm_cache_regs.h | 5 -
arch/x86/kvm/kvm_emulate.h | 47 +-
arch/x86/kvm/lapic.c | 14 +-
arch/x86/kvm/lapic.h | 7 +-
arch/x86/kvm/mmu/mmu.c | 1 +
arch/x86/kvm/smm.c | 637 ++++++++++++++++++
arch/x86/kvm/smm.h | 160 +++++
arch/x86/kvm/svm/nested.c | 3 +
arch/x86/kvm/svm/svm.c | 43 +-
arch/x86/kvm/vmx/nested.c | 1 +
arch/x86/kvm/vmx/vmcs12.h | 5 +-
arch/x86/kvm/vmx/vmx.c | 11 +-
arch/x86/kvm/x86.c | 353 +---------
include/linux/build_bug.h | 9 +
tools/testing/selftests/kvm/x86_64/smm_test.c | 2 +
20 files changed, 1031 insertions(+), 768 deletions(-)
create mode 100644 arch/x86/kvm/smm.c
create mode 100644 arch/x86/kvm/smm.h
--
2.34.3
The XSAVE feature set supports the saving and restoring of xstate components.
XSAVE feature has been used for process context switching. XSAVE components
include x87 state for FP execution environment, SSE state, AVX state and so on.
In order to ensure that XSAVE works correctly, add XSAVE most basic test for
XSAVE architecture functionality.
This patch tests "FP, SSE(XMM), AVX2(YMM), AVX512_OPMASK/AVX512_ZMM_Hi256/
AVX512_Hi16_ZMM and PKRU parts" xstates with following cases:
1. The contents of these xstates in the process should not change after the
signal handling.
2. The contents of these xstates in the child process should be the same as
the contents of the xstate in the parent process after the fork syscall.
3. The contents of xstates in the parent process should not change after
the context switch.
As stated in the ABI(Application Binary Interface) specification:
https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf
Xstate like XMM is not preserved across function calls, so fork() function
which provided from libc could not be used in the xsave test, and the libc
function is replaced with an inline function of the assembly code only.
To prevent GCC from generating any FP/SSE(XMM)/AVX/PKRU code by mistake, add
"-mno-sse -mno-mmx -mno-sse2 -mno-avx -mno-pku" compiler arguments. stdlib.h
can not be used because of the "-mno-sse" option.
Thanks Dave, Hansen for the above suggestion!
Thanks Chen Yu; Shuah Khan; Chatre Reinette and Tony Luck's comments!
Thanks to Bae, Chang Seok for a bunch of comments!
========
- Change from v12 to v13
- Improve the comments of CPUID.(EAX=0DH, ECX=0H):EBX.
- Change from v11 to v12
- Remove useless rbx register stuffing in assembly syscall functions.
(Zhang, Li)
- Change from v10 to v11
- Remove the small function like cpu_has_pkru(), get_xstate_size() and so
on. (Shuah Khan)
- Unify xfeature_num type to uint32_t.
- Change from v9 to v10
- Remove the small function if the function will be called once and there
is no good reason. (Shuah Khan)
- Change from v8 to v9
- Use function pointers to make it more structured. (Hansen, Dave)
- Improve the function name: xstate_tested -> xstate_in_test. (Chang S. Bae)
- Break this test up into two pieces: keep the xstate key test steps with
"-mno-sse" and no stdlib.h, keep others in xstate.c file. (Hansen, Dave)
- Use kselftest infrastructure for xstate.c file. (Hansen, Dave)
- Use instruction back to populate fp xstate buffer. (Hansen, Dave)
- Will skip the test if cpu could not support xsave. (Chang S. Bae)
- Use __cpuid_count() helper in kselftest.h. (Reinette, Chatre)
- Change from v7 to v8
Many thanks to Bae, Chang Seok for a bunch of comments as follow:
- Use the filling buffer way to prepare the xstate buffer, and use xrstor
instruction way to load the tested xstates.
- Remove useless dump_buffer, compare_buffer functions.
- Improve the struct of xstate_info.
- Added AVX512_ZMM_Hi256 and AVX512_Hi16_ZMM components in xstate test.
- Remove redundant xstate_info.xstate_mask, xstate_flag[], and
xfeature_test_mask, use xstate_info.mask instead.
- Check if xfeature is supported outside of fill_xstate_buf() , this change
is easier to read and understand.
- Remove useless wrpkru, only use filling all tested xstate buffer in
fill_xstates_buf().
- Improve a bunch of function names and variable names.
- Improve test steps flow for readability.
- Change from v6 to v7:
- Added the error number and error description of the reason for the
failure, thanks Shuah Khan's suggestion.
- Added a description of what these tests are doing in the head comments.
- Added changes update in the head comments.
- Added description of the purpose of the function. thanks Shuah Khan.
- Change from v5 to v6:
- In order to prevent GCC from generating any FP code by mistake,
"-mno-sse -mno-mmx -mno-sse2 -mno-avx -mno-pku" compiler parameter was
added, it's referred to the parameters for compiling the x86 kernel. Thanks
Dave Hansen's suggestion.
- Removed the use of "kselftest.h", because kselftest.h included <stdlib.h>,
and "stdlib.h" would use sse instructions in it's libc, and this *XSAVE*
test needed to be compiled without libc sse instructions(-mno-sse).
- Improved the description in commit header, thanks Chen Yu's suggestion.
- Becasue test code could not use buildin xsave64 in libc without sse, added
xsave function by instruction way.
- Every key test action would not use libc(like printf) except syscall until
it's failed or done. If it's failed, then it would print the failed reason.
- Used __cpuid_count() instead of native_cpuid(), becasue __cpuid_count()
was a macro definition function with one instruction in libc and did not
change xstate. Thanks Chatre Reinette, Shuah Khan.
https://lore.kernel.org/linux-sgx/8b7c98f4-f050-bc1c-5699-fa598ecc66a2@linu…
- Change from v4 to v5:
- Moved code files into tools/testing/selftests/x86.
- Delete xsave instruction test, becaue it's not related to kernel.
- Improved case description.
- Added AVX512 opmask change and related XSAVE content verification.
- Added PKRU part xstate test into instruction and signal handling test.
- Added XSAVE process swich test for FPU, AVX2, AVX512 opmask and PKRU part.
- Change from v3 to v4:
- Improve the comment in patch 1.
- Change from v2 to v3:
- Improve the description of patch 2 git log.
- Change from v1 to v2:
- Improve the cover-letter. Thanks Dave Hansen's suggestion.
Pengfei Xu (2):
selftests/x86/xstate: Add xstate signal handling test for XSAVE
feature
selftests/x86/xstate: Add xstate fork test for XSAVE feature
tools/testing/selftests/x86/.gitignore | 1 +
tools/testing/selftests/x86/Makefile | 11 +-
tools/testing/selftests/x86/xstate.c | 214 +++++++++++++++++
tools/testing/selftests/x86/xstate.h | 228 +++++++++++++++++++
tools/testing/selftests/x86/xstate_helpers.c | 209 +++++++++++++++++
tools/testing/selftests/x86/xstate_helpers.h | 9 +
6 files changed, 670 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/x86/xstate.c
create mode 100644 tools/testing/selftests/x86/xstate.h
create mode 100644 tools/testing/selftests/x86/xstate_helpers.c
create mode 100644 tools/testing/selftests/x86/xstate_helpers.h
--
2.31.1
Remove the repeated word "and" in comments.
Signed-off-by: Shaomin Deng <dengshaomin(a)cdjrlc.com>
---
tools/testing/selftests/core/close_range_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c
index 749239930ca8..4db5ec73d016 100644
--- a/tools/testing/selftests/core/close_range_test.c
+++ b/tools/testing/selftests/core/close_range_test.c
@@ -476,7 +476,7 @@ TEST(close_range_cloexec_unshare_syzbot)
/*
* Create a huge gap in the fd table. When we now call
- * CLOSE_RANGE_UNSHARE with a shared fd table and and with ~0U as upper
+ * CLOSE_RANGE_UNSHARE with a shared fd table and with ~0U as upper
* bound the kernel will only copy up to fd1 file descriptors into the
* new fd table. If the kernel is buggy and doesn't handle
* CLOSE_RANGE_CLOEXEC correctly it will not have copied all file
--
2.35.1
Paul and myself got trapped a few times by not seeing the effects of
applying a patch to the nolibc source code until a "make clean" was
issued in the nolibc directory. It's particularly annoying when trying
to confirm that a proposed patch really solves a problem (or that
reverting it reintroduces the problem).
The reason for the sysroot not being rebuilt was that it can be quite
slow. But in fact it's only slow after a "make clean" issued at the
kernel's topdir, because it's the main "make headers" that can take a
tens of seconds; as long as "usr/include" still contains headers, the
"headers_install" phase is only a quick "rsync", and rebuilding the
whole nolibc sysroot takes a bit less than one second, which is perfectly
acceptable for a test, even more once the time lost caused by misleading
results if factored in.
This patch marks the sysroot target as phony and starts by clearing
the previous sysroot for the current architecture before reinstalling
it. Thanks to this, applying a patch to nolibc makes the effect
immediately visible to "make nolibc-test":
$ time make -j -C tools/testing/selftests/nolibc nolibc-test
make: Entering directory '/k/tools/testing/selftests/nolibc'
MKDIR sysroot/x86/include
make[1]: Entering directory '/k/tools/include/nolibc'
make[2]: Entering directory '/k'
make[2]: Leaving directory '/k'
make[2]: Entering directory '/k'
INSTALL /k/tools/testing/selftests/nolibc/sysroot/sysroot/include
make[2]: Leaving directory '/k'
make[1]: Leaving directory '/k/tools/include/nolibc'
CC nolibc-test
make: Leaving directory '/k/tools/testing/selftests/nolibc'
real 0m0.869s
user 0m0.716s
sys 0m0.149s
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Link: https://lore.kernel.org/all/20221021155645.GK5600@paulmck-ThinkPad-P17-Gen-…
Signed-off-by: Willy Tarreau <w(a)1wt.eu>
---
tools/testing/selftests/nolibc/Makefile | 3 +++
1 file changed, 3 insertions(+)
diff --git a/tools/testing/selftests/nolibc/Makefile b/tools/testing/selftests/nolibc/Makefile
index 69ea659caca9..22f1e1d73fa8 100644
--- a/tools/testing/selftests/nolibc/Makefile
+++ b/tools/testing/selftests/nolibc/Makefile
@@ -95,6 +95,7 @@ all: run
sysroot: sysroot/$(ARCH)/include
sysroot/$(ARCH)/include:
+ $(Q)rm -rf sysroot/$(ARCH) sysroot/sysroot
$(QUIET_MKDIR)mkdir -p sysroot
$(Q)$(MAKE) -C ../../../include/nolibc ARCH=$(ARCH) OUTPUT=$(CURDIR)/sysroot/ headers_standalone
$(Q)mv sysroot/sysroot sysroot/$(ARCH)
@@ -133,3 +134,5 @@ clean:
$(Q)rm -rf initramfs
$(call QUIET_CLEAN, run.out)
$(Q)rm -rf run.out
+
+.PHONY: sysroot/$(ARCH)/include
--
2.35.3
From: Roberto Sassu <roberto.sassu(a)huawei.com>
include/linux/lsm_hooks.h reports the result of the LSM infrastructure to
the callers, not what LSMs should return to the LSM infrastructure.
Clarify that and add that returning 1 from the LSMs means calling
__vm_enough_memory() with cap_sys_admin set, 0 without.
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
---
include/linux/lsm_hooks.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index 4ec80b96c22e..f40b82ca91e7 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1411,7 +1411,9 @@
* Check permissions for allocating a new virtual mapping.
* @mm contains the mm struct it is being added to.
* @pages contains the number of pages.
- * Return 0 if permission is granted.
+ * Return 0 if permission is granted by LSMs to the caller. LSMs should
+ * return 1 if __vm_enough_memory() should be called with
+ * cap_sys_admin set, 0 if not.
*
* @ismaclabel:
* Check if the extended attribute specified by @name
--
2.25.1