0Day/LKP observed that the kselftest blocks forever since one of the
pidfd_wait doesn't terminate in 1 of 30 runs. After digging into
the source, we found that it blocks at:
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
wait_states has below testing flow:
CHILD PARENT
---------------+--------------
1 STOP itself
2 WAIT for CHILD STOPPED
3 SIGNAL CHILD to CONT
4 CONT
5 STOP itself
5' WAIT for CHILD CONT
6 WAIT for CHILD STOPPED
The problem is that the kernel cannot ensure the order of 5 and 5', once
5 goes first, the test will fail.
we can reproduce it by:
$ while true; do make run_tests -C pidfd; done
Introduce a blocking read in child process to make sure the parent can
check its WCONTINUED.
CC: Philip Li <philip.li(a)intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
---
I have almost forgotten this patch since the former version post over 6 months
ago. This time I just do a rebase and update the comments.
V3: fixes description and add review tag
V2: rewrite with pipe to avoid usleep
---
tools/testing/selftests/pidfd/pidfd_wait.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/tools/testing/selftests/pidfd/pidfd_wait.c b/tools/testing/selftests/pidfd/pidfd_wait.c
index 070c1c876df1..c3e2a3041f55 100644
--- a/tools/testing/selftests/pidfd/pidfd_wait.c
+++ b/tools/testing/selftests/pidfd/pidfd_wait.c
@@ -95,20 +95,28 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options,
.flags = CLONE_PIDFD | CLONE_PARENT_SETTID,
.exit_signal = SIGCHLD,
};
+ int pfd[2];
pid_t pid;
siginfo_t info = {
.si_signo = 0,
};
+ ASSERT_EQ(pipe(pfd), 0);
pid = sys_clone3(&args);
ASSERT_GE(pid, 0);
if (pid == 0) {
+ char buf[2];
+
+ close(pfd[1]);
kill(getpid(), SIGSTOP);
+ ASSERT_EQ(read(pfd[0], buf, 1), 1);
+ close(pfd[0]);
kill(getpid(), SIGSTOP);
exit(EXIT_SUCCESS);
}
+ close(pfd[0]);
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WSTOPPED, NULL), 0);
ASSERT_EQ(info.si_signo, SIGCHLD);
ASSERT_EQ(info.si_code, CLD_STOPPED);
@@ -117,6 +125,8 @@ static int sys_waitid(int which, pid_t pid, siginfo_t *info, int options,
ASSERT_EQ(sys_pidfd_send_signal(pidfd, SIGCONT, NULL, 0), 0);
ASSERT_EQ(sys_waitid(P_PIDFD, pidfd, &info, WCONTINUED, NULL), 0);
+ ASSERT_EQ(write(pfd[1], "C", 1), 1);
+ close(pfd[1]);
ASSERT_EQ(info.si_signo, SIGCHLD);
ASSERT_EQ(info.si_code, CLD_CONTINUED);
ASSERT_EQ(info.si_pid, parent_tid);
--
1.8.3.1
Show for each node if every memory descriptor in that node has the
EFI_MEMORY_CPU_CRYPTO attribute.
fwupd project plans to use it as part of a check to see if the users
have properly configured memory hardware encryption
capabilities. fwupd's people have seen cases where it seems like there
is memory encryption because all the hardware is capable of doing it,
but on a closer look there is not, either because of system firmware
or because some component requires updating to enable the feature.
The MKTME/TME spec says that it will only encrypt those memory regions
which are flagged with the EFI_MEMORY_CPU_CRYPTO attribute.
If all nodes are capable of encryption and if the system have tme/sme
on we can pretty confidently say that the device is actively
encrypting all its memory.
It's planned to make this check part of an specification that can be
passed to people purchasing hardware
These checks will run at every boot. The specification is called Host
Security ID: https://fwupd.github.io/libfwupdplugin/hsi.html.
We choosed to do it a per-node basis because although an ABI that
shows that the whole system memory is capable of encryption would be
useful for the fwupd use case, doing it in a per-node basis would make
the path easier to give the capability to the user to target
allocations from applications to NUMA nodes which have encryption
capabilities in the future.
Changes since v8:
Add unit tests to e820_range_* functions
Changes since v7:
Less kerneldocs
Less verbosity in the e820 code
Changes since v6:
Fixes in __e820__handle_range_update
Const correctness in e820.c
Correct alignment in memblock.h
Rework memblock_overlaps_region
Changes since v5:
Refactor e820__range_{update, remove, set_crypto_capable} in order to
avoid code duplication.
Warn the user when a node has both encryptable and non-encryptable
regions.
Check that e820_table has enough size to store both current e820_table
and EFI memmap.
Changes since v4:
Add enum to represent the cryptographic capabilities in e820:
e820_crypto_capabilities.
Revert __e820__range_update, only adding the new argument for
__e820__range_add about crypto capabilities.
Add a function __e820__range_update_crypto similar to
__e820__range_update but to only update this new field.
Changes since v3:
Update date in Doc/ABI file.
More information about the fwupd usecase and the rationale behind
doing it in a per-NUMA-node.
Changes since v2:
e820__range_mark_crypto -> e820__range_mark_crypto_capable.
In e820__range_remove: Create a region with crypto capabilities
instead of creating one without it and then mark it.
Changes since v1:
Modify __e820__range_update to update the crypto capabilities of a
range; now this function will change the crypto capability of a range
if it's called with the same old_type and new_type. Rework
efi_mark_e820_regions_as_crypto_capable based on this.
Update do_add_efi_memmap to mark the regions as it creates them.
Change the type of crypto_capable in e820_entry from bool to u8.
Fix e820__update_table changes.
Remove memblock_add_crypto_capable. Now you have to add the region and
mark it then.
Better place for crypto_capable in pglist_data.
Martin Fernandez (9):
mm/memblock: Tag memblocks with crypto capabilities
mm/mmzone: Tag pg_data_t with crypto capabilities
x86/e820: Add infrastructure to refactor e820__range_{update,remove}
x86/e820: Refactor __e820__range_update
x86/e820: Refactor e820__range_remove
x86/e820: Tag e820_entry with crypto capabilities
x86/e820: Add unit tests for e820_range_* functions
x86/efi: Mark e820_entries as crypto capable from EFI memmap
drivers/node: Show in sysfs node's crypto capabilities
Documentation/ABI/testing/sysfs-devices-node | 10 +
arch/x86/Kconfig.debug | 10 +
arch/x86/include/asm/e820/api.h | 1 +
arch/x86/include/asm/e820/types.h | 12 +-
arch/x86/kernel/e820.c | 393 ++++++++++++++-----
arch/x86/kernel/e820_test.c | 249 ++++++++++++
arch/x86/platform/efi/efi.c | 37 ++
drivers/base/node.c | 10 +
include/linux/memblock.h | 5 +
include/linux/mmzone.h | 3 +
mm/memblock.c | 62 +++
mm/page_alloc.c | 1 +
12 files changed, 695 insertions(+), 98 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-devices-node
create mode 100644 arch/x86/kernel/e820_test.c
--
2.30.2
On top of mm-stable.
This is my current set of tests for testing COW handling of anonymous
memory, especially when interacting with GUP. I developed these tests
while working on PageAnonExclusive and managed to clean them up just now.
On current upstream Linux, all tests pass except the hugetlb tests that
rely on vmsplice -- these tests should pass as soon as vmsplice properly
uses FOLL_PIN instead of FOLL_GET.
I'm working on additional tests for COW handling in private mappings,
focusing on long-term R/O pinning e.g., of the shared zeropage, pagecache
pages and KSM pages. These tests, however, will go into a different file.
So this is everything I have regarding tests for anonymous memory.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Nadav Amit <namit(a)vmware.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Christoph von Recklinghausen <crecklin(a)redhat.com>
Cc: Don Dutile <ddutile(a)redhat.com>
David Hildenbrand (7):
selftests/vm: anon_cow: test COW handling of anonymous memory
selftests/vm: factor out pagemap_is_populated() into vm_util
selftests/vm: anon_cow: THP tests
selftests/vm: anon_cow: hugetlb tests
selftests/vm: anon_cow: add liburing test cases
mm/gup_test: start/stop/read functionality for PIN LONGTERM test
selftests/vm: anon_cow: add R/O longterm tests via gup_test
mm/gup_test.c | 140 +++
mm/gup_test.h | 12 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 25 +-
tools/testing/selftests/vm/anon_cow.c | 1126 ++++++++++++++++++++
tools/testing/selftests/vm/check_config.sh | 31 +
tools/testing/selftests/vm/madv_populate.c | 8 -
tools/testing/selftests/vm/run_vmtests.sh | 3 +
tools/testing/selftests/vm/vm_util.c | 15 +
tools/testing/selftests/vm/vm_util.h | 2 +
10 files changed, 1353 insertions(+), 10 deletions(-)
create mode 100644 tools/testing/selftests/vm/anon_cow.c
create mode 100644 tools/testing/selftests/vm/check_config.sh
--
2.37.3
This series cleans up and fixes break_ksm(). In summary, we no longer
use fake write faults to break COW but instead FAULT_FLAG_UNSHARE. Further,
we move away from using follow_page() [that we can hopefully remove
completely at one point] and use new walk_page_range_vma() instead.
Fortunately, we can get rid of VM_FAULT_WRITE and FOLL_MIGRATION in common
code now.
Add a selftest to measure MADV_UNMERGEABLE performance. In my setup
(AMD Ryzen 9 3900X), running the KSM selftest to test unmerge performance
on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in a
performance degradation of ~8% -- 9% (old: ~5250 MiB/s, new: ~4800 MiB/s).
I don't think we particularly care for now, but it's good to be aware
of the implication.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
David Hildenbrand (7):
selftests/vm: add test to measure MADV_UNMERGEABLE performance
mm/ksm: simplify break_ksm() to not rely on VM_FAULT_WRITE
mm: remove VM_FAULT_WRITE
mm/ksm: fix KSM COW breaking with userfaultfd-wp via
FAULT_FLAG_UNSHARE
mm/pagewalk: add walk_page_range_vma()
mm/ksm: convert break_ksm() to use walk_page_range_vma()
mm/gup: remove FOLL_MIGRATION
include/linux/mm.h | 1 -
include/linux/mm_types.h | 3 -
include/linux/pagewalk.h | 3 +
mm/gup.c | 55 ++-----------
mm/huge_memory.c | 2 +-
mm/ksm.c | 103 +++++++++++++++++++------
mm/memory.c | 9 +--
mm/pagewalk.c | 27 +++++++
tools/testing/selftests/vm/ksm_tests.c | 76 +++++++++++++++++-
9 files changed, 192 insertions(+), 87 deletions(-)
--
2.37.3
This change enables to extend CFLAGS and LDFLAGS from command line, e.g.
to extend compiler checks: make USERCFLAGS=-Werror USERLDFLAGS=-static
USERCFLAGS and USERLDFLAGS are documented in
Documentation/kbuild/makefiles.rst and Documentation/kbuild/kbuild.rst
This should be backported (down to 5.10) to improve previous kernel
versions testing as well.
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
Link: https://lore.kernel.org/r/20220909103901.1503436-1-mic@digikod.net
---
tools/testing/selftests/lib.mk | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
index d44c72b3abe3..da47a0257165 100644
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@ -119,6 +119,11 @@ endef
clean:
$(CLEAN)
+# Enables to extend CFLAGS and LDFLAGS from command line, e.g.
+# make USERCFLAGS=-Werror USERLDFLAGS=-static
+CFLAGS += $(USERCFLAGS)
+LDFLAGS += $(USERLDFLAGS)
+
# When make O= with kselftest target from main level
# the following aren't defined.
#
base-commit: 7e18e42e4b280c85b76967a9106a13ca61c16179
--
2.37.2
Hello,
This patch series implements a new ioctl on the pagemap proc fs file to
get, clear and perform both get and clear at the same time atomically on
the specified range of the memory.
Soft-dirty PTE bit of the memory pages can be viewed by using pagemap
procfs file. The soft-dirty PTE bit for the whole memory range of the
process can be cleared by writing to the clear_refs file. This series
adds features that weren't present earlier.
- There is no atomic get soft-dirty PTE bit status and clear operation
present.
- The soft-dirty PTE bit of only a part of memory cannot be cleared.
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The proc fs interface is enough for that as I think the process
is frozen. We have the use case where we need to track the soft-dirty
PTE bit for the running processes. We need this tracking and clear
mechanism of a region of memory while the process is running to emulate
the getWriteWatch() syscall of Windows. This syscall is used by games to
keep track of dirty pages and keep processing only the dirty pages. This
new ioctl can be used by the CRIU project and other applications which
require soft-dirty PTE bit information.
As in the current kernel there is no way to clear a part of memory (instead
of clearing the Soft-Dirty bits for the entire process) and get+clear
operation cannot be performed atomically, there are other methods to mimic
this information entirely in userspace with poor performance:
- The mprotect syscall and SIGSEGV handler for bookkeeping
- The userfaultfd syscall with the handler for bookkeeping
Some benchmarks can be seen [1].
This ioctl can be used by the CRIU project and other applications which
require soft-dirty PTE bit information. The following operations are
supported in this ioctl:
- Get the pages that are soft-dirty.
- Clear the pages which are soft-dirty.
- The optional flag to ignore the VM_SOFTDIRTY and only track per page
soft-dirty PTE bit
There are two decisions which have been taken about how to get the output
from the syscall.
- Return offsets of the pages from the start in the vec
- Stop execution when vec is filled with dirty pages
These two arguments doesn't follow the mincore() philosophy where the
output array corresponds to the address range in one to one fashion, hence
the output buffer length isn't passed and only a flag is set if the page
is present. This makes mincore() easy to use with less control. We are
passing the size of the output array and putting return data consecutively
which is offset of dirty pages from the start. The user can convert these
offsets back into the dirty page addresses easily. Suppose, the user want
to get first 10 dirty pages from a total memory of 100 pages. He'll
allocate output buffer of size 10 and the ioctl will abort after finding the
10 pages. This behaviour is needed to support Windows' getWriteWatch(). The
behaviour like mincore() can be achieved by passing output buffer of 100
size. This interface can be used for any desired behaviour.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: update functions to clear the soft-dirty PTE bit
fs/proc/task_mmu: Implement IOCTL to get and clear soft dirty PTE bit
selftests: vm: add pagemap ioctl tests
mm: add documentation of the new ioctl on pagemap
Documentation/admin-guide/mm/soft-dirty.rst | 42 +-
fs/proc/task_mmu.c | 342 ++++++++++-
include/uapi/linux/fs.h | 23 +
tools/include/uapi/linux/fs.h | 23 +
tools/testing/selftests/vm/.gitignore | 1 +
tools/testing/selftests/vm/Makefile | 2 +
tools/testing/selftests/vm/pagemap_ioctl.c | 649 ++++++++++++++++++++
7 files changed, 1050 insertions(+), 32 deletions(-)
create mode 100644 tools/testing/selftests/vm/pagemap_ioctl.c
--
2.30.2
This v3 series implements selftests targeting the feature floated by Chao
via:
https://lore.kernel.org/linux-mm/20220706082016.2603916-12-chao.p.peng@linu…
Below changes aim to test the fd based approach for guest private memory
in context of normal (non-confidential) VMs executing on non-confidential
platforms.
private_mem_test.c file adds selftest to access private memory from the
guest via private/shared accesses and checking if the contents can be
leaked to/accessed by vmm via shared memory view before/after conversions.
Updates in V3:
1) Series is based on v7 series from Chao
2) Changes are introduced in KVM to help execute private mem selftests
3) Selftests are executing from private memory
4) Test implementation is simplified to contain implicit/explicit memory
conversion paths according to feedback from Sean.
5) Addressed comments from Sean and Shuah.
This series has dependency on following patches:
1) V7 series patches from Chao mentioned above.
2) https://lore.kernel.org/lkml/20220810152033.946942-1-pgonda@google.com/T/#u
- Series posted by Peter containing patches from Michael and Sean.
Github link for the patches posted as part of this series:
https://github.com/vishals4gh/linux/commits/priv_memfd_selftests_rfc_v3
Vishal Annapurve (6):
kvm: x86: Add support for testing private memory
selftests: kvm: Add support for private memory
selftests: kvm: ucall: Allow querying ucall pool gpa
selftests: kvm: x86: Execute hypercall as per the cpu
selftests: kvm: x86: Execute VMs with private memory
sefltests: kvm: x86: Add selftest for private memory
arch/x86/include/uapi/asm/kvm_para.h | 2 +
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu/mmu.c | 19 ++
arch/x86/kvm/mmu/mmu_internal.h | 2 +-
arch/x86/kvm/x86.c | 67 +++-
include/linux/kvm_host.h | 12 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 2 +
.../selftests/kvm/include/kvm_util_base.h | 12 +-
.../selftests/kvm/include/ucall_common.h | 2 +
.../kvm/include/x86_64/private_mem.h | 51 +++
tools/testing/selftests/kvm/lib/kvm_util.c | 40 ++-
.../testing/selftests/kvm/lib/ucall_common.c | 12 +
.../selftests/kvm/lib/x86_64/private_mem.c | 297 ++++++++++++++++++
.../selftests/kvm/lib/x86_64/processor.c | 15 +-
.../selftests/kvm/x86_64/private_mem_test.c | 262 +++++++++++++++
virt/kvm/Kconfig | 9 +
virt/kvm/kvm_main.c | 90 +++++-
18 files changed, 887 insertions(+), 9 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/private_mem.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/private_mem.c
create mode 100644 tools/testing/selftests/kvm/x86_64/private_mem_test.c
--
2.37.1.595.g718a3a8f04-goog
Hi All,
Intel's Trust Domain Extensions (TDX) protect guest VMs from malicious
hosts and some physical attacks. VM guest with TDX support is called
as a TDX Guest.
In TDX guest, attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the
guest. For example, a key server may request for attestation before
releasing the encryption keys to mount the encrypted rootfs or
secondary drive.
This patch set adds attestation support for the TDX guest. Details
about the TDX attestation process and the steps involved are explained
in Documentation/x86/tdx.rst (added by patch 2/3).
Following are the details of the patch set:
Patch 1/3 -> Preparatory patch for adding attestation support.
Patch 2/3 -> Adds user interface driver to support attestation.
Patch 3/3 -> Adds selftest support for TDREPORT feature.
Commit log history is maintained in the individual patches.
Kuppuswamy Sathyanarayanan (3):
x86/tdx: Make __tdx_module_call() usable in driver module
virt: Add TDX guest driver
selftests: tdx: Test TDX attestation GetReport support
Documentation/virt/coco/tdx-guest.rst | 42 +++++
Documentation/virt/index.rst | 1 +
Documentation/x86/tdx.rst | 43 +++++
arch/x86/coco/tdx/tdcall.S | 2 +
arch/x86/coco/tdx/tdx.c | 5 -
arch/x86/include/asm/tdx.h | 6 +
drivers/virt/Kconfig | 2 +
drivers/virt/Makefile | 1 +
drivers/virt/coco/tdx-guest/Kconfig | 10 ++
drivers/virt/coco/tdx-guest/Makefile | 2 +
drivers/virt/coco/tdx-guest/tdx-guest.c | 131 ++++++++++++++
include/uapi/linux/tdx-guest.h | 53 ++++++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/tdx/Makefile | 7 +
tools/testing/selftests/tdx/config | 1 +
tools/testing/selftests/tdx/tdx_guest_test.c | 175 +++++++++++++++++++
16 files changed, 477 insertions(+), 5 deletions(-)
create mode 100644 Documentation/virt/coco/tdx-guest.rst
create mode 100644 drivers/virt/coco/tdx-guest/Kconfig
create mode 100644 drivers/virt/coco/tdx-guest/Makefile
create mode 100644 drivers/virt/coco/tdx-guest/tdx-guest.c
create mode 100644 include/uapi/linux/tdx-guest.h
create mode 100644 tools/testing/selftests/tdx/Makefile
create mode 100644 tools/testing/selftests/tdx/config
create mode 100644 tools/testing/selftests/tdx/tdx_guest_test.c
--
2.34.1