From: Reinette Chatre <reinette.chatre(a)intel.com>
Removing a page from an initialized enclave involves three steps:
(1) the user requests changing the page type to PT_TRIM via the
SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()
(2) on success the ENCLU[EACCEPT] instruction is run from within
the enclave to accept the page removal
(3) the user initiates the actual removal of the page via the
SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl().
Remove a page that has never been accessed. This means that when the
first ioctl() requesting page removal arrives, there will be no page
table entry, yet a valid page table entry needs to exist for the
ENCLU[EACCEPT] function to succeed. In this test it is verified that
a page table entry can still be installed for a page that is in the
process of being removed.
Suggested-by: Haitao Huang <haitao.huang(a)intel.com>
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
tools/testing/selftests/sgx/main.c | 82 ++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index d132e7d32454..c691a4864db8 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1816,4 +1816,86 @@ TEST_F(enclave, remove_added_page_invalid_access_after_eaccept)
EXPECT_EQ(self->run.exception_addr, data_start);
}
+TEST_F(enclave, remove_untouched_page)
+{
+ struct sgx_enclave_remove_pages remove_ioc;
+ struct encl_op_eaccept eaccept_op;
+ struct sgx_enclave_modt modt_ioc;
+ struct sgx_secinfo secinfo;
+ unsigned long data_start;
+ int ret, errno_save;
+
+ ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+ /*
+ * Hardware (SGX2) and kernel support is needed for this test. Start
+ * with check that test has a chance of succeeding.
+ */
+ memset(&modt_ioc, 0, sizeof(modt_ioc));
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+
+ if (ret == -1) {
+ if (errno == ENOTTY)
+ SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+ else if (errno == ENODEV)
+ SKIP(return, "System does not support SGX2");
+ }
+
+ /*
+ * Invalid parameters were provided during sanity check,
+ * expect command to fail.
+ */
+ EXPECT_EQ(ret, -1);
+
+ /* SGX2 is supported by kernel and hardware, test can proceed. */
+ memset(&self->run, 0, sizeof(self->run));
+ self->run.tcs = self->encl.encl_base;
+
+ data_start = self->encl.encl_base +
+ encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+ memset(&modt_ioc, 0, sizeof(modt_ioc));
+ memset(&secinfo, 0, sizeof(secinfo));
+
+ secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+ modt_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+ modt_ioc.length = PAGE_SIZE;
+ modt_ioc.secinfo = (unsigned long)&secinfo;
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+ errno_save = ret == -1 ? errno : 0;
+
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(errno_save, 0);
+ EXPECT_EQ(modt_ioc.result, 0);
+ EXPECT_EQ(modt_ioc.count, 4096);
+
+ /*
+ * Enter enclave via TCS #1 and approve page removal by sending
+ * EACCEPT for removed page.
+ */
+
+ eaccept_op.epc_addr = data_start;
+ eaccept_op.flags = SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED;
+ eaccept_op.ret = 0;
+ eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+ EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+ EXPECT_EQ(eaccept_op.ret, 0);
+
+ memset(&remove_ioc, 0, sizeof(remove_ioc));
+
+ remove_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+ remove_ioc.length = PAGE_SIZE;
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+ errno_save = ret == -1 ? errno : 0;
+
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(errno_save, 0);
+ EXPECT_EQ(remove_ioc.count, 4096);
+}
+
TEST_HARNESS_MAIN
--
2.35.1
From: Reinette Chatre <reinette.chatre(a)intel.com>
Removing a page from an initialized enclave involves three steps:
(1) the user requests changing the page type to SGX_PAGE_TYPE_TRIM
via the SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl(), (2) on success the
ENCLU[EACCEPT] instruction is run from within the enclave to accept
the page removal, (3) the user initiates the actual removal of the
page via the SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl().
Test two possible invalid accesses during the page removal flow:
* Test the behavior when a request to remove the page by changing its
type to SGX_PAGE_TYPE_TRIM completes successfully but instead of
executing ENCLU[EACCEPT] from within the enclave the enclave attempts
to read from the page. Even though the page is accessible from the
page table entries its type is SGX_PAGE_TYPE_TRIM and thus not
accessible according to SGX. The expected behavior is a page fault
with the SGX flag set in the error code.
* Test the behavior when the page type is changed successfully and
ENCLU[EACCEPT] was run from within the enclave. The final ioctl(),
SGX_IOC_ENCLAVE_REMOVE_PAGES, is omitted and replaced with an
attempt to access the page. Even though the page is accessible
from the page table entries its type is SGX_PAGE_TYPE_TRIM and
thus not accessible according to SGX. The expected behavior is
a page fault with the SGX flag set in the error code.
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
tools/testing/selftests/sgx/main.c | 247 +++++++++++++++++++++++++++++
1 file changed, 247 insertions(+)
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index 82902dab96bc..d132e7d32454 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1569,4 +1569,251 @@ TEST_F(enclave, remove_added_page_no_eaccept)
EXPECT_EQ(remove_ioc.count, 0);
}
+/*
+ * Request enclave page removal but instead of correctly following with
+ * EACCEPT a read attempt to page is made from within the enclave.
+ */
+TEST_F(enclave, remove_added_page_invalid_access)
+{
+ struct encl_op_get_from_addr get_addr_op;
+ struct encl_op_put_to_addr put_addr_op;
+ struct sgx_enclave_modt ioc;
+ struct sgx_secinfo secinfo;
+ unsigned long data_start;
+ int ret, errno_save;
+
+ ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+ memset(&self->run, 0, sizeof(self->run));
+ self->run.tcs = self->encl.encl_base;
+
+ /*
+ * Hardware (SGX2) and kernel support is needed for this test. Start
+ * with check that test has a chance of succeeding.
+ */
+ memset(&ioc, 0, sizeof(ioc));
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+
+ if (ret == -1) {
+ if (errno == ENOTTY)
+ SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+ else if (errno == ENODEV)
+ SKIP(return, "System does not support SGX2");
+ }
+
+ /*
+ * Invalid parameters were provided during sanity check,
+ * expect command to fail.
+ */
+ EXPECT_EQ(ret, -1);
+
+ /*
+ * Page that will be removed is the second data page in the .data
+ * segment. This forms part of the local encl_buffer within the
+ * enclave.
+ */
+ data_start = self->encl.encl_base +
+ encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+ /*
+ * Sanity check that page at @data_start is writable before
+ * removing it.
+ *
+ * Start by writing MAGIC to test page.
+ */
+ put_addr_op.value = MAGIC;
+ put_addr_op.addr = data_start;
+ put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /*
+ * Read memory that was just written to, confirming that data
+ * previously written (MAGIC) is present.
+ */
+ get_addr_op.value = 0;
+ get_addr_op.addr = data_start;
+ get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ EXPECT_EQ(get_addr_op.value, MAGIC);
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /* Start page removal by requesting change of page type to PT_TRIM. */
+ memset(&ioc, 0, sizeof(ioc));
+ memset(&secinfo, 0, sizeof(secinfo));
+
+ secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+ ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+ ioc.length = PAGE_SIZE;
+ ioc.secinfo = (unsigned long)&secinfo;
+
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+ errno_save = ret == -1 ? errno : 0;
+
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(errno_save, 0);
+ EXPECT_EQ(ioc.result, 0);
+ EXPECT_EQ(ioc.count, 4096);
+
+ /*
+ * Read from page that was just removed.
+ */
+ get_addr_op.value = 0;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ /*
+ * From kernel perspective the page is present but according to SGX the
+ * page should not be accessible so a #PF with SGX bit set is
+ * expected.
+ */
+
+ EXPECT_EQ(self->run.function, ERESUME);
+ EXPECT_EQ(self->run.exception_vector, 14);
+ EXPECT_EQ(self->run.exception_error_code, 0x8005);
+ EXPECT_EQ(self->run.exception_addr, data_start);
+}
+
+/*
+ * Request enclave page removal and correctly follow with
+ * EACCEPT but do not follow with removal ioctl() but instead a read attempt
+ * to removed page is made from within the enclave.
+ */
+TEST_F(enclave, remove_added_page_invalid_access_after_eaccept)
+{
+ struct encl_op_get_from_addr get_addr_op;
+ struct encl_op_put_to_addr put_addr_op;
+ struct encl_op_eaccept eaccept_op;
+ struct sgx_enclave_modt ioc;
+ struct sgx_secinfo secinfo;
+ unsigned long data_start;
+ int ret, errno_save;
+
+ ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+ memset(&self->run, 0, sizeof(self->run));
+ self->run.tcs = self->encl.encl_base;
+
+ /*
+ * Hardware (SGX2) and kernel support is needed for this test. Start
+ * with check that test has a chance of succeeding.
+ */
+ memset(&ioc, 0, sizeof(ioc));
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+
+ if (ret == -1) {
+ if (errno == ENOTTY)
+ SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+ else if (errno == ENODEV)
+ SKIP(return, "System does not support SGX2");
+ }
+
+ /*
+ * Invalid parameters were provided during sanity check,
+ * expect command to fail.
+ */
+ EXPECT_EQ(ret, -1);
+
+ /*
+ * Page that will be removed is the second data page in the .data
+ * segment. This forms part of the local encl_buffer within the
+ * enclave.
+ */
+ data_start = self->encl.encl_base +
+ encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+ /*
+ * Sanity check that page at @data_start is writable before
+ * removing it.
+ *
+ * Start by writing MAGIC to test page.
+ */
+ put_addr_op.value = MAGIC;
+ put_addr_op.addr = data_start;
+ put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /*
+ * Read memory that was just written to, confirming that data
+ * previously written (MAGIC) is present.
+ */
+ get_addr_op.value = 0;
+ get_addr_op.addr = data_start;
+ get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ EXPECT_EQ(get_addr_op.value, MAGIC);
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /* Start page removal by requesting change of page type to PT_TRIM. */
+ memset(&ioc, 0, sizeof(ioc));
+ memset(&secinfo, 0, sizeof(secinfo));
+
+ secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+ ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+ ioc.length = PAGE_SIZE;
+ ioc.secinfo = (unsigned long)&secinfo;
+
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &ioc);
+ errno_save = ret == -1 ? errno : 0;
+
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(errno_save, 0);
+ EXPECT_EQ(ioc.result, 0);
+ EXPECT_EQ(ioc.count, 4096);
+
+ eaccept_op.epc_addr = (unsigned long)data_start;
+ eaccept_op.ret = 0;
+ eaccept_op.flags = SGX_SECINFO_TRIM | SGX_SECINFO_MODIFIED;
+ eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+ EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+ EXPECT_EQ(eaccept_op.ret, 0);
+
+ /* Skip ioctl() to remove page. */
+
+ /*
+ * Read from page that was just removed.
+ */
+ get_addr_op.value = 0;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ /*
+ * From kernel perspective the page is present but according to SGX the
+ * page should not be accessible so a #PF with SGX bit set is
+ * expected.
+ */
+
+ EXPECT_EQ(self->run.function, ERESUME);
+ EXPECT_EQ(self->run.exception_vector, 14);
+ EXPECT_EQ(self->run.exception_error_code, 0x8005);
+ EXPECT_EQ(self->run.exception_addr, data_start);
+}
+
TEST_HARNESS_MAIN
--
2.35.1
From: Reinette Chatre <reinette.chatre(a)intel.com>
Removing a page from an initialized enclave involves three steps:
first the user requests changing the page type to SGX_PAGE_TYPE_TRIM
via an ioctl(), on success the ENCLU[EACCEPT] instruction needs to be
run from within the enclave to accept the page removal, finally the
user requests page removal to be completed via an ioctl(). Only after
acceptance (ENCLU[EACCEPT]) from within the enclave can the kernel
remove the page from a running enclave.
Test the behavior when the user's request to change the page type
succeeds, but the ENCLU[EACCEPT] instruction is not run before the
ioctl() requesting page removal is run. This should not be permitted.
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
tools/testing/selftests/sgx/main.c | 116 +++++++++++++++++++++++++++++
1 file changed, 116 insertions(+)
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index f9872c6746a3..82902dab96bc 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -1453,4 +1453,120 @@ TEST_F(enclave, tcs_create)
munmap(addr, 3 * PAGE_SIZE);
}
+/*
+ * Ensure sane behavior if user requests page removal, does not run
+ * EACCEPT from within enclave but still attempts to finalize page removal
+ * with the SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl(). The latter should fail
+ * because the removal was not EACCEPTed from within the enclave.
+ */
+TEST_F(enclave, remove_added_page_no_eaccept)
+{
+ struct sgx_enclave_remove_pages remove_ioc;
+ struct encl_op_get_from_addr get_addr_op;
+ struct encl_op_put_to_addr put_addr_op;
+ struct sgx_enclave_modt modt_ioc;
+ struct sgx_secinfo secinfo;
+ unsigned long data_start;
+ int ret, errno_save;
+
+ ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+ memset(&self->run, 0, sizeof(self->run));
+ self->run.tcs = self->encl.encl_base;
+
+ /*
+ * Hardware (SGX2) and kernel support is needed for this test. Start
+ * with check that test has a chance of succeeding.
+ */
+ memset(&modt_ioc, 0, sizeof(modt_ioc));
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+
+ if (ret == -1) {
+ if (errno == ENOTTY)
+ SKIP(return, "Kernel does not support SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl()");
+ else if (errno == ENODEV)
+ SKIP(return, "System does not support SGX2");
+ }
+
+ /*
+ * Invalid parameters were provided during sanity check,
+ * expect command to fail.
+ */
+ EXPECT_EQ(ret, -1);
+
+ /*
+ * Page that will be removed is the second data page in the .data
+ * segment. This forms part of the local encl_buffer within the
+ * enclave.
+ */
+ data_start = self->encl.encl_base +
+ encl_get_data_offset(&self->encl) + PAGE_SIZE;
+
+ /*
+ * Sanity check that page at @data_start is writable before
+ * removing it.
+ *
+ * Start by writing MAGIC to test page.
+ */
+ put_addr_op.value = MAGIC;
+ put_addr_op.addr = data_start;
+ put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /*
+ * Read memory that was just written to, confirming that data
+ * previously written (MAGIC) is present.
+ */
+ get_addr_op.value = 0;
+ get_addr_op.addr = data_start;
+ get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ EXPECT_EQ(get_addr_op.value, MAGIC);
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /* Start page removal by requesting change of page type to PT_TRIM */
+ memset(&modt_ioc, 0, sizeof(modt_ioc));
+ memset(&secinfo, 0, sizeof(secinfo));
+
+ secinfo.flags = SGX_PAGE_TYPE_TRIM << 8;
+ modt_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+ modt_ioc.length = PAGE_SIZE;
+ modt_ioc.secinfo = (unsigned long)&secinfo;
+
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_MODIFY_TYPE, &modt_ioc);
+ errno_save = ret == -1 ? errno : 0;
+
+ EXPECT_EQ(ret, 0);
+ EXPECT_EQ(errno_save, 0);
+ EXPECT_EQ(modt_ioc.result, 0);
+ EXPECT_EQ(modt_ioc.count, 4096);
+
+ /* Skip EACCEPT */
+
+ /* Send final ioctl() to complete page removal */
+ memset(&remove_ioc, 0, sizeof(remove_ioc));
+
+ remove_ioc.offset = encl_get_data_offset(&self->encl) + PAGE_SIZE;
+ remove_ioc.length = PAGE_SIZE;
+
+ ret = ioctl(self->encl.fd, SGX_IOC_ENCLAVE_REMOVE_PAGES, &remove_ioc);
+ errno_save = ret == -1 ? errno : 0;
+
+ /* Operation not permitted since EACCEPT was omitted. */
+ EXPECT_EQ(ret, -1);
+ EXPECT_EQ(errno_save, EPERM);
+ EXPECT_EQ(remove_ioc.count, 0);
+}
+
TEST_HARNESS_MAIN
--
2.35.1
From: Reinette Chatre <reinette.chatre(a)intel.com>
The Thread Control Structure (TCS) contains meta-data used by the
hardware to save and restore thread specific information when
entering/exiting the enclave. A TCS can be added to an initialized
enclave by first adding a new regular enclave page, initializing the
content of the new page from within the enclave, and then changing that
page's type to a TCS.
Support the initialization of a TCS from within the enclave.
The variable information needed that should be provided from outside
the enclave is the address of the TCS, address of the State Save Area
(SSA), and the entry point that the thread should use to enter the
enclave. With this information provided all needed fields of a TCS
can be initialized.
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
tools/testing/selftests/sgx/defines.h | 8 +++++++
tools/testing/selftests/sgx/test_encl.c | 30 +++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/tools/testing/selftests/sgx/defines.h b/tools/testing/selftests/sgx/defines.h
index b638eb98c80c..d8587c971941 100644
--- a/tools/testing/selftests/sgx/defines.h
+++ b/tools/testing/selftests/sgx/defines.h
@@ -26,6 +26,7 @@ enum encl_op_type {
ENCL_OP_NOP,
ENCL_OP_EACCEPT,
ENCL_OP_EMODPE,
+ ENCL_OP_INIT_TCS_PAGE,
ENCL_OP_MAX,
};
@@ -68,4 +69,11 @@ struct encl_op_emodpe {
uint64_t flags;
};
+struct encl_op_init_tcs_page {
+ struct encl_op_header header;
+ uint64_t tcs_page;
+ uint64_t ssa;
+ uint64_t entry;
+};
+
#endif /* DEFINES_H */
diff --git a/tools/testing/selftests/sgx/test_encl.c b/tools/testing/selftests/sgx/test_encl.c
index 5b6c65331527..c0d6397295e3 100644
--- a/tools/testing/selftests/sgx/test_encl.c
+++ b/tools/testing/selftests/sgx/test_encl.c
@@ -57,6 +57,35 @@ static void *memcpy(void *dest, const void *src, size_t n)
return dest;
}
+static void *memset(void *dest, int c, size_t n)
+{
+ size_t i;
+
+ for (i = 0; i < n; i++)
+ ((char *)dest)[i] = c;
+
+ return dest;
+}
+
+static void do_encl_init_tcs_page(void *_op)
+{
+ struct encl_op_init_tcs_page *op = _op;
+ void *tcs = (void *)op->tcs_page;
+ uint32_t val_32;
+
+ memset(tcs, 0, 16); /* STATE and FLAGS */
+ memcpy(tcs + 16, &op->ssa, 8); /* OSSA */
+ memset(tcs + 24, 0, 4); /* CSSA */
+ val_32 = 1;
+ memcpy(tcs + 28, &val_32, 4); /* NSSA */
+ memcpy(tcs + 32, &op->entry, 8); /* OENTRY */
+ memset(tcs + 40, 0, 24); /* AEP, OFSBASE, OGSBASE */
+ val_32 = 0xFFFFFFFF;
+ memcpy(tcs + 64, &val_32, 4); /* FSLIMIT */
+ memcpy(tcs + 68, &val_32, 4); /* GSLIMIT */
+ memset(tcs + 72, 0, 4024); /* Reserved */
+}
+
static void do_encl_op_put_to_buf(void *op)
{
struct encl_op_put_to_buf *op2 = op;
@@ -100,6 +129,7 @@ void encl_body(void *rdi, void *rsi)
do_encl_op_nop,
do_encl_eaccept,
do_encl_emodpe,
+ do_encl_init_tcs_page,
};
struct encl_op_header *op = (struct encl_op_header *)rdi;
--
2.35.1
From: Reinette Chatre <reinette.chatre(a)intel.com>
The test enclave (test_encl.elf) is built with two initialized
Thread Control Structures (TCS) included in the binary. Both TCS are
initialized with the same entry point, encl_entry, that correctly
computes the absolute address of the stack based on the stack of each
TCS that is also built into the binary.
A new TCS can be added dynamically to the enclave and requires to be
initialized with an entry point used to enter the enclave. Since the
existing entry point, encl_entry, assumes that the TCS and its stack
exists at particular offsets within the binary it is not able to handle
a dynamically added TCS and its stack.
Introduce a new entry point, encl_dyn_entry, that initializes the
absolute address of that thread's stack to the address immediately
preceding the TCS itself. It is now possible to dynamically add a
contiguous memory region to the enclave with the new stack preceding
the new TCS. With the new TCS initialized with encl_dyn_entry as entry
point the absolute address of the stack is computed correctly on entry.
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
tools/testing/selftests/sgx/test_encl_bootstrap.S | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/sgx/test_encl_bootstrap.S b/tools/testing/selftests/sgx/test_encl_bootstrap.S
index 82fb0dfcbd23..03ae0f57e29d 100644
--- a/tools/testing/selftests/sgx/test_encl_bootstrap.S
+++ b/tools/testing/selftests/sgx/test_encl_bootstrap.S
@@ -45,6 +45,12 @@ encl_entry:
# TCS #2. By adding the value of encl_stack to it, we get
# the absolute address for the stack.
lea (encl_stack)(%rbx), %rax
+ jmp encl_entry_core
+encl_dyn_entry:
+ # Entry point for dynamically created TCS page expected to follow
+ # its stack directly.
+ lea -1(%rbx), %rax
+encl_entry_core:
xchg %rsp, %rax
push %rax
--
2.35.1
From: Reinette Chatre <reinette.chatre(a)intel.com>
Enclave pages can be added to an initialized enclave when an address
belonging to the enclave but without a backing page is accessed from
within the enclave.
Accessing memory without a backing enclave page from within an enclave
can be in different ways:
1) Pre-emptively run ENCLU[EACCEPT]. Since the addition of a page
always needs to be accepted by the enclave via ENCLU[EACCEPT] this
flow is efficient since the first execution of ENCLU[EACCEPT]
triggers the addition of the page and when execution returns to the
same instruction the second execution would be successful as an
acceptance of the page.
2) A direct read or write. The flow where a direct read or write
triggers the page addition execution cannot resume from the
instruction (read/write) that triggered the fault but instead
the enclave needs to be entered at a different entry point to
run needed ENCLU[EACCEPT] before execution can return to the
original entry point and the read/write instruction that faulted.
Add tests for both flows.
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
tools/testing/selftests/sgx/main.c | 243 +++++++++++++++++++++++++++++
1 file changed, 243 insertions(+)
diff --git a/tools/testing/selftests/sgx/main.c b/tools/testing/selftests/sgx/main.c
index ea5f2e064687..13542c5de66f 100644
--- a/tools/testing/selftests/sgx/main.c
+++ b/tools/testing/selftests/sgx/main.c
@@ -86,6 +86,15 @@ static bool vdso_get_symtab(void *addr, struct vdso_symtab *symtab)
return true;
}
+static inline int sgx2_supported(void)
+{
+ unsigned int eax, ebx, ecx, edx;
+
+ __cpuid_count(SGX_CPUID, 0x0, eax, ebx, ecx, edx);
+
+ return eax & 0x2;
+}
+
static unsigned long elf_sym_hash(const char *name)
{
unsigned long h = 0, high;
@@ -863,4 +872,238 @@ TEST_F(enclave, epcm_permissions)
EXPECT_EQ(self->run.exception_addr, 0);
}
+/*
+ * Test the addition of pages to an initialized enclave via writing to
+ * a page belonging to the enclave's address space but was not added
+ * during enclave creation.
+ */
+TEST_F(enclave, augment)
+{
+ struct encl_op_get_from_addr get_addr_op;
+ struct encl_op_put_to_addr put_addr_op;
+ struct encl_op_eaccept eaccept_op;
+ size_t total_size = 0;
+ void *addr;
+ int i;
+
+ if (!sgx2_supported())
+ SKIP(return, "SGX2 not supported");
+
+ ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+ memset(&self->run, 0, sizeof(self->run));
+ self->run.tcs = self->encl.encl_base;
+
+ for (i = 0; i < self->encl.nr_segments; i++) {
+ struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+ total_size += seg->size;
+ }
+
+ /*
+ * Actual enclave size is expected to be larger than the loaded
+ * test enclave since enclave size must be a power of 2 in bytes
+ * and test_encl does not consume it all.
+ */
+ EXPECT_LT(total_size + PAGE_SIZE, self->encl.encl_size);
+
+ /*
+ * Create memory mapping for the page that will be added. New
+ * memory mapping is for one page right after all existing
+ * mappings.
+ */
+ addr = mmap((void *)self->encl.encl_base + total_size, PAGE_SIZE,
+ PROT_READ | PROT_WRITE | PROT_EXEC,
+ MAP_SHARED | MAP_FIXED, self->encl.fd, 0);
+ EXPECT_NE(addr, MAP_FAILED);
+
+ self->run.exception_vector = 0;
+ self->run.exception_error_code = 0;
+ self->run.exception_addr = 0;
+
+ /*
+ * Attempt to write to the new page from within enclave.
+ * Expected to fail since page is not (yet) part of the enclave.
+ * The first #PF will trigger the addition of the page to the
+ * enclave, but since the new page needs an EACCEPT from within the
+ * enclave before it can be used it would not be possible
+ * to successfully return to the failing instruction. This is the
+ * cause of the second #PF captured here having the SGX bit set,
+ * it is from hardware preventing the page from being used.
+ */
+ put_addr_op.value = MAGIC;
+ put_addr_op.addr = (unsigned long)addr;
+ put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+ EXPECT_EQ(self->run.function, ERESUME);
+ EXPECT_EQ(self->run.exception_vector, 14);
+ EXPECT_EQ(self->run.exception_addr, (unsigned long)addr);
+
+ if (self->run.exception_error_code == 0x6) {
+ munmap(addr, PAGE_SIZE);
+ SKIP(return, "Kernel does not support adding pages to initialized enclave");
+ }
+
+ EXPECT_EQ(self->run.exception_error_code, 0x8007);
+
+ self->run.exception_vector = 0;
+ self->run.exception_error_code = 0;
+ self->run.exception_addr = 0;
+
+ /* Handle AEX by running EACCEPT from new entry point. */
+ self->run.tcs = self->encl.encl_base + PAGE_SIZE;
+
+ eaccept_op.epc_addr = self->encl.encl_base + total_size;
+ eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+ eaccept_op.ret = 0;
+ eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+ EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+ EXPECT_EQ(eaccept_op.ret, 0);
+
+ /* Can now return to main TCS to resume execution. */
+ self->run.tcs = self->encl.encl_base;
+
+ EXPECT_EQ(vdso_sgx_enter_enclave((unsigned long)&put_addr_op, 0, 0,
+ ERESUME, 0, 0,
+ &self->run),
+ 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /*
+ * Read memory from newly added page that was just written to,
+ * confirming that data previously written (MAGIC) is present.
+ */
+ get_addr_op.value = 0;
+ get_addr_op.addr = (unsigned long)addr;
+ get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ EXPECT_EQ(get_addr_op.value, MAGIC);
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ munmap(addr, PAGE_SIZE);
+}
+
+/*
+ * Test for the addition of pages to an initialized enclave via a
+ * pre-emptive run of EACCEPT on page to be added.
+ */
+TEST_F(enclave, augment_via_eaccept)
+{
+ struct encl_op_get_from_addr get_addr_op;
+ struct encl_op_put_to_addr put_addr_op;
+ struct encl_op_eaccept eaccept_op;
+ size_t total_size = 0;
+ void *addr;
+ int i;
+
+ if (!sgx2_supported())
+ SKIP(return, "SGX2 not supported");
+
+ ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
+
+ memset(&self->run, 0, sizeof(self->run));
+ self->run.tcs = self->encl.encl_base;
+
+ for (i = 0; i < self->encl.nr_segments; i++) {
+ struct encl_segment *seg = &self->encl.segment_tbl[i];
+
+ total_size += seg->size;
+ }
+
+ /*
+ * Actual enclave size is expected to be larger than the loaded
+ * test enclave since enclave size must be a power of 2 in bytes while
+ * test_encl does not consume it all.
+ */
+ EXPECT_LT(total_size + PAGE_SIZE, self->encl.encl_size);
+
+ /*
+ * mmap() a page at end of existing enclave to be used for dynamic
+ * EPC page.
+ */
+
+ addr = mmap((void *)self->encl.encl_base + total_size, PAGE_SIZE,
+ PROT_READ | PROT_WRITE | PROT_EXEC, MAP_SHARED | MAP_FIXED,
+ self->encl.fd, 0);
+ EXPECT_NE(addr, MAP_FAILED);
+
+ self->run.exception_vector = 0;
+ self->run.exception_error_code = 0;
+ self->run.exception_addr = 0;
+
+ /*
+ * Run EACCEPT on new page to trigger the #PF->EAUG->EACCEPT(again
+ * without a #PF). All should be transparent to userspace.
+ */
+ eaccept_op.epc_addr = self->encl.encl_base + total_size;
+ eaccept_op.flags = SGX_SECINFO_R | SGX_SECINFO_W | SGX_SECINFO_REG | SGX_SECINFO_PENDING;
+ eaccept_op.ret = 0;
+ eaccept_op.header.type = ENCL_OP_EACCEPT;
+
+ EXPECT_EQ(ENCL_CALL(&eaccept_op, &self->run, true), 0);
+
+ if (self->run.exception_vector == 14 &&
+ self->run.exception_error_code == 4 &&
+ self->run.exception_addr == self->encl.encl_base + total_size) {
+ munmap(addr, PAGE_SIZE);
+ SKIP(return, "Kernel does not support adding pages to initialized enclave");
+ }
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+ EXPECT_EQ(eaccept_op.ret, 0);
+
+ /*
+ * New page should be accessible from within enclave - attempt to
+ * write to it.
+ */
+ put_addr_op.value = MAGIC;
+ put_addr_op.addr = (unsigned long)addr;
+ put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
+
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ /*
+ * Read memory from newly added page that was just written to,
+ * confirming that data previously written (MAGIC) is present.
+ */
+ get_addr_op.value = 0;
+ get_addr_op.addr = (unsigned long)addr;
+ get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
+
+ EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
+
+ EXPECT_EQ(get_addr_op.value, MAGIC);
+ EXPECT_EEXIT(&self->run);
+ EXPECT_EQ(self->run.exception_vector, 0);
+ EXPECT_EQ(self->run.exception_error_code, 0);
+ EXPECT_EQ(self->run.exception_addr, 0);
+
+ munmap(addr, PAGE_SIZE);
+}
+
TEST_HARNESS_MAIN
--
2.35.1
fix following coccicheck warning:
tools/testing/selftests/vm/userfaultfd.c:556:23-24:
WARNING this kind of initialization is deprecated
`unsigned long page_nr = *(&page_nr)` has the same form of
uninitialized_var() macro. I remove the redundant assignement. It has
been tested with gcc (Debian 8.3.0-6) 8.3.0.
The patch which removed uninitialized_var() is:
https://lore.kernel.org/all/20121028102007.GA7547@gmail.com/
And there is very few "/* GCC */" comments in the Linux kernel code now.
Signed-off-by: Guo Zhengkui <guozhengkui(a)vivo.com>
---
tools/testing/selftests/vm/userfaultfd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index fe404398c65a..203c4a2c2109 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -553,7 +553,7 @@ static void continue_range(int ufd, __u64 start, __u64 len)
static void *locking_thread(void *arg)
{
unsigned long cpu = (unsigned long) arg;
- unsigned long page_nr = *(&(page_nr)); /* uninitialized warning */
+ unsigned long page_nr;
unsigned long long count;
if (!(bounces & BOUNCE_RANDOM)) {
--
2.20.1
When building the vm selftests using clang, some errors are seen due to
having headers in the compilation command:
clang -Wall -I ../../../../usr/include -no-pie gup_test.c ../../../../mm/gup_test.h -lrt -lpthread -o .../tools/testing/selftests/vm/gup_test
clang: error: cannot specify -o when generating multiple output files
make[1]: *** [../lib.mk:146: .../tools/testing/selftests/vm/gup_test] Error 1
Rework to add the header files to LOCAL_HDRS before including ../lib.mk,
since the dependency is evaluated in '$(OUTPUT)/%:%.c $(LOCAL_HDRS)' in
file lib.mk.
Signed-off-by: Yosry Ahmed <yosryahmed(a)google.com>
---
This patch was inspired by:
https://lore.kernel.org/lkml/20211105162530.3307666-1-anders.roxell@linaro.…
---
tools/testing/selftests/vm/Makefile | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
index 1607322a112c..a14b5b800897 100644
--- a/tools/testing/selftests/vm/Makefile
+++ b/tools/testing/selftests/vm/Makefile
@@ -1,6 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for vm selftests
+LOCAL_HDRS += $(selfdir)/vm/local_config.h $(top_srcdir)/mm/gup_test.h
+
include local_config.mk
uname_M := $(shell uname -m 2>/dev/null || echo not)
@@ -140,10 +142,6 @@ endif
$(OUTPUT)/mlock-random-test $(OUTPUT)/memfd_secret: LDLIBS += -lcap
-$(OUTPUT)/gup_test: ../../../../mm/gup_test.h
-
-$(OUTPUT)/hmm-tests: local_config.h
-
# HMM_EXTRA_LIBS may get set in local_config.mk, or it may be left empty.
$(OUTPUT)/hmm-tests: LDLIBS += $(HMM_EXTRA_LIBS)
--
2.35.1.616.g0bdcbb4464-goog
I ran into the same problem when running:
make -C tools/testing/selftests TARGETS=vm
I was going to send a fix but I found this one deep in the mailing list.
Reviewed-by: Yosry Ahmed <yosryahmed(a)google.com>
Add kselftest_install directory to the .gitignore which is created while
creation of tar ball of objects:
make -C tools/testing/selftests gen_tar
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Changes in V2:
Break up the patch in individual test patches
Remove changes related to net selftest
---
tools/testing/selftests/.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/.gitignore b/tools/testing/selftests/.gitignore
index 055a5019b13c..cb24124ac5b9 100644
--- a/tools/testing/selftests/.gitignore
+++ b/tools/testing/selftests/.gitignore
@@ -3,6 +3,7 @@ gpiogpio-event-mon
gpiogpio-hammer
gpioinclude/
gpiolsgpio
+kselftest_install/
tpm2/SpaceTest.log
# Python bytecode and cache
--
2.30.2
Build of bpf and tc-testing selftests fails when the relative path of
the build directory is specified.
make -C tools/testing/selftests O=build0
make[1]: Entering directory '/linux_mainline/tools/testing/selftests/bpf'
../../../scripts/Makefile.include:4: *** O=build0 does not exist. Stop.
make[1]: Entering directory '/linux_mainline/tools/testing/selftests/tc-testing'
../../../scripts/Makefile.include:4: *** O=build0 does not exist. Stop.
Makefiles of bpf and tc-testing include scripts/Makefile.include file.
This file has sanity checking inside it which checks the output path.
The output path is not relative to the bpf or tc-testing. The sanity
check fails. Expand the output path to get rid of this error. The fix is
the same as mentioned in commit 150a27328b68 ("bpf, preload: Fix build
when $(O) points to a relative path").
Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
---
Changes in V2:
Add more explaination to the commit message.
Support make install as well.
---
tools/testing/selftests/Makefile | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 4eda7c7c15694..6a5c25fcc9cfc 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -178,6 +178,7 @@ all: khdr
BUILD_TARGET=$$BUILD/$$TARGET; \
mkdir $$BUILD_TARGET -p; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET \
+ O=$(abs_objtree) \
$(if $(FORCE_TARGETS),|| exit); \
ret=$$((ret * $$?)); \
done; exit $$ret;
@@ -185,7 +186,8 @@ all: khdr
run_tests: all
@for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
- $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests;\
+ $(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET run_tests \
+ O=$(abs_objtree); \
done;
hotplug:
@@ -236,6 +238,7 @@ ifdef INSTALL_PATH
for TARGET in $(TARGETS); do \
BUILD_TARGET=$$BUILD/$$TARGET; \
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET INSTALL_PATH=$(INSTALL_PATH)/$$TARGET install \
+ O=$(abs_objtree) \
$(if $(FORCE_TARGETS),|| exit); \
ret=$$((ret * $$?)); \
done; exit $$ret;
--
2.30.2
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
[ Upstream commit 6fec1ab67f8d60704cc7de64abcfd389ab131542 ]
The PREEMPT_RT patchset does not use do_softirq() function thus trying
to filter for do_softirq fails for such kernel:
echo do_softirq
ftracetest: 81: echo: echo: I/O error
Choose some other visible function for the test. The function does not
have to be actually executed during the test, because it is only testing
filter API interface.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)canonical.com>
Reviewed-by: Shuah Khan <skhan(a)linuxfoundation.org>
Acked-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
.../selftests/ftrace/test.d/ftrace/func_set_ftrace_file.tc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_set_ftrace_file.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_set_ftrace_file.tc
index 68e7a48f5828e..412e5c1f13ca6 100644
--- a/tools/testing/selftests/ftrace/test.d/ftrace/func_set_ftrace_file.tc
+++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_set_ftrace_file.tc
@@ -33,7 +33,7 @@ do_reset
FILTER=set_ftrace_filter
FUNC1="schedule"
-FUNC2="do_softirq"
+FUNC2="scheduler_tick"
ALL_FUNCS="#### all functions enabled ####"
--
2.34.1
On Wed, Mar 02, 2022 at 10:12:09AM +0000, Marc Zyngier wrote:
> On Tue, 01 Mar 2022 22:56:41 +0000,
> Qian Cai <quic_qiancai(a)quicinc.com> wrote:
> >
> > On Mon, Feb 07, 2022 at 03:20:32PM +0000, Mark Brown wrote:
> > > Since all the fields in the main ID registers are 4 bits wide we have up
> > > until now not bothered specifying the width in the code. Since we now
> > > wish to use this mechanism to enumerate features from the floating point
> > > feature registers which do not follow this pattern add a width to the
> > > table. This means updating all the existing table entries but makes it
> > > less likely that we run into issues in future due to implicitly assuming
> > > a 4 bit width.
> > >
> > > Signed-off-by: Mark Brown <broonie(a)kernel.org>
> >
> > Do we leave this one alone on purpose?
> >
> > .desc = "GIC system register CPU interface",
> > .capability = ARM64_HAS_SYSREG_GIC_CPUIF,
> > .type = ARM64_CPUCAP_STRICT_BOOT_CPU_FEATURE,
> > .matches = has_useable_gicv3_cpuif,
> > .sys_reg = SYS_ID_AA64PFR0_EL1,
> > .field_pos = ID_AA64PFR0_GIC_SHIFT,
> > .sign = FTR_UNSIGNED,
> > .min_field_value = 1,
> >
> > Since width == 0, it will generate an undefined behavior.
>
> I don't think that's on purpose, and we should definitely address
> this. Maybe we should have a warning if we spot an occurrence of
> .width being 0.
We should indeed have a check. Alternatively, assume the default to be 4
and convert all 0s to 4 during boot (less patch churn).
--
Catalin
This series provides initial support for the ARMv9 Scalable Matrix
Extension (SME). SME takes the approach used for vectors in SVE and
extends this to provide architectural support for matrix operations. A
more detailed overview can be found in [1].
For the kernel SME can be thought of as a series of features which are
intended to be used together by applications but operate mostly
orthogonally:
- The ZA matrix register.
- Streaming mode, in which ZA can be accessed and a subset of SVE
features are available.
- A second vector length, used for streaming mode SVE and ZA and
controlled using a similar interface to that for SVE.
- TPIDR2, a new userspace controllable system register intended for use
by the C library for storing context related to the ZA ABI.
A substantial part of the series is dedicated to refactoring the
existing SVE support so that we don't need to duplicate code for
handling vector lengths and the SVE registers, this involves creating an
array of vector types and making the users take the vector type as a
parameter. I'm not 100% happy with this but wasn't able to come up with
anything better, duplicating code definitely felt like a bad idea so
this felt like the least bad thing. If this approach makes sense to
people it might make sense to split this off into a separate series
and/or merge it while the rest is pending review to try to make things a
little more digestable, the series is very large so it'd probably make
things easier to digest if some of the preparatory refactoring could be
merged before the rest is ready.
One feature of the architecture of particular note is that switching
to and from streaming mode may change the size of and invalidate the
contents of the SVE registers, and when in streaming mode the FFR is not
accessible. This complicates aspects of the ABI like signal handling
and ptrace.
This initial implementation is mainly intended to get the ABI in place,
there are several areas which will be worked on going forwards - some of
these will be blockers, others could be handled in followup serieses:
- SME is currently not supported for KVM guests, this will be done as a
followup series. A host system can use SME and run KVM guests but
SME is not available in the guests.
- The KVM host support is done in a very simplistic way, were anyone to
attempt to use it in production there would be performance impacts on
hosts with SME support. As part of this we also add enumeration of
fine grained traps.
- There is not currently ptrace or signal support TPIDR2, this will be
done as a followup series.
- No support is currently provided for scheduler control of SME or SME
applications, given the size of the SME register state the context
switch overhead may be noticable so this may be needed especially for
real time applications. Similar concerns already exist for larger
SVE vector lengths but are amplified for SME, particularly as the
vector length increases.
- There has been no work on optimising the performance of anything the
kernel does.
It is not expected that any systems will be encountered that support SME
but not SVE, SME is an ARMv9 feature and SVE is mandatory for ARMv9.
The code attempts to handle any such systems that are encountered but
this hasn't been tested extensively.
v11:
- Rebase onto v5.17-rc3.
- Provide a sme-inst.h to collect manual encodings in kselftest.
v10:
- Actually do the rebase of fixups from the previous version into
relevant patches.
v9:
- Remove defensive programming around IS_ENABLED() and FGT in KVM code.
- Fix naming of TPIDR2 FGT register bit.
- Add patches making handling of floating point register bits more
consistent (also sent as separate series).
- Drop now unused enumeration of fine grained traps.
v8:
- Rebase onto v5.17-rc1.
- Support interoperation with KVM, SME is disabled for KVM guests with
minimal handling for cleaning up SME state when entering and leaving
the guest.
- Document and implement that signal handlers are invoked with ZA and
streaming mode disabled.
- Use the RDSVL instruction introduced in EAC2 of the architecture to
obtain the streaming mode vector length during enumeration, ZA state
loading/saving and in test programs.
- Store a pointer to SVCR in fpsimd_last_state and use it in fpsimd_save()
for interoperation with KVM.
- Add a test case sme_trap_no_sm checking that we generate a SIGILL
when using an instruction that requires streaming mode without
enabling it.
- Add basic ZA context form validation to testcases helper library.
- Move signal tests over to validating streaming VL from ZA information.
- Pulled in patch removing ARRAY_SIZE() so that kselftest builds
cleanly and to avoid trivial conflicts.
v7:
- Rebase onto v5.16-rc3.
- Reduce indentation when supporting custom triggers for signal tests
as suggested by Catalin.
- Change to specifying a width for all CPU features rather than adding
single bit specific infrastructure.
- Don't require zeroing of non-shared SVE state during syscalls.
v6:
- Rebase onto v5.16-rc1.
- Return to disabling TIF_SVE on kernel entry even if we have SME
state, this avoids the need for KVM to handle the case where TIF_SVE
is set on guest entry.
- Add syscall-abi.h to SME updates to syscall-abi, mistakenly omitted
from commit.
v5:
- Rebase onto currently merged SVE and kselftest patches.
- Add support for the FA64 option, introduced in the recently published
EAC1 update to the specification.
- Pull in test program for the syscall ABI previously sent separately
with some revisions and add coverage for the SME ABI.
- Fix checking for options with 1 bit fields in ID_AA64SMFR0_EL1.
- Minor fixes and clarifications to the ABI documentation.
v4:
- Rebase onto merged patches.
- Remove an uneeded NULL check in vec_proc_do_default_vl().
- Include patch to factor out utility routines in kselftests written in
assembler.
- Specify -ffreestanding when building TPIDR2 test.
v3:
- Skip FFR rather than predicate registers in sve_flush_live().
- Don't assume a bool is all zeros in sve_flush_live() as per AAPCS.
- Don't redundantly specify a zero index when clearing FFR.
v2:
- Fix several issues with !SME and !SVE configurations.
- Preserve TPIDR2 when creating a new thread/process unless
CLONE_SETTLS is set.
- Report traps due to using features in an invalid mode as SIGILL.
- Spell out streaming mode behaviour in SVE ABI documentation more
directly.
- Document TPIDR2 in the ABI document.
- Use SMSTART and SMSTOP rather than read/modify/write sequences.
- Rework logic for exiting streaming mode on syscall.
- Don't needlessly initialise SVCR on access trap.
- Always restore SME VL for userspace if SME traps are disabled.
- Only yield to encourage preemption every 128 iterations in za-test,
otherwise do a getpid(), and validate SVCR after syscall.
- Leave streaming mode disabled except when reading the vector length
in za-test, and disable ZA after detecting a mismatch.
- Add SME support to vlset.
- Clarifications and typo fixes in comments.
- Move sme_alloc() forward declaration back a patch.
[1] https://community.arm.com/developer/ip-products/processors/b/processors-ip-…
Mark Brown (40):
arm64: Define CPACR_EL1_FPEN similarly to other floating point
controls
arm64: Always use individual bits in CPACR floating point enables
arm64: cpufeature: Always specify and use a field width for
capabilities
kselftest/arm64: Remove local ARRAY_SIZE() definitions
kselftest/arm64: signal: Allow tests to be incompatible with features
arm64/sme: Provide ABI documentation for SME
arm64/sme: System register and exception syndrome definitions
arm64/sme: Manually encode SME instructions
arm64/sme: Early CPU setup for SME
arm64/sme: Basic enumeration support
arm64/sme: Identify supported SME vector lengths at boot
arm64/sme: Implement sysctl to set the default vector length
arm64/sme: Implement vector length configuration prctl()s
arm64/sme: Implement support for TPIDR2
arm64/sme: Implement SVCR context switching
arm64/sme: Implement streaming SVE context switching
arm64/sme: Implement ZA context switching
arm64/sme: Implement traps and syscall handling for SME
arm64/sme: Disable ZA and streaming mode when handling signals
arm64/sme: Implement streaming SVE signal handling
arm64/sme: Implement ZA signal handling
arm64/sme: Implement ptrace support for streaming mode SVE registers
arm64/sme: Add ptrace support for ZA
arm64/sme: Disable streaming mode and ZA when flushing CPU state
arm64/sme: Save and restore streaming mode over EFI runtime calls
KVM: arm64: Hide SME system registers from guests
KVM: arm64: Trap SME usage in guest
KVM: arm64: Handle SME host state when running guests
arm64/sme: Provide Kconfig for SME
kselftest/arm64: Add manual encodings for SME instructions
kselftest/arm64: sme: Add SME support to vlset
kselftest/arm64: Add tests for TPIDR2
kselftest/arm64: Extend vector configuration API tests to cover SME
kselftest/arm64: sme: Provide streaming mode SVE stress test
kselftest/arm64: signal: Handle ZA signal context in core code
kselftest/arm64: Add stress test for SME ZA context switching
kselftest/arm64: signal: Add SME signal handling tests
kselftest/arm64: Add streaming SVE to SVE ptrace tests
kselftest/arm64: Add coverage for the ZA ptrace interface
kselftest/arm64: Add SME support to syscall ABI test
Documentation/arm64/elf_hwcaps.rst | 33 +
Documentation/arm64/index.rst | 1 +
Documentation/arm64/sme.rst | 432 +++++++++++++
Documentation/arm64/sve.rst | 70 ++-
arch/arm64/Kconfig | 11 +
arch/arm64/include/asm/cpu.h | 4 +
arch/arm64/include/asm/cpufeature.h | 25 +
arch/arm64/include/asm/el2_setup.h | 64 +-
arch/arm64/include/asm/esr.h | 13 +-
arch/arm64/include/asm/exception.h | 1 +
arch/arm64/include/asm/fpsimd.h | 110 +++-
arch/arm64/include/asm/fpsimdmacros.h | 86 +++
arch/arm64/include/asm/hwcap.h | 8 +
arch/arm64/include/asm/kvm_arm.h | 5 +-
arch/arm64/include/asm/kvm_host.h | 4 +
arch/arm64/include/asm/processor.h | 18 +-
arch/arm64/include/asm/sysreg.h | 67 +-
arch/arm64/include/asm/thread_info.h | 2 +
arch/arm64/include/uapi/asm/hwcap.h | 8 +
arch/arm64/include/uapi/asm/ptrace.h | 69 ++-
arch/arm64/include/uapi/asm/sigcontext.h | 55 +-
arch/arm64/kernel/cpufeature.c | 273 ++++++--
arch/arm64/kernel/cpuinfo.c | 13 +
arch/arm64/kernel/entry-common.c | 11 +
arch/arm64/kernel/entry-fpsimd.S | 36 ++
arch/arm64/kernel/fpsimd.c | 585 ++++++++++++++++--
arch/arm64/kernel/process.c | 28 +-
arch/arm64/kernel/ptrace.c | 356 +++++++++--
arch/arm64/kernel/signal.c | 194 +++++-
arch/arm64/kernel/syscall.c | 34 +-
arch/arm64/kernel/traps.c | 1 +
arch/arm64/kvm/fpsimd.c | 43 +-
arch/arm64/kvm/hyp/include/hyp/switch.h | 4 +-
arch/arm64/kvm/hyp/nvhe/switch.c | 30 +
arch/arm64/kvm/hyp/vhe/switch.c | 15 +-
arch/arm64/kvm/sys_regs.c | 9 +-
arch/arm64/tools/cpucaps | 2 +
include/uapi/linux/elf.h | 2 +
include/uapi/linux/prctl.h | 9 +
kernel/sys.c | 12 +
tools/testing/selftests/arm64/abi/.gitignore | 1 +
tools/testing/selftests/arm64/abi/Makefile | 9 +-
.../selftests/arm64/abi/syscall-abi-asm.S | 69 ++-
.../testing/selftests/arm64/abi/syscall-abi.c | 205 +++++-
.../testing/selftests/arm64/abi/syscall-abi.h | 15 +
tools/testing/selftests/arm64/abi/tpidr2.c | 298 +++++++++
tools/testing/selftests/arm64/fp/.gitignore | 4 +
tools/testing/selftests/arm64/fp/Makefile | 12 +-
tools/testing/selftests/arm64/fp/rdvl-sme.c | 14 +
tools/testing/selftests/arm64/fp/rdvl.S | 10 +
tools/testing/selftests/arm64/fp/rdvl.h | 1 +
tools/testing/selftests/arm64/fp/sme-inst.h | 51 ++
tools/testing/selftests/arm64/fp/ssve-stress | 59 ++
tools/testing/selftests/arm64/fp/sve-ptrace.c | 13 +-
tools/testing/selftests/arm64/fp/sve-test.S | 20 +
tools/testing/selftests/arm64/fp/vec-syscfg.c | 10 +
tools/testing/selftests/arm64/fp/vlset.c | 10 +-
tools/testing/selftests/arm64/fp/za-ptrace.c | 354 +++++++++++
tools/testing/selftests/arm64/fp/za-stress | 59 ++
tools/testing/selftests/arm64/fp/za-test.S | 388 ++++++++++++
.../testing/selftests/arm64/signal/.gitignore | 2 +
.../selftests/arm64/signal/test_signals.h | 5 +
.../arm64/signal/test_signals_utils.c | 40 +-
.../arm64/signal/test_signals_utils.h | 2 +
.../testcases/fake_sigreturn_sme_change_vl.c | 92 +++
.../arm64/signal/testcases/sme_trap_no_sm.c | 38 ++
.../signal/testcases/sme_trap_non_streaming.c | 45 ++
.../arm64/signal/testcases/sme_trap_za.c | 36 ++
.../selftests/arm64/signal/testcases/sme_vl.c | 68 ++
.../arm64/signal/testcases/ssve_regs.c | 129 ++++
.../arm64/signal/testcases/testcases.c | 36 ++
.../arm64/signal/testcases/testcases.h | 3 +-
72 files changed, 4590 insertions(+), 251 deletions(-)
create mode 100644 Documentation/arm64/sme.rst
create mode 100644 tools/testing/selftests/arm64/abi/syscall-abi.h
create mode 100644 tools/testing/selftests/arm64/abi/tpidr2.c
create mode 100644 tools/testing/selftests/arm64/fp/rdvl-sme.c
create mode 100644 tools/testing/selftests/arm64/fp/sme-inst.h
create mode 100644 tools/testing/selftests/arm64/fp/ssve-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-ptrace.c
create mode 100644 tools/testing/selftests/arm64/fp/za-stress
create mode 100644 tools/testing/selftests/arm64/fp/za-test.S
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_sme_change_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_no_sm.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_non_streaming.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_trap_za.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/sme_vl.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/ssve_regs.c
base-commit: dfd42facf1e4ada021b939b4e19c935dcdd55566
--
2.30.2
From: Mike Kravetz <mike.kravetz(a)oracle.com>
[ Upstream commit fda153c89af344d21df281009a9d046cf587ea0f ]
Running the memfd script ./run_hugetlbfs_test.sh will often end in error
as follows:
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
memfd-hugetlb: SEAL-SHRINK
fallocate(ALLOC) failed: No space left on device
./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs
opening: ./mnt/memfd
fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE
test the mfd_fail_write routine maps the file, but does not unmap. As a
result, two hugetlb pages remain reserved for the mapping. When the
fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/memfd/memfd_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 26546892cd545..faab09215c88b 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -373,6 +373,7 @@ static void mfd_fail_write(int fd)
printf("mmap()+mprotect() didn't fail as expected\n");
abort();
}
+ munmap(p, mfd_def_size);
}
/* verify PUNCH_HOLE fails */
--
2.34.1
From: Mike Kravetz <mike.kravetz(a)oracle.com>
[ Upstream commit fda153c89af344d21df281009a9d046cf587ea0f ]
Running the memfd script ./run_hugetlbfs_test.sh will often end in error
as follows:
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
memfd-hugetlb: SEAL-SHRINK
fallocate(ALLOC) failed: No space left on device
./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs
opening: ./mnt/memfd
fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE
test the mfd_fail_write routine maps the file, but does not unmap. As a
result, two hugetlb pages remain reserved for the mapping. When the
fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/memfd/memfd_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 845e5f67b6f02..cf4c5276eb06a 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -416,6 +416,7 @@ static void mfd_fail_write(int fd)
printf("mmap()+mprotect() didn't fail as expected\n");
abort();
}
+ munmap(p, mfd_def_size);
}
/* verify PUNCH_HOLE fails */
--
2.34.1
From: Mike Kravetz <mike.kravetz(a)oracle.com>
[ Upstream commit fda153c89af344d21df281009a9d046cf587ea0f ]
Running the memfd script ./run_hugetlbfs_test.sh will often end in error
as follows:
memfd-hugetlb: CREATE
memfd-hugetlb: BASIC
memfd-hugetlb: SEAL-WRITE
memfd-hugetlb: SEAL-FUTURE-WRITE
memfd-hugetlb: SEAL-SHRINK
fallocate(ALLOC) failed: No space left on device
./run_hugetlbfs_test.sh: line 60: 166855 Aborted (core dumped) ./memfd_test hugetlbfs
opening: ./mnt/memfd
fuse: DONE
If no hugetlb pages have been preallocated, run_hugetlbfs_test.sh will
allocate 'just enough' pages to run the test. In the SEAL-FUTURE-WRITE
test the mfd_fail_write routine maps the file, but does not unmap. As a
result, two hugetlb pages remain reserved for the mapping. When the
fallocate call in the SEAL-SHRINK test attempts allocate all hugetlb
pages, it is short by the two reserved pages.
Fix by making sure to unmap in mfd_fail_write.
Link: https://lkml.kernel.org/r/20220219004340.56478-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/memfd/memfd_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 10baa1652fc2a..a4e520b94e431 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -386,6 +386,7 @@ static void mfd_fail_write(int fd)
printf("mmap()+mprotect() didn't fail as expected\n");
abort();
}
+ munmap(p, mfd_def_size);
}
/* verify PUNCH_HOLE fails */
--
2.34.1
Hello,
The aim of this series is to make resctrl_tests run by using
kselftest framework.
- I modify resctrl_test Makefile and kselftest Makefile,
to enable build/run resctrl_tests by using kselftest framework.
Of course, users can also build/run resctrl_tests without
using framework as before.
- I change the default limited time for resctrl_tests to 120 seconds, to
ensure the resctrl_tests finish in limited time on different environments.
- When resctrl file system is not supported by environment or
resctrl_tests is not run as root, return skip code of kselftest framework.
- If resctrl_tests does not finish in limited time, terminate it as
same as executing ctrl+c that kills parent process and child process.
Difference from v2:
- I reworte changelog of this patch series.
- I added how to use framework to run resctrl to README. [PATCH v3 2/5]
- License has no dependencies on this patch series, I separated from it this patch series to another patch.
https://lore.kernel.org/lkml/20211213100154.180599-1-tan.shaopeng@jp.fujits…
With regard to the limited time, I think 120s is not a problem since some tests have a longer
timeout (e.g. net test is 300s). Please let me know if this is wrong.
Thanks,
Shaopeng Tan (5):
selftests/resctrl: Kill child process before parent process terminates
if SIGTERM is received
selftests/resctrl: Make resctrl_tests run using kselftest framework
selftests/resctrl: Update README about using kselftest framework to
build/run resctrl_tests
selftests/resctrl: Change the default limited time to 120 seconds
selftests/resctrl: Fix resctrl_tests' return code to work with
selftest framework
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/resctrl/Makefile | 20 ++++-------
tools/testing/selftests/resctrl/README | 34 +++++++++++++++++++
.../testing/selftests/resctrl/resctrl_tests.c | 4 +--
tools/testing/selftests/resctrl/resctrl_val.c | 1 +
tools/testing/selftests/resctrl/settings | 1 +
6 files changed, 45 insertions(+), 16 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/settings
--
2.27.0
Hi there,
This series introduces support of eBPF for HID devices.
I have several use cases where eBPF could be interesting for those
input devices:
- simple fixup of report descriptor:
In the HID tree, we have half of the drivers that are "simple" and
that just fix one key or one byte in the report descriptor.
Currently, for users of such devices, the process of fixing them
is long and painful.
With eBPF, we could externalize those fixups in one external repo,
ship various CoRe bpf programs and have those programs loaded at boot
time without having to install a new kernel (and wait 6 months for the
fix to land in the distro kernel)
- Universal Stylus Interface (or any other new fancy feature that
requires a new kernel API)
See [0].
Basically, USI pens are requiring a new kernel API because there are
some channels of communication our HID and input stack are not capable
of. Instead of using hidraw or creating new sysfs or ioctls, we can rely
on eBPF to have the kernel API controlled by the consumer and to not
impact the performances by waking up userspace every time there is an
event.
- Surface Dial
This device is a "puck" from Microsoft, basically a rotary dial with a
push button. The kernel already exports it as such but doesn't handle
the haptic feedback we can get out of it.
Furthermore, that device is not recognized by userspace and so it's a
nice paperwight in the end.
With eBPF, we can morph that device into a mouse, and convert the dial
events into wheel events. Also, we can set/unset the haptic feedback
from userspace. The convenient part of BPF makes it that the kernel
doesn't make any choice that would need to be reverted because that
specific userspace doesn't handle it properly or because that other
one expects it to be different.
- firewall
What if we want to prevent other users to access a specific feature of a
device? (think a possibly bonker firmware update entry popint)
With eBPF, we can intercept any HID command emitted to the device and
validate it or not.
This also allows to sync the state between the userspace and the
kernel/bpf program because we can intercept any incoming command.
- tracing
The last usage I have in mind is tracing events and all the fun we can
do we BPF to summarize and analyze events.
Right now, tracing relies on hidraw. It works well except for a couple
of issues:
1. if the driver doesn't export a hidraw node, we can't trace anything
(eBPF will be a "god-mode" there, so it might raise some eyebrows)
2. hidraw doesn't catch the other process requests to the device, which
means that we have cases where we need to add printks to the kernel
to understand what is happening.
With that long introduction, here is the v1 of the support of eBPF in
HID.
I have targeted bpf-next here because the parts that will have the most
conflicts are in bpf. There might be a trivial minor conflict in
include/linux/hid.h with an other series I have pending[1].
I am relatively new to bpf, so having some feedback would be most very
welcome.
A couple of notes though:
- The series is missing a SEC("hid/driver_event") which would allow to
intercept incoming requests to the device from anybody. I left it
outside because it's not critical to have it from day one (we are more
interested right now by the USI case above)
- I am still wondering how to integrate the tracing part:
right now, if a bpf program is loaded before we start the tracer, we
will see *modified* events in the tracer. However, it might be
interesting to decide to see either unmodified (raw events from the
device) or modified events.
I think a flag might be able to solve that. The flag will control
whether we add the new program at the beginning of the list or at the
tail, but I am not sure if this is common practice in eBPF or if
there is a better way.
Cheers,
Benjamin
[0] https://lore.kernel.org/linux-input/20211215134220.1735144-1-tero.kristo@li…
[1] https://lore.kernel.org/linux-input/20220203143226.4023622-1-benjamin.tisso…
Benjamin Tissoires (6):
HID: initial BPF implementation
HID: bpf: allow to change the report descriptor from an eBPF program
HID: bpf: add hid_{get|set}_data helpers
HID: bpf: add new BPF type to trigger commands from userspace
HID: bpf: tests: rely on uhid event to know if a test device is ready
HID: bpf: add bpf_hid_raw_request helper function
drivers/hid/Makefile | 1 +
drivers/hid/hid-bpf.c | 327 +++++++++
drivers/hid/hid-core.c | 31 +-
include/linux/bpf-hid.h | 98 +++
include/linux/bpf_types.h | 4 +
include/linux/hid.h | 25 +
include/uapi/linux/bpf.h | 33 +
include/uapi/linux/bpf_hid.h | 56 ++
kernel/bpf/Makefile | 3 +
kernel/bpf/hid.c | 653 ++++++++++++++++++
kernel/bpf/syscall.c | 12 +
samples/bpf/.gitignore | 1 +
samples/bpf/Makefile | 4 +
samples/bpf/hid_mouse_kern.c | 91 +++
samples/bpf/hid_mouse_user.c | 129 ++++
tools/include/uapi/linux/bpf.h | 33 +
tools/lib/bpf/libbpf.c | 9 +
tools/lib/bpf/libbpf.h | 2 +
tools/lib/bpf/libbpf.map | 1 +
tools/testing/selftests/bpf/prog_tests/hid.c | 685 +++++++++++++++++++
tools/testing/selftests/bpf/progs/hid.c | 149 ++++
21 files changed, 2339 insertions(+), 8 deletions(-)
create mode 100644 drivers/hid/hid-bpf.c
create mode 100644 include/linux/bpf-hid.h
create mode 100644 include/uapi/linux/bpf_hid.h
create mode 100644 kernel/bpf/hid.c
create mode 100644 samples/bpf/hid_mouse_kern.c
create mode 100644 samples/bpf/hid_mouse_user.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/hid.c
create mode 100644 tools/testing/selftests/bpf/progs/hid.c
--
2.35.1
Changes from Previous Version (v2)
==================================
Compared to the v2 of this patchset
(https://lore.kernel.org/linux-mm/20220225130712.12682-1-sj@kernel.org/), this
version contains below changes.
- Put real details in the ABI document (Greg KH)
- Update 'Date:' in ABI document from Feb 2022 to Mar 2022 (Greg KH)
Introduction
============
DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so far.
However, it unnecessarily depends on debugfs, while DAMON is not aimed to be
used for only debugging. Also, the interface receives multiple values via one
file. For example, schemes file receives 18 values. As a result, it is
inefficient, hard to be used, and difficult to be extended. Especially,
keeping backward compatibility of user space tools is getting only challenging.
It would be better to implement another reliable and flexible interface and
deprecate DAMON_DBGFS in long term.
For the reason, this patchset introduces a sysfs-based new user interface of
DAMON. The idea of the new interface is, using directory hierarchies and
having one dedicated file for each value. For a short example, users can do
the virtual address monitoring via the interface as below:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
A brief representation of the files hierarchy of DAMON sysfs interface is as
below. Childs are represented with indentation, directories are having '/'
suffix, and files in each directory are separated by comma.
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Detailed usage of the files will be described in the final Documentation patch
of this patchset.
Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------
At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One
important difference between them is their exclusiveness. DAMON_DBGFS works in
an exclusive manner, so that no DAMON worker thread (kdamond) in the system can
run concurrently and interfere somehow. For the reason, DAMON_DBGFS asks users
to construct all monitoring contexts and start them at once. It's not a big
problem but makes the operation a little bit complex and unflexible.
For more flexible usage, DAMON_SYSFS moves the responsibility of preventing any
possible interference to the admins and work in a non-exclusive manner. That
is, users can configure and start contexts one by one. Note that DAMON
respects both exclusive groups and non-exclusive groups of contexts, in a
manner similar to that of reader-writer locks. That is, if any exclusive
monitoring contexts (e.g., contexts that started via DAMON_DBGFS) are running,
DAMON_SYSFS does not start new contexts, and vice versa.
Future Plan of DAMON_DBGFS Deprecation
======================================
Once this patchset is merged, DAMON_DBGFS development will be frozen. That is,
we will maintain it to work as is now so that no users will be break. But, it
will not be extended to provide any new feature of DAMON. The support will be
continued only until next LTS release. After that, we will drop DAMON_DBGFS.
User-space Tooling Compatibility
--------------------------------
As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space tooling can
move to DAMON_SYSFS. As we will continue supporting DAMON_DBGFS until next LTS
kernel release, user space tools would have enough time to move to DAMON_SYSFS.
The official user space tool, damo[1], is already supporting both DAMON_SYSFS
and DAMON_DBGFS. Both correctness tests[2] and performance tests[3] of DAMON
using DAMON_SYSFS also passed.
[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf
Complete Git Tree
=================
You can get the complete git tree from
https://git.kernel.org/sj/h/damon/sysfs/patches/v2.
Sequence of Patches
===================
First two patches (patches 1-2) make core changes for DAMON_SYSFS. The first
one (patch 1) allows non-exclusive DAMON contexts so that DAMON_SYSFS can work
in non-exclusive mode, while the second one (patch 2) adds size of DAMON enum
types so that DAMON API users can safely iterate the enums.
Third patch (patch 3) implements basic sysfs stub for virtual address spaces
monitoring. Note that this implements only sysfs files and DAMON is not
linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so that users
can control DAMON using the sysfs files.
Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring, DAMON-based
operation schemes, schemes quotas, schemes prioritization weights, schemes
watermarks, and schemes stats).
Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.
Patch History
=============
Changes from v2
(https://lore.kernel.org/linux-mm/20220225130712.12682-1-sj@kernel.org/)
- Put real details in the ABI document (Greg KH)
- Update 'Date:' in ABI document from Feb 2022 to Mar 2022 (Greg KH)
Changes from v1
(https://lore.kernel.org/linux-mm/20220223152051.22936-1-sj@kernel.org/)
- Use __ATTR_R{O,W}_MODE() instead of __ATTR() (Greg KH)
- Change some file names for using __ATTR_R{O,W}_MODE() (Greg KH)
- Add ABI document (Greg KH)
Chages from RFC
(https://lore.kernel.org/linux-mm/20220217161938.8874-1-sj@kernel.org/)
- Implement all DAMON debugfs interface providing features
- Writeup documents
- Add more selftests
SeongJae Park (13):
mm/damon/core: Allow non-exclusive DAMON start/stop
mm/damon/core: Add number of each enum type values
mm/damon: Implement a minimal stub for sysfs-based DAMON interface
mm/damon/sysfs: Link DAMON for virtual address spaces monitoring
mm/damon/sysfs: Support the physical address space monitoring
mm/damon/sysfs: Support DAMON-based Operation Schemes
mm/damon/sysfs: Support DAMOS quotas
mm/damon/sysfs: Support schemes prioritization
mm/damon/sysfs: Support DAMOS watermarks
mm/damon/sysfs: Support DAMOS stats
selftests/damon: Add a test for DAMON sysfs interface
Docs/admin-guide/mm/damon/usage: Document DAMON sysfs interface
Docs/ABI/testing: Add DAMON sysfs interface ABI document
.../ABI/testing/sysfs-kernel-mm-damon | 274 ++
Documentation/admin-guide/mm/damon/usage.rst | 350 ++-
MAINTAINERS | 1 +
include/linux/damon.h | 6 +-
mm/damon/Kconfig | 7 +
mm/damon/Makefile | 1 +
mm/damon/core.c | 23 +-
mm/damon/dbgfs.c | 2 +-
mm/damon/reclaim.c | 2 +-
mm/damon/sysfs.c | 2594 +++++++++++++++++
tools/testing/selftests/damon/Makefile | 1 +
tools/testing/selftests/damon/sysfs.sh | 306 ++
12 files changed, 3550 insertions(+), 17 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-damon
create mode 100644 mm/damon/sysfs.c
create mode 100755 tools/testing/selftests/damon/sysfs.sh
--
2.17.1
Extend the interoperability with IMA, to give wider flexibility for the
implementation of integrity-focused LSMs based on eBPF.
Patch 1 fixes some style issues.
Patches 2-4 gives the ability to eBPF-based LSMs to take advantage of the
measurement capability of IMA without needing to setup a policy in IMA
(those LSMs might implement the policy capability themselves).
Patches 5-6 allows eBPF-based LSMs to evaluate files read by the kernel.
Changelog
v1:
- Modify ima_file_hash() only and allow the usage of the function with the
modified behavior by eBPF-based LSMs through the new function
bpf_ima_file_hash() (suggested by Mimi)
- Make bpf_lsm_kernel_read_file() sleepable so that bpf_ima_inode_hash()
and bpf_ima_file_hash() can be called inside the implementation of
eBPF-based LSMs for this hook
Roberto Sassu (6):
ima: Fix documentation-related warnings in ima_main.c
ima: Always return a file measurement in ima_file_hash()
bpf-lsm: Introduce new helper bpf_ima_file_hash()
selftests/bpf: Add test for bpf_ima_file_hash()
bpf-lsm: Make bpf_lsm_kernel_read_file() as sleepable
selftests/bpf: Add test for bpf_lsm_kernel_read_file()
include/uapi/linux/bpf.h | 11 +++++
kernel/bpf/bpf_lsm.c | 21 +++++++++
security/integrity/ima/ima_main.c | 47 ++++++++++++-------
tools/include/uapi/linux/bpf.h | 11 +++++
tools/testing/selftests/bpf/ima_setup.sh | 2 +
.../selftests/bpf/prog_tests/test_ima.c | 30 ++++++++++--
tools/testing/selftests/bpf/progs/ima.c | 34 ++++++++++++--
7 files changed, 132 insertions(+), 24 deletions(-)
--
2.32.0
Before, our help output contained lines like
--kconfig_add KCONFIG_ADD
--qemu_config qemu_config
--jobs jobs
They're not very helpful.
The former kind come from the automatic 'metavar' we get from argparse,
the uppsercase version of the flag name.
The latter are where we manually specified metavar as the flag name.
After:
--build_dir DIR
--make_options X=Y
--kunitconfig KUNITCONFIG
--kconfig_add CONFIG_X=Y
--arch ARCH
--cross_compile PREFIX
--qemu_config FILE
--jobs N
--timeout SECONDS
--raw_output [{all,kunit}]
--json [FILE]
This patch tries to make the code more clear by specifying the _type_ of
input we expect, e.g. --build_dir is a DIR, --qemu_config is a FILE.
I also switched it to uppercase since it looked more clearly like
placeholder text that way.
This patch also changes --raw_output to specify `choices` to make it
more clear what the options are, and this way argparse can validate it
for us, as shown by the added test case.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
---
tools/testing/kunit/kunit.py | 26 ++++++++++++--------------
tools/testing/kunit/kunit_tool_test.py | 5 +++++
2 files changed, 17 insertions(+), 14 deletions(-)
diff --git a/tools/testing/kunit/kunit.py b/tools/testing/kunit/kunit.py
index 9274c6355809..566404f5e42a 100755
--- a/tools/testing/kunit/kunit.py
+++ b/tools/testing/kunit/kunit.py
@@ -206,8 +206,6 @@ def parse_tests(request: KunitParseRequest, input_data: Iterable[str]) -> Tuple[
pass
elif request.raw_output == 'kunit':
output = kunit_parser.extract_tap_lines(output)
- else:
- print(f'Unknown --raw_output option "{request.raw_output}"', file=sys.stderr)
for line in output:
print(line.rstrip())
@@ -281,10 +279,10 @@ def add_common_opts(parser) -> None:
parser.add_argument('--build_dir',
help='As in the make command, it specifies the build '
'directory.',
- type=str, default='.kunit', metavar='build_dir')
+ type=str, default='.kunit', metavar='DIR')
parser.add_argument('--make_options',
help='X=Y make option, can be repeated.',
- action='append')
+ action='append', metavar='X=Y')
parser.add_argument('--alltests',
help='Run all KUnit tests through allyesconfig',
action='store_true')
@@ -292,11 +290,11 @@ def add_common_opts(parser) -> None:
help='Path to Kconfig fragment that enables KUnit tests.'
' If given a directory, (e.g. lib/kunit), "/.kunitconfig" '
'will get automatically appended.',
- metavar='kunitconfig')
+ metavar='KUNITCONFIG')
parser.add_argument('--kconfig_add',
help='Additional Kconfig options to append to the '
'.kunitconfig, e.g. CONFIG_KASAN=y. Can be repeated.',
- action='append')
+ action='append', metavar='CONFIG_X=Y')
parser.add_argument('--arch',
help=('Specifies the architecture to run tests under. '
@@ -304,7 +302,7 @@ def add_common_opts(parser) -> None:
'string passed to the ARCH make param, '
'e.g. i386, x86_64, arm, um, etc. Non-UML '
'architectures run on QEMU.'),
- type=str, default='um', metavar='arch')
+ type=str, default='um', metavar='ARCH')
parser.add_argument('--cross_compile',
help=('Sets make\'s CROSS_COMPILE variable; it should '
@@ -316,18 +314,18 @@ def add_common_opts(parser) -> None:
'if you have downloaded the microblaze toolchain '
'from the 0-day website to a directory in your '
'home directory called `toolchains`).'),
- metavar='cross_compile')
+ metavar='PREFIX')
parser.add_argument('--qemu_config',
help=('Takes a path to a path to a file containing '
'a QemuArchParams object.'),
- type=str, metavar='qemu_config')
+ type=str, metavar='FILE')
def add_build_opts(parser) -> None:
parser.add_argument('--jobs',
help='As in the make command, "Specifies the number of '
'jobs (commands) to run simultaneously."',
- type=int, default=get_default_jobs(), metavar='jobs')
+ type=int, default=get_default_jobs(), metavar='N')
def add_exec_opts(parser) -> None:
parser.add_argument('--timeout',
@@ -336,7 +334,7 @@ def add_exec_opts(parser) -> None:
'tests.',
type=int,
default=300,
- metavar='timeout')
+ metavar='SECONDS')
parser.add_argument('filter_glob',
help='Filter which KUnit test suites/tests run at '
'boot-time, e.g. list* or list*.*del_test',
@@ -346,7 +344,7 @@ def add_exec_opts(parser) -> None:
metavar='filter_glob')
parser.add_argument('--kernel_args',
help='Kernel command-line parameters. Maybe be repeated',
- action='append')
+ action='append', metavar='')
parser.add_argument('--run_isolated', help='If set, boot the kernel for each '
'individual suite/test. This is can be useful for debugging '
'a non-hermetic test, one that might pass/fail based on '
@@ -357,13 +355,13 @@ def add_exec_opts(parser) -> None:
def add_parse_opts(parser) -> None:
parser.add_argument('--raw_output', help='If set don\'t format output from kernel. '
'If set to --raw_output=kunit, filters to just KUnit output.',
- type=str, nargs='?', const='all', default=None)
+ type=str, nargs='?', const='all', default=None, choices=['all', 'kunit'])
parser.add_argument('--json',
nargs='?',
help='Stores test results in a JSON, and either '
'prints to stdout or saves to file if a '
'filename is specified',
- type=str, const='stdout', default=None)
+ type=str, const='stdout', default=None, metavar='FILE')
def main(argv, linux=None):
parser = argparse.ArgumentParser(
diff --git a/tools/testing/kunit/kunit_tool_test.py b/tools/testing/kunit/kunit_tool_test.py
index 352369dffbd9..eb2011d12c78 100755
--- a/tools/testing/kunit/kunit_tool_test.py
+++ b/tools/testing/kunit/kunit_tool_test.py
@@ -595,6 +595,11 @@ class KUnitMainTest(unittest.TestCase):
self.assertNotEqual(call, mock.call(StrContains('Testing complete.')))
self.assertNotEqual(call, mock.call(StrContains(' 0 tests run')))
+ def test_run_raw_output_invalid(self):
+ self.linux_source_mock.run_kernel = mock.Mock(return_value=[])
+ with self.assertRaises(SystemExit) as e:
+ kunit.main(['run', '--raw_output=invalid'], self.linux_source_mock)
+
def test_run_raw_output_does_not_take_positional_args(self):
# --raw_output is a string flag, but we don't want it to consume
# any positional arguments, only ones after an '='
base-commit: 5debe5bfa02c4c8922bd2d0f82c9c3a70bec8944
--
2.35.1.574.g5d30c73bfb-goog
Some problems with reading the RTC time may happen rarely, for example
while the RTC is updating. So read the RTC many times to catch these
problems. For example, a previous attempt for my
commit ea6fa4961aab ("rtc: mc146818-lib: fix RTC presence check")
was incorrect and would have triggered this selftest.
To avoid the risk of damaging the hardware, wait 11ms before consecutive
reads.
In rtc_time_to_timestamp I copied values manually instead of casting -
just to be on the safe side. The 11ms wait period was chosen so that it is
not a divisor of 1000ms.
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Cc: Alessandro Zummo <a.zummo(a)towertech.it>
Cc: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Cc: Shuah Khan <shuah(a)kernel.org>
---
Also, before
commit cdedc45c579f ("rtc: cmos: avoid UIP when reading alarm time")
reading the RTC alarm time during RTC update produced incorrect results
on many Intel platforms. Preparing a similar selftest for this case
would be more difficult, though, because the RTC alarm time is cached by
the kernel. Direct access would have to be exposed somehow, for example
in debugfs. I may prepare a patch for it in the future.
---
tools/testing/selftests/rtc/rtctest.c | 66 +++++++++++++++++++++++++++
tools/testing/selftests/rtc/settings | 2 +-
2 files changed, 67 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/rtc/rtctest.c b/tools/testing/selftests/rtc/rtctest.c
index 66af608fb4c6..2b9d929a24ed 100644
--- a/tools/testing/selftests/rtc/rtctest.c
+++ b/tools/testing/selftests/rtc/rtctest.c
@@ -20,6 +20,8 @@
#define NUM_UIE 3
#define ALARM_DELTA 3
+#define READ_LOOP_DURATION_SEC 30
+#define READ_LOOP_SLEEP_MS 11
static char *rtc_file = "/dev/rtc0";
@@ -49,6 +51,70 @@ TEST_F(rtc, date_read) {
rtc_tm.tm_hour, rtc_tm.tm_min, rtc_tm.tm_sec);
}
+static time_t rtc_time_to_timestamp(struct rtc_time *rtc_time)
+{
+ struct tm tm_time = {
+ .tm_sec = rtc_time->tm_sec,
+ .tm_min = rtc_time->tm_min,
+ .tm_hour = rtc_time->tm_hour,
+ .tm_mday = rtc_time->tm_mday,
+ .tm_mon = rtc_time->tm_mon,
+ .tm_year = rtc_time->tm_year,
+ };
+
+ return mktime(&tm_time);
+}
+
+static void nanosleep_with_retries(long ns)
+{
+ struct timespec req = {
+ .tv_sec = 0,
+ .tv_nsec = ns,
+ };
+ struct timespec rem;
+
+ while (nanosleep(&req, &rem) != 0) {
+ req.tv_sec = rem.tv_sec;
+ req.tv_nsec = rem.tv_nsec;
+ }
+}
+
+TEST_F_TIMEOUT(rtc, date_read_loop, READ_LOOP_DURATION_SEC + 2) {
+ int rc;
+ long iter_count = 0;
+ struct rtc_time rtc_tm;
+ time_t start_rtc_read, prev_rtc_read;
+
+ TH_LOG("Continuously reading RTC time for %ds (with %dms breaks after every read).",
+ READ_LOOP_DURATION_SEC, READ_LOOP_SLEEP_MS);
+
+ rc = ioctl(self->fd, RTC_RD_TIME, &rtc_tm);
+ ASSERT_NE(-1, rc);
+ start_rtc_read = rtc_time_to_timestamp(&rtc_tm);
+ prev_rtc_read = start_rtc_read;
+
+ do {
+ time_t rtc_read;
+
+ rc = ioctl(self->fd, RTC_RD_TIME, &rtc_tm);
+ ASSERT_NE(-1, rc);
+
+ rtc_read = rtc_time_to_timestamp(&rtc_tm);
+ /* Time should not go backwards */
+ ASSERT_LE(prev_rtc_read, rtc_read);
+ /* Time should not increase more then 1s at a time */
+ ASSERT_GE(prev_rtc_read + 1, rtc_read);
+
+ /* Sleep 11ms to avoid killing / overheating the RTC */
+ nanosleep_with_retries(READ_LOOP_SLEEP_MS * 1000000);
+
+ prev_rtc_read = rtc_read;
+ iter_count++;
+ } while (prev_rtc_read <= start_rtc_read + READ_LOOP_DURATION_SEC);
+
+ TH_LOG("Performed %ld RTC time reads.", iter_count);
+}
+
TEST_F_TIMEOUT(rtc, uie_read, NUM_UIE + 2) {
int i, rc, irq = 0;
unsigned long data;
diff --git a/tools/testing/selftests/rtc/settings b/tools/testing/selftests/rtc/settings
index a953c96aa16e..0c1a2075d5f3 100644
--- a/tools/testing/selftests/rtc/settings
+++ b/tools/testing/selftests/rtc/settings
@@ -1 +1 @@
-timeout=180
+timeout=210
--
2.25.1
Changes from Previous Version (v1)
==================================
Compared to the v1 of this patchset
(https://lore.kernel.org/linux-mm/20220223152051.22936-1-sj@kernel.org/), this
version contains below changes.
- Use __ATTR_R{O,W}_MODE() instead of __ATTR() (Greg KH)
- Change some file names for using __ATTR_R{O,W}_MODE() (Greg KH)
- Add ABI document (Greg KH)
Introduction
============
DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so far.
However, it unnecessarily depends on debugfs, while DAMON is not aimed to be
used for only debugging. Also, the interface receives multiple values via one
file. For example, schemes file receives 18 values. As a result, it is
inefficient, hard to be used, and difficult to be extended. Especially,
keeping backward compatibility of user space tools is getting only challenging.
It would be better to implement another reliable and flexible interface and
deprecate DAMON_DBGFS in long term.
For the reason, this patchset introduces a sysfs-based new user interface of
DAMON. The idea of the new interface is, using directory hierarchies and
having one dedicated file for each value. For a short example, users can do
the virtual address monitoring via the interface as below:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr_kdamonds
# echo 1 > kdamonds/0/contexts/nr_contexts
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr_targets
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid_target
# echo on > kdamonds/0/state
A brief representation of the files hierarchy of DAMON sysfs interface is as
below. Childs are represented with indentation, directories are having '/'
suffix, and files in each directory are separated by comma.
/sys/kernel/mm/damon/admin
│ kdamonds/nr_kdamonds
│ │ 0/state,pid
│ │ │ contexts/nr_contexts
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr_targets
│ │ │ │ │ │ 0/pid_target
│ │ │ │ │ │ │ regions/nr_regions
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr_schemes
│ │ │ │ │ │ 0/action
│ │ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ │ ...
│ │ │ │ ...
│ │ ...
Detailed usage of the files will be described in the final Documentation patch
of this patchset.
Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------
At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One
important difference between them is their exclusiveness. DAMON_DBGFS works in
an exclusive manner, so that no DAMON worker thread (kdamond) in the system can
run concurrently and interfere somehow. For the reason, DAMON_DBGFS asks users
to construct all monitoring contexts and start them at once. It's not a big
problem but makes the operation a little bit complex and unflexible.
For more flexible usage, DAMON_SYSFS moves the responsibility of preventing any
possible interference to the admins and work in a non-exclusive manner. That
is, users can configure and start contexts one by one. Note that DAMON
respects both exclusive groups and non-exclusive groups of contexts, in a
manner similar to that of reader-writer locks. That is, if any exclusive
monitoring contexts (e.g., contexts that started via DAMON_DBGFS) are running,
DAMON_SYSFS does not start new contexts, and vice versa.
Future Plan of DAMON_DBGFS Deprecation
======================================
Once this patchset is merged, DAMON_DBGFS development will be frozen. That is,
we will maintain it to work as is now so that no users will be break. But, it
will not be extended to provide any new feature of DAMON. The support will be
continued only until next LTS release. After that, we will drop DAMON_DBGFS.
User-space Tooling Compatibility
--------------------------------
As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space tooling can
move to DAMON_SYSFS. As we will continue supporting DAMON_DBGFS until next LTS
kernel release, user space tools would have enough time to move to DAMON_SYSFS.
The official user space tool, damo[1], is already supporting both DAMON_SYSFS
and DAMON_DBGFS. Both correctness tests[2] and performance tests[3] of DAMON
using DAMON_SYSFS also passed.
[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf
Complete Git Tree
=================
You can get the complete git tree from
https://git.kernel.org/sj/h/damon/sysfs/patches/v2.
Sequence of Patches
===================
First two patches (patches 1-2) make core changes for DAMON_SYSFS. The first
one (patch 1) allows non-exclusive DAMON contexts so that DAMON_SYSFS can work
in non-exclusive mode, while the second one (patch 2) adds size of DAMON enum
types so that DAMON API users can safely iterate the enums.
Third patch (patch 3) implements basic sysfs stub for virtual address spaces
monitoring. Note that this implements only sysfs files and DAMON is not
linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so that users
can control DAMON using the sysfs files.
Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring, DAMON-based
operation schemes, schemes quotas, schemes prioritization weights, schemes
watermarks, and schemes stats).
Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.
Patch History
=============
Changes from Previous Version (v1)
==================================
Changes from v1
(https://lore.kernel.org/linux-mm/20220223152051.22936-1-sj@kernel.org/)
- Use __ATTR_R{O,W}_MODE() instead of __ATTR() (Greg KH)
- Change some file names for using __ATTR_R{O,W}_MODE() (Greg KH)
- Add ABI document (Greg KH)
Chages from RFC
(https://lore.kernel.org/linux-mm/20220217161938.8874-1-sj@kernel.org/)
- Implement all DAMON debugfs interface providing features
- Writeup documents
- Add more selftests
SeongJae Park (13):
mm/damon/core: Allow non-exclusive DAMON start/stop
mm/damon/core: Add number of each enum type values
mm/damon: Implement a minimal stub for sysfs-based DAMON interface
mm/damon/sysfs: Link DAMON for virtual address spaces monitoring
mm/damon/sysfs: Support the physical address space monitoring
mm/damon/sysfs: Support DAMON-based Operation Schemes
mm/damon/sysfs: Support DAMOS quotas
mm/damon/sysfs: Support schemes prioritization
mm/damon/sysfs: Support DAMOS watermarks
mm/damon/sysfs: Support DAMOS stats
selftests/damon: Add a test for DAMON sysfs interface
Docs/admin-guide/mm/damon/usage: Document DAMON sysfs interface
Docs/ABI/testing: Add DAMON sysfs interface ABI document
.../ABI/testing/sysfs-kernel-mm-damon | 276 ++
Documentation/admin-guide/mm/damon/usage.rst | 350 ++-
MAINTAINERS | 1 +
include/linux/damon.h | 6 +-
mm/damon/Kconfig | 7 +
mm/damon/Makefile | 1 +
mm/damon/core.c | 23 +-
mm/damon/dbgfs.c | 2 +-
mm/damon/reclaim.c | 2 +-
mm/damon/sysfs.c | 2594 +++++++++++++++++
tools/testing/selftests/damon/Makefile | 1 +
tools/testing/selftests/damon/sysfs.sh | 306 ++
12 files changed, 3552 insertions(+), 17 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-kernel-mm-damon
create mode 100644 mm/damon/sysfs.c
create mode 100755 tools/testing/selftests/damon/sysfs.sh
--
2.17.1
Chages from Previous Version (RFC)
==================================
Compared to the RFC version of this patchset
(https://lore.kernel.org/linux-mm/20220217161938.8874-1-sj@kernel.org/), this
version contains below changes.
- Implement all DAMON debugfs interface providing features
- Writeup documents
- Add more selftests
Introduction
============
DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so far.
However, it unnecessarily depends on debugfs, while DAMON is not aimed to be
used for only debugging. Also, the interface receives multiple values via one
file. For example, schemes file receives 18 values. As a result, it is
inefficient, hard to be used, and difficult to be extended. Especially,
keeping backward compatibility of user space tools is getting only challenging.
It would be better to implement another reliable and flexible interface and
deprecate DAMON_DBGFS in long term.
For the reason, this patchset introduces a sysfs-based new user interface of
DAMON. The idea of the new interface is, using directory hierarchies and
having one dedicated file for each value. For a short example, users can do
the virtual address monitoring via the interface as below:
# cd /sys/kernel/mm/damon/admin/
# echo 1 > kdamonds/nr
# echo 1 > kdamonds/0/contexts/nr
# echo vaddr > kdamonds/0/contexts/0/operations
# echo 1 > kdamonds/0/contexts/0/targets/nr
# echo $(pidof <workload>) > kdamonds/0/contexts/0/targets/0/pid
# echo on > kdamonds/0/state
A brief representation of the files hierarchy of DAMON sysfs interface is as
below. Childs are represented with indentation, directories are having '/'
suffix, and files in each directory are separated by comma.
/sys/kernel/mm/damon/admin
│ kdamonds/nr
│ │ 0/state,pid
│ │ │ contexts/nr
│ │ │ │ 0/operations
│ │ │ │ │ monitoring_attrs/
│ │ │ │ │ │ intervals/sample_us,aggr_us,update_us
│ │ │ │ │ │ nr_regions/min,max
│ │ │ │ │ targets/nr
│ │ │ │ │ │ 0/pid
│ │ │ │ │ │ │ regions/nr
│ │ │ │ │ │ │ │ 0/start,end
│ │ │ │ │ │ │ │ ...
│ │ │ │ │ │ ...
│ │ │ │ │ schemes/nr
│ │ │ │ │ 0/action
│ │ │ │ │ │ access_pattern/
│ │ │ │ │ │ │ sz/min,max
│ │ │ │ │ │ │ nr_accesses/min,max
│ │ │ │ │ │ │ age/min,max
│ │ │ │ │ │ quotas/ms,sz,reset_interval_ms
│ │ │ │ │ │ │ weights/sz,nr_accesses,age
│ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low
│ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds
│ │ │ │ │ ...
│ │ ...
Detailed usage of the files will be described in the final Documentation patch
of this patchset.
Main Difference Between DAMON_DBGFS and DAMON_SYSFS
---------------------------------------------------
At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One
important difference between them is their exclusiveness. DAMON_DBGFS works in
an exclusive manner, so that no DAMON worker thread (kdamond) in the system can
run concurrently and interfere somehow. For the reason, DAMON_DBGFS asks users
to construct all monitoring contexts and start them at once. It's not a big
problem but makes the operation a little bit complex and unflexible.
For more flexible usage, DAMON_SYSFS moves the responsibility of preventing any
possible interference to the admins and work in a non-exclusive manner. That
is, users can configure and start contexts one by one. Note that DAMON
respects both exclusive groups and non-exclusive groups of contexts, in a
manner similar to that of reader-writer locks. That is, if any exclusive
monitoring contexts (e.g., contexts that started via DAMON_DBGFS) are running,
DAMON_SYSFS does not start new contexts, and vice versa.
Future Plan of DAMON_DBGFS Deprecation
======================================
Once this patchset is merged, DAMON_DBGFS development will be frozen. That is,
we will maintain it to work as is now so that no users will be break. But, it
will not be extended to provide any new feature of DAMON. The support will be
continued only until next LTS release. After that, we will drop DAMON_DBGFS.
User-space Tooling Compatibility
--------------------------------
As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space tooling can
move to DAMON_SYSFS. As we will continue supporting DAMON_DBGFS until next LTS
kernel release, user space tools would have enough time to move to DAMON_SYSFS.
The official user space tool, damo[1], is already supporting both DAMON_SYSFS
and DAMON_DBGFS. Both correctness tests[2] and performance tests[3] of DAMON
using DAMON_SYSFS also passed.
[1] https://github.com/awslabs/damo
[2] https://github.com/awslabs/damon-tests/tree/master/corr
[3] https://github.com/awslabs/damon-tests/tree/master/perf
Complete Git Tree
=================
You can get the complete git tree from
https://git.kernel.org/sj/h/damon/sysfs/patches/v1.
Sequence of Patches
===================
First two patches (patches 1-2) make core changes for DAMON_SYSFS. The first
one (patch 1) allows non-exclusive DAMON contexts so that DAMON_SYSFS can work
in non-exclusive mode, while the second one (patch 2) adds size of DAMON enum
types so that DAMON API users can safely iterate the enums.
Third patch (patch 3) implements basic sysfs stub for virtual address spaces
monitoring. Note that this implements only sysfs files and DAMON is not
linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so that users
can control DAMON using the sysfs files.
Following six patches (patches 5-10) implements other DAMON features that
DAMON_DBGFS supports one by one (physical address space monitoring, DAMON-based
operation schemes, schemes quotas, schemes prioritization weights, schemes
watermarks, and schemes stats).
Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the
final one (patch 12) documents DAMON_SYSFS.
SeongJae Park (12):
mm/damon/core: Allow non-exclusive DAMON start/stop
mm/damon/core: Add number of each enum type values
mm/damon: Implement a minimal stub for sysfs-based DAMON interface
mm/damon/sysfs: Link DAMON for virtual address spaces monitoring
mm/damon/sysfs: Support physical address space monitoring
mm/damon/sysfs: Support DAMON-based Operation Schemes
mm/damon/sysfs: Support DAMOS quotas
mm/damon/sysfs: Support schemes prioritization weights
mm/damon/sysfs: Support DAMOS watermarks
mm/damon/sysfs: Support DAMOS stats
selftests/damon: Add a test for DAMON sysfs interface
Docs/admin-guide/mm/damon/usage: Document DAMON sysfs interface
Documentation/admin-guide/mm/damon/usage.rst | 349 ++-
include/linux/damon.h | 6 +-
mm/damon/Kconfig | 7 +
mm/damon/Makefile | 1 +
mm/damon/core.c | 23 +-
mm/damon/dbgfs.c | 2 +-
mm/damon/reclaim.c | 2 +-
mm/damon/sysfs.c | 2684 ++++++++++++++++++
tools/testing/selftests/damon/Makefile | 1 +
tools/testing/selftests/damon/sysfs.sh | 306 ++
10 files changed, 3364 insertions(+), 17 deletions(-)
create mode 100644 mm/damon/sysfs.c
create mode 100755 tools/testing/selftests/damon/sysfs.sh
--
2.17.1
Dzień dobry,
dostrzegam możliwość współpracy z Państwa firmą.
Świadczymy kompleksową obsługę inwestycji w fotowoltaikę, która obniża koszty energii elektrycznej nawet o 90%.
Czy są Państwo zainteresowani weryfikacją wstępnych propozycji?
Pozdrawiam,
Jakub Daroch