Currently if opening /dev/null fails to open then file pointer fp
is null and further access to fp via fprintf will cause a null
pointer dereference. Fix this by returning a negative error value
when a null fp is detected.
Detected using cppcheck static analysis:
tools/testing/selftests/resctrl/fill_buf.c:124:6: note: Assuming
that condition '!fp' is not redundant
if (!fp)
^
tools/testing/selftests/resctrl/fill_buf.c:126:10: note: Null
pointer dereference
fprintf(fp, "Sum: %d ", ret);
Fixes: a2561b12fe39 ("selftests/resctrl: Add built in benchmark")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
V2: Add cppcheck analysis information
---
tools/testing/selftests/resctrl/fill_buf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index 51e5cf22632f..56ccbeae0638 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -121,8 +121,10 @@ static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr,
/* Consume read result so that reading memory is not optimized out. */
fp = fopen("/dev/null", "w");
- if (!fp)
+ if (!fp) {
perror("Unable to write to /dev/null");
+ return -1;
+ }
fprintf(fp, "Sum: %d ", ret);
fclose(fp);
--
2.35.1
Hello,
The aim of this series is to print a message to let users know a possible
cause of failure, if the result of MBM&CMT tests is failed on Intel CPU.
In order to detect Intel vendor, I extended AMD vendor detect function.
Difference from v4:
- Fixed the typos.
- Changed "get_vendor() != ARCH_AMD" to "get_vendor() == ARCH_INTEL".
- Reorder the declarations based on line length from longest to shortest.
https://lore.kernel.org/lkml/20220316055940.292550-1-tan.shaopeng@jp.fujits… [PATCH v4]
This patch series is based on v5.17.
Shaopeng Tan (2):
selftests/resctrl: Extend CPU vendor detection
selftests/resctrl: Print a message if the result of MBM&CMT tests is
failed on Intel CPU
tools/testing/selftests/resctrl/cat_test.c | 2 +-
tools/testing/selftests/resctrl/resctrl.h | 5 ++-
.../testing/selftests/resctrl/resctrl_tests.c | 45 +++++++++++++------
tools/testing/selftests/resctrl/resctrlfs.c | 2 +-
4 files changed, 37 insertions(+), 17 deletions(-)
--
2.27.0
Now that the discussions surrounding the support for SGX2 is settling,
the kselftest audience is added to the discussion for the first time
to consider the testing of the new features.
V3: https://lore.kernel.org/lkml/cover.1648847675.git.reinette.chatre@intel.com/
Changes since V3 that directly impact user space:
- SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS ioctl()'s struct
sgx_enclave_restrict_permissions no longer provides entire secinfo,
just the new permissions in new "permissions" struct member. (Jarkko)
- Rename SGX_IOC_ENCLAVE_MODIFY_TYPE ioctl() to
SGX_IOC_ENCLAVE_MODIFY_TYPES. (Jarkko)
- SGX_IOC_ENCLAVE_MODIFY_TYPES ioctl()'s struct sgx_enclave_modify_type
no longer provides entire secinfo, just the new page type in new
"page_type" struct member. (Jarkko)
Details about changes since V3 that do not directly impact user space:
- Add new patch to enable VA pages to be added without invoking reclaimer
directly if no EPC pages are available, failing instead. This enables
VA pages to be added with enclave's mutex held. Fixes an issue
encountered by Haitao. More details in new patch "x86/sgx: Support VA page
allocation without reclaiming".
- While refactoring, change existing code to consistently use
IS_ALIGNED(). (Jarkko)
- Many patches received a tag from Jarkko.
- Many smaller changes, please refer to individual patches.
V2: https://lore.kernel.org/lkml/cover.1644274683.git.reinette.chatre@intel.com/
Changes since V2 that directly impact user space:
- Maximum allowed permissions of dynamically added pages is RWX,
previously limited to RW. (Jarkko)
Dynamically added pages are initially created with architecturally
limited EPCM permissions of RW. mmap() and mprotect() of these pages
with RWX permissions would no longer be blocked by SGX driver. PROT_EXEC
on dynamically added pages will be possible after running ENCLU[EMODPE]
from within the enclave with appropriate VMA permissions.
- The kernel no longer attempts to track the EPCM runtime permissions. (Jarkko)
Consequences are:
- Kernel does not modify PTEs to follow EPCM permissions. User space
will receive #PF with SGX error code in cases where the V2
implementation would have resulted in regular (non-SGX) page fault
error code.
- SGX_IOC_ENCLAVE_RELAX_PERMISSIONS is removed. This ioctl() was used
to clear PTEs after permissions were modified from within the enclave
and ensure correct PTEs are installed. Since PTEs no longer track
EPCM permissions the changes in EPCM permissions would not impact PTEs.
As long as new permissions are within the maximum vetted permissions
(vm_max_prot_bits) only ENCLU[EMODPE] from within enclave is needed,
as accompanied by appropriate VMA permissions.
- struct sgx_enclave_restrict_perm renamed to
sgx_enclave_restrict_permissions (Jarkko)
- struct sgx_enclave_modt renamed to struct sgx_enclave_modify_type
to be consistent with the verbose naming of other SGX uapi structs.
Details about changes since V2 that do not directly impact user space:
- Kernel no longer tracks the runtime EPCM permissions with the aim of
installing accurate PTEs. (Jarkko)
- In support of this change the following patches were removed:
Documentation/x86: Document SGX permission details
x86/sgx: Support VMA permissions more relaxed than enclave permissions
x86/sgx: Add pfn_mkwrite() handler for present PTEs
x86/sgx: Add sgx_encl_page->vm_run_prot_bits for dynamic permission changes
x86/sgx: Support relaxing of enclave page permissions
- No more handling of scenarios where VMA permissions may be more
relaxed than what the EPCM allows. Enclaves are not prevented
from accessing such pages and the EPCM permissions are entrusted
to control access as supported by the SGX error code in page faults.
- No more explicit setting of protection bits in page fault handler.
Protection bits are inherited from VMA similar to SGX1 support.
- Selftest patches are moved to the end of the series. (Jarkko)
- New patch contributed by Jarkko to avoid duplicated code:
x86/sgx: Export sgx_encl_page_alloc()
- New patch separating changes from existing patch. (Jarkko)
x86/sgx: Export sgx_encl_{grow,shrink}()
- New patch to keep one required benefit from the (now removed) kernel
EPCM permission tracking:
x86/sgx: Support loading enclave page without VMA permissions check
- Updated cover letter to reflect architecture changes.
- Many smaller changes, please refer to individual patches.
V1: https://lore.kernel.org/linux-sgx/cover.1638381245.git.reinette.chatre@inte…
Changes since V1 that directly impact user space:
- SGX2 permission changes changed from a single ioctl() named
SGX_IOC_PAGE_MODP to two new ioctl()s:
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS, supported by two different
parameter structures (SGX_IOC_ENCLAVE_RELAX_PERMISSIONS does
not support a result output parameter) (Jarkko).
User space flow impact: After user space runs ENCLU[EMODPE] it
needs to call SGX_IOC_ENCLAVE_RELAX_PERMISSIONS to have PTEs
updated. Previously running SGX_IOC_PAGE_MODP in this scenario
resulted in EPCM.PR being set but calling
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS will not result in EPCM.PR
being set anymore and thus no need for an additional
ENCLU[EACCEPT].
- SGX_IOC_ENCLAVE_RELAX_PERMISSIONS and
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
obtain new permissions from secinfo as parameter instead of
the permissions directly (Jarkko).
- ioctl() supporting SGX2 page type change is renamed from
SGX_IOC_PAGE_MODT to SGX_IOC_ENCLAVE_MODIFY_TYPE (Jarkko).
- SGX_IOC_ENCLAVE_MODIFY_TYPE obtains new page type from secinfo
as parameter instead of the page type directly (Jarkko).
- ioctl() supporting SGX2 page removal is renamed from
SGX_IOC_PAGE_REMOVE to SGX_IOC_ENCLAVE_REMOVE_PAGES (Jarkko).
- All ioctl() parameter structures have been renamed as a result of the
ioctl() renaming:
SGX_IOC_ENCLAVE_RELAX_PERMISSIONS => struct sgx_enclave_relax_perm
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS => struct sgx_enclave_restrict_perm
SGX_IOC_ENCLAVE_MODIFY_TYPE => struct sgx_enclave_modt
SGX_IOC_ENCLAVE_REMOVE_PAGES => struct sgx_enclave_remove_pages
Changes since V1 that do not directly impact user space:
- Number of patches in series increased from 25 to 32 primarily because
of splitting the original submission:
- Wrappers for the new SGX2 functions are introduced in three separate
patches replacing the original "x86/sgx: Add wrappers for SGX2
functions"
(Jarkko).
- Moving and renaming sgx_encl_ewb_cpumask() is done with two patches
replacing the original "x86/sgx: Use more generic name for enclave
cpumask function" (Jarkko).
- Support for SGX2 EPCM permission changes is split into two ioctls(),
one for relaxing and one for restricting permissions, each introduced
by a new patch replacing the original "x86/sgx: Support enclave page
permission changes" (Jarkko).
- Extracted code used by existing ioctls() for usage by new ioctl()s
into a new utility in new patch "x86/sgx: Create utility to validate
user provided offset and length" (Dave did not specifically ask for
this but it addresses his review feedback).
- Two new Documentation patches to support the SGX2 work
("Documentation/x86: Introduce enclave runtime management") and
a dedicated section on the enclave permission management
("Documentation/x86: Document SGX permission details") (Andy).
- Most patches were reworked to improve the language by:
* aiming to refer to exact item instead of English rephrasing (Jarkko).
* use ioctl() instead of ioctl throughout (Dave).
* Use "relaxed" instead of "exceed" when referring to permissions
(Dave).
- Improved documentation with several additions to
Documentation/x86/sgx.rst.
- Many smaller changes, please refer to individual patches.
Hi Everybody,
The current Linux kernel support for SGX includes support for SGX1 that
requires that an enclave be created with properties that accommodate all
usages over its (the enclave's) lifetime. This includes properties such
as permissions of enclave pages, the number of enclave pages, and the
number of threads supported by the enclave.
Consequences of this requirement to have the enclave be created to
accommodate all usages include:
* pages needing to support relocated code are required to have RWX
permissions for their entire lifetime,
* an enclave needs to be created with the maximum stack and heap
projected to be needed during the enclave's entire lifetime which
can be longer than the processes running within it,
* an enclave needs to be created with support for the maximum number
of threads projected to run in the enclave.
Since SGX1 a few more functions were introduced, collectively called
SGX2, that support modifications to an initialized enclave. Hardware
supporting these functions are already available as listed on
https://github.com/ayeks/SGX-hardware
This series adds support for SGX2, also referred to as Enclave Dynamic
Memory Management (EDMM). This includes:
* Support modifying EPCM permissions of regular enclave pages belonging
to an initialized enclave. Only permission restriction is supported
via a new ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS. Relaxing of
EPCM permissions can only be done from within the enclave with the
SGX instruction ENCLU[EMODPE].
* Support dynamic addition of regular enclave pages to an initialized
enclave. At creation new pages are architecturally limited to RW EPCM
permissions but will be accessible with PROT_EXEC after the enclave
runs ENCLU[EMODPE] to relax EPCM permissions to RWX.
Pages are dynamically added to an initialized enclave from the SGX
page fault handler.
* Support expanding an initialized enclave to accommodate more threads.
More threads can be accommodated by an enclave with the addition of
Thread Control Structure (TCS) pages that is done by changing the
type of regular enclave pages to TCS pages using a new ioctl()
SGX_IOC_ENCLAVE_MODIFY_TYPES.
* Support removing regular and TCS pages from an initialized enclave.
Removing pages is accomplished in two stages as supported by two new
ioctl()s SGX_IOC_ENCLAVE_MODIFY_TYPES (same ioctl() as mentioned in
previous bullet) and SGX_IOC_ENCLAVE_REMOVE_PAGES.
* Tests covering all the new flows, some edge cases, and one
comprehensive stress scenario.
No additional work is needed to support SGX2 in a virtualized
environment. All tests included in this series passed when run from
a guest as tested with the recent QEMU release based on 6.2.0
that supports SGX.
Patches 1 through 14 prepare the existing code for SGX2 support by
introducing the SGX2 functions, refactoring code, and tracking enclave
page types.
Patches 15 through 21 enable the SGX2 features and include a
Documentation patch.
Patches 22 through 31 test several scenarios of all the enabled
SGX2 features.
This series is based on v5.18-rc2.
Your feedback will be greatly appreciated.
Regards,
Reinette
Jarkko Sakkinen (1):
x86/sgx: Export sgx_encl_page_alloc()
Reinette Chatre (30):
x86/sgx: Add short descriptions to ENCLS wrappers
x86/sgx: Add wrapper for SGX2 EMODPR function
x86/sgx: Add wrapper for SGX2 EMODT function
x86/sgx: Add wrapper for SGX2 EAUG function
x86/sgx: Support loading enclave page without VMA permissions check
x86/sgx: Export sgx_encl_ewb_cpumask()
x86/sgx: Rename sgx_encl_ewb_cpumask() as sgx_encl_cpumask()
x86/sgx: Move PTE zap code to new sgx_zap_enclave_ptes()
x86/sgx: Make sgx_ipi_cb() available internally
x86/sgx: Create utility to validate user provided offset and length
x86/sgx: Keep record of SGX page type
x86/sgx: Export sgx_encl_{grow,shrink}()
x86/sgx: Support VA page allocation without reclaiming
x86/sgx: Support restricting of enclave page permissions
x86/sgx: Support adding of pages to an initialized enclave
x86/sgx: Tighten accessible memory range after enclave initialization
x86/sgx: Support modifying SGX page type
x86/sgx: Support complete page removal
x86/sgx: Free up EPC pages directly to support large page ranges
Documentation/x86: Introduce enclave runtime management section
selftests/sgx: Add test for EPCM permission changes
selftests/sgx: Add test for TCS page permission changes
selftests/sgx: Test two different SGX2 EAUG flows
selftests/sgx: Introduce dynamic entry point
selftests/sgx: Introduce TCS initialization enclave operation
selftests/sgx: Test complete changing of page type flow
selftests/sgx: Test faulty enclave behavior
selftests/sgx: Test invalid access to removed enclave page
selftests/sgx: Test reclaiming of untouched page
selftests/sgx: Page removal stress test
Documentation/x86/sgx.rst | 15 +
arch/x86/include/asm/sgx.h | 8 +
arch/x86/include/uapi/asm/sgx.h | 61 +
arch/x86/kernel/cpu/sgx/encl.c | 329 +++-
arch/x86/kernel/cpu/sgx/encl.h | 15 +-
arch/x86/kernel/cpu/sgx/encls.h | 33 +
arch/x86/kernel/cpu/sgx/ioctl.c | 640 +++++++-
arch/x86/kernel/cpu/sgx/main.c | 75 +-
arch/x86/kernel/cpu/sgx/sgx.h | 3 +
tools/testing/selftests/sgx/defines.h | 23 +
tools/testing/selftests/sgx/load.c | 41 +
tools/testing/selftests/sgx/main.c | 1435 +++++++++++++++++
tools/testing/selftests/sgx/main.h | 1 +
tools/testing/selftests/sgx/test_encl.c | 68 +
.../selftests/sgx/test_encl_bootstrap.S | 6 +
15 files changed, 2625 insertions(+), 128 deletions(-)
base-commit: ce522ba9ef7e2d9fb22a39eb3371c0c64e2a433e
--
2.25.1
Hello,
The aim of this series is to make resctrl_tests run by using
kselftest framework.
- I modify resctrl_test Makefile and kselftest Makefile,
to enable build/run resctrl_tests by using kselftest framework.
Of course, users can also build/run resctrl_tests without
using framework as before.
- I change the default limited time for resctrl_tests to 120 seconds, to
ensure the resctrl_tests finish in limited time on different environments.
- When resctrl file system is not supported by environment or
resctrl_tests is not run as root, return skip code of kselftest framework.
- If resctrl_tests does not finish in limited time, terminate it as
same as executing ctrl+c that kills parent process and child process.
Difference from v6:
- Fixed the typos.
https://lore.kernel.org/lkml/20220318075807.2921063-1-tan.shaopeng@jp.fujit… [PATCH v6]
This patch series is based on 'next' branch of linux-kselftest.
Note that Patch [4/6] uses KHDR_INCLUDES which is introduced by a patch
on 'next' branch of linux-kselftest (not merged in mainline yet)
linux-kselftest: git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
Shaopeng Tan (6):
selftests/resctrl: Kill child process before parent process terminates
if SIGTERM is received
selftests/resctrl: Change the default limited time to 120 seconds
selftests/resctrl: Fix resctrl_tests' return code to work with
selftest framework
selftests/resctrl: Make resctrl_tests run using kselftest framework
selftests/resctrl: Update README about using kselftest framework to
build/run resctrl_tests
selftests/resctrl: Add missing SPDX license to Makefile
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/resctrl/Makefile | 19 +++------
tools/testing/selftests/resctrl/README | 39 +++++++++++++++----
.../testing/selftests/resctrl/resctrl_tests.c | 4 +-
tools/testing/selftests/resctrl/resctrl_val.c | 1 +
tools/testing/selftests/resctrl/settings | 3 ++
6 files changed, 45 insertions(+), 22 deletions(-)
create mode 100644 tools/testing/selftests/resctrl/settings
--
2.27.0
Changes since V2:
- V2: https://lore.kernel.org/lkml/cover.1647360971.git.reinette.chatre@intel.com/
- Rebased against v5.18-rc4, no functional changes.
- Add text in cover letter and first patch to highlight that
the __cpuid_count() macro provided is not a new implementation but
copied from gcc.
Changes since V1:
- V1: https://lore.kernel.org/lkml/cover.1644000145.git.reinette.chatre@intel.com/
- Change solution to not use __cpuid_count() from compiler's
cpuid.h but instead use a local define of __cpuid_count()
provided in kselftest.h to ensure tests continue working
in all supported environments. (Shuah)
- Rewrite cover letter and changelogs to reflect new solution.
A few tests that require running CPUID do so with a private
implementation of a wrapper for CPUID. This duplication of
the CPUID wrapper should be avoided.
Both gcc and clang/LLVM provide wrappers for CPUID but
the wrappers are not available in the minimal required
version of gcc, v3.2, that the selftests need to be used
in. __cpuid_count() was added to gcc in v4.4, which is ok for
kernels after v4.19 when the gcc minimal required version
was changed to v4.6.
Copy gcc's __cpuid_count() to provide a local define of
__cpuid_count() to kselftest.h to ensure that selftests can
still work in environments with older stable kernels (v4.9
and v4.14 that have the minimal required version of gcc of
v3.2). Update tests with private CPUID wrappers to use the
new macro.
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Sandipan Das <sandipan(a)linux.ibm.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn(a)linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Michal Suchanek <msuchanek(a)suse.de>
Cc: linux-mm(a)kvack.org
Cc: Chang S. Bae <chang.seok.bae(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: x86(a)kernel.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Reinette Chatre (4):
selftests: Provide local define of __cpuid_count()
selftests/vm/pkeys: Use provided __cpuid_count() macro
selftests/x86/amx: Use provided __cpuid_count() macro
selftests/x86/corrupt_xstate_header: Use provided __cpuid_count()
macro
tools/testing/selftests/kselftest.h | 15 ++++++++++++
tools/testing/selftests/vm/pkey-x86.h | 21 ++--------------
tools/testing/selftests/x86/amx.c | 24 ++++++-------------
.../selftests/x86/corrupt_xstate_header.c | 16 ++-----------
4 files changed, 26 insertions(+), 50 deletions(-)
base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b
--
2.25.1
Hi,
this is a revised patch set for RFC I posted some time ago (*).
Since the ZSTD usage became much more popular now, it makes sense to
have the consistent (de)compression support in the kernel, also for
the firmware files. This patch set adds the support for ZSTD-
compressed firmware files as well as the extension of selftests, in
addition to a couple of relevant fixes in selftests.
(*) https://lore.kernel.org/r/20210127154939.13288-1-tiwai@suse.de
Takashi
===
Takashi Iwai (5):
firmware: Add the support for ZSTD-compressed firmware files
selftests: firmware: Use smaller dictionary for XZ compression
selftests: firmware: Fix the request_firmware_into_buf() test for XZ
format
selftests: firmware: Simplify test patterns
selftests: firmware: Add ZSTD compressed file tests
drivers/base/firmware_loader/Kconfig | 24 ++-
drivers/base/firmware_loader/main.c | 76 +++++++-
.../selftests/firmware/fw_filesystem.sh | 170 +++++++++---------
tools/testing/selftests/firmware/fw_lib.sh | 12 +-
4 files changed, 182 insertions(+), 100 deletions(-)
--
2.31.1
Currently if opening /dev/null fails to open then file pointer fp
is null and further access to fp via fprintf will cause a null
pointer dereference. Fix this by returning a negative error value
when a null fp is detected.
Fixes: a2561b12fe39 ("selftests/resctrl: Add built in benchmark")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/resctrl/fill_buf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/fill_buf.c b/tools/testing/selftests/resctrl/fill_buf.c
index 51e5cf22632f..56ccbeae0638 100644
--- a/tools/testing/selftests/resctrl/fill_buf.c
+++ b/tools/testing/selftests/resctrl/fill_buf.c
@@ -121,8 +121,10 @@ static int fill_cache_read(unsigned char *start_ptr, unsigned char *end_ptr,
/* Consume read result so that reading memory is not optimized out. */
fp = fopen("/dev/null", "w");
- if (!fp)
+ if (!fp) {
perror("Unable to write to /dev/null");
+ return -1;
+ }
fprintf(fp, "Sum: %d ", ret);
fclose(fp);
--
2.35.1
Currently the binderfs test says what failure it encountered
without saying why it may occurred when it fails to mount
binderfs. So, Warn about enabling CONFIG_ANDROID_BINDERFS in the
running kernel.
Signed-off-by: Karthik Alapati <mail(a)karthek.com>
---
tools/testing/selftests/filesystems/binderfs/binderfs_test.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c
index 0315955ff0f4..bc1c407651fc 100644
--- a/tools/testing/selftests/filesystems/binderfs/binderfs_test.c
+++ b/tools/testing/selftests/filesystems/binderfs/binderfs_test.c
@@ -412,7 +412,8 @@ TEST(binderfs_stress)
ret = mount(NULL, binderfs_mntpt, "binder", 0, 0);
ASSERT_EQ(ret, 0) {
- TH_LOG("%s - Failed to mount binderfs", strerror(errno));
+ TH_LOG("%s - Failed to mount binderfs, check if CONFIG_ANDROID_BINDERFS is enabled in the running kernel",
+ strerror(errno));
}
for (int i = 0; i < ARRAY_SIZE(fds); i++) {
--
2.35.2
This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.
---
Changes in V4:
mm/memcontrol.c:
- Return -EINTR on signal_pending().
- On the final retry, drain percpu lru caches hoping that it might
introduce some evictable pages for reclaim.
- Simplified the retry loop as suggested by Dan Schatzberg.
selftests:
- Always return -errno on failure from cg_write() (whether open() or
write() fail), also update cg_read() and read_text() to return -errno
as well for consistency. Also make sure to correctly check that the
whole buffer was written in cg_write().
- Added a maximum number of retries for the reclaim selftest.
Changes in V3:
- Fix cg_write() (in patch 2) to properly return -1 if open() fails
and not fail if len == errno.
- Remove debug printf() in patch 3.
Changes in V2:
- Add the interface to root as well.
- Added a selftest.
- Documented the interface as a nested-keyed interface, which makes
adding optional arguments in the future easier (see doc updates in the
first patch).
- Modified the commit message to reflect changes and added a timeout
argument as a suggested possible extension
- Return -EAGAIN if the kernel fails to reclaim the full requested
amount.
---
Shakeel Butt (1):
memcg: introduce per-memcg reclaim interface
Yosry Ahmed (3):
selftests: cgroup: return -errno from cg_read()/cg_write() on failure
selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
selftests: cgroup: add a selftest for memory.reclaim
Documentation/admin-guide/cgroup-v2.rst | 21 +++++
mm/memcontrol.c | 44 +++++++++
tools/testing/selftests/cgroup/cgroup_util.c | 44 ++++-----
.../selftests/cgroup/test_memcontrol.c | 94 ++++++++++++++++++-
4 files changed, 176 insertions(+), 27 deletions(-)
--
2.36.0.rc2.479.g8af0fa9b8e-goog
This series is just a set of minor tweaks and improvements for the MTE
tests that I did while working on the asymmetric mode support for
userspace which seemed like they might be worth keeping even though the
prctl() for asymmetric mode got removed.
v2:
- Rebase onto v5.18-rc3
Mark Brown (4):
kselftest/arm64: Handle more kselftest result codes in MTE helpers
kselftest/arm64: Log unexpected asynchronous MTE faults
kselftest/arm64: Refactor parameter checking in mte_switch_mode()
kselftest/arm64: Add simple test for MTE prctl
tools/testing/selftests/arm64/mte/.gitignore | 1 +
.../testing/selftests/arm64/mte/check_prctl.c | 119 ++++++++++++++++++
.../selftests/arm64/mte/mte_common_util.c | 19 ++-
.../selftests/arm64/mte/mte_common_util.h | 15 ++-
4 files changed, 149 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/arm64/mte/check_prctl.c
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.30.2
This series has a couple of minor fixes and cleanups for sve-ptrace plus
the addition of a new test which validates that we can write using the
FPSIMD regset and then read matching data back using the SVE regset -
previously we only validated writing SVE and reading FPSIMD data.
v2
- Rebase onto v5.18-rc1
Mark Brown (3):
kselftest/arm64: Fix comment for ptrace_sve_get_fpsimd_data()
kselftest/arm64: Remove assumption that tasks start FPSIMD only
kselftest/arm64: Validate setting via FPSIMD and read via SVE regsets
tools/testing/selftests/arm64/fp/sve-ptrace.c | 164 +++++++++++++++---
1 file changed, 139 insertions(+), 25 deletions(-)
base-commit: 3123109284176b1532874591f7c81f3837bbdc17
--
2.30.2
The first patch of this series is an improvement to the existing
syncookie BPF helper. The second patch is a documentation fix.
The third patch allows BPF helpers to accept memory regions of fixed
size without doing runtime size checks.
The two last patches add new functionality that allows XDP to
accelerate iptables synproxy.
v1 of this series [1] used to include a patch that exposed conntrack
lookup to BPF using stable helpers. It was superseded by series [2] by
Kumar Kartikeya Dwivedi, which implements this functionality using
unstable helpers.
The fourth patch adds new helpers to issue and check SYN cookies without
binding to a socket, which is useful in the synproxy scenario.
The fifth patch adds a selftest, which consists of a script, an XDP
program and a userspace control application. The XDP program uses
socketless SYN cookie helpers and queries conntrack status instead of
socket status. The userspace control application allows to tune
parameters of the XDP program. This program also serves as a minimal
example of usage of the new functionality.
The draft of the new functionality was presented on Netdev 0x15 [3].
v2 changes:
Split into two series, submitted bugfixes to bpf, dropped the conntrack
patches, implemented the timestamp cookie in BPF using bpf_loop, dropped
the timestamp cookie patch.
v3 changes:
Moved some patches from bpf to bpf-next, dropped the patch that changed
error codes, split the new helpers into IPv4/IPv6, added verifier
functionality to accept memory regions of fixed size.
v4 changes:
Converted the selftest to the test_progs runner. Replaced some
deprecated functions in xdp_synproxy userspace helper.
v5 changes:
Fixed a bug in the selftest. Added questionable functionality to support
new helpers in TC BPF, added selftests for it.
[1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/
[2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/
[3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP
Maxim Mikityanskiy (6):
bpf: Use ipv6_only_sock in bpf_tcp_gen_syncookie
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf: Allow helpers to accept pointers with a fixed size
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Add selftests for raw syncookie helpers
bpf: Allow the new syncookie helpers to work with SKBs
include/linux/bpf.h | 10 +
include/net/tcp.h | 1 +
include/uapi/linux/bpf.h | 100 ++-
kernel/bpf/verifier.c | 26 +-
net/core/filter.c | 136 ++-
net/ipv4/tcp_input.c | 3 +-
scripts/bpf_doc.py | 4 +
tools/include/uapi/linux/bpf.h | 100 ++-
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/xdp_synproxy.c | 144 +++
.../selftests/bpf/progs/xdp_synproxy_kern.c | 819 ++++++++++++++++++
tools/testing/selftests/bpf/xdp_synproxy.c | 466 ++++++++++
13 files changed, 1790 insertions(+), 22 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
create mode 100644 tools/testing/selftests/bpf/xdp_synproxy.c
--
2.30.2
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v5:
Rebase on top of v5.18-rc3
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.
Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.
Remove all GPU cgroup code except what's necessary to support charge transfer
from dma_buf. Previously charging was done in export, but for non-Android
graphics use-cases this is not ideal since there may be a delay between
allocation and export, during which time there is no accounting.
Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
result of above.
Put the charge and uncharge code in the same file (system_heap_allocate,
system_heap_dma_buf_release) instead of splitting them between the heap and
the dma_buf_release. This avoids asymmetric management of the gpucg charges.
Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.
Support all strings for gpucg_register_bucket instead of just string
literals.
Enforce globally unique gpucg_bucket names.
Constrain gpucg_bucket name lengths to 64 bytes.
Append "-heap" to gpucg_bucket names from dmabuf-heaps.
Drop patch 7 from the series, which changed the types of
binder_transaction_data's sender_pid and sender_euid fields. This was
done in another commit here:
https://lore.kernel.org/all/20220210021129.3386083-4-masahiroy@kernel.org/
Rename:
gpucg_try_charge -> gpucg_charge
find_cg_rpool_locked -> cg_rpool_find_locked
init_cg_rpool -> cg_rpool_init
get_cg_rpool_locked -> cg_rpool_get_locked
"gpu cgroup controller" -> "GPU controller"
gpucg_device -> gpucg_bucket
usage -> size
Tests:
Support both binder_fd_array_object and binder_fd_object. This is
necessary because new versions of Android will use binder_fd_object
instead of binder_fd_array_object, and we need to support both.
Tests for both binder_fd_array_object and binder_fd_object.
For binder_utils return error codes instead of
struct binder{fs}_ctx.
Use ifdef __ANDROID__ to choose platform-dependent temp path instead
of a runtime fallback.
Ensure binderfs_mntpt ends with a trailing '/' character instead of
prepending it where used.
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (3):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
binder: Add flags to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
selftests: Add binder cgroup gpu memory transfer tests
Documentation/gpu/rfc/gpu-cgroup.rst | 190 +++++++
Documentation/gpu/rfc/index.rst | 4 +
drivers/android/binder.c | 27 +-
drivers/dma-buf/dma-buf.c | 80 ++-
drivers/dma-buf/dma-heap.c | 39 ++
drivers/dma-buf/heaps/system_heap.c | 28 +-
include/linux/cgroup_gpu.h | 137 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 49 +-
include/linux/dma-heap.h | 15 +
include/uapi/linux/android/binder.h | 23 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 386 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 250 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 526 ++++++++++++++++++
19 files changed, 1787 insertions(+), 23 deletions(-)
create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.36.0.rc0.470.gd361397f0d-goog
I am getting the following compilation error for prog_tests/uprobe_autoattach.c
tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c: In function ‘test_uprobe_autoattach’:
./test_progs.h:209:26: error: pointer ‘mem’ may be used after ‘free’ [-Werror=use-after-free]
mem variable is now used in one of the asserts so it shouldn't be freed right
away. Move free(mem) after the assert block.
Fixes: 1717e248014c ("selftests/bpf: Uprobe tests should verify param/return values")
Signed-off-by: Artem Savkov <asavkov(a)redhat.com>
---
tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
index d6003dc8cc99..35b87c7ba5be 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_autoattach.c
@@ -34,7 +34,6 @@ void test_uprobe_autoattach(void)
/* trigger & validate shared library u[ret]probes attached by name */
mem = malloc(malloc_sz);
- free(mem);
ASSERT_EQ(skel->bss->uprobe_byname_parm1, trigger_val, "check_uprobe_byname_parm1");
ASSERT_EQ(skel->bss->uprobe_byname_ran, 1, "check_uprobe_byname_ran");
@@ -44,6 +43,8 @@ void test_uprobe_autoattach(void)
ASSERT_EQ(skel->bss->uprobe_byname2_ran, 3, "check_uprobe_byname2_ran");
ASSERT_EQ(skel->bss->uretprobe_byname2_rc, mem, "check_uretprobe_byname2_rc");
ASSERT_EQ(skel->bss->uretprobe_byname2_ran, 4, "check_uretprobe_byname2_ran");
+
+ free(mem);
cleanup:
test_uprobe_autoattach__destroy(skel);
}
--
2.35.1
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam serdecznie,
Mikołaj Rudzik
This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.
---
Changes in V3:
- Fix cg_write() (in patch 2) to properly return -1 if open() fails
and not fail if len == errno.
- Remove debug printf() in patch 3.
Changes in V2:
- Add the interface to root as well.
- Added a selftest.
- Documented the interface as a nested-keyed interface, which makes
adding optional arguments in the future easier (see doc updates in the
first patch).
- Modified the commit message to reflect changes and add a timeout
argument as a suggested possible extension
- Return -EAGAIN if the kernel fails to reclaim the full requested
amount.
---
Shakeel Butt (1):
memcg: introduce per-memcg reclaim interface
Yosry Ahmed (3):
selftests: cgroup: return the errno of write() in cg_write() on
failure
selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
selftests: cgroup: add a selftest for memory.reclaim
Documentation/admin-guide/cgroup-v2.rst | 21 +++++
mm/memcontrol.c | 37 ++++++++
tools/testing/selftests/cgroup/cgroup_util.c | 32 ++++---
.../selftests/cgroup/test_memcontrol.c | 93 ++++++++++++++++++-
4 files changed, 166 insertions(+), 17 deletions(-)
--
2.35.1.1178.g4f1659d476-goog
This patch series is motivated by Shuah's suggestion here:
https://lore.kernel.org/kvm/d576d8f7-980f-3bc6-87ad-5a6ae45609b8@linuxfound…
Many s390x KVM selftests do not output any information about which
tests have been run, so it's hard to say whether a test binary
contains a certain sub-test or not. To improve this situation let's
add some TAP output via the kselftest.h interface to these tests,
so that it easier to understand what has been executed or not.
Thomas Huth (4):
KVM: s390: selftests: Use TAP interface in the memop test
KVM: s390: selftests: Use TAP interface in the sync_regs test
KVM: s390: selftests: Use TAP interface in the tprot test
KVM: s390: selftests: Use TAP interface in the reset test
tools/testing/selftests/kvm/s390x/memop.c | 90 +++++++++++++++----
tools/testing/selftests/kvm/s390x/resets.c | 38 ++++++--
.../selftests/kvm/s390x/sync_regs_test.c | 86 +++++++++++++-----
tools/testing/selftests/kvm/s390x/tprot.c | 12 ++-
4 files changed, 179 insertions(+), 47 deletions(-)
--
2.27.0
Changes since V1:
- V1: https://lore.kernel.org/lkml/cover.1644000145.git.reinette.chatre@intel.com/
- Change solution to not use __cpuid_count() from compiler's
cpuid.h but instead use a local define of __cpuid_count()
provided in kselftest.h to ensure tests continue working
in all supported environments. (Shuah)
- Rewrite cover letter and changelogs to reflect new solution.
A few tests that require running CPUID do so with a private
implementation of a wrapper for CPUID. This duplication of
the CPUID wrapper should be avoided.
Both gcc and clang/LLVM provide wrappers for CPUID but
the wrappers are not available in the minimal required
version of gcc, v3.2, that the selftests need to be used
in. __cpuid_count() was added to gcc in v4.4, which is ok for
kernels after v4.19 when the gcc minimal required version
was changed to v4.6.
Add a local define of __cpuid_count() to kselftest.h to
ensure that selftests can still work in environments with
older stable kernels (v4.9 and v4.14 that have the minimal
required version of gcc of v3.2). Update tests with private
CPUID wrappers to use the new macro.
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Sandipan Das <sandipan(a)linux.ibm.com>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn(a)linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Thiago Jung Bauermann <bauerman(a)linux.ibm.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Michal Suchanek <msuchanek(a)suse.de>
Cc: linux-mm(a)kvack.org
Cc: Chang S. Bae <chang.seok.bae(a)intel.com>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: x86(a)kernel.org
Cc: Andy Lutomirski <luto(a)kernel.org>
Reinette Chatre (4):
selftests: Provide local define of __cpuid_count()
selftests/vm/pkeys: Use provided __cpuid_count() macro
selftests/x86/amx: Use provided __cpuid_count() macro
selftests/x86/corrupt_xstate_header: Use provided __cpuid_count()
macro
tools/testing/selftests/kselftest.h | 15 ++++++++++++
tools/testing/selftests/vm/pkey-x86.h | 21 ++--------------
tools/testing/selftests/x86/amx.c | 24 ++++++-------------
.../selftests/x86/corrupt_xstate_header.c | 16 ++-----------
4 files changed, 26 insertions(+), 50 deletions(-)
base-commit: 09688c0166e76ce2fb85e86b9d99be8b0084cdf9
--
2.25.1
The mremap test currently segfaults because mremap
does not have a NOREPLACE flag which will fail if the
remap destination address collides with an existing mapping.
The segfault is caused by the mremap call destorying the
text region mapping of the program. This patch series fixes
the segfault by sanitizing the mremap destination address
and introduces other minor fixes to the test case.
Sidhartha Kumar (4):
selftest/vm: verify mmap addr in mremap_test
selftest/vm: verify remap destination address in mremap_test
selftest/vm: support xfail in mremap_test
selftest/vm: add skip support to mremap_test
tools/testing/selftests/vm/mremap_test.c | 79 ++++++++++++++++++++++-
tools/testing/selftests/vm/run_vmtests.sh | 11 +++-
2 files changed, 85 insertions(+), 5 deletions(-)
--
2.27.0
This series is just a set of minor tweaks and improvements for the MTE
tests that I did while working on the asymmetric mode support for
userspace which seemed like they might be worth keeping even though the
prctl() for asymmetric mode got removed.
v2:
- Rebase onto v5.18-rc3
Mark Brown (4):
kselftest/arm64: Handle more kselftest result codes in MTE helpers
kselftest/arm64: Log unexpected asynchronous MTE faults
kselftest/arm64: Refactor parameter checking in mte_switch_mode()
kselftest/arm64: Add simple test for MTE prctl
tools/testing/selftests/arm64/mte/.gitignore | 1 +
.../testing/selftests/arm64/mte/check_prctl.c | 119 ++++++++++++++++++
.../selftests/arm64/mte/mte_common_util.c | 19 ++-
.../selftests/arm64/mte/mte_common_util.h | 15 ++-
4 files changed, 149 insertions(+), 5 deletions(-)
create mode 100644 tools/testing/selftests/arm64/mte/check_prctl.c
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.30.2
Add support for a new kind of kunit_suite registration macro called
kunit_test_init_section_suite(); this new registration macro allows the
registration of kunit_suites that reference functions marked __init and
data marked __initdata.
Signed-off-by: Brendan Higgins <brendanhiggins(a)google.com>
Tested-by: Martin Fernandez <martin.fernandez(a)eclypsium.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Reviewed-by: David Gow <davidgow(a)google.com>
---
Changes since last version:
Renamed the new kunit_suite registration macro for init functions to a
more readable name.
---
include/kunit/test.h | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 00b9ff7783ab..5a870f2d81f4 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -380,6 +380,34 @@ static inline int kunit_run_all_tests(void)
#define kunit_test_suite(suite) kunit_test_suites(&suite)
+/**
+ * kunit_test_init_section_suites() - used to register one or more &struct
+ * kunit_suite containing init functions or
+ * init data.
+ *
+ * @__suites: a statically allocated list of &struct kunit_suite.
+ *
+ * This functions identically as &kunit_test_suites() except that it suppresses
+ * modpost warnings for referencing functions marked __init or data marked
+ * __initdata; this is OK because currently KUnit only runs tests upon boot
+ * during the init phase or upon loading a module during the init phase.
+ *
+ * NOTE TO KUNIT DEVS: If we ever allow KUnit tests to be run after boot, these
+ * tests must be excluded.
+ *
+ * The only thing this macro does that's different from kunit_test_suites is
+ * that it suffixes the array and suite declarations it makes with _probe;
+ * modpost suppresses warnings about referencing init data for symbols named in
+ * this manner.
+ */
+#define kunit_test_init_section_suites(__suites...) \
+ __kunit_test_suites(CONCATENATE(__UNIQUE_ID(array), _probe), \
+ CONCATENATE(__UNIQUE_ID(suites), _probe), \
+ ##__suites)
+
+#define kunit_test_init_section_suite(suite) \
+ kunit_test_init_section_suites(&suite)
+
#define kunit_suite_for_each_test_case(suite, test_case) \
for (test_case = suite->test_cases; test_case->run_case; test_case++)
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.36.0.rc0.470.gd361397f0d-goog
Historically, it has been shown that intercepting kernel faults with
userfaultfd (thereby forcing the kernel to wait for an arbitrary amount
of time) can be exploited, or at least can make some kinds of exploits
easier. So, in 37cd0575b8 "userfaultfd: add UFFD_USER_MODE_ONLY" we
changed things so, in order for kernel faults to be handled by
userfaultfd, either the process needs CAP_SYS_PTRACE, or this sysctl
must be configured so that any unprivileged user can do it.
In a typical implementation of a hypervisor with live migration (take
QEMU/KVM as one such example), we do indeed need to be able to handle
kernel faults. But, both options above are less than ideal:
- Toggling the sysctl increases attack surface by allowing any
unprivileged user to do it.
- Granting the live migration process CAP_SYS_PTRACE gives it this
ability, but *also* the ability to "observe and control the
execution of another process [...], and examine and change [its]
memory and registers" (from ptrace(2)). This isn't something we need
or want to be able to do, so granting this permission violates the
"principle of least privilege".
This is all a long winded way to say: we want a more fine-grained way to
grant access to userfaultfd, without granting other additional
permissions at the same time.
To achieve this, add a /dev/userfaultfd misc device. This device
provides an alternative to the userfaultfd(2) syscall for the creation
of new userfaultfds. The idea is, any userfaultfds created this way will
be able to handle kernel faults, without the caller having any special
capabilities. Access to this mechanism is instead restricted using e.g.
standard filesystem permissions.
Signed-off-by: Axel Rasmussen <axelrasmussen(a)google.com>
---
fs/userfaultfd.c | 79 ++++++++++++++++++++++++++------
include/uapi/linux/userfaultfd.h | 4 ++
2 files changed, 69 insertions(+), 14 deletions(-)
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index aa0c47cb0d16..16d7573ab41a 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -29,6 +29,7 @@
#include <linux/ioctl.h>
#include <linux/security.h>
#include <linux/hugetlb.h>
+#include <linux/miscdevice.h>
int sysctl_unprivileged_userfaultfd __read_mostly;
@@ -65,6 +66,8 @@ struct userfaultfd_ctx {
unsigned int flags;
/* features requested from the userspace */
unsigned int features;
+ /* whether or not to handle kernel faults */
+ bool handle_kernel_faults;
/* released */
bool released;
/* memory mappings are changing because of non-cooperative event */
@@ -410,13 +413,8 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason)
if (ctx->features & UFFD_FEATURE_SIGBUS)
goto out;
- if ((vmf->flags & FAULT_FLAG_USER) == 0 &&
- ctx->flags & UFFD_USER_MODE_ONLY) {
- printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
- "sysctl knob to 1 if kernel faults must be handled "
- "without obtaining CAP_SYS_PTRACE capability\n");
+ if (!(vmf->flags & FAULT_FLAG_USER) && !ctx->handle_kernel_faults)
goto out;
- }
/*
* If it's already released don't get it. This avoids to loop
@@ -2064,19 +2062,33 @@ static void init_once_userfaultfd_ctx(void *mem)
seqcount_spinlock_init(&ctx->refile_seq, &ctx->fault_pending_wqh.lock);
}
-SYSCALL_DEFINE1(userfaultfd, int, flags)
+static inline bool userfaultfd_allowed(bool is_syscall, int flags)
+{
+ bool kernel_faults = !(flags & UFFD_USER_MODE_ONLY);
+ bool allow_unprivileged = sysctl_unprivileged_userfaultfd;
+
+ /* userfaultfd(2) access is controlled by sysctl + capability. */
+ if (is_syscall && kernel_faults) {
+ if (!allow_unprivileged && !capable(CAP_SYS_PTRACE))
+ return false;
+ }
+
+ /*
+ * For /dev/userfaultfd, access is to be controlled using e.g.
+ * permissions on the device node. We assume this is correctly
+ * configured by userspace, so we simply allow access here.
+ */
+
+ return true;
+}
+
+static int new_userfaultfd(bool is_syscall, int flags)
{
struct userfaultfd_ctx *ctx;
int fd;
- if (!sysctl_unprivileged_userfaultfd &&
- (flags & UFFD_USER_MODE_ONLY) == 0 &&
- !capable(CAP_SYS_PTRACE)) {
- printk_once(KERN_WARNING "uffd: Set unprivileged_userfaultfd "
- "sysctl knob to 1 if kernel faults must be handled "
- "without obtaining CAP_SYS_PTRACE capability\n");
+ if (!userfaultfd_allowed(is_syscall, flags))
return -EPERM;
- }
BUG_ON(!current->mm);
@@ -2095,6 +2107,11 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
refcount_set(&ctx->refcount, 1);
ctx->flags = flags;
ctx->features = 0;
+ /*
+ * If UFFD_USER_MODE_ONLY is not set, then userfaultfd_allowed() above
+ * decided that kernel faults were allowed and should be handled.
+ */
+ ctx->handle_kernel_faults = !(flags & UFFD_USER_MODE_ONLY);
ctx->released = false;
atomic_set(&ctx->mmap_changing, 0);
ctx->mm = current->mm;
@@ -2110,8 +2127,42 @@ SYSCALL_DEFINE1(userfaultfd, int, flags)
return fd;
}
+SYSCALL_DEFINE1(userfaultfd, int, flags)
+{
+ return new_userfaultfd(true, flags);
+}
+
+static int userfaultfd_dev_open(struct inode *inode, struct file *file)
+{
+ return 0;
+}
+
+static long userfaultfd_dev_ioctl(struct file *file, unsigned int cmd, unsigned long flags)
+{
+ if (cmd != USERFAULTFD_IOC_NEW)
+ return -EINVAL;
+
+ return new_userfaultfd(false, flags);
+}
+
+static const struct file_operations userfaultfd_dev_fops = {
+ .open = userfaultfd_dev_open,
+ .unlocked_ioctl = userfaultfd_dev_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
+ .owner = THIS_MODULE,
+ .llseek = noop_llseek,
+};
+
+static struct miscdevice userfaultfd_misc = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "userfaultfd",
+ .fops = &userfaultfd_dev_fops
+};
+
static int __init userfaultfd_init(void)
{
+ WARN_ON(misc_register(&userfaultfd_misc));
+
userfaultfd_ctx_cachep = kmem_cache_create("userfaultfd_ctx_cache",
sizeof(struct userfaultfd_ctx),
0,
diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h
index ef739054cb1c..032a35b3bbd2 100644
--- a/include/uapi/linux/userfaultfd.h
+++ b/include/uapi/linux/userfaultfd.h
@@ -12,6 +12,10 @@
#include <linux/types.h>
+/* ioctls for /dev/userfaultfd */
+#define USERFAULTFD_IOC 0xAA
+#define USERFAULTFD_IOC_NEW _IOWR(USERFAULTFD_IOC, 0x00, int)
+
/*
* If the UFFDIO_API is upgraded someday, the UFFDIO_UNREGISTER and
* UFFDIO_WAKE ioctls should be defined as _IOW and not as _IOR. In
--
2.35.1.1178.g4f1659d476-goog