From: Zi Yan <ziy(a)nvidia.com>
Hi all,
File folio supports any order and people would like to support flexible orders
for anonymous folio[1] too. Currently, split_huge_page() only splits a huge
page to order-0 pages, but splitting to orders higher than 0 is also useful.
This patchset adds support for splitting a huge page to any lower order pages
and uses it during folio truncate operations.
The patchset is on top of mm-everything-2023-03-27-21-20.
Changelog from v1
===
1. Changed split_page_memcg() and split_page_owner() parameter to use order
2. Used folio_test_pmd_mappable() in place of the equivalent code
Details
===
* Patch 1 changes split_page_memcg() to use order instead of nr_pages
* Patch 2 changes split_page_owner() to use order instead of nr_pages
* Patch 3 and 4 add new_order parameter split_page_memcg() and
split_page_owner() and prepare for upcoming changes.
* Patch 5 adds split_huge_page_to_list_to_order() to split a huge page
to any lower order. The original split_huge_page_to_list() calls
split_huge_page_to_list_to_order() with new_order = 0.
* Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio
truncation instead of split the large folio all the way down to order-0.
* Patch 7 adds a test API to debugfs and test cases in
split_huge_page_test selftests.
Comments and/or suggestions are welcome.
[1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/
Zi Yan (7):
mm/memcg: use order instead of nr in split_page_memcg()
mm/page_owner: use order instead of nr in split_page_owner()
mm: memcg: make memcg huge page split support any order split.
mm: page_owner: add support for splitting to any order in split
page_owner.
mm: thp: split huge page to any lower order pages.
mm: truncate: split huge page cache page to a non-zero order if
possible.
mm: huge_memory: enable debugfs to split huge pages to any order.
include/linux/huge_mm.h | 10 +-
include/linux/memcontrol.h | 4 +-
include/linux/page_owner.h | 10 +-
mm/huge_memory.c | 137 ++++++++---
mm/memcontrol.c | 10 +-
mm/page_alloc.c | 8 +-
mm/page_owner.c | 10 +-
mm/truncate.c | 21 +-
.../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++-
9 files changed, 366 insertions(+), 69 deletions(-)
--
2.39.2
Hi all,
following bug is trying to workaround an error on ppc64le, where
zram01.sh LTP test (there is also kernel selftest
tools/testing/selftests/zram/zram01.sh, but LTP test got further
updates) has often mem_used_total 0 although zram is already filled.
Patch tries to repeatedly read /sys/block/zram*/mm_stat for 1 sec,
waiting for mem_used_total > 0. The question if this is expected and
should be workarounded or a bug which should be fixed.
REPRODUCE THE ISSUE
Quickest way to install only zram tests and their dependencies:
make autotools && ./configure && for i in testcases/lib/ testcases/kernel/device-drivers/zram/; do cd $i && make -j$(getconf _NPROCESSORS_ONLN) && make install && cd -; done
Run the test (only on vfat)
PATH="/opt/ltp/testcases/bin:$PATH" LTP_SINGLE_FS_TYPE=vfat zram01.sh
Petr Vorel (1):
zram01.sh: Workaround division by 0 on vfat on ppc64le
.../kernel/device-drivers/zram/zram01.sh | 27 +++++++++++++++++--
1 file changed, 25 insertions(+), 2 deletions(-)
--
2.38.0
I checked and the Landlock ptrace test failed because Yama is enabled,
which is expected. You can check that with
/proc/sys/kernel/yama/ptrace_scope
Jeff Xu sent a patch to fix this case but it is not ready yet:
https://lore.kernel.org/r/20220628222941.2642917-1-jeffxu@google.com
Could you please send a new patch Jeff, and add Limin in Cc?
On 29/11/2022 12:26, limin wrote:
> cat /proc/cmdline
> BOOT_IMAGE=/vmlinuz-6.1.0-next-20221116
> root=UUID=a65b3a79-dc02-4728-8a0c-5cf24f4ae08b ro
> systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all
>
>
> config
>
> #
> # Automatically generated file; DO NOT EDIT.
> # Linux/x86 6.1.0-rc6 Kernel Configuration
> #
[...]
> CONFIG_SECURITY_YAMA=y
[...]
> CONFIG_LSM="landlock,lockdown,yama,integrity,apparmor"
[...]
>
> On 2022/11/29 19:03, Mickaël Salaün wrote:
>> I tested with next-20221116 and all tests are OK. Could you share your
>> kernel configuration with a link? What is the content of /proc/cmdline?
>>
>> On 29/11/2022 02:42, limin wrote:
>>> I run test on Linux ubuntu2204 6.1.0-next-20221116
>>>
>>> I did't use yama.
>>>
>>> you can reproduce by this step:
>>>
>>> cd kernel_src
>>>
>>> cd tools/testing/selftests/landlock/
>>> make
>>> ./ptrace_test
>>>
>>>
>>>
>>>
>>> On 2022/11/29 3:44, Mickaël Salaün wrote:
>>>> This patch changes the test semantic and then cannot work on my test
>>>> environment. On which kernel did you run test? Do you use Yama or
>>>> something similar?
>>>>
>>>> On 28/11/2022 03:04, limin wrote:
>>>>> Tests PTRACE_ATTACH and PTRACE_MODE_READ on the parent,
>>>>> trace parent return -1 when child== 0
>>>>> How to reproduce warning:
>>>>> $ make -C tools/testing/selftests TARGETS=landlock run_tests
>>>>>
>>>>> Signed-off-by: limin <limin100(a)huawei.com>
>>>>> ---
>>>>> tools/testing/selftests/landlock/ptrace_test.c | 5 ++---
>>>>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/tools/testing/selftests/landlock/ptrace_test.c
>>>>> b/tools/testing/selftests/landlock/ptrace_test.c
>>>>> index c28ef98ff3ac..88c4dc63eea0 100644
>>>>> --- a/tools/testing/selftests/landlock/ptrace_test.c
>>>>> +++ b/tools/testing/selftests/landlock/ptrace_test.c
>>>>> @@ -267,12 +267,11 @@ TEST_F(hierarchy, trace)
>>>>> /* Tests PTRACE_ATTACH and PTRACE_MODE_READ on the
>>>>> parent. */
>>>>> err_proc_read = test_ptrace_read(parent);
>>>>> ret = ptrace(PTRACE_ATTACH, parent, NULL, 0);
>>>>> + EXPECT_EQ(-1, ret);
>>>>> + EXPECT_EQ(EPERM, errno);
>>>>> if (variant->domain_child) {
>>>>> - EXPECT_EQ(-1, ret);
>>>>> - EXPECT_EQ(EPERM, errno);
>>>>> EXPECT_EQ(EACCES, err_proc_read);
>>>>> } else {
>>>>> - EXPECT_EQ(0, ret);
>>>>> EXPECT_EQ(0, err_proc_read);
>>>>> }
>>>>> if (ret == 0) {
From: Jeff Xu <jeffxu(a)google.com>
Since Linux introduced the memfd feature, memfd have always had their
execute bit set, and the memfd_create() syscall doesn't allow setting
it differently.
However, in a secure by default system, such as ChromeOS, (where all
executables should come from the rootfs, which is protected by Verified
boot), this executable nature of memfd opens a door for NoExec bypass
and enables “confused deputy attack”. E.g, in VRP bug [1]: cros_vm
process created a memfd to share the content with an external process,
however the memfd is overwritten and used for executing arbitrary code
and root escalation. [2] lists more VRP in this kind.
On the other hand, executable memfd has its legit use, runc uses memfd’s
seal and executable feature to copy the contents of the binary then
execute them, for such system, we need a solution to differentiate runc's
use of executable memfds and an attacker's [3].
To address those above, this set of patches add following:
1> Let memfd_create() set X bit at creation time.
2> Let memfd to be sealed for modifying X bit.
3> A new pid namespace sysctl: vm.memfd_noexec to control the behavior of
X bit.For example, if a container has vm.memfd_noexec=2, then
memfd_create() without MFD_NOEXEC_SEAL will be rejected.
4> A new security hook in memfd_create(). This make it possible to a new
LSM, which rejects or allows executable memfd based on its security policy.
Change history:
v8:
- Update ref bug in cover letter.
- Add Reviewed-by field.
- Remove security hook (security_memfd_create) patch, which will have
its own patch set in future.
v7:
- patch 2/6: remove #ifdef and MAX_PATH (memfd_test.c).
- patch 3/6: check capability (CAP_SYS_ADMIN) from userns instead of
global ns (pid_sysctl.h). Add a tab (pid_namespace.h).
- patch 5/6: remove #ifdef (memfd_test.c)
- patch 6/6: remove unneeded security_move_mount(security.c).
v6:https://lore.kernel.org/lkml/20221206150233.1963717-1-jeffxu@google.com/
- Address comment and move "#ifdef CONFIG_" from .c file to pid_sysctl.h
v5:https://lore.kernel.org/lkml/20221206152358.1966099-1-jeffxu@google.com/
- Pass vm.memfd_noexec from current ns to child ns.
- Fix build issue detected by kernel test robot.
- Add missing security.c
v3:https://lore.kernel.org/lkml/20221202013404.163143-1-jeffxu@google.com/
- Address API design comments in v2.
- Let memfd_create() to set X bit at creation time.
- A new pid namespace sysctl: vm.memfd_noexec to control behavior of X bit.
- A new security hook in memfd_create().
v2:https://lore.kernel.org/lkml/20220805222126.142525-1-jeffxu@google.com/
- address comments in V1.
- add sysctl (vm.mfd_noexec) to set the default file permissions of
memfd_create to be non-executable.
v1:https://lwn.net/Articles/890096/
[1] https://crbug.com/1305267
[2] https://bugs.chromium.org/p/chromium/issues/list?q=type%3Dbug-security%20me…
[3] https://lwn.net/Articles/781013/
Daniel Verkamp (2):
mm/memfd: add F_SEAL_EXEC
selftests/memfd: add tests for F_SEAL_EXEC
Jeff Xu (3):
mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC
mm/memfd: Add write seals when apply SEAL_EXEC to executable memfd
selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC
include/linux/pid_namespace.h | 19 ++
include/uapi/linux/fcntl.h | 1 +
include/uapi/linux/memfd.h | 4 +
kernel/pid_namespace.c | 5 +
kernel/pid_sysctl.h | 59 ++++
mm/memfd.c | 56 +++-
mm/shmem.c | 6 +
tools/testing/selftests/memfd/fuse_test.c | 1 +
tools/testing/selftests/memfd/memfd_test.c | 341 ++++++++++++++++++++-
9 files changed, 489 insertions(+), 3 deletions(-)
create mode 100644 kernel/pid_sysctl.h
base-commit: eb7081409f94a9a8608593d0fb63a1aa3d6f95d8
--
2.39.0.rc1.256.g54fd8350bd-goog
The default timeout for kselftests is 45 seconds, but pcm-test can take
longer than that to run depending on the number of PCMs present on a
device.
As a data point, running pcm-test on mt8192-asurada-spherion takes about
1m15s.
Set the timeout to 10 minutes, which should give enough slack to run the
test even on devices with many PCMs.
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
tools/testing/selftests/alsa/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/alsa/settings
diff --git a/tools/testing/selftests/alsa/settings b/tools/testing/selftests/alsa/settings
new file mode 100644
index 000000000000..a62d2fa1275c
--- /dev/null
+++ b/tools/testing/selftests/alsa/settings
@@ -0,0 +1 @@
+timeout=600
--
2.39.0