October 2023 - Linux-kselftest-mirror

[syzbot] WARNING in btrfs_free_reserved_data_space_noquota

by syzbot

Hello, syzbot found the following issue on: HEAD commit: b7b275e60bcd Linux 6.1-rc7 git tree: upstream console+strace: https://syzkaller.appspot.com/x/log.txt?x=158a7b73880000 kernel config: https://syzkaller.appspot.com/x/.config?x=2325e409a9a893e1 dashboard link: https://syzkaller.appspot.com/bug?extid=adec8406ad17413d4c06 compiler: Debian clang version 13.0.1-++20220126092033+75e33f71c2da-1~exp1~20220126212112.63, GNU ld (GNU Binutils for Debian) 2.35.2 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=169ccb75880000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=17bf7153880000 Downloadable assets: disk image: https://storage.googleapis.com/syzbot-assets/525233126d34/disk-b7b275e6.raw… vmlinux: https://storage.googleapis.com/syzbot-assets/e8299bf41400/vmlinux-b7b275e6.… kernel image: https://storage.googleapis.com/syzbot-assets/eebf691dbf6f/bzImage-b7b275e6.… mounted in repro: https://storage.googleapis.com/syzbot-assets/5423c2d2ad62/mount_0.gz The issue was bisected to: commit c814bf958926ff45a9c1e899bd001006ab6cfbae Author: ye xingchen <ye.xingchen(a)zte.com.cn> Date: Tue Aug 16 10:51:06 2022 +0000 powerpc/selftests: Use timersub() for gettimeofday() bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=118c3d03880000 final oops: https://syzkaller.appspot.com/x/report.txt?x=138c3d03880000 console output: https://syzkaller.appspot.com/x/log.txt?x=158c3d03880000 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+adec8406ad17413d4c06(a)syzkaller.appspotmail.com Fixes: c814bf958926 ("powerpc/selftests: Use timersub() for gettimeofday()") RDX: 0000000000000001 RSI: 0000000020000280 RDI: 0000000000000005 RBP: 00007ffd32e91c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008000000 R11: 0000000000000246 R12: 0000000000000006 R13: 00007ffd32e91cb0 R14: 00007ffd32e91c90 R15: 0000000000000006 </TASK> ------------[ cut here ]------------ WARNING: CPU: 1 PID: 3764 at fs/btrfs/space-info.h:122 btrfs_space_info_free_bytes_may_use fs/btrfs/space-info.h:154 [inline] WARNING: CPU: 1 PID: 3764 at fs/btrfs/space-info.h:122 btrfs_free_reserved_data_space_noquota+0x219/0x2b0 fs/btrfs/delalloc-space.c:179 Modules linked in: CPU: 1 PID: 3764 Comm: syz-executor759 Not tainted 6.1.0-rc7-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:btrfs_space_info_update_bytes_may_use fs/btrfs/space-info.h:122 [inline] RIP: 0010:btrfs_space_info_free_bytes_may_use fs/btrfs/space-info.h:154 [inline] RIP: 0010:btrfs_free_reserved_data_space_noquota+0x219/0x2b0 fs/btrfs/delalloc-space.c:179 Code: 2f 00 74 08 4c 89 ef e8 b5 98 32 fe 49 8b 5d 00 48 89 df 4c 8b 74 24 08 4c 89 f6 e8 21 81 de fd 4c 39 f3 73 16 e8 d7 7e de fd <0f> 0b 31 db 4c 8b 34 24 41 80 3c 2f 00 75 8c eb 92 e8 c1 7e de fd RSP: 0018:ffffc9000443f410 EFLAGS: 00010293 RAX: ffffffff83ac1919 RBX: 00000000005cb000 RCX: ffff888027989d40 RDX: 0000000000000000 RSI: 0000000000800000 RDI: 00000000005cb000 RBP: dffffc0000000000 R08: ffffffff83ac190f R09: fffffbfff1cebe0e R10: fffffbfff1cebe0e R11: 1ffffffff1cebe0d R12: ffff8880774f3800 R13: ffff8880774f3860 R14: 0000000000800000 R15: 1ffff1100ee9e70c FS: 0000555555aaa300(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0d98f20140 CR3: 0000000025ccf000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> btrfs_free_reserved_data_space+0x9d/0xd0 fs/btrfs/delalloc-space.c:199 btrfs_dio_iomap_begin+0x8f7/0x1070 fs/btrfs/inode.c:7762 iomap_iter+0x606/0x8a0 fs/iomap/iter.c:74 __iomap_dio_rw+0xd91/0x20d0 fs/iomap/direct-io.c:601 btrfs_dio_write+0x9c/0xe0 fs/btrfs/inode.c:8094 btrfs_direct_write fs/btrfs/file.c:1835 [inline] btrfs_do_write_iter+0x871/0x1260 fs/btrfs/file.c:1980 do_iter_write+0x6c2/0xc20 fs/read_write.c:861 vfs_writev fs/read_write.c:934 [inline] do_pwritev+0x200/0x350 fs/read_write.c:1031 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f0d98ea8ea9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd32e91c38 EFLAGS: 00000246 ORIG_RAX: 0000000000000148 RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f0d98ea8ea9 RDX: 0000000000000001 RSI: 0000000020000280 RDI: 0000000000000005 RBP: 00007ffd32e91c70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000008000000 R11: 0000000000000246 R12: 0000000000000006 R13: 00007ffd32e91cb0 R14: 00007ffd32e91c90 R15: 0000000000000006 </TASK> --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkaller(a)googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. For information about bisection process see: https://goo.gl/tpsmEJ#bisection syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches

3 days, 11 hours

3
3
0 0

[PATCH v6] lib: add basic KUnit test for lib/math

by Daniel Latypov

Add basic test coverage for files that don't require any config options: * part of math.h (what seem to be the most commonly used macros) * gcd.c * lcm.c * int_sqrt.c * reciprocal_div.c (Ignored int_pow.c since it's a simple textbook algorithm.) These tests aren't particularly interesting, but they * provide short and simple examples of parameterized tests * provide a place to add tests for any new files in this dir * are written so adding new test cases to cover edge cases should be easy * looking at code coverage, we hit all the branches in the .c files Signed-off-by: Daniel Latypov <dlatypov(a)google.com> Reviewed-by: David Gow <davidgow(a)google.com> --- Changes since v5: * add in test cases for roundup/rounddown * address misc comments from David Changes since v4: * add in test cases for some math.h macros (abs, round_up/round_down, div_round_down/closest) * use parameterized testing less to keep things terser Changes since v3: * fix `checkpatch.pl --strict` warnings * add test cases for gcd(0,0) and lcm(0,0) * minor: don't test both gcd(a,b) and gcd(b,a) when a == b Changes since v2: mv math_test.c => math_kunit.c Changes since v1: * Rebase and rewrite to use the new parameterized testing support. * misc: fix overflow in literal and inline int_sqrt format string. * related: commit 1f0e943df68a ("Documentation: kunit: provide guidance for testing many inputs") was merged explaining the patterns shown here. * there's an in-flight patch to update it for parameterized testing. --- lib/math/Kconfig | 12 ++ lib/math/Makefile | 2 + lib/math/math_kunit.c | 291 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 305 insertions(+) create mode 100644 lib/math/math_kunit.c diff --git a/lib/math/Kconfig b/lib/math/Kconfig index f19bc9734fa7..a974d4db0f9c 100644 --- a/lib/math/Kconfig +++ b/lib/math/Kconfig @@ -15,3 +15,15 @@ config PRIME_NUMBERS config RATIONAL bool + +config MATH_KUNIT_TEST + tristate "KUnit test for lib/math and math.h" if !KUNIT_ALL_TESTS + depends on KUNIT + default KUNIT_ALL_TESTS + help + This builds unit tests for lib/math and math.h. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation in Documentation/dev-tools/kunit/. + + If unsure, say N. diff --git a/lib/math/Makefile b/lib/math/Makefile index be6909e943bd..30abb7a8d564 100644 --- a/lib/math/Makefile +++ b/lib/math/Makefile @@ -4,3 +4,5 @@ obj-y += div64.o gcd.o lcm.o int_pow.o int_sqrt.o reciprocal_div.o obj-$(CONFIG_CORDIC) += cordic.o obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o obj-$(CONFIG_RATIONAL) += rational.o + +obj-$(CONFIG_MATH_KUNIT_TEST) += math_kunit.o diff --git a/lib/math/math_kunit.c b/lib/math/math_kunit.c new file mode 100644 index 000000000000..556c23b17c3c --- /dev/null +++ b/lib/math/math_kunit.c @@ -0,0 +1,291 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Simple KUnit suite for math helper funcs that are always enabled. + * + * Copyright (C) 2020, Google LLC. + * Author: Daniel Latypov <dlatypov(a)google.com> + */ + +#include <kunit/test.h> +#include <linux/gcd.h> +#include <linux/kernel.h> +#include <linux/lcm.h> +#include <linux/reciprocal_div.h> + +static void abs_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, abs((char)0), (char)0); + KUNIT_EXPECT_EQ(test, abs((char)42), (char)42); + KUNIT_EXPECT_EQ(test, abs((char)-42), (char)42); + + /* The expression in the macro is actually promoted to an int. */ + KUNIT_EXPECT_EQ(test, abs((short)0), 0); + KUNIT_EXPECT_EQ(test, abs((short)42), 42); + KUNIT_EXPECT_EQ(test, abs((short)-42), 42); + + KUNIT_EXPECT_EQ(test, abs(0), 0); + KUNIT_EXPECT_EQ(test, abs(42), 42); + KUNIT_EXPECT_EQ(test, abs(-42), 42); + + KUNIT_EXPECT_EQ(test, abs(0L), 0L); + KUNIT_EXPECT_EQ(test, abs(42L), 42L); + KUNIT_EXPECT_EQ(test, abs(-42L), 42L); + + KUNIT_EXPECT_EQ(test, abs(0LL), 0LL); + KUNIT_EXPECT_EQ(test, abs(42LL), 42LL); + KUNIT_EXPECT_EQ(test, abs(-42LL), 42LL); + + /* Unsigned types get casted to signed. */ + KUNIT_EXPECT_EQ(test, abs(0ULL), 0LL); + KUNIT_EXPECT_EQ(test, abs(42ULL), 42LL); +} + +static void int_sqrt_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, int_sqrt(0UL), 0UL); + KUNIT_EXPECT_EQ(test, int_sqrt(1UL), 1UL); + KUNIT_EXPECT_EQ(test, int_sqrt(4UL), 2UL); + KUNIT_EXPECT_EQ(test, int_sqrt(5UL), 2UL); + KUNIT_EXPECT_EQ(test, int_sqrt(8UL), 2UL); + KUNIT_EXPECT_EQ(test, int_sqrt(1UL << 30), 1UL << 15); +} + +static void round_up_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, round_up(0, 1), 0); + KUNIT_EXPECT_EQ(test, round_up(1, 2), 2); + KUNIT_EXPECT_EQ(test, round_up(3, 2), 4); + KUNIT_EXPECT_EQ(test, round_up((1 << 30) - 1, 2), 1 << 30); + KUNIT_EXPECT_EQ(test, round_up((1 << 30) - 1, 1 << 29), 1 << 30); +} + +static void round_down_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, round_down(0, 1), 0); + KUNIT_EXPECT_EQ(test, round_down(1, 2), 0); + KUNIT_EXPECT_EQ(test, round_down(3, 2), 2); + KUNIT_EXPECT_EQ(test, round_down((1 << 30) - 1, 2), (1 << 30) - 2); + KUNIT_EXPECT_EQ(test, round_down((1 << 30) - 1, 1 << 29), 1 << 29); +} + +/* These versions can round to numbers that aren't a power of two */ +static void roundup_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, roundup(0, 1), 0); + KUNIT_EXPECT_EQ(test, roundup(1, 2), 2); + KUNIT_EXPECT_EQ(test, roundup(3, 2), 4); + KUNIT_EXPECT_EQ(test, roundup((1 << 30) - 1, 2), 1 << 30); + KUNIT_EXPECT_EQ(test, roundup((1 << 30) - 1, 1 << 29), 1 << 30); + + KUNIT_EXPECT_EQ(test, roundup(3, 2), 4); + KUNIT_EXPECT_EQ(test, roundup(4, 3), 6); +} + +static void rounddown_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, rounddown(0, 1), 0); + KUNIT_EXPECT_EQ(test, rounddown(1, 2), 0); + KUNIT_EXPECT_EQ(test, rounddown(3, 2), 2); + KUNIT_EXPECT_EQ(test, rounddown((1 << 30) - 1, 2), (1 << 30) - 2); + KUNIT_EXPECT_EQ(test, rounddown((1 << 30) - 1, 1 << 29), 1 << 29); + + KUNIT_EXPECT_EQ(test, rounddown(3, 2), 2); + KUNIT_EXPECT_EQ(test, rounddown(4, 3), 3); +} + +static void div_round_up_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, DIV_ROUND_UP(0, 1), 0); + KUNIT_EXPECT_EQ(test, DIV_ROUND_UP(20, 10), 2); + KUNIT_EXPECT_EQ(test, DIV_ROUND_UP(21, 10), 3); + KUNIT_EXPECT_EQ(test, DIV_ROUND_UP(21, 20), 2); + KUNIT_EXPECT_EQ(test, DIV_ROUND_UP(21, 99), 1); +} + +static void div_round_closest_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, DIV_ROUND_CLOSEST(0, 1), 0); + KUNIT_EXPECT_EQ(test, DIV_ROUND_CLOSEST(20, 10), 2); + KUNIT_EXPECT_EQ(test, DIV_ROUND_CLOSEST(21, 10), 2); + KUNIT_EXPECT_EQ(test, DIV_ROUND_CLOSEST(25, 10), 3); +} + +/* Generic test case for unsigned long inputs. */ +struct test_case { + unsigned long a, b; + unsigned long result; +}; + +static struct test_case gcd_cases[] = { + { + .a = 0, .b = 0, + .result = 0, + }, + { + .a = 0, .b = 1, + .result = 1, + }, + { + .a = 2, .b = 2, + .result = 2, + }, + { + .a = 2, .b = 4, + .result = 2, + }, + { + .a = 3, .b = 5, + .result = 1, + }, + { + .a = 3 * 9, .b = 3 * 5, + .result = 3, + }, + { + .a = 3 * 5 * 7, .b = 3 * 5 * 11, + .result = 15, + }, + { + .a = 1 << 21, + .b = (1 << 21) - 1, + .result = 1, + }, +}; + +KUNIT_ARRAY_PARAM(gcd, gcd_cases, NULL); + +static void gcd_test(struct kunit *test) +{ + const char *message_fmt = "gcd(%lu, %lu)"; + const struct test_case *test_param = test->param_value; + + KUNIT_EXPECT_EQ_MSG(test, test_param->result, + gcd(test_param->a, test_param->b), + message_fmt, test_param->a, + test_param->b); + + if (test_param->a == test_param->b) + return; + + /* gcd(a,b) == gcd(b,a) */ + KUNIT_EXPECT_EQ_MSG(test, test_param->result, + gcd(test_param->b, test_param->a), + message_fmt, test_param->b, + test_param->a); +} + +static struct test_case lcm_cases[] = { + { + .a = 0, .b = 0, + .result = 0, + }, + { + .a = 0, .b = 1, + .result = 0, + }, + { + .a = 1, .b = 2, + .result = 2, + }, + { + .a = 2, .b = 2, + .result = 2, + }, + { + .a = 3 * 5, .b = 3 * 7, + .result = 3 * 5 * 7, + }, +}; + +KUNIT_ARRAY_PARAM(lcm, lcm_cases, NULL); + +static void lcm_test(struct kunit *test) +{ + const char *message_fmt = "lcm(%lu, %lu)"; + const struct test_case *test_param = test->param_value; + + KUNIT_EXPECT_EQ_MSG(test, test_param->result, + lcm(test_param->a, test_param->b), + message_fmt, test_param->a, + test_param->b); + + if (test_param->a == test_param->b) + return; + + /* lcm(a,b) == lcm(b,a) */ + KUNIT_EXPECT_EQ_MSG(test, test_param->result, + lcm(test_param->b, test_param->a), + message_fmt, test_param->b, + test_param->a); +} + +struct u32_test_case { + u32 a, b; + u32 result; +}; + +static struct u32_test_case reciprocal_div_cases[] = { + { + .a = 0, .b = 1, + .result = 0, + }, + { + .a = 42, .b = 20, + .result = 2, + }, + { + .a = 42, .b = 9999, + .result = 0, + }, + { + .a = (1 << 16), .b = (1 << 14), + .result = 1 << 2, + }, +}; + +KUNIT_ARRAY_PARAM(reciprocal_div, reciprocal_div_cases, NULL); + +static void reciprocal_div_test(struct kunit *test) +{ + const struct u32_test_case *test_param = test->param_value; + struct reciprocal_value rv = reciprocal_value(test_param->b); + + KUNIT_EXPECT_EQ_MSG(test, test_param->result, + reciprocal_divide(test_param->a, rv), + "reciprocal_divide(%u, %u)", + test_param->a, test_param->b); +} + +static void reciprocal_scale_test(struct kunit *test) +{ + KUNIT_EXPECT_EQ(test, reciprocal_scale(0u, 100), 0u); + KUNIT_EXPECT_EQ(test, reciprocal_scale(1u, 100), 0u); + KUNIT_EXPECT_EQ(test, reciprocal_scale(1u << 4, 1 << 28), 1u); + KUNIT_EXPECT_EQ(test, reciprocal_scale(1u << 16, 1 << 28), 1u << 12); + KUNIT_EXPECT_EQ(test, reciprocal_scale(~0u, 1 << 28), (1u << 28) - 1); +} + +static struct kunit_case math_test_cases[] = { + KUNIT_CASE(abs_test), + KUNIT_CASE(int_sqrt_test), + KUNIT_CASE(round_up_test), + KUNIT_CASE(round_down_test), + KUNIT_CASE(roundup_test), + KUNIT_CASE(rounddown_test), + KUNIT_CASE(div_round_up_test), + KUNIT_CASE(div_round_closest_test), + KUNIT_CASE_PARAM(gcd_test, gcd_gen_params), + KUNIT_CASE_PARAM(lcm_test, lcm_gen_params), + KUNIT_CASE_PARAM(reciprocal_div_test, reciprocal_div_gen_params), + KUNIT_CASE(reciprocal_scale_test), + {} +}; + +static struct kunit_suite math_test_suite = { + .name = "lib-math", + .test_cases = math_test_cases, +}; + +kunit_test_suites(&math_test_suite); + +MODULE_LICENSE("GPL v2"); base-commit: 7e25f40eab52c57ff6772d27d2aef3640a3237d7 -- 2.31.1.368.gbe11c130af-goog

3 days, 11 hours

5
5
0 0

[PATCH v6 1/2] posix-timers: Prefer delivery of signals to the current thread

by Marco Elver

From: Dmitry Vyukov <dvyukov(a)google.com> POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main thread of a thread group for signal delivery. However, this has a significant downside: it requires waking up a potentially idle thread. Instead, prefer to deliver signals to the current thread (in the same thread group) if SIGEV_THREAD_ID is not set by the user. This does not change guaranteed semantics, since POSIX process CPU time timers have never guaranteed that signal delivery is to a specific thread (without SIGEV_THREAD_ID set). The effect is that we no longer wake up potentially idle threads, and the kernel is no longer biased towards delivering the timer signal to any particular thread (which better distributes the timer signals esp. when multiple timers fire concurrently). Signed-off-by: Dmitry Vyukov <dvyukov(a)google.com> Suggested-by: Oleg Nesterov <oleg(a)redhat.com> Reviewed-by: Oleg Nesterov <oleg(a)redhat.com> Signed-off-by: Marco Elver <elver(a)google.com> --- v6: - Split test from this patch. - Update wording on what this patch aims to improve. v5: - Rebased onto v6.2. v4: - Restructured checks in send_sigqueue() as suggested. v3: - Switched to the completely different implementation (much simpler) based on the Oleg's idea. RFC v2: - Added additional Cc as Thomas asked. --- kernel/signal.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 8cb28f1df294..605445fa27d4 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type) /* * Now find a thread we can wake up to take the signal off the queue. * - * If the main thread wants the signal, it gets first crack. - * Probably the least surprising to the average bear. + * Try the suggested task first (may or may not be the main thread). */ if (wants_signal(sig, p)) t = p; @@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) ret = -1; rcu_read_lock(); + /* + * This function is used by POSIX timers to deliver a timer signal. + * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID + * set), the signal must be delivered to the specific thread (queues + * into t->pending). + * + * Where type is not PIDTYPE_PID, signals must just be delivered to the + * current process. In this case, prefer to deliver to current if it is + * in the same thread group as the target, as it avoids unnecessarily + * waking up a potentially idle task. + */ t = pid_task(pid, type); - if (!t || !likely(lock_task_sighand(t, &flags))) + if (!t) + goto ret; + if (type != PIDTYPE_PID && same_thread_group(t, current)) + t = current; + if (!likely(lock_task_sighand(t, &flags))) goto ret; ret = 1; /* the signal is ignored */ @@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type) q->info.si_overrun = 0; signalfd_notify(t, sig); + /* + * If the type is not PIDTYPE_PID, we just use shared_pending, which + * won't guarantee that the specified task will receive the signal, but + * is sufficient if t==current in the common case. + */ pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending; list_add_tail(&q->list, &pending->list); sigaddset(&pending->signal, sig); -- 2.40.0.rc1.284.g88254d51c5-goog

1 month

9
50
0 0

[PATCH v3] selftests/ftrace: traceonoff_triggers: strip off names

by Yipeng Zou

The func_traceonoff_triggers.tc sometimes goes to fail on my board, Kunpeng-920. [root@localhost]# ./ftracetest ./test.d/ftrace/func_traceonoff_triggers.tc -l fail.log === Ftrace unit tests === [1] ftrace - test for function traceon/off triggers [FAIL] [2] (instance) ftrace - test for function traceon/off triggers [UNSUPPORTED] I look up the log, and it shows that the md5sum is different between csum1 and csum2. ++ cnt=611 ++ sleep .1 +++ cnt_trace +++ grep -v '^#' trace +++ wc -l ++ cnt2=611 ++ '[' 611 -ne 611 ']' +++ cat tracing_on ++ on=0 ++ '[' 0 '!=' 0 ']' +++ md5sum trace ++ csum1='76896aa74362fff66a6a5f3cf8a8a500 trace' ++ sleep .1 +++ md5sum trace ++ csum2='ee8625a21c058818fc26e45c1ed3f6de trace' ++ '[' '76896aa74362fff66a6a5f3cf8a8a500 trace' '!=' 'ee8625a21c058818fc26e45c1ed3f6de trace' ']' ++ fail 'Tracing file is still changing' ++ echo Tracing file is still changing Tracing file is still changing ++ exit_fail ++ exit 1 So I directly dump the trace file before md5sum, the diff shows that: [root@localhost]# diff trace_1.log trace_2.log -y --suppress-common-lines dockerd-12285 [036] d.... 18385.510290: sched_stat | <...>-12285 [036] d.... 18385.510290: sched_stat dockerd-12285 [036] d.... 18385.510291: sched_swit | <...>-12285 [036] d.... 18385.510291: sched_swit <...>-740 [044] d.... 18385.602859: sched_stat | kworker/44:1-740 [044] d.... 18385.602859: sched_stat <...>-740 [044] d.... 18385.602860: sched_swit | kworker/44:1-740 [044] d.... 18385.602860: sched_swit And we can see that <...> filed be filled with names. We can strip off the names there to fix that. After strip off the names: kworker/u257:0-12 [019] d..2. 2528.758910: sched_stat | -12 [019] d..2. 2528.758910: sched_stat_runtime: comm=k kworker/u257:0-12 [019] d..2. 2528.758912: sched_swit | -12 [019] d..2. 2528.758912: sched_switch: prev_comm=kw <idle>-0 [000] d.s5. 2528.762318: sched_waki | -0 [000] d.s5. 2528.762318: sched_waking: comm=sshd pi <idle>-0 [037] dNh2. 2528.762326: sched_wake | -0 [037] dNh2. 2528.762326: sched_wakeup: comm=sshd pi <idle>-0 [037] d..2. 2528.762334: sched_swit | -0 [037] d..2. 2528.762334: sched_switch: prev_comm=sw Fixes: d87b29179aa0 ("selftests: ftrace: Use md5sum to take less time of checking logs") Suggested-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- .../ftrace/test.d/ftrace/func_traceonoff_triggers.tc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc b/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc index aee22289536b..1b57771dbfdf 100644 --- a/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc +++ b/tools/testing/selftests/ftrace/test.d/ftrace/func_traceonoff_triggers.tc @@ -90,9 +90,10 @@ if [ $on != "0" ]; then fail "Tracing is not off" fi -csum1=`md5sum trace` +# Cannot rely on names being around as they are only cached, strip them +csum1=`cat trace | sed -e 's/^ *[^ ]*$-[0-9][0-9]*$/\1/' | md5sum` sleep $SLEEP_TIME -csum2=`md5sum trace` +csum2=`cat trace | sed -e 's/^ *[^ ]*$-[0-9][0-9]*$/\1/' | md5sum` if [ "$csum1" != "$csum2" ]; then fail "Tracing file is still changing" -- 2.34.1

1 month, 1 week

2
2
0 0

[PATCH] KVM: selftests: Use TAP in the steal_time test

by Thomas Huth

For easier use of the tests in automation and for having some status information for the user while the test is running, let's provide some TAP output in this test. Signed-off-by: Thomas Huth <thuth(a)redhat.com> --- NB: This patch does not use the interface from kselftest_harness.h since it is not very suitable for the for-loop in this patch. tools/testing/selftests/kvm/steal_time.c | 46 ++++++++++++------------ 1 file changed, 23 insertions(+), 23 deletions(-) diff --git a/tools/testing/selftests/kvm/steal_time.c b/tools/testing/selftests/kvm/steal_time.c index 171adfb2a6cb..aa6149eb9ea1 100644 --- a/tools/testing/selftests/kvm/steal_time.c +++ b/tools/testing/selftests/kvm/steal_time.c @@ -81,20 +81,18 @@ static void steal_time_init(struct kvm_vcpu *vcpu, uint32_t i) static void steal_time_dump(struct kvm_vm *vm, uint32_t vcpu_idx) { struct kvm_steal_time *st = addr_gva2hva(vm, (ulong)st_gva[vcpu_idx]); - int i; - pr_info("VCPU%d:\n", vcpu_idx); - pr_info(" steal: %lld\n", st->steal); - pr_info(" version: %d\n", st->version); - pr_info(" flags: %d\n", st->flags); - pr_info(" preempted: %d\n", st->preempted); - pr_info(" u8_pad: "); - for (i = 0; i < 3; ++i) - pr_info("%d", st->u8_pad[i]); - pr_info("\n pad: "); - for (i = 0; i < 11; ++i) - pr_info("%d", st->pad[i]); - pr_info("\n"); + ksft_print_msg("VCPU%d:\n", vcpu_idx); + ksft_print_msg(" steal: %lld\n", st->steal); + ksft_print_msg(" version: %d\n", st->version); + ksft_print_msg(" flags: %d\n", st->flags); + ksft_print_msg(" preempted: %d\n", st->preempted); + ksft_print_msg(" u8_pad: %d %d %d\n", + st->u8_pad[0], st->u8_pad[1], st->u8_pad[2]); + ksft_print_msg(" pad: %d %d %d %d %d %d %d %d %d %d %d\n", + st->pad[0], st->pad[1], st->pad[2], st->pad[3], + st->pad[4], st->pad[5], st->pad[6], st->pad[7], + st->pad[8], st->pad[9], st->pad[10]); } #elif defined(__aarch64__) @@ -197,10 +195,10 @@ static void steal_time_dump(struct kvm_vm *vm, uint32_t vcpu_idx) { struct st_time *st = addr_gva2hva(vm, (ulong)st_gva[vcpu_idx]); - pr_info("VCPU%d:\n", vcpu_idx); - pr_info(" rev: %d\n", st->rev); - pr_info(" attr: %d\n", st->attr); - pr_info(" st_time: %ld\n", st->st_time); + ksft_print_msg("VCPU%d:\n", vcpu_idx); + ksft_print_msg(" rev: %d\n", st->rev); + ksft_print_msg(" attr: %d\n", st->attr); + ksft_print_msg(" st_time: %ld\n", st->st_time); } #endif @@ -267,7 +265,9 @@ int main(int ac, char **av) vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, ST_GPA_BASE, 1, gpages, 0); virt_map(vm, ST_GPA_BASE, ST_GPA_BASE, gpages); + ksft_print_header(); TEST_REQUIRE(is_steal_time_supported(vcpus[0])); + ksft_set_plan(NR_VCPUS); /* Run test on each VCPU */ for (i = 0; i < NR_VCPUS; ++i) { @@ -308,14 +308,14 @@ int main(int ac, char **av) run_delay, stolen_time); if (verbose) { - pr_info("VCPU%d: total-stolen-time=%ld test-stolen-time=%ld", i, - guest_stolen_time[i], stolen_time); - if (stolen_time == run_delay) - pr_info(" (BONUS: guest test-stolen-time even exactly matches test-run_delay)"); - pr_info("\n"); + ksft_print_msg("VCPU%d: total-stolen-time=%ld test-stolen-time=%ld%s\n", + i, guest_stolen_time[i], stolen_time, + stolen_time == run_delay ? + " (BONUS: guest test-stolen-time even exactly matches test-run_delay)" : ""); steal_time_dump(vm, i); } + ksft_test_result_pass("vcpu%d\n", i); } - return 0; + ksft_finished(); /* Print results and exit() accordingly */ } -- 2.41.0

1 month, 1 week

5
6
0 0

[PATCH v3 0/7] Split a folio to any lower order folios

by Zi Yan

From: Zi Yan <ziy(a)nvidia.com> Hi all, File folio supports any order and people would like to support flexible orders for anonymous folio[1] too. Currently, split_huge_page() only splits a huge page to order-0 pages, but splitting to orders higher than 0 is also useful. This patchset adds support for splitting a huge page to any lower order pages and uses it during file folio truncate operations. The patchset is on top of mm-everything-2023-03-27-21-20. Changelog === Since v2 --- 1. Fixed an issue in __split_page_owner() introduced during my rebase Since v1 --- 1. Changed split_page_memcg() and split_page_owner() parameter to use order 2. Used folio_test_pmd_mappable() in place of the equivalent code Details === * Patch 1 changes split_page_memcg() to use order instead of nr_pages * Patch 2 changes split_page_owner() to use order instead of nr_pages * Patch 3 and 4 add new_order parameter split_page_memcg() and split_page_owner() and prepare for upcoming changes. * Patch 5 adds split_huge_page_to_list_to_order() to split a huge page to any lower order. The original split_huge_page_to_list() calls split_huge_page_to_list_to_order() with new_order = 0. * Patch 6 uses split_huge_page_to_list_to_order() in large pagecache folio truncation instead of split the large folio all the way down to order-0. * Patch 7 adds a test API to debugfs and test cases in split_huge_page_test selftests. Comments and/or suggestions are welcome. [1] https://lore.kernel.org/linux-mm/Y%2FblF0GIunm+pRIC@casper.infradead.org/ Zi Yan (7): mm/memcg: use order instead of nr in split_page_memcg() mm/page_owner: use order instead of nr in split_page_owner() mm: memcg: make memcg huge page split support any order split. mm: page_owner: add support for splitting to any order in split page_owner. mm: thp: split huge page to any lower order pages. mm: truncate: split huge page cache page to a non-zero order if possible. mm: huge_memory: enable debugfs to split huge pages to any order. include/linux/huge_mm.h | 10 +- include/linux/memcontrol.h | 4 +- include/linux/page_owner.h | 10 +- mm/huge_memory.c | 137 ++++++++--- mm/memcontrol.c | 10 +- mm/page_alloc.c | 8 +- mm/page_owner.c | 8 +- mm/truncate.c | 21 +- .../selftests/mm/split_huge_page_test.c | 225 +++++++++++++++++- 9 files changed, 365 insertions(+), 68 deletions(-) -- 2.39.2

3 months

6
22
0 0

[PATCH v2 0/7] Use TAP in some more x86 KVM selftests

by Thomas Huth

Here's a follow-up from my RFC series last year: https://lore.kernel.org/lkml/20221004093131.40392-1-thuth@redhat.com/T/ and from v1 earlier this year: https://lore.kernel.org/kvm/20230712075910.22480-1-thuth@redhat.com/ Basic idea of this series is now to use the kselftest_harness.h framework to get TAP output in the tests, so that it is easier for the user to see what is going on, and e.g. to be able to detect whether a certain test is part of the test binary or not (which is useful when tests get extended in the course of time). v2: - Dropped the "Rename the ASSERT_EQ macro" patch (already merged) - Split the fixes in the sync_regs_test into separate patches (see the first two patches) - Introduce the KVM_ONE_VCPU_TEST_SUITE() macro as suggested by Sean (see third patch) and use it in the following patches - Add a new patch to convert vmx_pmu_caps_test.c, too Thomas Huth (7): KVM: selftests: x86: sync_regs_test: Use vcpu_run() where appropriate KVM: selftests: x86: sync_regs_test: Get regs structure before modifying it KVM: selftests: Add a macro to define a test with one vcpu KVM: selftests: x86: Use TAP interface in the sync_regs test KVM: selftests: x86: Use TAP interface in the fix_hypercall test KVM: selftests: x86: Use TAP interface in the vmx_pmu_caps test KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test .../selftests/kvm/include/kvm_test_harness.h | 35 +++++ .../selftests/kvm/x86_64/fix_hypercall_test.c | 27 ++-- .../selftests/kvm/x86_64/sync_regs_test.c | 121 +++++++++++++----- .../kvm/x86_64/userspace_msr_exit_test.c | 19 +-- .../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 50 ++------ 5 files changed, 160 insertions(+), 92 deletions(-) create mode 100644 tools/testing/selftests/kvm/include/kvm_test_harness.h -- 2.41.0

3 months, 1 week

2
9
0 0

[PATCH v12] exec: Fix dead-lock in de_thread with ptrace_attach

by Bernd Edlinger

This introduces signal->exec_bprm, which is used to fix the case when at least one of the sibling threads is traced, and therefore the trace process may dead-lock in ptrace_attach, but de_thread will need to wait for the tracer to continue execution. The solution is to detect this situation and allow ptrace_attach to continue by temporarily releasing the cred_guard_mutex, while de_thread() is still waiting for traced zombies to be eventually released by the tracer. In the case of the thread group leader we only have to wait for the thread to become a zombie, which may also need co-operation from the tracer due to PTRACE_O_TRACEEXIT. When a tracer wants to ptrace_attach a task that already is in execve, we simply retry the ptrace_may_access check while temporarily installing the new credentials and dumpability which are about to be used after execve completes. If the ptrace_attach happens on a thread that is a sibling-thread of the thread doing execve, it is sufficient to check against the old credentials, as this thread will be waited for, before the new credentials are installed. Other threads die quickly since the cred_guard_mutex is released, but a deadly signal is already pending. In case the mutex_lock_killable misses the signal, the non-zero current->signal->exec_bprm makes sure they release the mutex immediately and return with -ERESTARTNOINTR. This means there is no API change, unlike the previous version of this patch which was discussed here: https://lore.kernel.org/lkml/b6537ae6-31b1-5c50-f32b-8b8332ace882@hotmail.d… See tools/testing/selftests/ptrace/vmaccess.c for a test case that gets fixed by this change. Note that since the test case was originally designed to test the ptrace_attach returning an error in this situation, the test expectation needed to be adjusted, to allow the API to succeed at the first attempt. Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de> --- fs/exec.c | 69 ++++++++++++++++------- fs/proc/base.c | 6 ++ include/linux/cred.h | 1 + include/linux/sched/signal.h | 18 ++++++ kernel/cred.c | 28 +++++++-- kernel/ptrace.c | 32 +++++++++++ kernel/seccomp.c | 12 +++- tools/testing/selftests/ptrace/vmaccess.c | 23 +++++--- 8 files changed, 155 insertions(+), 34 deletions(-) v10: Changes to previous version, make the PTRACE_ATTACH retun -EAGAIN, instead of execve return -ERESTARTSYS. Added some lessions learned to the description. v11: Check old and new credentials in PTRACE_ATTACH again without changing the API. Note: I got actually one response from an automatic checker to the v11 patch, https://lore.kernel.org/lkml/202107121344.wu68hEPF-lkp@intel.com/ which is complaining about: >> kernel/ptrace.c:425:26: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct cred const *old_cred @@ got struct cred const [noderef] __rcu *real_cred @@ 417 struct linux_binprm *bprm = task->signal->exec_bprm; 418 const struct cred *old_cred; 419 struct mm_struct *old_mm; 420 421 retval = down_write_killable(&task->signal->exec_update_lock); 422 if (retval) 423 goto unlock_creds; 424 task_lock(task); > 425 old_cred = task->real_cred; v12: Essentially identical to v11. - Fixed a minor merge conflict in linux v5.17, and fixed the above mentioned nit by adding __rcu to the declaration. - re-tested the patch with all linux versions from v5.11 to v6.6 v10 was an alternative approach which did imply an API change. But I would prefer to avoid such an API change. The difficult part is getting the right dumpability flags assigned before de_thread starts, hope you like this version. If not, the v10 is of course also acceptable. Thanks Bernd. diff --git a/fs/exec.c b/fs/exec.c index 2f2b0acec4f0..902d3b230485 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1041,11 +1041,13 @@ static int exec_mmap(struct mm_struct *mm) return 0; } -static int de_thread(struct task_struct *tsk) +static int de_thread(struct task_struct *tsk, struct linux_binprm *bprm) { struct signal_struct *sig = tsk->signal; struct sighand_struct *oldsighand = tsk->sighand; spinlock_t *lock = &oldsighand->siglock; + struct task_struct *t = tsk; + bool unsafe_execve_in_progress = false; if (thread_group_empty(tsk)) goto no_thread_group; @@ -1068,6 +1070,19 @@ static int de_thread(struct task_struct *tsk) if (!thread_group_leader(tsk)) sig->notify_count--; + while_each_thread(tsk, t) { + if (unlikely(t->ptrace) + && (t != tsk->group_leader || !t->exit_state)) + unsafe_execve_in_progress = true; + } + + if (unlikely(unsafe_execve_in_progress)) { + spin_unlock_irq(lock); + sig->exec_bprm = bprm; + mutex_unlock(&sig->cred_guard_mutex); + spin_lock_irq(lock); + } + while (sig->notify_count) { __set_current_state(TASK_KILLABLE); spin_unlock_irq(lock); @@ -1158,6 +1173,11 @@ static int de_thread(struct task_struct *tsk) release_task(leader); } + if (unlikely(unsafe_execve_in_progress)) { + mutex_lock(&sig->cred_guard_mutex); + sig->exec_bprm = NULL; + } + sig->group_exec_task = NULL; sig->notify_count = 0; @@ -1169,6 +1189,11 @@ static int de_thread(struct task_struct *tsk) return 0; killed: + if (unlikely(unsafe_execve_in_progress)) { + mutex_lock(&sig->cred_guard_mutex); + sig->exec_bprm = NULL; + } + /* protects against exit_notify() and __exit_signal() */ read_lock(&tasklist_lock); sig->group_exec_task = NULL; @@ -1253,6 +1278,24 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) return retval; + /* If the binary is not readable then enforce mm->dumpable=0 */ + would_dump(bprm, bprm->file); + if (bprm->have_execfd) + would_dump(bprm, bprm->executable); + + /* + * Figure out dumpability. Note that this checking only of current + * is wrong, but userspace depends on it. This should be testing + * bprm->secureexec instead. + */ + if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || + is_dumpability_changed(current_cred(), bprm->cred) || + !(uid_eq(current_euid(), current_uid()) && + gid_eq(current_egid(), current_gid()))) + set_dumpable(bprm->mm, suid_dumpable); + else + set_dumpable(bprm->mm, SUID_DUMP_USER); + /* * Ensure all future errors are fatal. */ @@ -1261,7 +1304,7 @@ int begin_new_exec(struct linux_binprm * bprm) /* * Make this the only thread in the thread group. */ - retval = de_thread(me); + retval = de_thread(me, bprm); if (retval) goto out; @@ -1284,11 +1327,6 @@ int begin_new_exec(struct linux_binprm * bprm) if (retval) goto out; - /* If the binary is not readable then enforce mm->dumpable=0 */ - would_dump(bprm, bprm->file); - if (bprm->have_execfd) - would_dump(bprm, bprm->executable); - /* * Release all of the old mmap stuff */ @@ -1350,18 +1388,6 @@ int begin_new_exec(struct linux_binprm * bprm) me->sas_ss_sp = me->sas_ss_size = 0; - /* - * Figure out dumpability. Note that this checking only of current - * is wrong, but userspace depends on it. This should be testing - * bprm->secureexec instead. - */ - if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || - !(uid_eq(current_euid(), current_uid()) && - gid_eq(current_egid(), current_gid()))) - set_dumpable(current->mm, suid_dumpable); - else - set_dumpable(current->mm, SUID_DUMP_USER); - perf_event_exec(); __set_task_comm(me, kbasename(bprm->filename), true); @@ -1480,6 +1506,11 @@ static int prepare_bprm_creds(struct linux_binprm *bprm) if (mutex_lock_interruptible(&current->signal->cred_guard_mutex)) return -ERESTARTNOINTR; + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + return -ERESTARTNOINTR; + } + bprm->cred = prepare_exec_creds(); if (likely(bprm->cred)) return 0; diff --git a/fs/proc/base.c b/fs/proc/base.c index ffd54617c354..0da9adfadb48 100644 --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -2788,6 +2788,12 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf, if (rv < 0) goto out_free; + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + rv = -ERESTARTNOINTR; + goto out_free; + } + rv = security_setprocattr(PROC_I(inode)->op.lsm, file->f_path.dentry->d_name.name, page, count); diff --git a/include/linux/cred.h b/include/linux/cred.h index f923528d5cc4..b01e309f5686 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -159,6 +159,7 @@ extern const struct cred *get_task_cred(struct task_struct *); extern struct cred *cred_alloc_blank(void); extern struct cred *prepare_creds(void); extern struct cred *prepare_exec_creds(void); +extern bool is_dumpability_changed(const struct cred *, const struct cred *); extern int commit_creds(struct cred *); extern void abort_creds(struct cred *); extern const struct cred *override_creds(const struct cred *); diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 0014d3adaf84..14df7073a0a8 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -234,9 +234,27 @@ struct signal_struct { struct mm_struct *oom_mm; /* recorded mm when the thread group got * killed by the oom killer */ + struct linux_binprm *exec_bprm; /* Used to check ptrace_may_access + * against new credentials while + * de_thread is waiting for other + * traced threads to terminate. + * Set while de_thread is executing. + * The cred_guard_mutex is released + * after de_thread() has called + * zap_other_threads(), therefore + * a fatal signal is guaranteed to be + * already pending in the unlikely + * event, that + * current->signal->exec_bprm happens + * to be non-zero after the + * cred_guard_mutex was acquired. + */ + struct mutex cred_guard_mutex; /* guard against foreign influences on * credential calculations * (notably. ptrace) + * Held while execve runs, except when + * a sibling thread is being traced. * Deprecated do not use in new code. * Use exec_update_lock instead. */ diff --git a/kernel/cred.c b/kernel/cred.c index 98cb4eca23fb..586cb6c7cf6b 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -433,6 +433,28 @@ static bool cred_cap_issubset(const struct cred *set, const struct cred *subset) return false; } +/** + * is_dumpability_changed - Will changing creds from old to new + * affect the dumpability in commit_creds? + * + * Return: false - dumpability will not be changed in commit_creds. + * Return: true - dumpability will be changed to non-dumpable. + * + * @old: The old credentials + * @new: The new credentials + */ +bool is_dumpability_changed(const struct cred *old, const struct cred *new) +{ + if (!uid_eq(old->euid, new->euid) || + !gid_eq(old->egid, new->egid) || + !uid_eq(old->fsuid, new->fsuid) || + !gid_eq(old->fsgid, new->fsgid) || + !cred_cap_issubset(old, new)) + return true; + + return false; +} + /** * commit_creds - Install new credentials upon the current task * @new: The credentials to be assigned @@ -467,11 +489,7 @@ int commit_creds(struct cred *new) get_cred(new); /* we will require a ref for the subj creds too */ /* dumpability changes */ - if (!uid_eq(old->euid, new->euid) || - !gid_eq(old->egid, new->egid) || - !uid_eq(old->fsuid, new->fsuid) || - !gid_eq(old->fsgid, new->fsgid) || - !cred_cap_issubset(old, new)) { + if (is_dumpability_changed(old, new)) { if (task->mm) set_dumpable(task->mm, suid_dumpable); task->pdeath_signal = 0; diff --git a/kernel/ptrace.c b/kernel/ptrace.c index 443057bee87c..eb1c450bb7d7 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -20,6 +20,7 @@ #include <linux/pagemap.h> #include <linux/ptrace.h> #include <linux/security.h> +#include <linux/binfmts.h> #include <linux/signal.h> #include <linux/uio.h> #include <linux/audit.h> @@ -435,6 +436,28 @@ static int ptrace_attach(struct task_struct *task, long request, if (retval) goto unlock_creds; + if (unlikely(task->in_execve)) { + struct linux_binprm *bprm = task->signal->exec_bprm; + const struct cred __rcu *old_cred; + struct mm_struct *old_mm; + + retval = down_write_killable(&task->signal->exec_update_lock); + if (retval) + goto unlock_creds; + task_lock(task); + old_cred = task->real_cred; + old_mm = task->mm; + rcu_assign_pointer(task->real_cred, bprm->cred); + task->mm = bprm->mm; + retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS); + rcu_assign_pointer(task->real_cred, old_cred); + task->mm = old_mm; + task_unlock(task); + up_write(&task->signal->exec_update_lock); + if (retval) + goto unlock_creds; + } + write_lock_irq(&tasklist_lock); retval = -EPERM; if (unlikely(task->exit_state)) @@ -508,6 +531,14 @@ static int ptrace_traceme(void) { int ret = -EPERM; + if (mutex_lock_interruptible(&current->signal->cred_guard_mutex)) + return -ERESTARTNOINTR; + + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + return -ERESTARTNOINTR; + } + write_lock_irq(&tasklist_lock); /* Are we already being traced? */ if (!current->ptrace) { @@ -523,6 +554,7 @@ static int ptrace_traceme(void) } } write_unlock_irq(&tasklist_lock); + mutex_unlock(&current->signal->cred_guard_mutex); return ret; } diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 255999ba9190..b29bbfa0b044 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -1955,9 +1955,15 @@ static long seccomp_set_mode_filter(unsigned int flags, * Make sure we cannot change seccomp or nnp state via TSYNC * while another thread is in the middle of calling exec. */ - if (flags & SECCOMP_FILTER_FLAG_TSYNC && - mutex_lock_killable(&current->signal->cred_guard_mutex)) - goto out_put_fd; + if (flags & SECCOMP_FILTER_FLAG_TSYNC) { + if (mutex_lock_killable(&current->signal->cred_guard_mutex)) + goto out_put_fd; + + if (unlikely(current->signal->exec_bprm)) { + mutex_unlock(&current->signal->cred_guard_mutex); + goto out_put_fd; + } + } spin_lock_irq(&current->sighand->siglock); diff --git a/tools/testing/selftests/ptrace/vmaccess.c b/tools/testing/selftests/ptrace/vmaccess.c index 4db327b44586..3b7d81fb99bb 100644 --- a/tools/testing/selftests/ptrace/vmaccess.c +++ b/tools/testing/selftests/ptrace/vmaccess.c @@ -39,8 +39,15 @@ TEST(vmaccess) f = open(mm, O_RDONLY); ASSERT_GE(f, 0); close(f); - f = kill(pid, SIGCONT); - ASSERT_EQ(f, 0); + f = waitpid(-1, NULL, 0); + ASSERT_NE(f, -1); + ASSERT_NE(f, 0); + ASSERT_NE(f, pid); + f = waitpid(-1, NULL, 0); + ASSERT_EQ(f, pid); + f = waitpid(-1, NULL, 0); + ASSERT_EQ(f, -1); + ASSERT_EQ(errno, ECHILD); } TEST(attach) @@ -57,22 +64,24 @@ TEST(attach) sleep(1); k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); - ASSERT_EQ(errno, EAGAIN); - ASSERT_EQ(k, -1); + ASSERT_EQ(k, 0); k = waitpid(-1, &s, WNOHANG); ASSERT_NE(k, -1); ASSERT_NE(k, 0); ASSERT_NE(k, pid); ASSERT_EQ(WIFEXITED(s), 1); ASSERT_EQ(WEXITSTATUS(s), 0); - sleep(1); - k = ptrace(PTRACE_ATTACH, pid, 0L, 0L); + k = waitpid(-1, &s, 0); + ASSERT_EQ(k, pid); + ASSERT_EQ(WIFSTOPPED(s), 1); + ASSERT_EQ(WSTOPSIG(s), SIGTRAP); + k = ptrace(PTRACE_CONT, pid, 0L, 0L); ASSERT_EQ(k, 0); k = waitpid(-1, &s, 0); ASSERT_EQ(k, pid); ASSERT_EQ(WIFSTOPPED(s), 1); ASSERT_EQ(WSTOPSIG(s), SIGSTOP); - k = ptrace(PTRACE_DETACH, pid, 0L, 0L); + k = ptrace(PTRACE_CONT, pid, 0L, 0L); ASSERT_EQ(k, 0); k = waitpid(-1, &s, 0); ASSERT_EQ(k, pid); -- 2.39.2

3 months, 3 weeks

5
14
0 0

[PATCH] selftests: core: include linux/close_range.h for CLOSE_RANGE_* macros

by Muhammad Usama Anjum

Correct header file is needed for getting CLOSE_RANGE_* macros. Previously it was tested with newer glibc which didn't show the need to include the header which was a mistake. Fixes: ec54424923cf ("selftests: core: remove duplicate defines") Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com> Link: https://lore.kernel.org/all/7161219e-0223-d699-d6f3-81abd9abf13b@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> --- tools/testing/selftests/core/close_range_test.c | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/core/close_range_test.c b/tools/testing/selftests/core/close_range_test.c index 534576f06df1c..c59e4adb905df 100644 --- a/tools/testing/selftests/core/close_range_test.c +++ b/tools/testing/selftests/core/close_range_test.c @@ -12,6 +12,7 @@ #include <syscall.h> #include <unistd.h> #include <sys/resource.h> +#include <linux/close_range.h> #include "../kselftest_harness.h" #include "../clone3/clone3_selftests.h" -- 2.42.0

3 months, 3 weeks

1
4
0 0

[PATCH v10 0/4] RISC-V: mm: Make SV48 the default address space

by Charlie Jenkins

Make sv48 the default address space for mmap as some applications currently depend on this assumption. Users can now select a desired address space using a non-zero hint address to mmap. Previously, requesting the default address space from mmap by passing zero as the hint address would result in using the largest address space possible. Some applications depend on empty bits in the virtual address space, like Go and Java, so this patch provides more flexibility for application developers. -Charlie --- v10: - Move pgtable.h defintions into a no __ASSEMBLY__ region to resolve compilation conflicts (pointed out by Conor) - Will now compile with allmodconfig v9: - Raise the mmap_end default to STACK_TOP_MAX to allow the address space to grow beyond the default of sv48 on sv57 machines as suggested by Alexandre - Some of the mmap macros had unnecessary conditionals that I have removed v8: - Fix RV32 and the RV32 compat mode of RV64 (suggested by Conor) - Extract out addr and base from the mmap macros (suggested by Alexandre) v7: - Changing RLIMIT_STACK inside of an executing program does not trigger arch_pick_mmap_layout(), so rewrite tests to change RLIMIT_STACK from a script before executing tests. RLIMIT_STACK of infinity forces bottomup mmap allocation. - Make arch_get_mmap_base macro more readible by extracting out the rnd calculation. - Use MMAP_MIN_VA_BITS in TASK_UNMAPPED_BASE to support case when mmap attempts to allocate address smaller than DEFAULT_MAP_WINDOW. - Fix incorrect wording in documentation. v6: - Rebase onto the correct base v5: - Minor wording change in documentation - Change some parenthesis in arch_get_mmap_ macros - Added case for addr==0 in arch_get_mmap_ because without this, programs would crash if RLIMIT_STACK was modified before executing the program. This was tested using the libhugetlbfs tests. v4: - Split testcases/document patch into test cases, in-code documentation, and formal documentation patches - Modified the mmap_base macro to be more legible and better represent memory layout - Fixed documentation to better reflect the implmentation - Renamed DEFAULT_VA_BITS to MMAP_VA_BITS - Added additional test case for rlimit changes --- Charlie Jenkins (4): RISC-V: mm: Restrict address space for sv39,sv48,sv57 RISC-V: mm: Add tests for RISC-V mm RISC-V: mm: Update pgtable comment documentation RISC-V: mm: Document mmap changes Documentation/riscv/vm-layout.rst | 22 +++++++ arch/riscv/include/asm/elf.h | 2 +- arch/riscv/include/asm/pgtable.h | 33 ++++++++-- arch/riscv/include/asm/processor.h | 52 +++++++++++++-- tools/testing/selftests/riscv/Makefile | 2 +- tools/testing/selftests/riscv/mm/.gitignore | 2 + tools/testing/selftests/riscv/mm/Makefile | 15 +++++ .../riscv/mm/testcases/mmap_bottomup.c | 35 ++++++++++ .../riscv/mm/testcases/mmap_default.c | 35 ++++++++++ .../selftests/riscv/mm/testcases/mmap_test.h | 64 +++++++++++++++++++ .../selftests/riscv/mm/testcases/run_mmap.sh | 12 ++++ 11 files changed, 261 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/riscv/mm/.gitignore create mode 100644 tools/testing/selftests/riscv/mm/Makefile create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_bottomup.c create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_default.c create mode 100644 tools/testing/selftests/riscv/mm/testcases/mmap_test.h create mode 100755 tools/testing/selftests/riscv/mm/testcases/run_mmap.sh -- 2.34.1

3 months, 4 weeks

3
11
0 0

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror October 2023