From: Dmitry Vyukov <dvyukov(a)google.com>
POSIX timers using the CLOCK_PROCESS_CPUTIME_ID clock prefer the main
thread of a thread group for signal delivery. However, this has a
significant downside: it requires waking up a potentially idle thread.
Instead, prefer to deliver signals to the current thread (in the same
thread group) if SIGEV_THREAD_ID is not set by the user. This does not
change guaranteed semantics, since POSIX process CPU time timers have
never guaranteed that signal delivery is to a specific thread (without
SIGEV_THREAD_ID set).
The effect is that we no longer wake up potentially idle threads, and
the kernel is no longer biased towards delivering the timer signal to
any particular thread (which better distributes the timer signals esp.
when multiple timers fire concurrently).
Signed-off-by: Dmitry Vyukov <dvyukov(a)google.com>
Suggested-by: Oleg Nesterov <oleg(a)redhat.com>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Marco Elver <elver(a)google.com>
---
v6:
- Split test from this patch.
- Update wording on what this patch aims to improve.
v5:
- Rebased onto v6.2.
v4:
- Restructured checks in send_sigqueue() as suggested.
v3:
- Switched to the completely different implementation (much simpler)
based on the Oleg's idea.
RFC v2:
- Added additional Cc as Thomas asked.
---
kernel/signal.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/kernel/signal.c b/kernel/signal.c
index 8cb28f1df294..605445fa27d4 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1003,8 +1003,7 @@ static void complete_signal(int sig, struct task_struct *p, enum pid_type type)
/*
* Now find a thread we can wake up to take the signal off the queue.
*
- * If the main thread wants the signal, it gets first crack.
- * Probably the least surprising to the average bear.
+ * Try the suggested task first (may or may not be the main thread).
*/
if (wants_signal(sig, p))
t = p;
@@ -1970,8 +1969,23 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
ret = -1;
rcu_read_lock();
+ /*
+ * This function is used by POSIX timers to deliver a timer signal.
+ * Where type is PIDTYPE_PID (such as for timers with SIGEV_THREAD_ID
+ * set), the signal must be delivered to the specific thread (queues
+ * into t->pending).
+ *
+ * Where type is not PIDTYPE_PID, signals must just be delivered to the
+ * current process. In this case, prefer to deliver to current if it is
+ * in the same thread group as the target, as it avoids unnecessarily
+ * waking up a potentially idle task.
+ */
t = pid_task(pid, type);
- if (!t || !likely(lock_task_sighand(t, &flags)))
+ if (!t)
+ goto ret;
+ if (type != PIDTYPE_PID && same_thread_group(t, current))
+ t = current;
+ if (!likely(lock_task_sighand(t, &flags)))
goto ret;
ret = 1; /* the signal is ignored */
@@ -1993,6 +2007,11 @@ int send_sigqueue(struct sigqueue *q, struct pid *pid, enum pid_type type)
q->info.si_overrun = 0;
signalfd_notify(t, sig);
+ /*
+ * If the type is not PIDTYPE_PID, we just use shared_pending, which
+ * won't guarantee that the specified task will receive the signal, but
+ * is sufficient if t==current in the common case.
+ */
pending = (type != PIDTYPE_PID) ? &t->signal->shared_pending : &t->pending;
list_add_tail(&q->list, &pending->list);
sigaddset(&pending->signal, sig);
--
2.40.0.rc1.284.g88254d51c5-goog
KUnit's try-catch infrastructure now uses vfork_done, which is always
set to a valid completion when a kthread is created, but which is set to
NULL once the thread terminates. This creates a race condition, where
the kthread exits before we can wait on it.
Keep a copy of vfork_done, which is taken before we wake_up_process()
and so valid, and wait on that instead.
Fixes: 4de2a8e4cca4 ("kunit: Handle test faults")
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Closes: https://lore.kernel.org/lkml/20240410102710.35911-1-naresh.kamboju@linaro.o…
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Acked-by: Mickaël Salaün <mic(a)digikod.net>
Signed-off-by: David Gow <davidgow(a)google.com>
---
lib/kunit/try-catch.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/lib/kunit/try-catch.c b/lib/kunit/try-catch.c
index fa687278ccc9..6bbe0025b079 100644
--- a/lib/kunit/try-catch.c
+++ b/lib/kunit/try-catch.c
@@ -63,6 +63,7 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
{
struct kunit *test = try_catch->test;
struct task_struct *task_struct;
+ struct completion *task_done;
int exit_code, time_remaining;
try_catch->context = context;
@@ -75,13 +76,16 @@ void kunit_try_catch_run(struct kunit_try_catch *try_catch, void *context)
return;
}
get_task_struct(task_struct);
- wake_up_process(task_struct);
/*
* As for a vfork(2), task_struct->vfork_done (pointing to the
* underlying kthread->exited) can be used to wait for the end of a
- * kernel thread.
+ * kernel thread. It is set to NULL when the thread exits, so we
+ * keep a copy here.
*/
- time_remaining = wait_for_completion_timeout(task_struct->vfork_done,
+ task_done = task_struct->vfork_done;
+ wake_up_process(task_struct);
+
+ time_remaining = wait_for_completion_timeout(task_done,
kunit_test_timeout());
if (time_remaining == 0) {
try_catch->try_result = -ETIMEDOUT;
--
2.44.0.683.g7961c838ac-goog
Currently, the migration worker delays 1-10 us, assuming that one
KVM_RUN iteration only takes a few microseconds. But if C-state exit
latencies are large enough, for example, hundreds or even thousands
of microseconds on server CPUs, it may happen that it's not able to
bring the target CPU out of C-state before the migration worker starts
to migrate it to the next CPU.
If the system workload is light, most CPUs could be at a certain level
of C-state, and the vCPU thread may waste milliseconds before it can
actually migrate to a new CPU.
Thus, the tests may be inefficient in such systems, and in some cases
it may fail the migration/KVM_RUN ratio sanity check.
Since we are not able to turn off the cpuidle sub-system in run time,
this patch creates an idle thread on every CPU to prevent them from
entering C-states.
Additionally, seems it's reasonable to randomize the length of usleep(),
other than delay in a fixed pattern.
Signed-off-by: Zide Chen <zide.chen(a)intel.com>
---
tools/testing/selftests/kvm/rseq_test.c | 76 ++++++++++++++++++++++---
1 file changed, 69 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/kvm/rseq_test.c b/tools/testing/selftests/kvm/rseq_test.c
index 28f97fb52044..d6e8b851d29e 100644
--- a/tools/testing/selftests/kvm/rseq_test.c
+++ b/tools/testing/selftests/kvm/rseq_test.c
@@ -11,6 +11,7 @@
#include <syscall.h>
#include <sys/ioctl.h>
#include <sys/sysinfo.h>
+#include <sys/resource.h>
#include <asm/barrier.h>
#include <linux/atomic.h>
#include <linux/rseq.h>
@@ -29,9 +30,10 @@
#define NR_TASK_MIGRATIONS 100000
static pthread_t migration_thread;
+static pthread_t *idle_threads;
static cpu_set_t possible_mask;
-static int min_cpu, max_cpu;
-static bool done;
+static int min_cpu, max_cpu, nproc;
+static volatile bool done;
static atomic_t seq_cnt;
@@ -150,7 +152,7 @@ static void *migration_worker(void *__rseq_tid)
* Use usleep() for simplicity and to avoid unnecessary kernel
* dependencies.
*/
- usleep((i % 10) + 1);
+ usleep((rand() % 10) + 1);
}
done = true;
return NULL;
@@ -158,7 +160,7 @@ static void *migration_worker(void *__rseq_tid)
static void calc_min_max_cpu(void)
{
- int i, cnt, nproc;
+ int i, cnt;
TEST_REQUIRE(CPU_COUNT(&possible_mask) >= 2);
@@ -186,6 +188,61 @@ static void calc_min_max_cpu(void)
"Only one usable CPU, task migration not possible");
}
+static void *idle_thread_fn(void *__idle_cpu)
+{
+ int r, cpu = (int)(unsigned long)__idle_cpu;
+ cpu_set_t allowed_mask;
+
+ CPU_ZERO(&allowed_mask);
+ CPU_SET(cpu, &allowed_mask);
+
+ r = sched_setaffinity(0, sizeof(allowed_mask), &allowed_mask);
+ TEST_ASSERT(!r, "sched_setaffinity failed, errno = %d (%s)",
+ errno, strerror(errno));
+
+ /* lowest priority, trying to prevent it from entering C-states */
+ r = setpriority(PRIO_PROCESS, 0, 19);
+ TEST_ASSERT(!r, "setpriority failed, errno = %d (%s)",
+ errno, strerror(errno));
+
+ while(!done);
+
+ return NULL;
+}
+
+static void spawn_threads(void)
+{
+ int cpu;
+
+ /* Run a dummy thread on every CPU */
+ for (cpu = min_cpu; cpu <= max_cpu; cpu++) {
+ if (!CPU_ISSET(cpu, &possible_mask))
+ continue;
+
+ pthread_create(&idle_threads[cpu], NULL, idle_thread_fn,
+ (void *)(unsigned long)cpu);
+ }
+
+ pthread_create(&migration_thread, NULL, migration_worker,
+ (void *)(unsigned long)syscall(SYS_gettid));
+}
+
+static void join_threads(void)
+{
+ int cpu;
+
+ pthread_join(migration_thread, NULL);
+
+ for (cpu = min_cpu; cpu <= max_cpu; cpu++) {
+ if (!CPU_ISSET(cpu, &possible_mask))
+ continue;
+
+ pthread_join(idle_threads[cpu], NULL);
+ }
+
+ free(idle_threads);
+}
+
int main(int argc, char *argv[])
{
int r, i, snapshot;
@@ -199,6 +256,12 @@ int main(int argc, char *argv[])
calc_min_max_cpu();
+ srand(time(NULL));
+
+ idle_threads = malloc(sizeof(pthread_t) * nproc);
+ TEST_ASSERT(idle_threads, "malloc failed, errno = %d (%s)", errno,
+ strerror(errno));
+
r = rseq_register_current_thread();
TEST_ASSERT(!r, "rseq_register_current_thread failed, errno = %d (%s)",
errno, strerror(errno));
@@ -210,8 +273,7 @@ int main(int argc, char *argv[])
*/
vm = vm_create_with_one_vcpu(&vcpu, guest_code);
- pthread_create(&migration_thread, NULL, migration_worker,
- (void *)(unsigned long)syscall(SYS_gettid));
+ spawn_threads();
for (i = 0; !done; i++) {
vcpu_run(vcpu);
@@ -258,7 +320,7 @@ int main(int argc, char *argv[])
TEST_ASSERT(i > (NR_TASK_MIGRATIONS / 2),
"Only performed %d KVM_RUNs, task stalled too much?", i);
- pthread_join(migration_thread, NULL);
+ join_threads();
kvm_vm_free(vm);
--
2.34.1
From: Geliang Tang <tanggeliang(a)kylinos.cn>
v2:
- update patch 6 only, fix errors reported by CI.
This patchset uses public helpers start_server_* and connect_to_* defined
in network_helpers.c to drop duplicate code.
Geliang Tang (14):
selftests/bpf: Add start_server_addr helper
selftests/bpf: Use start_server_addr in cls_redirect
selftests/bpf: Use connect_to_addr in cls_redirect
selftests/bpf: Use start_server_addr in sk_assign
selftests/bpf: Use connect_to_addr in sk_assign
selftests/bpf: Use log_err in network_helpers
selftests/bpf: Use start_server_addr in test_sock_addr
selftests/bpf: Use connect_to_addr in test_sock_addr
selftests/bpf: Add function pointer for __start_server
selftests/bpf: Add start_server_setsockopt helper
selftests/bpf: Use start_server_setsockopt in sockopt_inherit
selftests/bpf: Use connect_to_fd in sockopt_inherit
selftests/bpf: Use start_server_* in test_tcp_check_syncookie
selftests/bpf: Use connect_to_addr in test_tcp_check_syncookie
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/network_helpers.c | 50 ++++++++----
tools/testing/selftests/bpf/network_helpers.h | 4 +
.../selftests/bpf/prog_tests/cls_redirect.c | 38 +---------
.../selftests/bpf/prog_tests/sk_assign.c | 53 +------------
.../bpf/prog_tests/sockopt_inherit.c | 64 ++++------------
tools/testing/selftests/bpf/test_sock_addr.c | 74 ++----------------
.../bpf/test_tcp_check_syncookie_user.c | 76 +++----------------
8 files changed, 83 insertions(+), 280 deletions(-)
--
2.40.1
The series consists of two parts:
- pids.events rework (originally v2, patches 1-6,
- migration charging, patches 7-9.
The changes are independent in principle, I stacked them for (my)
convenience and because they both deserve RFC:
1) Changed semantics of v2 pids.events
- similar change was proposed for memory.swap.events:max [1]
2) Migration charging is obsolete concept
How are the new events supposed to be useful?
- pids.events.local:max
- tells that cgroup's limit is hit (too tight?)
- pids.events.local:max.imposed
- tells that cgroup's workload was restricted (generalization of
'cgroup: fork rejected by pids controller in %s' message)
- pids.events:*
- "only" directs top-down search to cgroups of interest
The migration charging is motivated by apparenty surprising
pids.current > pids.max
because supervised processes are forked in supervisor's cgroup (more
details in commit cgroup/pids: Enforce pids.max on task migrations too)
Changes from v2 (https://lore.kernel.org/r/20200205134426.10570-1-mkoutny@suse.com)
- implemented pids.events.local (Tejun)
- added migration charging
[1] https://lore.kernel.org/r/20230202155626.1829121-1-hannes@cmpxchg.org/
Michal Koutný (9):
cgroup/pids: Remove superfluous zeroing
cgroup/pids: Separate semantics of pids.events related to pids.max
cgroup/pids: Make event counters hierarchical
cgroup/pids: Add pids.events.local
selftests: cgroup: Lexicographic order in Makefile
selftests: cgroup: Add basic tests for pids controller
cgroup/pids: Replace uncharge/charge pair with a single function
cgroup/pids: Enforce pids.max on task migrations
selftests: cgroup: Add tests pids controller
Documentation/admin-guide/cgroup-v1/pids.rst | 3 +-
Documentation/admin-guide/cgroup-v2.rst | 22 +-
include/linux/cgroup-defs.h | 7 +-
kernel/cgroup/cgroup.c | 16 +-
kernel/cgroup/pids.c | 206 +++++++++----
tools/testing/selftests/cgroup/Makefile | 25 +-
tools/testing/selftests/cgroup/test_pids.c | 302 +++++++++++++++++++
7 files changed, 514 insertions(+), 67 deletions(-)
create mode 100644 tools/testing/selftests/cgroup/test_pids.c
base-commit: 026e680b0a08a62b1d948e5a8ca78700bfac0e6e
--
2.44.0
After commit 6d029c25b71f ("selftests/timers/posix_timers: Reimplement
check_timer_distribution()"), clang warns:
tools/testing/selftests/timers/../kselftest.h:398:6: warning: variable 'major' is used uninitialized whenever '||' condition is true [-Wsometimes-uninitialized]
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:401:9: note: uninitialized use occurs here
401 | return major > min_major || (major == min_major && minor >= min_minor);
| ^~~~~
tools/testing/selftests/timers/../kselftest.h:398:6: note: remove the '||' if its condition is always false
398 | if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
| ^~~~~~~~~~~~~~~
tools/testing/selftests/timers/../kselftest.h:395:20: note: initialize the variable 'major' to silence this warning
395 | unsigned int major, minor;
| ^
| = 0
This is a false positive because if uname() fails, ksft_exit_fail_msg()
will be called, which unconditionally calls exit(), a noreturn function.
However, clang does not know that ksft_exit_fail_msg() will call exit()
at the point in the pipeline that the warning is emitted because
inlining has not occurred, so it assumes control flow will resume
normally after ksft_exit_fail_msg() is called.
Make it clear to clang that all of the functions that call exit()
unconditionally in kselftest.h are noreturn transitively by marking them
explicitly with '__attribute__((__noreturn__))', which clears up the
warning above and any future warnings that may appear for the same
reason.
Fixes: 6d029c25b71f ("selftests/timers/posix_timers: Reimplement check_timer_distribution()")
Reported-by: John Stultz <jstultz(a)google.com>
Closes: https://lore.kernel.org/all/20240410232637.4135564-2-jstultz@google.com/
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
I have based this change on timers/urgent, as the commit that introduces
this particular warning is there and it is marked for stable, even
though this appears to be a generic kselftest issue. I think it makes
the most sense for this change to go via timers/urgent with Shuah's ack.
While __noreturn with a return type other than 'void' does not make much
sense semantically, there are many places that these functions are used
as the return value for other functions such as main(), so I did not
change the return type of these functions from 'int' to 'void' to
minimize the necessary changes for a backport (it is an existing issue
anyways).
I see there is another instance of this problem that will need to be
addressed in -next, introduced by commit f07041728422 ("selftests: add
ksft_exit_fail_perror()").
---
tools/testing/selftests/kselftest.h | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 973b18e156b2..0591974b57e0 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -80,6 +80,9 @@
#define KSFT_XPASS 3
#define KSFT_SKIP 4
+#ifndef __noreturn
+#define __noreturn __attribute__((__noreturn__))
+#endif
#define __printf(a, b) __attribute__((format(printf, a, b)))
/* counters */
@@ -300,13 +303,13 @@ void ksft_test_result_code(int exit_code, const char *test_name,
va_end(args);
}
-static inline int ksft_exit_pass(void)
+static inline __noreturn int ksft_exit_pass(void)
{
ksft_print_cnts();
exit(KSFT_PASS);
}
-static inline int ksft_exit_fail(void)
+static inline __noreturn int ksft_exit_fail(void)
{
ksft_print_cnts();
exit(KSFT_FAIL);
@@ -333,7 +336,7 @@ static inline int ksft_exit_fail(void)
ksft_cnt.ksft_xfail + \
ksft_cnt.ksft_xskip)
-static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
@@ -348,19 +351,19 @@ static inline __printf(1, 2) int ksft_exit_fail_msg(const char *msg, ...)
exit(KSFT_FAIL);
}
-static inline int ksft_exit_xfail(void)
+static inline __noreturn int ksft_exit_xfail(void)
{
ksft_print_cnts();
exit(KSFT_XFAIL);
}
-static inline int ksft_exit_xpass(void)
+static inline __noreturn int ksft_exit_xpass(void)
{
ksft_print_cnts();
exit(KSFT_XPASS);
}
-static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
+static inline __noreturn __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
{
int saved_errno = errno;
va_list args;
---
base-commit: 076361362122a6d8a4c45f172ced5576b2d4a50d
change-id: 20240411-mark-kselftest-exit-funcs-noreturn-17d8ff729a7a
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>