step_after_suspend_test fails with device busy error while
writing to /sys/power/state to start suspend. The test believes
it failed to enter suspend state with
$ sudo ./step_after_suspend_test
TAP version 13
Bail out! Failed to enter Suspend state
However, in the kernel message, I indeed see the system get
suspended and then wake up later.
[611172.033108] PM: suspend entry (s2idle)
[611172.044940] Filesystems sync: 0.006 seconds
[611172.052254] Freezing user space processes
[611172.059319] Freezing user space processes completed (elapsed 0.001 seconds)
[611172.067920] OOM killer disabled.
[611172.072465] Freezing remaining freezable tasks
[611172.080332] Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
[611172.089724] printk: Suspending console(s) (use no_console_suspend to debug)
[611172.117126] serial 00:03: disabled
some other hardware get reconnected
[611203.136277] OOM killer enabled.
[611203.140637] Restarting tasks ...
[611203.141135] usb 1-8.1: USB disconnect, device number 7
[611203.141755] done.
[611203.155268] random: crng reseeded on system resumption
[611203.162059] PM: suspend exit
After investigation, I noticed that for the code block
if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
ksft_exit_fail_msg("Failed to enter Suspend state\n");
The write will return -1 and errno is set to 16 (device busy).
It should be caused by the write function is not successfully returned
before the system suspend and the return value get messed when waking up.
As a result, It may be better to check the time passed of those few
instructions to determine whether the suspend is executed correctly for
it is pretty hard to execute those few lines for 5 seconds.
The timer to wake up the system is set to expire after 5 seconds and
no re-arm. If the timer remaining time is 0 second and 0 nano secomd,
it means the timer expired and wake the system up. Otherwise, the system
could be considered to enter the suspend state failed if there is any
remaining time.
After appling this patch, the test would not fail for it believes the
system does not go to suspend by mistake. It now could continue to the
rest part of the test after suspend.
Fixes: bfd092b8c272 ("selftests: breakpoint: add step_after_suspend_test")
Reported-by: Sinadin Shan <sinadin.shan(a)oracle.com>
Signed-off-by: Yifei Liu <yifei.l.liu(a)oracle.com>
---
v4->v5: Remove the above quotes in the first part.
remove the incorrect format which could confuse the git.
---
.../testing/selftests/breakpoints/step_after_suspend_test.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/step_after_suspend_test.c b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
index dfec31fb9b30d..8d275f03e977f 100644
--- a/tools/testing/selftests/breakpoints/step_after_suspend_test.c
+++ b/tools/testing/selftests/breakpoints/step_after_suspend_test.c
@@ -152,7 +152,10 @@ void suspend(void)
if (err < 0)
ksft_exit_fail_msg("timerfd_settime() failed\n");
- if (write(power_state_fd, "mem", strlen("mem")) != strlen("mem"))
+ system("(echo mem > /sys/power/state) 2> /dev/null");
+
+ timerfd_gettime(timerfd, &spec);
+ if (spec.it_value.tv_sec != 0 || spec.it_value.tv_nsec != 0)
ksft_exit_fail_msg("Failed to enter Suspend state\n");
close(timerfd);
--
2.46.0
Thanks to Miroslav, Petr and Marcos for the reviews!
V4:
Use variable for /sys/kernel/debug.
Be consistent with "" around variables.
Fix path in commit message to /sys/kernel/debug/kprobes/enabled.
V3:
Save and restore kprobe state also when test fails, by integrating it
into setup_config() and cleanup().
Rename SYSFS variables in a more logical way.
Sort test modules in alphabetical order.
Rename module description.
V2:
Save and restore kprobe state.
Michael Vetter (3):
selftests: livepatch: rename KLP_SYSFS_DIR to SYSFS_KLP_DIR
selftests: livepatch: save and restore kprobe state
selftests: livepatch: test livepatching a kprobed function
tools/testing/selftests/livepatch/Makefile | 3 +-
.../testing/selftests/livepatch/functions.sh | 19 ++++--
.../selftests/livepatch/test-kprobe.sh | 62 +++++++++++++++++++
.../selftests/livepatch/test_modules/Makefile | 3 +-
.../livepatch/test_modules/test_klp_kprobe.c | 38 ++++++++++++
5 files changed, 117 insertions(+), 8 deletions(-)
create mode 100755 tools/testing/selftests/livepatch/test-kprobe.sh
create mode 100644 tools/testing/selftests/livepatch/test_modules/test_klp_kprobe.c
--
2.46.1
Some applications rely on placing data in free bits addresses allocated
by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
address returned by mmap to be less than the 48-bit address space,
unless the hint address uses more than 47 bits (the 48th bit is reserved
for the kernel address space).
The riscv architecture needs a way to similarly restrict the virtual
address space. On the riscv port of OpenJDK an error is thrown if
attempted to run on the 57-bit address space, called sv57 [1]. golang
has a comment that sv57 support is not complete, but there are some
workarounds to get it to mostly work [2].
These applications work on x86 because x86 does an implicit 47-bit
restriction of mmap() address that contain a hint address that is less
than 48 bits.
Instead of implicitly restricting the address space on riscv (or any
current/future architecture), provide a flag to the personality syscall
that can be used to ensure an application works in any arbitrary VA
space. A similar feature has already been implemented by the personality
syscall in ADDR_LIMIT_32BIT.
This flag will also allow seemless compatibility between all
architectures, so applications like Go and OpenJDK that use bits in a
virtual address can request the exact number of bits they need in a
generic way. The flag can be checked inside of vm_unmapped_area() so
that this flag does not have to be handled individually by each
architecture.
Link:
https://github.com/openjdk/jdk/blob/f080b4bb8a75284db1b6037f8c00ef3b1ef1add…
[1]
Link:
https://github.com/golang/go/blob/9e8ea567c838574a0f14538c0bbbd83c3215aa55/…
[2]
To: Arnd Bergmann <arnd(a)arndb.de>
To: Richard Henderson <richard.henderson(a)linaro.org>
To: Ivan Kokshaysky <ink(a)jurassic.park.msu.ru>
To: Matt Turner <mattst88(a)gmail.com>
To: Vineet Gupta <vgupta(a)kernel.org>
To: Russell King <linux(a)armlinux.org.uk>
To: Guo Ren <guoren(a)kernel.org>
To: Huacai Chen <chenhuacai(a)kernel.org>
To: WANG Xuerui <kernel(a)xen0n.name>
To: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
To: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com>
To: Helge Deller <deller(a)gmx.de>
To: Michael Ellerman <mpe(a)ellerman.id.au>
To: Nicholas Piggin <npiggin(a)gmail.com>
To: Christophe Leroy <christophe.leroy(a)csgroup.eu>
To: Naveen N Rao <naveen(a)kernel.org>
To: Alexander Gordeev <agordeev(a)linux.ibm.com>
To: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
To: Heiko Carstens <hca(a)linux.ibm.com>
To: Vasily Gorbik <gor(a)linux.ibm.com>
To: Christian Borntraeger <borntraeger(a)linux.ibm.com>
To: Sven Schnelle <svens(a)linux.ibm.com>
To: Yoshinori Sato <ysato(a)users.sourceforge.jp>
To: Rich Felker <dalias(a)libc.org>
To: John Paul Adrian Glaubitz <glaubitz(a)physik.fu-berlin.de>
To: David S. Miller <davem(a)davemloft.net>
To: Andreas Larsson <andreas(a)gaisler.com>
To: Thomas Gleixner <tglx(a)linutronix.de>
To: Ingo Molnar <mingo(a)redhat.com>
To: Borislav Petkov <bp(a)alien8.de>
To: Dave Hansen <dave.hansen(a)linux.intel.com>
To: x86(a)kernel.org
To: H. Peter Anvin <hpa(a)zytor.com>
To: Andy Lutomirski <luto(a)kernel.org>
To: Peter Zijlstra <peterz(a)infradead.org>
To: Muchun Song <muchun.song(a)linux.dev>
To: Andrew Morton <akpm(a)linux-foundation.org>
To: Liam R. Howlett <Liam.Howlett(a)oracle.com>
To: Vlastimil Babka <vbabka(a)suse.cz>
To: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
To: Shuah Khan <shuah(a)kernel.org>
To: Christoph Hellwig <hch(a)infradead.org>
To: Michal Hocko <mhocko(a)suse.com>
To: "Kirill A. Shutemov" <kirill(a)shutemov.name>
To: Chris Torek <chris.torek(a)gmail.com>
Cc: linux-arch(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-alpha(a)vger.kernel.org
Cc: linux-snps-arc(a)lists.infradead.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-csky(a)vger.kernel.org
Cc: loongarch(a)lists.linux.dev
Cc: linux-mips(a)vger.kernel.org
Cc: linux-parisc(a)vger.kernel.org
Cc: linuxppc-dev(a)lists.ozlabs.org
Cc: linux-s390(a)vger.kernel.org
Cc: linux-sh(a)vger.kernel.org
Cc: sparclinux(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-abi-devel(a)lists.sourceforge.net
Signed-off-by: Charlie Jenkins <charlie(a)rivosinc.com>
Changes in v2:
- Added much greater detail to cover letter
- Removed all code that touched architecture specific code and was able
to factor this out into all generic functions, except for flags that
needed to be added to vm_unmapped_area_info
- Made this an RFC since I have only tested it on riscv and x86
- Link to v1: https://lore.kernel.org/r/20240827-patches-below_hint_mmap-v1-0-46ff2eb9022…
Changes in v3:
- Use a personality flag instead of an mmap flag
- Link to v2: https://lore.kernel.org/r/20240829-patches-below_hint_mmap-v2-0-638a28d9eae…
---
Charlie Jenkins (2):
mm: Add personality flag to limit address to 47 bits
selftests/mm: Create ADDR_LIMIT_47BIT test
include/uapi/linux/personality.h | 1 +
mm/mmap.c | 3 ++
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 1 +
tools/testing/selftests/mm/map_47bit_personality.c | 34 ++++++++++++++++++++++
5 files changed, 40 insertions(+)
---
base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
--
- Charlie
virtio-net have two usage of hashes: one is RSS and another is hash
reporting. Conventionally the hash calculation was done by the VMM.
However, computing the hash after the queue was chosen defeats the
purpose of RSS.
Another approach is to use eBPF steering program. This approach has
another downside: it cannot report the calculated hash due to the
restrictive nature of eBPF.
Introduce the code to compute hashes to the kernel in order to overcome
thse challenges.
An alternative solution is to extend the eBPF steering program so that it
will be able to report to the userspace, but it is based on context
rewrites, which is in feature freeze. We can adopt kfuncs, but they will
not be UAPIs. We opt to ioctl to align with other relevant UAPIs (KVM
and vhost_net).
The patches for QEMU to use this new feature was submitted as RFC and
is available at:
https://patchew.org/QEMU/20240915-hash-v3-0-79cb08d28647@daynix.com/
This work was presented at LPC 2024:
https://lpc.events/event/18/contributions/1963/
V1 -> V2:
Changed to introduce a new BPF program type.
Signed-off-by: Akihiko Odaki <akihiko.odaki(a)daynix.com>
---
Changes in v4:
- Moved tun_vnet_hash_ext to if_tun.h.
- Renamed virtio_net_toeplitz() to virtio_net_toeplitz_calc().
- Replaced htons() with cpu_to_be16().
- Changed virtio_net_hash_rss() to return void.
- Reordered variable declarations in virtio_net_hash_rss().
- Removed virtio_net_hdr_v1_hash_from_skb().
- Updated messages of "tap: Pad virtio header with zero" and
"tun: Pad virtio header with zero".
- Fixed vnet_hash allocation size.
- Ensured to free vnet_hash when destructing tun_struct.
- Link to v3: https://lore.kernel.org/r/20240915-rss-v3-0-c630015db082@daynix.com
Changes in v3:
- Reverted back to add ioctl.
- Split patch "tun: Introduce virtio-net hashing feature" into
"tun: Introduce virtio-net hash reporting feature" and
"tun: Introduce virtio-net RSS".
- Changed to reuse hash values computed for automq instead of performing
RSS hashing when hash reporting is requested but RSS is not.
- Extracted relevant data from struct tun_struct to keep it minimal.
- Added kernel-doc.
- Changed to allow calling TUNGETVNETHASHCAP before TUNSETIFF.
- Initialized num_buffers with 1.
- Added a test case for unclassified packets.
- Fixed error handling in tests.
- Changed tests to verify that the queue index will not overflow.
- Rebased.
- Link to v2: https://lore.kernel.org/r/20231015141644.260646-1-akihiko.odaki@daynix.com
---
Akihiko Odaki (9):
skbuff: Introduce SKB_EXT_TUN_VNET_HASH
virtio_net: Add functions for hashing
net: flow_dissector: Export flow_keys_dissector_symmetric
tap: Pad virtio header with zero
tun: Pad virtio header with zero
tun: Introduce virtio-net hash reporting feature
tun: Introduce virtio-net RSS
selftest: tun: Add tests for virtio-net hashing
vhost/net: Support VIRTIO_NET_F_HASH_REPORT
Documentation/networking/tuntap.rst | 7 +
drivers/net/Kconfig | 1 +
drivers/net/tap.c | 2 +-
drivers/net/tun.c | 255 ++++++++++++--
drivers/vhost/net.c | 16 +-
include/linux/if_tun.h | 5 +
include/linux/skbuff.h | 3 +
include/linux/virtio_net.h | 174 +++++++++
include/net/flow_dissector.h | 1 +
include/uapi/linux/if_tun.h | 71 ++++
net/core/flow_dissector.c | 3 +-
net/core/skbuff.c | 4 +
tools/testing/selftests/net/Makefile | 2 +-
tools/testing/selftests/net/tun.c | 666 ++++++++++++++++++++++++++++++++++-
14 files changed, 1170 insertions(+), 40 deletions(-)
---
base-commit: 752ebcbe87aceeb6334e846a466116197711a982
change-id: 20240403-rss-e737d89efa77
Best regards,
--
Akihiko Odaki <akihiko.odaki(a)daynix.com>
This patch allows progs to elide a null check on statically known map
lookup keys. In other words, if the verifier can statically prove that
the lookup will be in-bounds, allow the prog to drop the null check.
This is useful for two reasons:
1. Large numbers of nullness checks (especially when they cannot fail)
unnecessarily pushes prog towards BPF_COMPLEXITY_LIMIT_JMP_SEQ.
2. It forms a tighter contract between programmer and verifier.
For (1), bpftrace is starting to make heavier use of percpu scratch
maps. As a result, for user scripts with large number of unrolled loops,
we are starting to hit jump complexity verification errors. These
percpu lookups cannot fail anyways, as we only use static key values.
Eliding nullness probably results in less work for verifier as well.
For (2), percpu scratch maps are often used as a larger stack, as the
currrent stack is limited to 512 bytes. In these situations, it is
desirable for the programmer to express: "this lookup should never fail,
and if it does, it means I messed up the code". By omitting the null
check, the programmer can "ask" the verifier to double check the logic.
Changes in v3:
* Check if stack is (erroneously) growing upwards
* Mention in commit message why existing tests needed change
Changes in v2:
* Added a check for when R2 is not a ptr to stack
* Added a check for when stack is uninitialized (no stack slot yet)
* Updated existing tests to account for null elision
* Added test case for when R2 can be both const and non-const
Daniel Xu (2):
bpf: verifier: Support eliding map lookup nullness
bpf: selftests: verifier: Add nullness elision tests
kernel/bpf/verifier.c | 67 ++++++-
tools/testing/selftests/bpf/progs/iters.c | 14 +-
.../selftests/bpf/progs/map_kptr_fail.c | 2 +-
.../bpf/progs/verifier_array_access.c | 166 ++++++++++++++++++
.../selftests/bpf/progs/verifier_map_in_map.c | 2 +-
.../testing/selftests/bpf/verifier/map_kptr.c | 2 +-
6 files changed, 242 insertions(+), 11 deletions(-)
--
2.46.0
From: Joshua Hahn <joshua.hahn6(a)gmail.com>
v2 -> v3: Signed-off-by & renamed subject for clarity.
v1 -> v2: Edited commit messages for clarity.
Niced CPU usage is a metric reported in host-level /prot/stat, but is
not reported in cgroup-level statistics in cpu.stat. However, when a
host contains multiple tasks across different workloads, it becomes
difficult to gauge how much of the task is being spent on niced
processes based on /proc/stat alone, since host-level metrics do not
provide this cgroup-level granularity.
Exposing this metric will allow users to accurately probe the niced CPU
metric for each workload, and make more informed decisions when
directing higher priority tasks.
Joshua Hahn (2):
Tracking cgroup-level niced CPU time
Selftests for niced CPU statistics
include/linux/cgroup-defs.h | 1 +
kernel/cgroup/rstat.c | 16 ++++-
tools/testing/selftests/cgroup/test_cpu.c | 72 +++++++++++++++++++++++
3 files changed, 86 insertions(+), 3 deletions(-)
--
2.43.5
The recent addition of support for testing with the x86 specific quirk
KVM_X86_QUIRK_SLOT_ZAP_ALL disabled in the generic memslot tests broke the
build of the KVM selftests for all other architectures:
In file included from include/kvm_util.h:8,
from include/memstress.h:13,
from memslot_modification_stress_test.c:21:
memslot_modification_stress_test.c: In function ‘main’:
memslot_modification_stress_test.c:176:38: error: ‘KVM_X86_QUIRK_SLOT_ZAP_ALL’ undeclared (first use in this function)
176 | KVM_X86_QUIRK_SLOT_ZAP_ALL);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
Add __x86_64__ guard defines to avoid building the relevant code on other
architectures.
Fixes: 61de4c34b51c ("KVM: selftests: Test memslot move in memslot_perf_test with quirk disabled")
Fixes: 218f6415004a ("KVM: selftests: Allow slot modification stress test with quirk disabled")
Reported-by: Aishwarya TCV <aishwarya.tcv(a)arm.com>
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
This is obviously disruptive for testing of KVM changes on non-x86
architectures.
---
tools/testing/selftests/kvm/memslot_modification_stress_test.c | 2 ++
tools/testing/selftests/kvm/memslot_perf_test.c | 6 ++++++
2 files changed, 8 insertions(+)
diff --git a/tools/testing/selftests/kvm/memslot_modification_stress_test.c b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
index e3343f0df9e1..c81a84990eab 100644
--- a/tools/testing/selftests/kvm/memslot_modification_stress_test.c
+++ b/tools/testing/selftests/kvm/memslot_modification_stress_test.c
@@ -169,12 +169,14 @@ int main(int argc, char *argv[])
case 'i':
p.nr_iterations = atoi_positive("Number of iterations", optarg);
break;
+#ifdef __x86_64__
case 'q':
p.disable_slot_zap_quirk = true;
TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
KVM_X86_QUIRK_SLOT_ZAP_ALL);
break;
+#endif
case 'h':
default:
help(argv[0]);
diff --git a/tools/testing/selftests/kvm/memslot_perf_test.c b/tools/testing/selftests/kvm/memslot_perf_test.c
index 893366982f77..989ffe0d047f 100644
--- a/tools/testing/selftests/kvm/memslot_perf_test.c
+++ b/tools/testing/selftests/kvm/memslot_perf_test.c
@@ -113,7 +113,9 @@ static_assert(ATOMIC_BOOL_LOCK_FREE == 2, "atomic bool is not lockless");
static sem_t vcpu_ready;
static bool map_unmap_verify;
+#ifdef __x86_64__
static bool disable_slot_zap_quirk;
+#endif
static bool verbose;
#define pr_info_v(...) \
@@ -579,8 +581,10 @@ static bool test_memslot_move_prepare(struct vm_data *data,
uint32_t guest_page_size = data->vm->page_size;
uint64_t movesrcgpa, movetestgpa;
+#ifdef __x86_64__
if (disable_slot_zap_quirk)
vm_enable_cap(data->vm, KVM_CAP_DISABLE_QUIRKS2, KVM_X86_QUIRK_SLOT_ZAP_ALL);
+#endif
movesrcgpa = vm_slot2gpa(data, data->nslots - 1);
@@ -971,11 +975,13 @@ static bool parse_args(int argc, char *argv[],
case 'd':
map_unmap_verify = true;
break;
+#ifdef __x86_64__
case 'q':
disable_slot_zap_quirk = true;
TEST_REQUIRE(kvm_check_cap(KVM_CAP_DISABLE_QUIRKS2) &
KVM_X86_QUIRK_SLOT_ZAP_ALL);
break;
+#endif
case 's':
targs->nslots = atoi_paranoid(optarg);
if (targs->nslots <= 1 && targs->nslots != -1) {
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20240930-kvm-build-breakage-a542f46d78f9
Best regards,
--
Mark Brown <broonie(a)kernel.org>