So far KSM can only be enabled by calling madvise for memory regions. To
be able to use KSM for more workloads, KSM needs to have the ability to be
enabled / disabled at the process / cgroup level.
Use case 1:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Use case 2:
The same interpreter can also be used for workloads where KSM brings no
benefit or even has overhead. We'd like to be able to enable KSM on a workload
by workload basis.
Use case 3:
With the madvise call sharing opportunities are only enabled for the current
process: it is a workload-local decision. A considerable number of sharing
opportunities may exist across multiple workloads or jobs (if they are part
of the same security domain). Only a higler level entity like a job scheduler
or container can know for certain if its running one or more instances of a
job. That job scheduler however doesn't have the necessary internal workload
knowledge to make targeted madvise calls.
Security concerns:
In previous discussions security concerns have been brought up. The problem is
that an individual workload does not have the knowledge about what else is
running on a machine. Therefore it has to be very conservative in what memory
areas can be shared or not. However, if the system is dedicated to running
multiple jobs within the same security domain, its the job scheduler that has
the knowledge that sharing can be safely enabled and is even desirable.
Performance:
Experiments with using UKSM have shown a capacity increase of around 20%.
1. New options for prctl system command
This patch series adds two new options to the prctl system call. The first
one allows to enable KSM at the process level and the second one to query the
setting.
The setting will be inherited by child processes.
With the above setting, KSM can be enabled for the seed process of a cgroup
and all processes in the cgroup will inherit the setting.
2. Changes to KSM processing
When KSM is enabled at the process level, the KSM code will iterate over all
the VMA's and enable KSM for the eligible VMA's.
When forking a process that has KSM enabled, the setting will be inherited by
the new child process.
In addition when KSM is disabled for a process, KSM will be disabled for the
VMA's where KSM has been enabled.
3. Add general_profit metric
The general_profit metric of KSM is specified in the documentation, but not
calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm.
4. Add more metrics to ksm_stat
This adds the process profit and ksm type metric to /proc/<pid>/ksm_stat.
5. Add more tests to ksm_tests
This adds an option to specify the merge type to the ksm_tests. This allows to
test madvise and prctl KSM. It also adds a new option to query if prctl KSM has
been enabled. It adds a fork test to verify that the KSM process setting is
inherited by client processes.
Changes:
- V5:
- When the prctl system call is invoked, mark all compatible VMA
as mergeable
- Instead of checcking during scan if VMA is mergeable, mark the VMA
mergeable when the VMA is created (in case the VMA is compatible)
- Remove earlier changes, they are no longer necessary
- Unset the flag MMF_VM_MERGE_ANY in gmap_mark_unmergeable().
- When unsetting the MMF_VM_MERGE_ANY flag with prctl, only unset the
flag
- Remove pages_volatile function (with the simplar general_profit calculation,
the function is no longer needed)
- Use simpler formula for calculation of general_profit
- V4:
- removing check in prctl for MMF_VM_MERGEABLE in PR_SET_MEMORY_MERGE
handling
- Checking for VM_MERGEABLE AND MMF_VM_MERGE_ANY to avoid chaning vm_flags
- This requires also checking that the vma is compatible. The
compatibility check is provided by a new helper
- processes which have set MMF_VM_MERGE_ANY, only need to call the
helper and not madvise.
- removed unmerge_vmas function, this function is no longer necessary,
clearing the MMF_VM_MERGE_ANY bit is sufficient
- V3:
- folded patch 1 - 6
- folded patch 7 - 14
- folded patch 15 - 19
- Expanded on the use cases in the cover letter
- Added a section on security concerns to the cover letter
- V2:
- Added use cases to the cover letter
- Removed the tracing patch from the patch series and posted it as an
individual patch
- Refreshed repo
Stefan Roesch (3):
mm: add new api to enable ksm per process
mm: add new KSM process and sysfs knobs
selftests/mm: add new selftests for KSM
Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +
Documentation/admin-guide/mm/ksm.rst | 8 +-
arch/s390/mm/gmap.c | 1 +
fs/proc/base.c | 5 +
include/linux/ksm.h | 36 ++-
include/linux/sched/coredump.h | 1 +
include/uapi/linux/prctl.h | 2 +
kernel/fork.c | 1 +
kernel/sys.c | 24 ++
mm/ksm.c | 143 ++++++++--
mm/mmap.c | 7 +
tools/include/uapi/linux/prctl.h | 2 +
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/ksm_tests.c | 254 +++++++++++++++---
14 files changed, 426 insertions(+), 68 deletions(-)
--
2.34.1
This patch set makes it possible to have synchronized dynamic ATU and FDB
entries on locked ports. As locked ports are not able to automatically
learn, they depend on userspace added entries, where userspace can add
static or dynamic entries. The lifetime of static entries are completely
dependent on userspace intervention, and thus not of interest here. We
are only concerned with dynamic entries, which can be added with a
command like:
bridge fdb replace ADDR dev <DEV> master dynamic
We choose only to support this feature on locked ports, as it involves
utilizing the CPU to handle ATU related switchcore events (typically
interrupts) and thus can result in significant performance loss if
exposed to heavy traffic.
On locked ports it is important for userspace to know when an authorized
station has become silent, hence not breaking the communication of a
station that has been authorized based on the MAC-Authentication Bypass
(MAB) scheme. Thus if the station keeps being active after authorization,
it will continue to have an open port as long as it is active. Only after
a silent period will it have to be reauthorized. As the ageing process in
the ATU is dependent on incoming traffic to the switchcore port, it is
necessary for the ATU to signal that an entry has aged out, so that the
FDB can be updated at the correct time.
This patch set includes a solution for the Marvell mv88e6xxx driver, where
for this driver we use the Hold-At-One feature so that an age-out
violation interrupt occurs when a station has been silent for the
system-set age time. The age out violation interrupt allows the switchcore
driver to remove both the ATU and the FDB entry at the same time.
It is up to the maintainers of other switchcore drivers to implement the
feature for their specific driver.
LOG:
V2: Ensure the port is locked when using the feature as we
must ensure that learning is enabled at all times for
the interrupts to occur. This was missed in the previous
version.
Instead of ignoring unsupported flags, ensure that
drivers are only called when supporting the feature.
As 'dynamic' flag is legacy, all drivers support it at
least by their previous handling.
Hans J. Schultz (6):
net: bridge: add dynamic flag to switchdev notifier
net: dsa: propagate flags down towards drivers
drivers: net: dsa: add fdb entry flags incoming to switchcore drivers
net: bridge: ensure FDB offloaded flag is handled as needed
net: dsa: mv88e6xxx: implementation of dynamic ATU entries
selftests: forwarding: add dynamic FDB test
drivers/net/dsa/b53/b53_common.c | 4 +-
drivers/net/dsa/b53/b53_priv.h | 4 +-
drivers/net/dsa/hirschmann/hellcreek.c | 4 +-
drivers/net/dsa/lan9303-core.c | 4 +-
drivers/net/dsa/lantiq_gswip.c | 4 +-
drivers/net/dsa/microchip/ksz_common.c | 6 +-
drivers/net/dsa/mt7530.c | 4 +-
drivers/net/dsa/mv88e6xxx/chip.c | 20 ++++--
drivers/net/dsa/mv88e6xxx/chip.h | 9 ++-
drivers/net/dsa/mv88e6xxx/global1_atu.c | 21 +++++++
drivers/net/dsa/mv88e6xxx/port.c | 6 +-
drivers/net/dsa/mv88e6xxx/switchdev.c | 61 +++++++++++++++++++
drivers/net/dsa/mv88e6xxx/switchdev.h | 5 ++
drivers/net/dsa/mv88e6xxx/trace.h | 5 ++
drivers/net/dsa/ocelot/felix.c | 4 +-
drivers/net/dsa/qca/qca8k-common.c | 4 +-
drivers/net/dsa/qca/qca8k.h | 4 +-
drivers/net/dsa/rzn1_a5psw.c | 4 +-
drivers/net/dsa/sja1105/sja1105_main.c | 11 ++--
include/net/dsa.h | 9 ++-
include/net/switchdev.h | 1 +
net/bridge/br_fdb.c | 5 +-
net/bridge/br_switchdev.c | 1 +
net/dsa/dsa.c | 6 ++
net/dsa/port.c | 28 +++++----
net/dsa/port.h | 8 +--
net/dsa/slave.c | 20 ++++--
net/dsa/switch.c | 26 +++++---
net/dsa/switch.h | 1 +
.../net/forwarding/bridge_locked_port.sh | 36 +++++++++++
30 files changed, 258 insertions(+), 67 deletions(-)
--
2.34.1
At present the kselftest header can't be used with nolibc since it makes
use of vprintf() which is not available in nolibc and seems like it would
be inappropriate to implement given the minimal system requirements and
environment intended for nolibc. This has resulted in some open coded
kselftests which use nolibc to test features that are supposed to be
controlled via libc and therefore better exercised in an environment with
no libc.
Rather than continue this let's factor out the I/O routines in kselftest.h
into a separate header file and provide a nolibc implementation which only
allows simple strings to be provided rather than full printf() support.
This is limiting but a great improvement on sharing no code at all.
As an example of using this I've updated the arm64 za-fork test to use
the standard kselftest.h.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 +++--------------
tools/testing/selftests/kselftest-nolibc.h | 93 ++++++++++++++++++
tools/testing/selftests/kselftest-std.h | 151 +++++++++++++++++++++++++++++
tools/testing/selftests/kselftest.h | 149 +++-------------------------
5 files changed, 272 insertions(+), 211 deletions(-)
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230405-kselftest-nolibc-cb2ce0446d09
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 10.03.23 19:28, Stefan Roesch wrote:
> This adds the general_profit KSM sysfs knob and the process profit metric
> and process merge type knobs to ksm_stat.
>
> 1) split off pages_volatile function
>
> This splits off the pages_volatile function. The next patch will
> use this function.
>
> 2) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 3) document general_profit sysfs knob
>
> 4) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 5) add ksm_merge_type() function
>
> This adds the ksm_merge_type function. The function returns the
> merge type for the process. For madvise it returns "madvise", for
> prctl it returns "process" and otherwise it returns "none".
>
> 6) mm: expose ksm process profit metric and merge type in ksm_stat
>
> This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
> The name of the value is ksm_merge_type. The documentation mentions
> the formula for the ksm process profit metric, however it does not
> calculate it. In addition the formula depends on the size of internal
> structures. So it makes sense to expose it.
>
> 7) document new procfs ksm knobs
>
Often, when you have to start making a list of things that a patch does,
it might make sense to split some of the items into separate patches
such that you can avoid lists and just explain in list-free text how the
pieces in the patch fit together.
I'd suggest splitting this patch into logical pieces. For example,
separating the general profit calculation/exposure from the per-mm
profit and the per-mm ksm type indication.
> Link: https://lkml.kernel.org/r/20230224044000.3084046-3-shr@devkernel.io
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
[...]
> KSM_ATTR_RO(pages_volatile);
>
> @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject *kobj,
> }
> KSM_ATTR_RO(zero_pages_sharing);
>
> +static ssize_t general_profit_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + long general_profit;
> + long all_rmap_items;
> +
> + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared +
> + ksm_pages_unshared + pages_volatile();
Are you sure you want to count a config knob (ksm_max_page_sharing) into
that formula? I yet have to digest what this calculation implies, but it
does feel odd.
Further, maybe just avoid pages_volatile(). Expanding the formula
(excluding ksm_max_page_sharing for now):
all_rmap = ksm_pages_shared + ksm_pages_unshared + pages_volatile();
-> expand pages_volatile() (ignoring the < 0 case)
all_rmap = ksm_pages_shared + ksm_pages_unshared + ksm_rmap_items -
ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared;
-> simplify
all_rmap = ksm_rmap_items + ksm_pages_sharing;
Or is the < 0 case relevant here?
> + general_profit = ksm_pages_sharing * PAGE_SIZE -
> + all_rmap_items * sizeof(struct ksm_rmap_item);
> +
> + return sysfs_emit(buf, "%ld\n", general_profit);
> +}
> +KSM_ATTR_RO(general_profit);
> +
> static ssize_t stable_node_dups_show(struct kobject *kobj,
> struct kobj_attribute *attr, char *buf)
> {
> @@ -3345,6 +3385,7 @@ static struct attribute *ksm_attrs[] = {
> &stable_node_dups_attr.attr,
> &stable_node_chains_prune_millisecs_attr.attr,
> &use_zero_pages_attr.attr,
> + &general_profit_attr.attr,
> NULL,
> };
>
The calculations (profit) don't include when KSM places the shared
zeropage I guess. Accounting that per MM (and eventually globally) is in
the works. [1]
[1]
https://lore.kernel.org/lkml/20230328153852.26c2577e4bd921c371c47a7e@linux-…
--
Thanks,
David / dhildenb
Commit 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
added an additional test, so the number passed to ksft_set_plan needs to
be bumped accordingly.
Also use ksft_finish to print results and exit. This will catch future
mismatches between ksft_set_plan and the number of tests being run.
Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/clone3/clone3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 4fce46afe6db..7b45c9854202 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -129,7 +129,7 @@ int main(int argc, char *argv[])
uid_t uid = getuid();
ksft_print_header();
- ksft_set_plan(17);
+ ksft_set_plan(18);
test_clone3_supported();
/* Just a simple clone3() should return 0.*/
--
2.39.1
Add unaligned descriptor test for frame size of 4001. Using an odd frame
size ensures that the end of the UMEM is not near a page boundary. This
allows testing descriptors that staddle the end of the UMEM but not a
page.
This test used to fail without the previous commit ("xsk: Add check for
unaligned descriptors that overrun UMEM").
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 24 ++++++++++++++++++++++++
tools/testing/selftests/bpf/xskxceiver.h | 1 +
2 files changed, 25 insertions(+)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 1a4bdd5aa78c..5a9691e942de 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -69,6 +69,7 @@
*/
#define _GNU_SOURCE
+#include <assert.h>
#include <fcntl.h>
#include <errno.h>
#include <getopt.h>
@@ -1876,6 +1877,29 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
test->ifobj_rx->umem->unaligned_mode = true;
testapp_invalid_desc(test);
break;
+ case TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME: {
+ u64 page_size, umem_size;
+
+ if (!hugepages_present(test->ifobj_tx)) {
+ ksft_test_result_skip("No 2M huge pages present.\n");
+ return;
+ }
+ test_spec_set_name(test, "UNALIGNED_INV_DESC_4K1_FRAME_SIZE");
+ /* Odd frame size so the UMEM doesn't end near a page boundary. */
+ test->ifobj_tx->umem->frame_size = 4001;
+ test->ifobj_rx->umem->frame_size = 4001;
+ test->ifobj_tx->umem->unaligned_mode = true;
+ test->ifobj_rx->umem->unaligned_mode = true;
+ /* This test exists to test descriptors that staddle the end of
+ * the UMEM but not a page.
+ */
+ page_size = sysconf(_SC_PAGESIZE);
+ umem_size = test->ifobj_tx->umem->num_frames * test->ifobj_tx->umem->frame_size;
+ assert(umem_size % page_size > PKT_SIZE);
+ assert(umem_size % page_size < page_size - PKT_SIZE);
+ testapp_invalid_desc(test);
+ break;
+ }
case TEST_TYPE_UNALIGNED:
if (!testapp_unaligned(test))
return;
diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h
index cc24ab72f3ff..919327807a4e 100644
--- a/tools/testing/selftests/bpf/xskxceiver.h
+++ b/tools/testing/selftests/bpf/xskxceiver.h
@@ -78,6 +78,7 @@ enum test_type {
TEST_TYPE_ALIGNED_INV_DESC,
TEST_TYPE_ALIGNED_INV_DESC_2K_FRAME,
TEST_TYPE_UNALIGNED_INV_DESC,
+ TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME,
TEST_TYPE_HEADROOM,
TEST_TYPE_TEARDOWN,
TEST_TYPE_BIDI,
--
2.39.2
Add new subtest to video_device_test to cover the VIDIOC_G_PRIORITY
and VIDIOC_S_PRIORITY ioctl calls. This test tries to set the priority
associated with the file descriptior via ioctl VIDIOC_S_PRIORITY
command from V4L2 API. After that, the test tries to get the new
priority via VIDIOC_G_PRIORITY ioctl command and compares the result
with the v4l2_priority it set before. At the end, the test restores the
old priority.
This test will increase the code coverage for video_device_test, so
I think it might be useful. Additionally, this patch will refactor the
video_device_test a little bit, according to the new functionality.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
V1 -> V2: Revert the test description change
.../selftests/media_tests/video_device_test.c | 111 +++++++++++++-----
1 file changed, 83 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/media_tests/video_device_test.c b/tools/testing/selftests/media_tests/video_device_test.c
index 0f6aef2e2593..2c44e115f2f0 100644
--- a/tools/testing/selftests/media_tests/video_device_test.c
+++ b/tools/testing/selftests/media_tests/video_device_test.c
@@ -37,45 +37,58 @@
#include <time.h>
#include <linux/videodev2.h>
-int main(int argc, char **argv)
+#define PRIORITY_MAX 4
+
+int priority_test(int fd)
{
- int opt;
- char video_dev[256];
- int count;
- struct v4l2_tuner vtuner;
- struct v4l2_capability vcap;
+ /* This test will try to update the priority associated with a file descriptor */
+
+ enum v4l2_priority old_priority, new_priority, priority_to_compare;
int ret;
- int fd;
+ int result = 0;
- if (argc < 2) {
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to get priority: %s\n", strerror(errno));
+ return -1;
+ }
+ new_priority = (old_priority + 1) % PRIORITY_MAX;
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &new_priority);
+ if (ret < 0) {
+ printf("Failed to set priority: %s\n", strerror(errno));
+ return -1;
+ }
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &priority_to_compare);
+ if (ret < 0) {
+ printf("Failed to get new priority: %s\n", strerror(errno));
+ result = -1;
+ goto cleanup;
+ }
+ if (priority_to_compare != new_priority) {
+ printf("Priority wasn't set - test failed\n");
+ result = -1;
}
- /* Process arguments */
- while ((opt = getopt(argc, argv, "d:")) != -1) {
- switch (opt) {
- case 'd':
- strncpy(video_dev, optarg, sizeof(video_dev) - 1);
- video_dev[sizeof(video_dev)-1] = '\0';
- break;
- default:
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
- }
+cleanup:
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to restore priority: %s\n", strerror(errno));
+ return -1;
}
+ return result;
+}
+
+int loop_test(int fd)
+{
+ int count;
+ struct v4l2_tuner vtuner;
+ struct v4l2_capability vcap;
+ int ret;
/* Generate random number of interations */
srand((unsigned int) time(NULL));
count = rand();
- /* Open Video device and keep it open */
- fd = open(video_dev, O_RDWR);
- if (fd == -1) {
- printf("Video Device open errno %s\n", strerror(errno));
- exit(-1);
- }
-
printf("\nNote:\n"
"While test is running, remove the device or unbind\n"
"driver and ensure there are no use after free errors\n"
@@ -98,4 +111,46 @@ int main(int argc, char **argv)
sleep(10);
count--;
}
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ int opt;
+ char video_dev[256];
+ int fd;
+ int test_result;
+
+ if (argc < 2) {
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+
+ /* Process arguments */
+ while ((opt = getopt(argc, argv, "d:")) != -1) {
+ switch (opt) {
+ case 'd':
+ strncpy(video_dev, optarg, sizeof(video_dev) - 1);
+ video_dev[sizeof(video_dev)-1] = '\0';
+ break;
+ default:
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+ }
+
+ /* Open Video device and keep it open */
+ fd = open(video_dev, O_RDWR);
+ if (fd == -1) {
+ printf("Video Device open errno %s\n", strerror(errno));
+ exit(-1);
+ }
+
+ test_result = priority_test(fd);
+ if (!test_result)
+ printf("Priority test - PASSED\n");
+ else
+ printf("Priority test - FAILED\n");
+
+ loop_test(fd);
}
--
2.34.1
Add new subtest to video_device_test to cover the VIDIOC_G_PRIORITY
and VIDIOC_S_PRIORITY ioctl calls. This test tries to set the priority
associated with the file descriptior via ioctl VIDIOC_S_PRIORITY
command from V4L2 API. After that, the test tries to get the new
priority via VIDIOC_G_PRIORITY ioctl command and compares the result
with the v4l2_priority it set before. At the end, the test restores the
old priority.
This test will increase the code coverage for video_device_test, so
I think it might be useful. Additionally, this patch will refactor the
video_device_test a little bit, according to the new functionality.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
.../selftests/media_tests/video_device_test.c | 131 +++++++++++++-----
1 file changed, 93 insertions(+), 38 deletions(-)
diff --git a/tools/testing/selftests/media_tests/video_device_test.c b/tools/testing/selftests/media_tests/video_device_test.c
index 0f6aef2e2593..5e6f65ad2ca3 100644
--- a/tools/testing/selftests/media_tests/video_device_test.c
+++ b/tools/testing/selftests/media_tests/video_device_test.c
@@ -13,18 +13,9 @@
* in the Kselftest run. This test should be run when hardware and driver
* that makes use of V4L2 API is present.
*
- * This test opens user specified Video Device and calls video ioctls in a
- * loop once every 10 seconds.
- *
* Usage:
* sudo ./video_device_test -d /dev/videoX
- *
- * While test is running, remove the device or unbind the driver and
- * ensure there are no use after free errors and other Oops in the
- * dmesg.
- * When possible, enable KaSan kernel config option for use-after-free
- * error detection.
-*/
+ */
#include <stdio.h>
#include <unistd.h>
@@ -37,45 +28,67 @@
#include <time.h>
#include <linux/videodev2.h>
-int main(int argc, char **argv)
+#define PRIORITY_MAX 4
+
+int priority_test(int fd)
{
- int opt;
- char video_dev[256];
- int count;
- struct v4l2_tuner vtuner;
- struct v4l2_capability vcap;
+ /* This test will try to update the priority associated with a file descriptor */
+
+ enum v4l2_priority old_priority, new_priority, priority_to_compare;
int ret;
- int fd;
+ int result = 0;
- if (argc < 2) {
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to get priority: %s\n", strerror(errno));
+ return -1;
+ }
+ new_priority = (old_priority + 1) % PRIORITY_MAX;
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &new_priority);
+ if (ret < 0) {
+ printf("Failed to set priority: %s\n", strerror(errno));
+ return -1;
+ }
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &priority_to_compare);
+ if (ret < 0) {
+ printf("Failed to get new priority: %s\n", strerror(errno));
+ result = -1;
+ goto cleanup;
+ }
+ if (priority_to_compare != new_priority) {
+ printf("Priority wasn't set - test failed\n");
+ result = -1;
}
- /* Process arguments */
- while ((opt = getopt(argc, argv, "d:")) != -1) {
- switch (opt) {
- case 'd':
- strncpy(video_dev, optarg, sizeof(video_dev) - 1);
- video_dev[sizeof(video_dev)-1] = '\0';
- break;
- default:
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
- }
+cleanup:
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to restore priority: %s\n", strerror(errno));
+ return -1;
}
+ return result;
+}
+
+int loop_test(int fd)
+{
+ /*
+ * This test opens user specified Video Device and calls video ioctls in a
+ * loop once every 10 seconds.
+ * While test is running, remove the device or unbind the driver and
+ * ensure there are no use after free errors and other Oops in the
+ * dmesg.
+ * When possible, enable KaSan kernel config option for use-after-free
+ * error detection.
+ */
+ int count;
+ struct v4l2_tuner vtuner;
+ struct v4l2_capability vcap;
+ int ret;
/* Generate random number of interations */
srand((unsigned int) time(NULL));
count = rand();
- /* Open Video device and keep it open */
- fd = open(video_dev, O_RDWR);
- if (fd == -1) {
- printf("Video Device open errno %s\n", strerror(errno));
- exit(-1);
- }
-
printf("\nNote:\n"
"While test is running, remove the device or unbind\n"
"driver and ensure there are no use after free errors\n"
@@ -98,4 +111,46 @@ int main(int argc, char **argv)
sleep(10);
count--;
}
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ int opt;
+ char video_dev[256];
+ int fd;
+ int test_result;
+
+ if (argc < 2) {
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+
+ /* Process arguments */
+ while ((opt = getopt(argc, argv, "d:")) != -1) {
+ switch (opt) {
+ case 'd':
+ strncpy(video_dev, optarg, sizeof(video_dev) - 1);
+ video_dev[sizeof(video_dev)-1] = '\0';
+ break;
+ default:
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+ }
+
+ /* Open Video device and keep it open */
+ fd = open(video_dev, O_RDWR);
+ if (fd == -1) {
+ printf("Video Device open errno %s\n", strerror(errno));
+ exit(-1);
+ }
+
+ test_result = priority_test(fd);
+ if (!test_result)
+ printf("Priority test - PASSED\n");
+ else
+ printf("Priority test - FAILED\n");
+
+ loop_test(fd);
}
--
2.34.1
This change fixes flakiness in the BIDIRECTIONAL test:
# [is_pkt_valid] expected length [60], got length [90]
not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL
When IPv6 is enabled, the interface will periodically send MLDv1 and
MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail
since it uses VETH0 for RX.
For other tests, this was not a problem since they only receive on VETH1
and IPv6 was already disabled on VETH0.
Fixes: a89052572ebb ("selftests/bpf: Xsk selftests framework")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/test_xsk.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh
index b077cf58f825..377fb157a57c 100755
--- a/tools/testing/selftests/bpf/test_xsk.sh
+++ b/tools/testing/selftests/bpf/test_xsk.sh
@@ -116,6 +116,7 @@ setup_vethPairs() {
ip link add ${VETH0} numtxqueues 4 numrxqueues 4 type veth peer name ${VETH1} numtxqueues 4 numrxqueues 4
if [ -f /proc/net/if_inet6 ]; then
echo 1 > /proc/sys/net/ipv6/conf/${VETH0}/disable_ipv6
+ echo 1 > /proc/sys/net/ipv6/conf/${VETH1}/disable_ipv6
fi
if [[ $verbose -eq 1 ]]; then
echo "setting up ${VETH1}"
--
2.39.2
Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after
receiving the last (valid) packet which is not the final packet sent in
the test (valid and invalid packets are sent in alternating fashion with
the final packet being invalid). Since the last packet may or may not
have been dropped already, both outcomes must be allowed.
This issue could also be fixed by making sure the last packet sent is
valid. This alternative is left as an exercise to the reader (or the
benevolent maintainers of this file).
This problem was quite visible on certain setups. On one machine this
failure was observed 50% of the time.
Also, remove a redundant assignment of pkt_stream->nb_pkts. This field
is already initialized by __pkt_stream_alloc.
Fixes: 27e934bec35b ("selftests: xsk: make stat tests not spin on getsockopt")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson(a)intel.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index a17655107a94..30a364283542 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -631,7 +631,6 @@ static struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem, u32 nb
if (!pkt_stream)
exit_with_error(ENOMEM);
- pkt_stream->nb_pkts = nb_pkts;
for (i = 0; i < nb_pkts; i++) {
pkt_set(umem, &pkt_stream->pkts[i], (i % umem->num_frames) * umem->frame_size,
pkt_len);
@@ -1124,7 +1123,14 @@ static int validate_rx_dropped(struct ifobject *ifobject)
if (err)
return TEST_FAILURE;
- if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2)
+ /* The receiver calls getsockopt after receiving the last (valid)
+ * packet which is not the final packet sent in this test (valid and
+ * invalid packets are sent in alternating fashion with the final
+ * packet being invalid). Since the last packet may or may not have
+ * been dropped already, both outcomes must be allowed.
+ */
+ if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 ||
+ stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 - 1)
return TEST_PASS;
return TEST_FAILURE;
--
2.39.2
On 10.03.23 19:28, Stefan Roesch wrote:
> Patch series "mm: process/cgroup ksm support", v3.
>
> So far KSM can only be enabled by calling madvise for memory regions. To
> be able to use KSM for more workloads, KSM needs to have the ability to be
> enabled / disabled at the process / cgroup level.
>
> Use case 1:
>
> The madvise call is not available in the programming language. An
> example for this are programs with forked workloads using a garbage
> collected language without pointers. In such a language madvise cannot
> be made available.
>
> In addition the addresses of objects get moved around as they are
> garbage collected. KSM sharing needs to be enabled "from the outside"
> for these type of workloads.
I guess the interpreter could enable it (like a memory allocator could
enable it for the whole heap). But I get that it's much easier to enable
this per-process, and eventually only when a lot of the same processes
are running in that particular environment.
>
> Use case 2:
>
> The same interpreter can also be used for workloads where KSM brings
> no benefit or even has overhead. We'd like to be able to enable KSM on
> a workload by workload basis.
Agreed. A per-process control is also helpful to identidy workloads
where KSM might be beneficial (and to which degree).
>
> Use case 3:
>
> With the madvise call sharing opportunities are only enabled for the
> current process: it is a workload-local decision. A considerable number
> of sharing opportuniites may exist across multiple workloads or jobs.
> Only a higler level entity like a job scheduler or container can know
> for certain if its running one or more instances of a job. That job
> scheduler however doesn't have the necessary internal worklaod knowledge
> to make targeted madvise calls.
>
> Security concerns:
>
> In previous discussions security concerns have been brought up. The
> problem is that an individual workload does not have the knowledge about
> what else is running on a machine. Therefore it has to be very
> conservative in what memory areas can be shared or not. However, if the
> system is dedicated to running multiple jobs within the same security
> domain, its the job scheduler that has the knowledge that sharing can be
> safely enabled and is even desirable.
>
> Performance:
>
> Experiments with using UKSM have shown a capacity increase of around
> 20%.
>
As raised, it would be great to include more details about the workload
where this particulalry helps (e.g., a lot of Django processes operating
in the same domain).
>
> 1. New options for prctl system command
>
> This patch series adds two new options to the prctl system call.
> The first one allows to enable KSM at the process level and the second
> one to query the setting.
>
> The setting will be inherited by child processes.
>
> With the above setting, KSM can be enabled for the seed process of a
> cgroup and all processes in the cgroup will inherit the setting.
>
> 2. Changes to KSM processing
>
> When KSM is enabled at the process level, the KSM code will iterate
> over all the VMA's and enable KSM for the eligible VMA's.
>
> When forking a process that has KSM enabled, the setting will be
> inherited by the new child process.
>
> In addition when KSM is disabled for a process, KSM will be disabled
> for the VMA's where KSM has been enabled.
Do we want to make MADV_MERGEABLE/MADV_UNMERGEABLE fail while the new
prctl is enabled for a process?
>
> 3. Add general_profit metric
>
> The general_profit metric of KSM is specified in the documentation,
> but not calculated. This adds the general profit metric to
> /sys/kernel/debug/mm/ksm.
>
> 4. Add more metrics to ksm_stat
>
> This adds the process profit and ksm type metric to
> /proc/<pid>/ksm_stat.
>
> 5. Add more tests to ksm_tests
>
> This adds an option to specify the merge type to the ksm_tests.
> This allows to test madvise and prctl KSM. It also adds a new option
> to query if prctl KSM has been enabled. It adds a fork test to verify
> that the KSM process setting is inherited by client processes.
>
> An update to the prctl(2) manpage has been proposed at [1].
>
> This patch (of 3):
>
> This adds a new prctl to API to enable and disable KSM on a per process
> basis instead of only at the VMA basis (with madvise).
>
> 1) Introduce new MMF_VM_MERGE_ANY flag
>
> This introduces the new flag MMF_VM_MERGE_ANY flag. When this flag
> is set, kernel samepage merging (ksm) gets enabled for all vma's of a
> process.
>
> 2) add flag to __ksm_enter
>
> This change adds the flag parameter to __ksm_enter. This allows to
> distinguish if ksm was called by prctl or madvise.
>
> 3) add flag to __ksm_exit call
>
> This adds the flag parameter to the __ksm_exit() call. This allows
> to distinguish if this call is for an prctl or madvise invocation.
>
> 4) invoke madvise for all vmas in scan_get_next_rmap_item
>
> If the new flag MMF_VM_MERGE_ANY has been set for a process, iterate
> over all the vmas and enable ksm if possible. For the vmas that can be
> ksm enabled this is only done once.
>
> 5) support disabling of ksm for a process
>
> This adds the ability to disable ksm for a process if ksm has been
> enabled for the process.
>
> 6) add new prctl option to get and set ksm for a process
>
> This adds two new options to the prctl system call
> - enable ksm for all vmas of a process (if the vmas support it).
> - query if ksm has been enabled for a process.
Did you consider, instead of handling MMF_VM_MERGE_ANY in a special way,
to instead make it reuse the existing MMF_VM_MERGEABLE/VM_MERGEABLE
infrastructure. Especially:
1) During prctl(MMF_VM_MERGE_ANY), set VM_MERGABLE on all applicable
compatible. Further, set MMF_VM_MERGEABLE and enter KSM if not
already set.
2) When creating a new, compatible VMA and MMF_VM_MERGE_ANY is set, set
VM_MERGABLE?
The you can avoid all runtime checks for compatible VMAs and only look
at the VM_MERGEABLE flag. In fact, the VM_MERGEABLE will be completely
expressive then for all VMAs. You don't need vma_ksm_mergeable() then.
Another thing to consider is interaction with arch/s390/mm/gmap.c:
s390x/kvm does not support KSM and it has to disable it for all VMAs. We
have to find a way to fence the prctl (for example, fail setting the
prctl after gmap_mark_unmergeable() ran, and make
gmap_mark_unmergeable() fail if the prctl ran -- or handle it gracefully
in some other way).
>
> Link: https://lkml.kernel.org/r/20230227220206.436662-1-shr@devkernel.io [1]
> Link: https://lkml.kernel.org/r/20230224044000.3084046-1-shr@devkernel.io
> Link: https://lkml.kernel.org/r/20230224044000.3084046-2-shr@devkernel.io
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
> include/linux/ksm.h | 14 ++++--
> include/linux/sched/coredump.h | 1 +
> include/uapi/linux/prctl.h | 2 +
> kernel/sys.c | 27 ++++++++++
> mm/ksm.c | 90 +++++++++++++++++++++++-----------
> 5 files changed, 101 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 7e232ba59b86..d38a05a36298 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -18,20 +18,24 @@
> #ifdef CONFIG_KSM
> int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, int advice, unsigned long *vm_flags);
> -int __ksm_enter(struct mm_struct *mm);
> -void __ksm_exit(struct mm_struct *mm);
> +int __ksm_enter(struct mm_struct *mm, int flag);
> +void __ksm_exit(struct mm_struct *mm, int flag);
>
> static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> {
> + if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags))
> + return __ksm_enter(mm, MMF_VM_MERGE_ANY);
> if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
> - return __ksm_enter(mm);
> + return __ksm_enter(mm, MMF_VM_MERGEABLE);
> return 0;
> }
>
> static inline void ksm_exit(struct mm_struct *mm)
> {
> - if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> - __ksm_exit(mm);
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + __ksm_exit(mm, MMF_VM_MERGE_ANY);
> + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> + __ksm_exit(mm, MMF_VM_MERGEABLE);
> }
>
> /*
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 0e17ae7fbfd3..0ee96ea7a0e9 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -90,4 +90,5 @@ static inline int get_dumpable(struct mm_struct *mm)
> #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
>
> +#define MMF_VM_MERGE_ANY 29
> #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 1312a137f7fb..759b3f53e53f 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -290,4 +290,6 @@ struct prctl_mm_map {
> #define PR_SET_VMA 0x53564d41
> # define PR_SET_VMA_ANON_NAME 0
>
> +#define PR_SET_MEMORY_MERGE 67
> +#define PR_GET_MEMORY_MERGE 68
> #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 495cd87d9bf4..edc439b1cae9 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -15,6 +15,7 @@
> #include <linux/highuid.h>
> #include <linux/fs.h>
> #include <linux/kmod.h>
> +#include <linux/ksm.h>
> #include <linux/perf_event.h>
> #include <linux/resource.h>
> #include <linux/kernel.h>
> @@ -2661,6 +2662,32 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (!capable(CAP_SYS_RESOURCE))
> + return -EPERM;
> +
> + if (arg2) {
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags))
> + error = __ksm_enter(me->mm, MMF_VM_MERGE_ANY);
Hm, I think this might be problematic if we alread called __ksm_enter()
via madvise(). Maybe we should really consider making MMF_VM_MERGE_ANY
set MMF_VM_MERGABLE instead. Like:
error = 0;
if(test_bit(MMF_VM_MERGEABLE, &me->mm->flags))
error = __ksm_enter(me->mm);
if (!error)
set_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + mmap_write_unlock(me->mm);
> + } else {
> + __ksm_exit(me->mm, MMF_VM_MERGE_ANY);
Hm, I'd prefer if we really only call __ksm_exit() when we really exit
the process. Is there a strong requirement to optimize disabling of KSM
or would it be sufficient to clear the MMF_VM_MERGE_ANY flag here?
Also, I wonder what happens if we have another VMA in that process that
has it enabled ..
Last but not least, wouldn't we want to do the same thing as
MADV_UNMERGEABLE and actually unmerge the KSM pages?
It smells like it could be simpler and more consistent to handle by
letting PR_SET_MEMORY_MERGE piggy-back on MMF_VM_MERGABLE/VM_MERGABLE
and mimic what ksm_madvise() does simply for all VMAs.
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -534,16 +534,58 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
> return (ret & VM_FAULT_OOM) ? -ENOMEM : 0;
> }
>
> +static bool vma_ksm_compatible(struct vm_area_struct *vma)
> +{
> + /*
> + * Be somewhat over-protective for now!
> + */
> + if (vma->vm_flags & (VM_MERGEABLE | VM_SHARED | VM_MAYSHARE |
> + VM_PFNMAP | VM_IO | VM_DONTEXPAND |
> + VM_HUGETLB | VM_MIXEDMAP))
> + return false; /* just ignore the advice */
That comment is kind-of stale and ksm_madvise() specific.
> +
The VM_MERGEABLE check is really only used for ksm_madvise() to return
immediately. I suggest keeping it in ksm_madvise() -- "Already enabled".
Returning "false" in that case looks wrong (it's not broken because you
do an early check in vma_ksm_mergeable(), it's just semantically weird).
--
Thanks,
David / dhildenb
The va_128TBswitch selftest is designed and implemented for PowerPC and
x86 architectures which support a 128TB switch, up to 256TB of virtual
address space and hugepage sizes of 16MB and 2MB respectively. Arm64
platforms on the other hand support a 256Tb switch, up to 4PB of virtual
address space and a default hugepage size of 512MB when 64k pagesize is
enabled.
These architectural differences require introducing support for arm64
platforms, after which a more generic naming convention is suggested.
The in code comments are amended to provide a more platform independent
explanation of the working of the code and nr_hugepages are configured
as required. Finally, the file running the testcase is modified in order
to prevent skipping of hugetlb testcases of va_high_addr_switch.
This series has been tested on 6.3.0-rc3 kernel, both on arm64 and x86
platforms.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Chaitanya S Prakash (5):
selftests/mm: Add support for arm64 platform on va switch
selftests/mm: Rename va_128TBswitch to va_high_addr_switch
selftests/mm: Add platform independent in code comments
selftests/mm: Configure nr_hugepages for arm64
selftests/mm: Run hugetlb testcases of va switch
tools/testing/selftests/mm/Makefile | 4 +-
tools/testing/selftests/mm/run_vmtests.sh | 12 +++++-
...va_128TBswitch.c => va_high_addr_switch.c} | 41 +++++++++++++++----
..._128TBswitch.sh => va_high_addr_switch.sh} | 6 ++-
4 files changed, 49 insertions(+), 14 deletions(-)
rename tools/testing/selftests/mm/{va_128TBswitch.c => va_high_addr_switch.c} (86%)
rename tools/testing/selftests/mm/{va_128TBswitch.sh => va_high_addr_switch.sh} (89%)
--
2.30.2
Hi,
No progress on this bug report, so it is still unpatched in 6.3-rc5 so I am
submitting again.
Please see the relevant data at the bottom:
On 27. 01. 2023. 19:36, Mirsad Goran Todorovac wrote:
> Hi all,
>
> I came across a memory leak with the vanilla mainline Torvalds tree kernel
> with MGLRU and CONFIG_KMEMLEAK enabled:
>
> unreferenced object 0xffff8d7c92ad5180 (size 192):
> comm "ftracetest", pid 2738512, jiffies 4335176273 (age 4842.976s)
> hex dump (first 32 bytes):
> c0 59 ad 92 7c 8d ff ff 60 dd d7 31 7c 8d ff ff .Y..|...`..1|...
> 60 55 df 97 ff ff ff ff 09 00 02 00 00 00 00 00 `U..............
> backtrace:
> [<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
> [<ffffffff96556dda>] kmalloc_trace+0x2a/0xa0
> [<ffffffff964382fc>] tracing_log_err+0x16c/0x1b0
> [<ffffffff96451963>] append_filter_err+0x113/0x1d0
> [<ffffffff96453c0a>] create_event_filter+0xba/0xe0
> [<ffffffff96454b18>] set_trigger_filter+0x98/0x160
> [<ffffffff96456554>] event_trigger_parse+0x104/0x180
> [<ffffffff96455823>] trigger_process_regex+0xc3/0x110
> [<ffffffff964558f7>] event_trigger_write+0x77/0xe0
> [<ffffffff96623a41>] vfs_write+0xd1/0x420
> [<ffffffff9662413b>] ksys_write+0x7b/0x100
> [<ffffffff966241e9>] __x64_sys_write+0x19/0x20
> [<ffffffff971c9188>] do_syscall_64+0x58/0x80
> [<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> unreferenced object 0xffff8d7b076be000 (size 32):
> comm "ftracetest", pid 2738512, jiffies 4335176273 (age 4842.976s)
> hex dump (first 32 bytes):
> 0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 61 0a 00 00 . Command: a...
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
> [<ffffffff96557a8d>] __kmalloc+0x4d/0xd0
> [<ffffffff96438314>] tracing_log_err+0x184/0x1b0
> [<ffffffff96451963>] append_filter_err+0x113/0x1d0
> [<ffffffff96453c0a>] create_event_filter+0xba/0xe0
> [<ffffffff96454b18>] set_trigger_filter+0x98/0x160
> [<ffffffff96456554>] event_trigger_parse+0x104/0x180
> [<ffffffff96455823>] trigger_process_regex+0xc3/0x110
> [<ffffffff964558f7>] event_trigger_write+0x77/0xe0
> [<ffffffff96623a41>] vfs_write+0xd1/0x420
> [<ffffffff9662413b>] ksys_write+0x7b/0x100
> [<ffffffff966241e9>] __x64_sys_write+0x19/0x20
> [<ffffffff971c9188>] do_syscall_64+0x58/0x80
> [<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> unreferenced object 0xffff8d7c92ad59c0 (size 192):
> comm "ftracetest", pid 2738512, jiffies 4335176280 (age 4843.088s)
> hex dump (first 32 bytes):
> c0 5c ad 92 7c 8d ff ff 80 51 ad 92 7c 8d ff ff .\..|....Q..|...
> 60 55 df 97 ff ff ff ff 01 00 0b 00 00 00 00 00 `U..............
> backtrace:
> [<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
> [<ffffffff96556dda>] kmalloc_trace+0x2a/0xa0
> [<ffffffff964382fc>] tracing_log_err+0x16c/0x1b0
> [<ffffffff96451963>] append_filter_err+0x113/0x1d0
> [<ffffffff96453c0a>] create_event_filter+0xba/0xe0
> [<ffffffff96454b18>] set_trigger_filter+0x98/0x160
> [<ffffffff96456554>] event_trigger_parse+0x104/0x180
> [<ffffffff96455823>] trigger_process_regex+0xc3/0x110
> [<ffffffff964558f7>] event_trigger_write+0x77/0xe0
> [<ffffffff96623a41>] vfs_write+0xd1/0x420
> [<ffffffff9662413b>] ksys_write+0x7b/0x100
> [<ffffffff966241e9>] __x64_sys_write+0x19/0x20
> [<ffffffff971c9188>] do_syscall_64+0x58/0x80
> [<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>
> The bug was noticed on Lenovo desktop 10TX000VCR (LENOVO_MT_10TX_BU_Lenovo_FM_V530S-07ICB)
> running AlmaLinux 8.7 (Stone Smilodon), a CentOS clone, with the compiler:
>
> mtodorov@domac:~/linux/kernel/linux_torvalds$ gcc --version
> gcc (Debian 8.3.0-6) 8.3.0
> Copyright (C) 2018 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> mtodorov@domac:~/linux/kernel/linux_torvalds$
>
> Bisecting gave the following culprit commit:
>
> git bisect good a92ce570c81dc0feaeb12a429b4bc65686d17967
> # good: [c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a] ipmi/watchdog: use strscpy() to instead of strncpy()
> git bisect good c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a
> # good: [90b12f423d3c8a89424c7bdde18e1923dfd0941e] Merge tag 'for-linus-6.2-1' of https://github.com/cminyard/linux-ipmi
> git bisect good 90b12f423d3c8a89424c7bdde18e1923dfd0941e
> # first bad commit: [71946a25f357a51dcce849367501d7fb04c0465b] Merge tag 'mmc-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
>
> The commit was merged on December 13th 2022.
>
> It is a huge commit.
>
> The selftests/ftrace/ftracetest triggers this leak, sometimes several times in a run.
> ftracetest requires root permission to run, but I haven't yet realised whether a non-superuser
> could devise an automated script to abuse this leak exhausting all kernel's memory.
>
> Non-root user gets a EPERM error when trying to access /proc/sys/kernel internals:
>
> [marvin@pc-mtodorov linux_torvalds]$ tools/testing/selftests/ftrace/ftracetest
> Error: this must be run by root user
> tools/testing/selftests/ftrace/ftracetest: line 46: /proc/sys/kernel/sched_rt_runtime_us: Permission denied
> [marvin@pc-mtodorov linux_torvalds]$
>
> Hope this helps.
>
> According to the Code of Conduct, I have Cc:-ed maintainers from get_maintainers.pl and
> I will add Thorsten because this is sort of a regression :-)
The debug output is like follows:
unreferenced object 0xffff93a3dc2d1e18 (size 192):
comm "ftracetest", pid 12451, jiffies 4295087353 (age 463.476s)
hex dump (first 32 bytes):
20 08 2d dc a3 93 ff ff c0 bd 5d cd a3 93 ff ff .-.......].....
c0 bf 85 b6 ff ff ff ff 09 00 02 00 00 00 00 00 ................
backtrace:
[<ffffffffb4afb23c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffffb4b02b19>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffffb4a7693e>] kmalloc_trace+0x2e/0xc0
[<ffffffffb493a8fb>] tracing_log_err+0x18b/0x1d0
[<ffffffffb4959049>] append_filter_err.isra.13+0x119/0x190
[<ffffffffb495a89f>] create_filter+0xbf/0xe0
[<ffffffffb495ab10>] create_event_filter+0x10/0x20
[<ffffffffb495c040>] set_trigger_filter+0xa0/0x180
[<ffffffffb495d745>] event_trigger_parse+0xf5/0x160
[<ffffffffb495c889>] trigger_process_regex+0xc9/0x120
[<ffffffffb495c976>] event_trigger_write+0x86/0xf0
[<ffffffffb4b52dc2>] vfs_write+0xf2/0x520
[<ffffffffb4b533d8>] ksys_write+0x68/0xe0
[<ffffffffb4b5347e>] __x64_sys_write+0x1e/0x30
[<ffffffffb586619c>] do_syscall_64+0x5c/0x90
[<ffffffffb5a000ae>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff93a3873dda20 (size 32):
comm "ftracetest", pid 12451, jiffies 4295087353 (age 463.476s)
hex dump (first 32 bytes):
0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 61 0a 00 00 . Command: a...
00 00 cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
backtrace:
[<ffffffffb4afb23c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffffb4b02b19>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffffb4a77785>] __kmalloc+0x55/0x160
[<ffffffffb493a913>] tracing_log_err+0x1a3/0x1d0
[<ffffffffb4959049>] append_filter_err.isra.13+0x119/0x190
[<ffffffffb495a89f>] create_filter+0xbf/0xe0
[<ffffffffb495ab10>] create_event_filter+0x10/0x20
[<ffffffffb495c040>] set_trigger_filter+0xa0/0x180
[<ffffffffb495d745>] event_trigger_parse+0xf5/0x160
[<ffffffffb495c889>] trigger_process_regex+0xc9/0x120
[<ffffffffb495c976>] event_trigger_write+0x86/0xf0
[<ffffffffb4b52dc2>] vfs_write+0xf2/0x520
[<ffffffffb4b533d8>] ksys_write+0x68/0xe0
[<ffffffffb4b5347e>] __x64_sys_write+0x1e/0x30
[<ffffffffb586619c>] do_syscall_64+0x5c/0x90
[<ffffffffb5a000ae>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Please find the complete debug info at the URL:
https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/ftracetest/
Bisect log is [edited]:
> git bisect good a92ce570c81dc0feaeb12a429b4bc65686d17967
> # good: [c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a] ipmi/watchdog: use strscpy() to instead of strncpy()
> git bisect good c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a
> # good: [90b12f423d3c8a89424c7bdde18e1923dfd0941e] Merge tag 'for-linus-6.2-1' of https://github.com/cminyard/linux-ipmi
> git bisect good 90b12f423d3c8a89424c7bdde18e1923dfd0941e
> # first bad commit: [71946a25f357a51dcce849367501d7fb04c0465b] Merge tag 'mmc-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
>
> The commit was merged on December 13th 2022.
The amount of applied diffs in the culprit commit 71946a25f357a51dcce849367501d7fb04c0465b
prevents me from bisecting further - I do not know which changes depend of which, and which
can be tested independently.
Hopefully I might come up with a reproducer, but I need some feedback first. Maybe there
are ways to narrow down the lines of code that could have caused the leaks, yet I am
completely new to the kernel/trace subtree.
Apologies for not Cc:ing Ulf nine weeks ago, but it was an omission, not deliberate act.
Best regards,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
"I see something approaching fast ... Will it be friends with me?"
o6irnndpcv7 writes via Kernel.org Bugzilla:
Hello and good day!
I think I found a missing dependency.
In case of setting CONFIG_FIPS_SIGNATURE_SELFTEST, CONFIG_CRYPTO_SHA256 also needs to be set. But not as module.
Failing to do so results in an early kernel panic during boot.
Tested on linux-6.1.12-gentoo and linux-6.1.19-gentoo.
Thanks,
sephora
View: https://bugzilla.kernel.org/show_bug.cgi?id=217293#c0
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)
Hi All,
In TDX guest, the attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the
guest.
The TDX guest attestation process consists of two steps:
1. TDREPORT generation
2. Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest
measurement data in the format of TDREPORT which is further used to
validate the authenticity of the TDX guest. The second step involves
sending the TDREPORT to a Quoting Enclave (QE) server to generate a
remotely verifiable Quote. TDREPORT by design can only be verified on
the local platform. To support remote verification of the TDREPORT,
TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT
locally and convert it to a remotely verifiable Quote. Although
attestation software can use communication methods like TCP/IP or
vsock to send the TDREPORT to QE, not all platforms support these
communication models. So TDX GHCI specification [1] defines a method
for Quote generation via hypercalls. Please check the discussion from
Google [2] and Alibaba [3] which clarifies the need for hypercall based
Quote generation support. This patch set adds this support.
Support for TDREPORT generation already exists in the TDX guest driver.
This patchset extends the same driver to add the Quote generation
support.
Following are the details of the patch set:
Patch 1/3 -> Adds event notification IRQ support.
Patch 2/3 -> Adds Quote generation support.
Patch 3/3 -> Adds selftest support for Quote generation feature.
[1] https://cdrdv2.intel.com/v1/dl/getContent/726790, section titled "TDG.VP.VMCALL<GetQuote>".
[2] https://lore.kernel.org/lkml/CAAYXXYxxs2zy_978GJDwKfX5Hud503gPc8=1kQ-+JwG_k…
[3] https://lore.kernel.org/lkml/a69faebb-11e8-b386-d591-dbd08330b008@linux.ali…
Kuppuswamy Sathyanarayanan (3):
x86/tdx: Add TDX Guest event notify interrupt support
virt: tdx-guest: Add Quote generation support
selftests/tdx: Test GetQuote TDX attestation feature
Documentation/virt/coco/tdx-guest.rst | 11 +
arch/x86/coco/tdx/tdx.c | 203 +++++++++++++++
arch/x86/include/asm/tdx.h | 8 +
drivers/virt/coco/tdx-guest/tdx-guest.c | 249 ++++++++++++++++++-
include/uapi/linux/tdx-guest.h | 44 ++++
tools/testing/selftests/tdx/tdx_guest_test.c | 68 ++++-
6 files changed, 575 insertions(+), 8 deletions(-)
--
2.34.1
This change fixes flakiness in the BIDIRECTIONAL test:
# [is_pkt_valid] expected length [60], got length [90]
not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL
When IPv6 is enabled, the interface will periodically send MLDv1 and
MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail
since it uses VETH0 for RX.
For other tests, this was not a problem since they only receive on VETH1
and IPv6 was already disabled on VETH0.
Fixes: a89052572ebb ("selftests/bpf: Xsk selftests framework")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/test_xsk.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh
index b077cf58f825..377fb157a57c 100755
--- a/tools/testing/selftests/bpf/test_xsk.sh
+++ b/tools/testing/selftests/bpf/test_xsk.sh
@@ -116,6 +116,7 @@ setup_vethPairs() {
ip link add ${VETH0} numtxqueues 4 numrxqueues 4 type veth peer name ${VETH1} numtxqueues 4 numrxqueues 4
if [ -f /proc/net/if_inet6 ]; then
echo 1 > /proc/sys/net/ipv6/conf/${VETH0}/disable_ipv6
+ echo 1 > /proc/sys/net/ipv6/conf/${VETH1}/disable_ipv6
fi
if [[ $verbose -eq 1 ]]; then
echo "setting up ${VETH1}"
--
2.39.2
All related to the pages code, and the latter are reproducible with a
simple test.
Jason Gunthorpe (4):
iommufd: Check for uptr overflow
iommufd: Fix unpinning of pages when an access is present
iommufd: Do not corrupt the pfn list when doing batch carry
iommufd/selftest: Cover domain unmap with huge pages and access
drivers/iommu/iommufd/pages.c | 16 ++++++++++--
tools/testing/selftests/iommu/iommufd.c | 34 +++++++++++++++++++++++++
2 files changed, 48 insertions(+), 2 deletions(-)
base-commit: 9c7d518b9b71f4d5ca3d12952cda3417ac6126c4
--
2.40.0
Dzień dobry,
chcielibyśmy zapewnić Państwu kompleksowe rozwiązania, jeśli chodzi o system monitoringu GPS.
Precyzyjne monitorowanie pojazdów na mapach cyfrowych, śledzenie ich parametrów eksploatacyjnych w czasie rzeczywistym oraz kontrola paliwa to kluczowe funkcjonalności naszego systemu.
Organizowanie pracy pracowników jest dzięki temu prostsze i bardziej efektywne, a oszczędności i optymalizacja w zakresie ponoszonych kosztów, mają dla każdego przedsiębiorcy ogromne znaczenie.
Dopasujemy naszą ofertę do Państwa oczekiwań i potrzeb organizacji. Czy moglibyśmy porozmawiać o naszej propozycji?
Pozdrawiam
Krystian Wieczorek
Hi all,
This patch series adds support to run tests via kunit_tool on the
SuperH-based virtualized r2d platform. As r2d uses the second serial
port as the console, this needs a small modification of the core
infrastructure.
Thanks for your comments!
Geert Uytterhoeven (2):
kunit: tool: Add support for overriding the QEMU serial port
kunit: tool: Add support for SH under QEMU
tools/testing/kunit/kunit_kernel.py | 3 ++-
tools/testing/kunit/qemu_config.py | 1 +
tools/testing/kunit/qemu_configs/sh.py | 17 +++++++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/kunit/qemu_configs/sh.py
--
2.34.1
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
The default timeout for kselftests is 45 seconds, but that isn't enough
time to run pcm-test when there are many PCMs on the device, nor for
mixer-test when slower control buses and fancier CODECs are present.
As data points, running pcm-test on mt8192-asurada-spherion takes about
1m15s, and mixer-test on rk3399-gru-kevin takes about 2m.
Set the timeout to 4 minutes to allow both pcm-test and mixer-test to
run to completion with some slack.
Reviewed-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Changes in v2:
- Reduced timeout from 10 to 4 minutes
- Tweaked commit message to also mention mixer-test and run time for
mixer-test on rk3399-gru-kevin
tools/testing/selftests/alsa/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/alsa/settings
diff --git a/tools/testing/selftests/alsa/settings b/tools/testing/selftests/alsa/settings
new file mode 100644
index 000000000000..b478e684846a
--- /dev/null
+++ b/tools/testing/selftests/alsa/settings
@@ -0,0 +1 @@
+timeout=240
--
2.39.0
Add unaligned descriptor test for frame size of 4001. Using an odd frame
size ensures that the end of the UMEM is not near a page boundary. This
allows testing descriptors that staddle the end of the UMEM but not a
page.
This test used to fail without the previous commit ("xsk: Add check for
unaligned descriptors that overrun UMEM").
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 25 ++++++++++++++++++++++++
tools/testing/selftests/bpf/xskxceiver.h | 1 +
2 files changed, 26 insertions(+)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 1a4bdd5aa78c..9b9efd0e0a4c 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -69,6 +69,7 @@
*/
#define _GNU_SOURCE
+#include <assert.h>
#include <fcntl.h>
#include <errno.h>
#include <getopt.h>
@@ -1876,6 +1877,30 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
test->ifobj_rx->umem->unaligned_mode = true;
testapp_invalid_desc(test);
break;
+ case TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME:
+ if (!hugepages_present(test->ifobj_tx)) {
+ ksft_test_result_skip("No 2M huge pages present.\n");
+ return;
+ }
+ test_spec_set_name(test, "UNALIGNED_INV_DESC_4K1_FRAME_SIZE");
+ /* Odd frame size so the UMEM doesn't end near a page boundary. */
+ test->ifobj_tx->umem->frame_size = 4001;
+ test->ifobj_rx->umem->frame_size = 4001;
+ test->ifobj_tx->umem->unaligned_mode = true;
+ test->ifobj_rx->umem->unaligned_mode = true;
+ /* This test exists to test descriptors that staddle the end of
+ * the UMEM but not a page.
+ */
+ {
+ u64 umem_size = test->ifobj_tx->umem->num_frames *
+ test->ifobj_tx->umem->frame_size;
+ u64 page_size = sysconf(_SC_PAGESIZE);
+
+ assert(umem_size % page_size > PKT_SIZE);
+ assert(umem_size % page_size < page_size - PKT_SIZE);
+ }
+ testapp_invalid_desc(test);
+ break;
case TEST_TYPE_UNALIGNED:
if (!testapp_unaligned(test))
return;
diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h
index cc24ab72f3ff..919327807a4e 100644
--- a/tools/testing/selftests/bpf/xskxceiver.h
+++ b/tools/testing/selftests/bpf/xskxceiver.h
@@ -78,6 +78,7 @@ enum test_type {
TEST_TYPE_ALIGNED_INV_DESC,
TEST_TYPE_ALIGNED_INV_DESC_2K_FRAME,
TEST_TYPE_UNALIGNED_INV_DESC,
+ TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME,
TEST_TYPE_HEADROOM,
TEST_TYPE_TEARDOWN,
TEST_TYPE_BIDI,
--
2.39.2
Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after
receiving the last (valid) packet which is not the final packet sent in
the test (valid and invalid packets are sent in alternating fashion with
the final packet being invalid). Since the last packet may or may not
have been dropped already, both outcomes must be allowed.
This issue could also be fixed by making sure the last packet sent is
valid. This alternative is left as an exercise to the reader (or the
benevolent maintainers of this file).
This problem was quite visible on certain setups. On one machine this
failure was observed 50% of the time.
Also, remove a redundant assignment of pkt_stream->nb_pkts. This field
is already initialized by __pkt_stream_alloc.
Fixes: 27e934bec35b ("selftests: xsk: make stat tests not spin on getsockopt")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 34a1f32fe752..1a4bdd5aa78c 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -633,7 +633,6 @@ static struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem, u32 nb
if (!pkt_stream)
exit_with_error(ENOMEM);
- pkt_stream->nb_pkts = nb_pkts;
for (i = 0; i < nb_pkts; i++) {
pkt_set(umem, &pkt_stream->pkts[i], (i % umem->num_frames) * umem->frame_size,
pkt_len);
@@ -1141,7 +1140,14 @@ static int validate_rx_dropped(struct ifobject *ifobject)
if (err)
return TEST_FAILURE;
- if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2)
+ /* The receiver calls getsockopt after receiving the last (valid)
+ * packet which is not the final packet sent in this test (valid and
+ * invalid packets are sent in alternating fashion with the final
+ * packet being invalid). Since the last packet may or may not have
+ * been dropped already, both outcomes must be allowed.
+ */
+ if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 ||
+ stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 - 1)
return TEST_PASS;
return TEST_FAILURE;
--
2.39.2
We're testing usage of vsock as a way to redirect guest-local UDS
requests to the host and this patch series greatly improves the
performance of such a setup.
Compared to copying packets via userspace, this improves throughput by
121% in basic testing.
Tested as follows.
Setup: guest unix dgram sender -> guest vsock redirector -> host vsock
server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
"socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs
*Note: these numbers are from RFC v1
Only the virtio transport has been tested. The loopback transport was
used in writing bpf/selftests, but not thoroughly tested otherwise.
This series requires the skb patch.
Changes in v4:
- af_vsock: fix parameter alignment in vsock_dgram_recvmsg()
- af_vsock: add TCP_ESTABLISHED comment in vsock_dgram_connect()
- vsock/bpf: change ret type to bool
Changes in v3:
- vsock/bpf: Refactor wait logic in vsock_bpf_recvmsg() to avoid
backwards goto
- vsock/bpf: Check psock before acquiring slock
- vsock/bpf: Return bool instead of int of 0 or 1
- vsock/bpf: Wrap macro args __sk/__psock in parens
- vsock/bpf: Place comment trailer */ on separate line
Changes in v2:
- vsock/bpf: rename vsock_dgram_* -> vsock_*
- vsock/bpf: change sk_psock_{get,put} and {lock,release}_sock() order
to minimize slock hold time
- vsock/bpf: use "new style" wait
- vsock/bpf: fix bug in wait log
- vsock/bpf: add check that recvmsg sk_type is one dgram, seqpacket, or
stream. Return error if not one of the three.
- virtio/vsock: comment __skb_recv_datagram() usage
- virtio/vsock: do not init copied in read_skb()
- vsock/bpf: add ifdef guard around struct proto in dgram_recvmsg()
- selftests/bpf: add vsock loopback config for aarch64
- selftests/bpf: add vsock loopback config for s390x
- selftests/bpf: remove vsock device from vmtest.sh qemu machine
- selftests/bpf: remove CONFIG_VIRTIO_VSOCKETS=y from config.x86_64
- vsock/bpf: move transport-related (e.g., if (!vsk->transport)) checks
out of fast path
Signed-off-by: Bobby Eshleman <bobby.eshleman(a)bytedance.com>
---
Bobby Eshleman (3):
vsock: support sockmap
selftests/bpf: add vsock to vmtest.sh
selftests/bpf: add a test case for vsock sockmap
drivers/vhost/vsock.c | 1 +
include/linux/virtio_vsock.h | 1 +
include/net/af_vsock.h | 17 ++
net/vmw_vsock/Makefile | 1 +
net/vmw_vsock/af_vsock.c | 64 +++++++-
net/vmw_vsock/virtio_transport.c | 2 +
net/vmw_vsock/virtio_transport_common.c | 25 +++
net/vmw_vsock/vsock_bpf.c | 174 +++++++++++++++++++++
net/vmw_vsock/vsock_loopback.c | 2 +
tools/testing/selftests/bpf/config.aarch64 | 2 +
tools/testing/selftests/bpf/config.s390x | 3 +
tools/testing/selftests/bpf/config.x86_64 | 3 +
.../selftests/bpf/prog_tests/sockmap_listen.c | 163 +++++++++++++++++++
13 files changed, 452 insertions(+), 6 deletions(-)
---
base-commit: e5b42483ccce50d5b957f474fd332afd4ef0c27b
change-id: 20230327-vsock-sockmap-30b090c70cd1
Best regards,
--
Bobby Eshleman <bobby.eshleman(a)bytedance.com>
Hi all,
This is a cleanup series to consolidate a common signal setup code.
Right now quite a bit of duplicated code is there in an unorganized
way. Here is a rework of that signal-related code:
(1) Consolidate the signal handler helpers
They have been exactly copied everywhere. Place them in the shared
code. Then, remove those duplicates.
(2) Simplify altstack code
Most cases require just a usable alternate stack. So, there is a
chance to simplify them all. Abstract the entire setup code to one
setup call. Then, it can reduce the amount of code there.
For testing sigaltstack() specifically, another helper is provided
that excludes the syscall part.
The series also includes some preparatory changes for them:
* Along with the rework, some existing problem was uncovered. A couple
of tests look to free the altstack memory even before the signal
delivery. Adjust the memory cleanup to resolve this issue.
* Also resolve a define conflict separately before including the
refactored header.
Then, there is another selftest fix that I posted:
https://lore.kernel.org/lkml/20230330233520.21937-1-chang.seok.bae@intel.co…
which has a conflict with this. As the fix should go first, this
cleanup series is based on it.
FWIW, at the moment, the new x86 selftest cases -- lam and
test_shadow_stack do not conflict with this.
Here is the repository where this series can be found:
git://github.com/intel/amx-linux.git selftest-signal
Thanks,
Chang
Chang S. Bae (4):
selftests/x86: Fix the altstack free
selftests/x86/mov_ss_trap: Include processor-flags.h
selftests/x86: Consolidate signal handler helpers
selftests/x86: Refactor altstack setup code
tools/testing/selftests/x86/Makefile | 16 ++-
tools/testing/selftests/x86/amx.c | 67 +++--------
.../selftests/x86/corrupt_xstate_header.c | 15 +--
tools/testing/selftests/x86/entry_from_vm86.c | 25 +---
tools/testing/selftests/x86/fsgsbase.c | 25 +---
tools/testing/selftests/x86/helpers.c | 110 ++++++++++++++++++
tools/testing/selftests/x86/helpers.h | 10 ++
tools/testing/selftests/x86/ioperm.c | 26 +----
tools/testing/selftests/x86/iopl.c | 26 +----
tools/testing/selftests/x86/ldt_gdt.c | 19 +--
tools/testing/selftests/x86/mov_ss_trap.c | 26 +----
tools/testing/selftests/x86/ptrace_syscall.c | 24 +---
tools/testing/selftests/x86/sigaltstack.c | 67 +++--------
tools/testing/selftests/x86/sigreturn.c | 35 +-----
.../selftests/x86/single_step_syscall.c | 36 +-----
.../testing/selftests/x86/syscall_arg_fault.c | 24 +---
tools/testing/selftests/x86/syscall_nt.c | 13 ---
tools/testing/selftests/x86/sysret_rip.c | 24 +---
tools/testing/selftests/x86/test_vsyscall.c | 13 ---
tools/testing/selftests/x86/unwind_vdso.c | 13 ---
20 files changed, 205 insertions(+), 409 deletions(-)
create mode 100644 tools/testing/selftests/x86/helpers.c
--
2.17.1
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Include <sys/mman.h> for tests.
- Implement FILE* in terms of integer pointers.
- Provide fdopen() and fileno().
- Link to v1: https://lore.kernel.org/lkml/20230328-nolibc-printf-test-v1-0-d7290ec893dd@…
---
Thomas Weißschuh (3):
tools/nolibc: add wrapper for memfd_create
tools/nolibc: implement fd-based FILE streams
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 60 +++++++++++----------
tools/include/nolibc/sys.h | 23 ++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 78 ++++++++++++++++++++++++++++
3 files changed, 134 insertions(+), 27 deletions(-)
---
base-commit: a63baab5f60110f3631c98b55d59066f1c68c4f7
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
tools/nolibc: add wrapper for memfd_create
tools/nolibc: let FILE streams contain an fd
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 36 +++----------
tools/include/nolibc/sys.h | 23 +++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 77 ++++++++++++++++++++++++++++
3 files changed, 107 insertions(+), 29 deletions(-)
---
base-commit: a5333c037de823912dd20e933785c63de7679e64
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>