Dzień dobry,
czy rozważali Państwo rozwój kwalifikacji językowych swoich pracowników?
Opracowaliśmy kursy językowe dla różnych branż, w których koncentrujemy się na podniesieniu poziomu słownictwa i jakości komunikacji wykorzystując autorską metodę, stworzoną specjalnie dla wymagającego biznesu.
Niestandardowy kurs on-line, dopasowany do profilu firmy i obszarów świadczonych usług, w szybkim czasie przyniesie efekty, które zwiększą komfort i jakość pracy, rozwijając możliwości biznesowe.
Zdalne szkolenie językowe to m.in. zajęcia z native speakerami, które w szybkim czasie nauczą pracowników rozmawiać za pomocą jasnego i zwięzłego języka Business English.
Czy mógłbym przedstawić więcej szczegółów i opowiedzieć jak działamy?
Pozdrawiam
Krzysztof Maj
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
---
v2: Fixed uninitialized var in test patch (4).
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
This is a follow-up to [1]:
[PATCH v9 0/3] mm: process/cgroup ksm support
which is now in mm-stable. Ideally we'd get at least patch #1 into the
same kernel release as [1], so the semantics of setting
PR_SET_MEMORY_MERGE=0 are unchanged between kernel versions.
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
v1 -> v2:
- "mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0"
-> Cleanup one if/else
-> Add doc for ksm_disable_merge_any()
- Added ACKs
[1] https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
David Hildenbrand (3):
mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: move disabling KSM from s390/gmap code to KSM code
arch/s390/mm/gmap.c | 20 +-----
include/linux/ksm.h | 7 ++
kernel/sys.c | 12 +---
mm/ksm.c | 70 +++++++++++++++++++
.../selftests/mm/ksm_functional_tests.c | 46 ++++++++++--
5 files changed, 121 insertions(+), 34 deletions(-)
--
2.40.0
Hi Linus,
Please pull the following KUnit next update for Linux 6.4-rc1.
linux-kselftest-kunit-6.4-rc1
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6:
Linux 6.3-rc1 (2023-03-05 14:52:03 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-kunit-6.4-rc1
for you to fetch changes up to a42077b787680cbc365a96446b30f32399fa3f6f:
kunit: add tests for using current KUnit test field (2023-04-05 12:51:30 -0600)
----------------------------------------------------------------
linux-kselftest-kunit-6.4-rc1
This KUnit update Linux 6.4-rc1 consists of:
- several fixes to kunit tool
- new klist structure test
- support for m68k under QEMU
- support for overriding the QEMU serial port
- support for SH under QEMU
----------------------------------------------------------------
Andy Shevchenko (1):
.gitignore: Unignore .kunitconfig
Daniel Latypov (3):
kunit: tool: add subscripts for type annotations where appropriate
kunit: tool: remove unused imports and variables
kunit: tool: fix pre-existing `mypy --strict` errors and update run_checks.py
Geert Uytterhoeven (3):
kunit: tool: Add support for m68k under QEMU
kunit: tool: Add support for overriding the QEMU serial port
kunit: tool: Add support for SH under QEMU
Heiko Carstens (1):
kunit: increase KUNIT_LOG_SIZE to 2048 bytes
Rae Moar (4):
kunit: fix bug in debugfs logs of parameterized tests
kunit: fix bug in the order of lines in debugfs logs
kunit: fix bug of extra newline characters in debugfs logs
kunit: add tests for using current KUnit test field
Sadiya Kazi (1):
list: test: Test the klist structure
Stephen Boyd (1):
kunit: Use gfp in kunit_alloc_resource() kernel-doc
.gitignore | 1 +
include/kunit/resource.h | 2 +-
include/kunit/test.h | 4 +-
lib/kunit/debugfs.c | 14 +-
lib/kunit/kunit-test.c | 77 ++++++--
lib/kunit/test.c | 57 ++++--
lib/list-test.c | 300 ++++++++++++++++++++++++++++++-
tools/testing/kunit/kunit.py | 26 +--
tools/testing/kunit/kunit_config.py | 4 +-
tools/testing/kunit/kunit_kernel.py | 39 ++--
tools/testing/kunit/kunit_parser.py | 1 -
tools/testing/kunit/kunit_printer.py | 2 +-
tools/testing/kunit/kunit_tool_test.py | 2 +-
tools/testing/kunit/qemu_config.py | 1 +
tools/testing/kunit/qemu_configs/m68k.py | 10 ++
tools/testing/kunit/qemu_configs/sh.py | 17 ++
tools/testing/kunit/run_checks.py | 6 +-
17 files changed, 491 insertions(+), 72 deletions(-)
create mode 100644 tools/testing/kunit/qemu_configs/m68k.py
create mode 100644 tools/testing/kunit/qemu_configs/sh.py
----------------------------------------------------------------
Hi Linus,
Please pull the following Kselftest update for Linux 6.4-rc1.
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit fe15c26ee26efa11741a7b632e9f23b01aca4cc6:
Linux 6.3-rc1 (2023-03-05 14:52:03 -0800)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-next-6.4-rc1
for you to fetch changes up to 50ad2fb7ec2b18186b8a4fa1c0e00f78b3de5119:
selftests/resctrl: Fix incorrect error return on test complete (2023-04-14 11:13:18 -0600)
----------------------------------------------------------------
linux-kselftest-next-6.4-rc1
linux-kselftest-next-6.4-rc1
This Kselftest update for Linux 6.4-rc1 consists of:
- several patches to enhance and fix resctrl test
- nolibc support for kselftest with an addition to vprintf() to
tools/nolibc/stdio and related test changes
- Refactor 'peeksiginfo' ptrace test part
- add 'malloc' failures checks in cgroup test_memcontrol
- a new prctl test
- enhancements sched test with additional ore schedule prctl calls
----------------------------------------------------------------
Fenghua Yu (1):
selftests/resctrl: Change name from CBM_MASK_PATH to INFO_PATH
Ilpo Järvinen (8):
selftests/resctrl: Return NULL if malloc_and_init_memory() did not alloc mem
selftests/resctrl: Move ->setup() call outside of test specific branches
selftests/resctrl: Allow ->setup() to return errors
selftests/resctrl: Check for return value after write_schemata()
selftests/resctrl: Replace obsolete memalign() with posix_memalign()
selftests/resctrl: Change initialize_llc_perf() return type to void
selftests/resctrl: Use remount_resctrlfs() consistently with boolean
selftests/resctrl: Correct get_llc_perf() param in function comment
Ivan Orlov (4):
selftests: Refactor 'peeksiginfo' ptrace test part
selftests: cgroup: Add 'malloc' failures checks in test_memcontrol
selftests: sched: Add more core schedule prctl calls
selftests: prctl: Add new prctl test for PR_SET_VMA action
Mark Brown (3):
tools/nolibc/stdio: Implement vprintf()
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
Peter Newman (1):
selftests/resctrl: Use correct exit code when tests fail
Reinette Chatre (1):
selftests/resctrl: Fix incorrect error return on test complete
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test
selftests/resctrl: Return MBA check result and make it to output message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister for all tests
selftests/resctrl: Remove duplicate codes that clear each test result file
Sukrut Bellary (1):
kselftest: amd-pstate: Fix spelling mistakes
tools/include/nolibc/stdio.h | 6 ++
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/amd-pstate/gitsource.sh | 4 +-
tools/testing/selftests/amd-pstate/run.sh | 4 +-
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 ++++-------------
tools/testing/selftests/cgroup/test_memcontrol.c | 15 +++
tools/testing/selftests/kselftest.h | 2 +
tools/testing/selftests/prctl/.gitignore | 1 +
tools/testing/selftests/prctl/Makefile | 2 +-
tools/testing/selftests/prctl/config | 1 +
.../selftests/prctl/set-anon-vma-name-test.c | 104 +++++++++++++++++++++
tools/testing/selftests/ptrace/peeksiginfo.c | 14 +--
tools/testing/selftests/resctrl/cache.c | 17 ++--
tools/testing/selftests/resctrl/cat_test.c | 33 ++++---
tools/testing/selftests/resctrl/cmt_test.c | 16 ++--
tools/testing/selftests/resctrl/fill_buf.c | 21 +----
tools/testing/selftests/resctrl/mba_test.c | 34 ++++---
tools/testing/selftests/resctrl/mbm_test.c | 22 ++---
tools/testing/selftests/resctrl/resctrl.h | 8 +-
tools/testing/selftests/resctrl/resctrl_tests.c | 14 +--
tools/testing/selftests/resctrl/resctrl_val.c | 88 +++++++++++------
tools/testing/selftests/resctrl/resctrlfs.c | 7 +-
tools/testing/selftests/sched/cs_prctl_test.c | 6 ++
24 files changed, 306 insertions(+), 204 deletions(-)
create mode 100644 tools/testing/selftests/prctl/config
create mode 100644 tools/testing/selftests/prctl/set-anon-vma-name-test.c
----------------------------------------------------------------
From: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
The verification function of this test case is likely to encounter the
following error, which may confuse users. The problem is easily
reproducible in the latest kernel.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$IP_B"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 17664, a(97) != q(113)
If the packet is captured, you will see:
Environment A, the sender:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
Environment B, the receiver:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 7360
IP $IP_A.41025 > $IP_B.8000: UDP, length 14720
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
In one test, the verification data is printed as follows:
abcd...xyz | 1...
.. |
abcd...xyz |
abcd...opabcd...xyz | ...1472... Not xyzabcd, messages are merged
.. |
The issue is that the test on receive for expected payload pattern
{AB..Z}+ fail for GRO packets if segment payload does not end on a Z.
The issue still exists when using the GRO with -G, but not using the -S
to obtain gsosize. Therefore, a print has been added to remind users.
Changes in v3:
- Simplify description and adjust judgment order.
Changes in v2:
- Fix confusing descriptions.
Signed-off-by: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
Reviewed-by: Xu Xin (CGEL ZTE) <xu.xin16(a)zte.com.cn>
Reviewed-by: Yang Yang (CGEL ZTE) <yang.yang29(a)zte.com.cn>
Cc: Xuexin Jiang (CGEL ZTE) <jiang.xuexin(a)zte.com.cn>
---
tools/testing/selftests/net/udpgso_bench_rx.c | 34 +++++++++++++++++++++++----
1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924d4a30..3ad18cbc570d 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -189,26 +189,45 @@ static char sanitized_char(char val)
return (val >= 'a' && val <= 'z') ? val : '.';
}
-static void do_verify_udp(const char *data, int len)
+static void do_verify_udp(const char *data, int start, int len)
{
- char cur = data[0];
+ char cur = data[start];
int i;
/* verify contents */
if (cur < 'a' || cur > 'z')
error(1, 0, "data initial byte out of range");
- for (i = 1; i < len; i++) {
+ for (i = start + 1; i < start + len; i++) {
if (cur == 'z')
cur = 'a';
else
cur++;
- if (data[i] != cur)
+ if (data[i] != cur) {
+ if (cfg_gro_segment && !cfg_expected_gso_size)
+ error(0, 0, "Use -S to obtain gsosize to guide "
+ "splitting and verification.");
+
error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
i, len,
sanitized_char(data[i]), data[i],
sanitized_char(cur), cur);
+ }
+ }
+}
+
+static void do_verify_udp_gro(const char *data, int len, int segment_size)
+{
+ int start = 0;
+
+ while (len - start > 0) {
+ if (len - start > segment_size)
+ do_verify_udp(data, start, segment_size);
+ else
+ do_verify_udp(data, start, len - start);
+
+ start += segment_size;
}
}
@@ -268,7 +287,12 @@ static void do_flush_udp(int fd)
if (ret == 0)
error(1, errno, "recv: 0 byte datagram\n");
- do_verify_udp(rbuf, ret);
+ if (!cfg_gro_segment)
+ do_verify_udp(rbuf, 0, ret);
+ else if (gso_size > 0)
+ do_verify_udp_gro(rbuf, ret, gso_size);
+ else
+ do_verify_udp_gro(rbuf, ret, ret);
}
if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
error(1, 0, "recv: bad gso size, got %d, expected %d "
--
2.15.2
This is a follow-up to [1]:
[PATCH v9 0/3] mm: process/cgroup ksm support
which is not in mm-unstable yet (but soon? :) ). I'll be on vacation for
~2 weeks, so sending it out now as reply to [1].
(1) Make PR_SET_MEMORY_MERGE=0 unmerge pages like setting MADV_UNMERGEABLE
does, (2) add a selftest for it and (3) factor out disabling of KSM from
s390/gmap code.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Stefan Roesch <shr(a)devkernel.io>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Janosch Frank <frankja(a)linux.ibm.com>
Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Shuah Khan <shuah(a)kernel.org>
[1] https://lkml.kernel.org/r/20230418051342.1919757-1-shr@devkernel.io
David Hildenbrand (3):
mm/ksm: unmerge and clear VM_MERGEABLE when setting
PR_SET_MEMORY_MERGE=0
selftests/ksm: ksm_functional_tests: add prctl unmerge test
mm/ksm: move disabling KSM from s390/gmap code to KSM code
arch/s390/mm/gmap.c | 20 +------
include/linux/ksm.h | 7 +++
kernel/sys.c | 7 +--
mm/ksm.c | 58 +++++++++++++++++++
.../selftests/mm/ksm_functional_tests.c | 46 +++++++++++++--
5 files changed, 107 insertions(+), 31 deletions(-)
--
2.39.2
This series consolidates the behavior of the 2 drivers that implement
the ethtool MAC Merge layer by making NXP ENETC commit its preemptible
traffic classes to hardware only when MM TX is active (same as Ocelot).
Then, after resolving an issue with the ENETC driver, it restricts user
space from entering 2 states which don't make sense:
- pmac-enabled off tx-enabled on verify-enabled *
- pmac-enabled * tx-enabled off verify-enabled on
Then, it introduces a selftest (ethtool_mm.sh) which puts everything
together and tests all valid configurations known to me.
This is simultaneously the v2 of "[PATCH net-next 0/2] ethtool mm API
improvements":
https://lore.kernel.org/netdev/20230415173454.3970647-1-vladimir.oltean@nxp…
which had caused some problems to openlldp. Those were solved in the
meantime, see:
https://github.com/intel/openlldp/commit/11171b474f6f3cbccac5d608b7f26b32ff…
and of "[RFC PATCH net-next] selftests: forwarding: add a test for MAC
Merge layer":
https://lore.kernel.org/netdev/20230210221243.228932-1-vladimir.oltean@nxp.…
Petr Machata (2):
selftests: forwarding: sch_tbf_*: Add a pre-run hook
selftests: forwarding: generalize bail_on_lldpad from mlxsw
Vladimir Oltean (7):
net: enetc: fix MAC Merge layer remaining enabled until a link down
event
net: enetc: report mm tx-active based on tx-enabled and verify-status
net: enetc: only commit preemptible TCs to hardware when MM TX is
active
net: enetc: include MAC Merge / FP registers in register dump
net: ethtool: mm: sanitize some UAPI configurations
selftests: forwarding: introduce helper for standard ethtool counters
selftests: forwarding: add a test for MAC Merge layer
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 5 +-
.../ethernet/freescale/enetc/enetc_ethtool.c | 94 +++++-
.../net/ethernet/freescale/enetc/enetc_hw.h | 3 +
net/ethtool/mm.c | 10 +
.../drivers/net/mlxsw/qos_headroom.sh | 3 +-
.../selftests/drivers/net/mlxsw/qos_lib.sh | 28 --
.../selftests/drivers/net/mlxsw/qos_pfc.sh | 3 +-
.../selftests/drivers/net/mlxsw/sch_ets.sh | 3 +-
.../drivers/net/mlxsw/sch_red_core.sh | 1 -
.../drivers/net/mlxsw/sch_red_ets.sh | 2 +-
.../drivers/net/mlxsw/sch_red_root.sh | 2 +-
.../drivers/net/mlxsw/sch_tbf_ets.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_prio.sh | 6 +-
.../drivers/net/mlxsw/sch_tbf_root.sh | 6 +-
.../testing/selftests/net/forwarding/Makefile | 1 +
.../selftests/net/forwarding/ethtool_mm.sh | 288 ++++++++++++++++++
tools/testing/selftests/net/forwarding/lib.sh | 60 ++++
.../net/forwarding/sch_tbf_etsprio.sh | 4 +
.../selftests/net/forwarding/sch_tbf_root.sh | 4 +
20 files changed, 486 insertions(+), 66 deletions(-)
create mode 100755 tools/testing/selftests/net/forwarding/ethtool_mm.sh
--
2.34.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v6:
- Go back to the v4 locking arragnment with now both the attach/detach
igroup->locks inside the functions, Kevin says he needs this for a
followup series. This still fixes the syzkaller bug
- Fix two more error unwind locking bugs where
iommufd_object_abort_and_destroy(hwpt) would deadlock or be mislocked.
Make sure fail_nth will catch these mistakes
- Add a patch allowing objects to have different abort than destroy
function, it allows hwpt abort to require the caller to continue
to hold the lock and enforces this with lockdep.
v5: https://lore.kernel.org/r/0-v5-6716da355392+c5-iommufd_alloc_jgg@nvidia.com
- Go back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Compared to v3 the diff for the whole series looks like:
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
index 024ed8ee9939cd..2770087059ba73 100644
--- a/drivers/iommu/iommufd/device.c
+++ b/drivers/iommu/iommufd/device.c
@@ -341,14 +341,15 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt = idev->igroup->hwpt;
- lockdep_assert_held(&idev->igroup->lock);
-
+ mutex_lock(&idev->igroup->lock);
list_del(&idev->group_item);
if (list_empty(&idev->igroup->device_list)) {
iommu_detach_group(hwpt->domain, idev->igroup->group);
idev->igroup->hwpt = NULL;
}
iopt_remove_reserved_iova(&hwpt->ioas->iopt, idev->dev);
+ mutex_unlock(&idev->igroup->lock);
+
/* Caller must destroy hwpt */
return hwpt;
}
@@ -515,8 +516,8 @@ iommufd_device_auto_get_domain(struct iommufd_device *idev,
hwpt->auto_domain = true;
*pt_id = hwpt->obj.id;
- mutex_unlock(&ioas->mutex);
iommufd_object_finalize(idev->ictx, &hwpt->obj);
+ mutex_unlock(&ioas->mutex);
return destroy_hwpt;
out_abort:
@@ -610,7 +611,6 @@ EXPORT_SYMBOL_NS_GPL(iommufd_device_attach, IOMMUFD);
* This is the same as
* iommufd_device_detach();
* iommufd_device_attach();
- *
* If it fails then no change is made to the attachment. The iommu driver may
* implement this so there is no disruption in translation. This can only be
* called if iommufd_device_attach() has already succeeded.
@@ -633,10 +633,7 @@ void iommufd_device_detach(struct iommufd_device *idev)
{
struct iommufd_hw_pagetable *hwpt;
- mutex_lock(&idev->igroup->lock);
hwpt = iommufd_hw_pagetable_detach(idev);
- mutex_unlock(&idev->igroup->lock);
-
iommufd_hw_pagetable_put(idev->ictx, hwpt);
refcount_dec(&idev->obj.users);
}
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 8aa9ac130b5960..655ed32144f62e 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -26,6 +26,21 @@ void iommufd_hw_pagetable_destroy(struct iommufd_object *obj)
refcount_dec(&hwpt->ioas->obj.users);
}
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj)
+{
+ struct iommufd_hw_pagetable *hwpt =
+ container_of(obj, struct iommufd_hw_pagetable, obj);
+
+ /* The ioas->mutex must be held until finalize is called. */
+ lockdep_assert_held(&hwpt->ioas->mutex);
+
+ if (!list_empty(&hwpt->hwpt_item)) {
+ list_del_init(&hwpt->hwpt_item);
+ iopt_table_remove_domain(&hwpt->ioas->iopt, hwpt->domain);
+ }
+ iommufd_hw_pagetable_destroy(obj);
+}
+
int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
{
if (hwpt->enforce_cache_coherency)
@@ -50,6 +65,10 @@ int iommufd_hw_pagetable_enforce_cc(struct iommufd_hw_pagetable *hwpt)
* Allocate a new iommu_domain and return it as a hw_pagetable. The HWPT
* will be linked to the given ioas and upon return the underlying iommu_domain
* is fully popoulated.
+ *
+ * The caller must hold the ioas->mutex until after
+ * iommufd_object_abort_and_destroy() or iommufd_object_finalize() is called on
+ * the returned hwpt.
*/
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
@@ -93,9 +112,6 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
* directly allocate a domain. These drivers do not finish creating the
* domain until attach is completed. Thus we must have this call
* sequence. Once those drivers are fixed this should be removed.
- *
- * Note we hold the igroup->lock here which prevents any other thread
- * from observing igroup->hwpt until we finish setting it up.
*/
if (immediate_attach) {
rc = iommufd_hw_pagetable_attach(hwpt, idev);
@@ -140,10 +156,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
mutex_lock(&ioas->mutex);
hwpt = iommufd_hw_pagetable_alloc(ucmd->ictx, ioas, idev, false);
- mutex_unlock(&ioas->mutex);
if (IS_ERR(hwpt)) {
rc = PTR_ERR(hwpt);
- goto out_put_ioas;
+ goto out_unlock;
}
cmd->out_hwpt_id = hwpt->obj.id;
@@ -151,11 +166,12 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
if (rc)
goto out_hwpt;
iommufd_object_finalize(ucmd->ictx, &hwpt->obj);
- goto out_put_ioas;
+ goto out_unlock;
out_hwpt:
iommufd_object_abort_and_destroy(ucmd->ictx, &hwpt->obj);
-out_put_ioas:
+out_unlock:
+ mutex_unlock(&ioas->mutex);
iommufd_put_object(&ioas->obj);
out_put_idev:
iommufd_put_object(&idev->obj);
diff --git a/drivers/iommu/iommufd/io_pagetable.c b/drivers/iommu/iommufd/io_pagetable.c
index f842768b2e250b..21052f64f95649 100644
--- a/drivers/iommu/iommufd/io_pagetable.c
+++ b/drivers/iommu/iommufd/io_pagetable.c
@@ -1172,6 +1172,9 @@ int iopt_table_enforce_dev_resv_regions(struct io_pagetable *iopt,
unsigned int num_sw_msi = 0;
int rc;
+ if (iommufd_should_fail())
+ return -EINVAL;
+
down_write(&iopt->iova_rwsem);
/* FIXME: drivers allocate memory but there is no failure propogated */
iommu_get_resv_regions(dev, &resv_regions);
diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
index cb693190bf51c5..ba50eb4661e217 100644
--- a/drivers/iommu/iommufd/iommufd_private.h
+++ b/drivers/iommu/iommufd/iommufd_private.h
@@ -261,6 +261,7 @@ int iommufd_hw_pagetable_attach(struct iommufd_hw_pagetable *hwpt,
struct iommufd_hw_pagetable *
iommufd_hw_pagetable_detach(struct iommufd_device *idev);
void iommufd_hw_pagetable_destroy(struct iommufd_object *obj);
+void iommufd_hw_pagetable_abort(struct iommufd_object *obj);
int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd);
static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx,
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 694da191e4b155..73a91e96896252 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -24,6 +24,7 @@
struct iommufd_object_ops {
void (*destroy)(struct iommufd_object *obj);
+ void (*abort)(struct iommufd_object *obj);
};
static const struct iommufd_object_ops iommufd_object_ops[];
static struct miscdevice vfio_misc_dev;
@@ -104,7 +105,10 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj)
void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx,
struct iommufd_object *obj)
{
- iommufd_object_ops[obj->type].destroy(obj);
+ if (iommufd_object_ops[obj->type].abort)
+ iommufd_object_ops[obj->type].abort(obj);
+ else
+ iommufd_object_ops[obj->type].destroy(obj);
iommufd_object_abort(ictx, obj);
}
@@ -413,6 +417,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = {
},
[IOMMUFD_OBJ_HW_PAGETABLE] = {
.destroy = iommufd_hw_pagetable_destroy,
+ .abort = iommufd_hw_pagetable_abort,
},
#ifdef CONFIG_IOMMUFD_TEST
[IOMMUFD_OBJ_SELFTEST] = {
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index c07252dbf62d72..8b2c18ac6a2864 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -9,9 +9,6 @@
#include "iommufd_utils.h"
-static void *buffer;
-
-static unsigned long PAGE_SIZE;
static unsigned long HUGEPAGE_SIZE;
#define MOCK_PAGE_SIZE (PAGE_SIZE / 2)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c
index 7e1afb6ff9bd8d..d4c552e5694812 100644
--- a/tools/testing/selftests/iommu/iommufd_fail_nth.c
+++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c
@@ -41,6 +41,8 @@ static int writeat(int dfd, const char *fn, const char *val)
static __attribute__((constructor)) void setup_buffer(void)
{
+ PAGE_SIZE = sysconf(_SC_PAGE_SIZE);
+
BUFFER_SIZE = 2*1024*1024;
buffer = mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE,
@@ -579,6 +581,7 @@ TEST_FAIL_NTH(basic_fail_nth, device)
uint32_t stdev_id;
uint32_t idev_id;
uint32_t hwpt_id;
+ __u64 iova;
self->fd = open("/dev/iommu", O_RDWR);
if (self->fd == -1)
@@ -590,6 +593,18 @@ TEST_FAIL_NTH(basic_fail_nth, device)
if (_test_ioctl_ioas_alloc(self->fd, &ioas_id2))
return -1;
+ iova = MOCK_APERTURE_START;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+ if (_test_ioctl_ioas_map(self->fd, ioas_id2, buffer, PAGE_SIZE, &iova,
+ IOMMU_IOAS_MAP_FIXED_IOVA |
+ IOMMU_IOAS_MAP_WRITEABLE |
+ IOMMU_IOAS_MAP_READABLE))
+ return -1;
+
fail_nth_enable();
if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, NULL,
diff --git a/tools/testing/selftests/iommu/iommufd_utils.h b/tools/testing/selftests/iommu/iommufd_utils.h
index 9b6dcb921750b6..53b4d3f2d9fc6c 100644
--- a/tools/testing/selftests/iommu/iommufd_utils.h
+++ b/tools/testing/selftests/iommu/iommufd_utils.h
@@ -19,6 +19,8 @@
static void *buffer;
static unsigned long BUFFER_SIZE;
+static unsigned long PAGE_SIZE;
+
/*
* Have the kernel check the refcount on pages. I don't know why a freshly
* mmap'd anon non-compound page starts out with a ref of 3
Jason Gunthorpe (17):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Allow a hwpt to be aborted after allocation
iommufd: Fix locking around hwpt allocation
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 525 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 112 +++-
drivers/iommu/iommufd/io_pagetable.c | 30 +-
drivers/iommu/iommufd/iommufd_private.h | 52 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 24 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++-
14 files changed, 853 insertions(+), 211 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Trace sched related functions, such as enqueue_task_fair, it is necessary to
specify a task instead of the current task which within a given cgroup to a map.
Feng Zhou (2):
bpf: Add bpf_task_under_cgroup helper
selftests/bpf: Add testcase for bpf_task_under_cgroup
include/uapi/linux/bpf.h | 13 +++++
kernel/bpf/verifier.c | 4 +-
kernel/trace/bpf_trace.c | 31 ++++++++++++
tools/include/uapi/linux/bpf.h | 13 +++++
.../bpf/prog_tests/task_under_cgroup.c | 49 +++++++++++++++++++
.../bpf/progs/test_task_under_cgroup.c | 31 ++++++++++++
6 files changed, 140 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/task_under_cgroup.c
create mode 100644 tools/testing/selftests/bpf/progs/test_task_under_cgroup.c
--
2.20.1
There's been a bunch of off-list discussions about this, including at
Plumbers. The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string). That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.
Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system. The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example). The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.
The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.
An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].
I was asked about the performance delta between this and something like
sysfs. I created a small test program [2] and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
- open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
- access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
- riscv_hwprobe() vDSO and syscall: .0094us
- riscv_hwprobe() vDSO with no syscall: 0.0091us
These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.
[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050
[2] https://pastebin.com/x84NEKaS
Changes in v6:
- Remove spurious blank line (Conorbot)
- Update copyrights (Paul)
- Update copyrights (Paul)
- Wrap init_hwprobe_vdso_data() in CONFIG_MMU to fix nommu build break
(Conorbot)
- Update copyrights (Paul)
Changes in v5:
- Added tags
- Fixed misuse of ISA_EXT_c as bitmap, changed to use
riscv_isa_extension_available() (Heiko, Conor)
- Document the alternatives approach in the commit message (Conor and
Heiko).
- Fix __init call warnings by making probe_vendor_features() and
thead_feature_probe_func() __init_or_module.
- Fixed compat vdso compilation failure (lkp).
Changes in v4:
- Used real types in syscall prototypes (Arnd)
- Fixed static line break in do_riscv_hwprobe() (Conor)
- Added newlines between documentation lists (Conor)
- Crispen up size types to size_t, and cpu indices to int (Joe)
- Fix copy_from_user() return logic bug (found via kselftests!)
- Add __user to SYSCALL_DEFINE() to fix warning
- More newlines in BASE_BEHAVIOR_IMA documentation (Conor)
- Add newlines to CPUPERF_0 documentation (Conor)
- Add UNSUPPORTED value (Conor)
- Switched from DT to alternatives-based probing (Rob)
- Crispen up cpu index type to always be int (Conor)
- Fixed selftests commit description, no more tiny libc (Mark Brown)
- Fixed selftest syscall prototype types to match v4.
- Added a prototype to fix -Wmissing-prototype warning (lkp(a)intel.com)
- Fixed rv32 build failure (lkp(a)intel.com)
- Make vdso prototype match syscall types update
Changes in v3:
- Updated copyright date in cpufeature.h
- Fixed typo in cpufeature.h comment (Conor)
- Refactored functions so that kernel mode can query too, in
preparation for the vDSO data population.
- Changed the vendor/arch/imp IDs to return a value of -1 on mismatch
rather than failing the whole call.
- Const cpumask pointer in hwprobe_mid()
- Embellished documentation WRT cpu_set and the returned values.
- Renamed hwprobe_mid() to hwprobe_arch_id() (Conor)
- Fixed machine ID doc warnings, changed elements to c:macro:.
- Completed dangling unistd.h comment (Conor)
- Fixed line breaks and minor logic optimization (Conor).
- Use riscv_cached_mxxxid() (Conor)
- Refactored base ISA behavior probe to allow kernel probing as well,
in prep for vDSO data initialization.
- Fixed doc warnings in IMA text list, use :c:macro:.
- Have hwprobe_misaligned return int instead of long.
- Constify cpumask pointer in hwprobe_misaligned()
- Fix warnings in _PERF_O list documentation, use :c:macro:.
- Move include cpufeature.h to misaligned patch.
- Fix documentation mismatch for RISCV_HWPROBE_KEY_CPUPERF_0 (Conor)
- Use for_each_possible_cpu() instead of NR_CPUS (Conor)
- Break early in misaligned access iteration (Conor)
- Increase MISALIGNED_MASK from 2 bits to 3 for possible UNSUPPORTED future
value (Conor)
- Introduced vDSO function
Changes in v2:
- Factored the move of struct riscv_cpuinfo to its own header
- Changed the interface to look more like poll(). Rather than supplying
key_offset and getting back an array of values with numerically
contiguous keys, have the user pre-fill the key members of the array,
and the kernel will fill in the corresponding values. For any key it
doesn't recognize, it will set the key of that element to -1. This
allows usermode to quickly ask for exactly the elements it cares
about, and not get bogged down in a back and forth about newer keys
that older kernels might not recognize. In other words, the kernel
can communicate that it doesn't recognize some of the keys while
still providing the data for the keys it does know.
- Added a shortcut to the cpuset parameters that if a size of 0 and
NULL is provided for the CPU set, the kernel will use a cpu mask of
all online CPUs. This is convenient because I suspect most callers
will only want to act on a feature if it's supported on all CPUs, and
it's a headache to dynamically allocate an array of all 1s, not to
mention a waste to have the kernel loop over all of the offline bits.
- Fixed logic error in if(of_property_read_string...) that caused crash
- Include cpufeature.h in cpufeature.h to avoid undeclared variable
warning.
- Added a _MASK define
- Fix random checkpatch complaints
- Updated the selftests to the new API and added some more.
- Fixed indentation, comments in .S, and general checkpatch complaints.
Evan Green (6):
RISC-V: Move struct riscv_cpuinfo to new header
RISC-V: Add a syscall for HW probing
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
RISC-V: hwprobe: Support probing of misaligned access performance
selftests: Test the new RISC-V hwprobe interface
RISC-V: Add hwprobe vDSO function and data
Documentation/riscv/hwprobe.rst | 86 +++++++
Documentation/riscv/index.rst | 1 +
arch/riscv/Kconfig | 1 +
arch/riscv/errata/thead/errata.c | 10 +
arch/riscv/include/asm/alternative.h | 5 +
arch/riscv/include/asm/cpufeature.h | 23 ++
arch/riscv/include/asm/hwprobe.h | 13 +
arch/riscv/include/asm/syscall.h | 4 +
arch/riscv/include/asm/vdso/data.h | 17 ++
arch/riscv/include/asm/vdso/gettimeofday.h | 8 +
arch/riscv/include/uapi/asm/hwprobe.h | 37 +++
arch/riscv/include/uapi/asm/unistd.h | 9 +
arch/riscv/kernel/alternative.c | 19 ++
arch/riscv/kernel/compat_vdso/Makefile | 2 +-
arch/riscv/kernel/cpu.c | 8 +-
arch/riscv/kernel/cpufeature.c | 3 +
arch/riscv/kernel/smpboot.c | 1 +
arch/riscv/kernel/sys_riscv.c | 228 +++++++++++++++++-
arch/riscv/kernel/vdso.c | 6 -
arch/riscv/kernel/vdso/Makefile | 4 +
arch/riscv/kernel/vdso/hwprobe.c | 52 ++++
arch/riscv/kernel/vdso/sys_hwprobe.S | 15 ++
arch/riscv/kernel/vdso/vdso.lds.S | 3 +
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/riscv/Makefile | 58 +++++
.../testing/selftests/riscv/hwprobe/Makefile | 10 +
.../testing/selftests/riscv/hwprobe/hwprobe.c | 90 +++++++
.../selftests/riscv/hwprobe/sys_hwprobe.S | 12 +
28 files changed, 712 insertions(+), 14 deletions(-)
create mode 100644 Documentation/riscv/hwprobe.rst
create mode 100644 arch/riscv/include/asm/cpufeature.h
create mode 100644 arch/riscv/include/asm/hwprobe.h
create mode 100644 arch/riscv/include/asm/vdso/data.h
create mode 100644 arch/riscv/include/uapi/asm/hwprobe.h
create mode 100644 arch/riscv/kernel/vdso/hwprobe.c
create mode 100644 arch/riscv/kernel/vdso/sys_hwprobe.S
create mode 100644 tools/testing/selftests/riscv/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/Makefile
create mode 100644 tools/testing/selftests/riscv/hwprobe/hwprobe.c
create mode 100644 tools/testing/selftests/riscv/hwprobe/sys_hwprobe.S
--
2.25.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
Once review of this is complete I will keep it on a side branch and
accumulate the following series when they are ready so we can have a
stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
I have this on github:
https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v3:
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 512 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 27 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 64 ++-
.../selftests/iommu/iommufd_fail_nth.c | 52 +-
tools/testing/selftests/iommu/iommufd_utils.h | 61 ++-
14 files changed, 804 insertions(+), 200 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
Resending... It was not clear to me if this file has a specific
maintainer. I chose to send it to the most recent committer.
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
When calling socket lookup from L2 (tc, xdp), VRF boundaries aren't
respected. This patchset fixes this by regarding the incoming device's
VRF attachment when performing the socket lookups from tc/xdp.
The first two patches are coding changes which facilitate this fix by
factoring out the tc helper's logic which was shared with cg/sk_skb
(which operate correctly).
The third patch contains the actual bugfix.
The fourth patch adds bpf tests for these lookup functions.
Gilad Sever (4):
bpf: factor out socket lookup functions for the TC hookpoint.
bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC
hookpoint
bpf: fix bpf socket lookup from tc/xdp to respect socket VRF bindings
selftests/bpf: Add tc_socket_lookup tests
net/core/filter.c | 132 +++++--
.../bpf/prog_tests/tc_socket_lookup.c | 341 ++++++++++++++++++
.../selftests/bpf/progs/tc_socket_lookup.c | 73 ++++
3 files changed, 525 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_socket_lookup.c
create mode 100644 tools/testing/selftests/bpf/progs/tc_socket_lookup.c
--
2.34.1
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Instead, in the event test terminates early, run the test exit and
cleanup from a new 'cleanup' kthread, which sets current->kunit_test,
and better isolates the rest of KUnit from issues which arise in test
cleanup.
If a test cleanup function itself aborts (e.g., due to an assertion
failing), there will be no further attempts to clean up: an error will
be logged and the test failed.
This should also make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This is an updated version of / replacement of "kunit: Set the current
KUnit context when cleaning up", which instead creates a new kthread
for cleanup tasks if the original test kthread is aborted. This protects
us from failed assertions during cleanup, if the test exited early.
Changes since v1:
https://lore.kernel.org/linux-kselftest/20230415091401.681395-1-davidgow@go…
- Move cleanup execution to another kthread
- (Thanks, Benjamin, for pointing out the assertion issues)
---
lib/kunit/test.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 52 insertions(+), 2 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..caeae0dfd82b 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -423,8 +423,51 @@ static void kunit_try_run_case(void *data)
kunit_run_case_cleanup(test, suite);
}
+static void kunit_try_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ struct kunit_suite *suite = ctx->suite;
+
+ current->kunit_test = test;
+
+ kunit_run_case_cleanup(test, suite);
+}
+
+static void kunit_catch_run_case_cleanup(void *data)
+{
+ struct kunit_try_catch_context *ctx = data;
+ struct kunit *test = ctx->test;
+ int try_exit_code = kunit_try_catch_get_result(&test->try_catch);
+
+ /* It is always a failure if cleanup aborts. */
+ kunit_set_failure(test);
+
+ if (try_exit_code) {
+ /*
+ * Test case could not finish, we have no idea what state it is
+ * in, so don't do clean up.
+ */
+ if (try_exit_code == -ETIMEDOUT) {
+ kunit_err(test, "test case cleanup timed out\n");
+ /*
+ * Unknown internal error occurred preventing test case from
+ * running, so there is nothing to clean up.
+ */
+ } else {
+ kunit_err(test, "internal error occurred during test case cleanup: %d\n",
+ try_exit_code);
+ }
+ return;
+ }
+
+ kunit_err(test, "test aborted during cleanup. continuing without cleaning up\n");
+}
+
+
static void kunit_catch_run_case(void *data)
{
+ struct kunit_try_catch cleanup;
struct kunit_try_catch_context *ctx = data;
struct kunit *test = ctx->test;
struct kunit_suite *suite = ctx->suite;
@@ -451,9 +494,16 @@ static void kunit_catch_run_case(void *data)
/*
* Test case was run, but aborted. It is the test case's business as to
- * whether it failed or not, we just need to clean up.
+ * whether it failed or not, we just need to clean up. Do this in a new
+ * try / catch context, in case it asserts, too.
*/
- kunit_run_case_cleanup(test, suite);
+ kunit_try_catch_init(&cleanup,
+ test,
+ kunit_try_run_case_cleanup,
+ kunit_catch_run_case_cleanup);
+ ctx->test = test;
+ ctx->suite = suite;
+ kunit_try_catch_run(&cleanup, ctx);
}
/*
--
2.40.0.634.g4ca3ef3211-goog
*Changes in v15*
- Build fix
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 481 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2105 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
From: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
The verification function of this test case is likely to encounter the
following error, which may confuse users. The problem is easily
reproducible in the latest kernel.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$IP_B"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 17664, a(97) != q(113)
If the packet is captured, you will see:
Environment A, the sender:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_A.41025 > $IP_B.8000: UDP, length 1472
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
Environment B, the receiver:
bash# tcpdump -i eth0 host "$IP_B" &
IP $IP_A.41025 > $IP_B.8000: UDP, length 7360
IP $IP_A.41025 > $IP_B.8000: UDP, length 14720
IP $IP_B > $IP_A: ICMP $IP_B udp port 8000 unreachable, length 556
In one test, the verification data is printed as follows:
abcd...xyz | 1...
.. |
abcd...xyz |
abcd...opabcd...xyz | ...1472... Not xyzabcd, messages are merged
.. |
This is because the sending buffer is buf[64K], and its content is a
loop of A-Z. But maybe only 1472 bytes per send, or more if UDP GSO is
used. The message content does not necessarily end with XYZ, but GRO
will merge these packets, and the -v parameter directly verifies the
entire GRO receive buffer. So we do the validation after the data is split
at the receiving end, just as the application actually uses this feature.
If the sender does not use GSO, each individual segment starts at A,
end at somewhere. Using GSO also has the same problem, and. The data
between each segment during transmission is continuous, but GRO is merged
in the order received, which is not necessarily the order of transmission.
Execution in the same environment does not cause problems, because the
lo device is not NAPI, and does not perform GRO processing. Perhaps it
could be worth supporting to reduce system calls.
bash# tcpdump -i lo host "$IP_self" &
bash# echo udp_gro_receive > /sys/kernel/debug/tracing/set_ftrace_filter
bash# echo function > /sys/kernel/debug/tracing/current_tracer
bash# udpgso_bench_rx -4 -G -S 1472 -v &
bash# udpgso_bench_tx -l 4 -4 -D "$IP_self"
The issue still exists when using the GRO with -G, but not using the -S
to obtain gsosize. Therefore, a print has been added to remind users.
After this issue is resolved, another issue will be encountered and will
be resolved in the next patch.
Environment A, the sender:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
udpgso_bench_tx: write: Connection refused
Environment B, the receiver:
bash# udpgso_bench_rx -4 -G -S 1472
udp rx: 15 MB/s 256 calls/s
udp rx: 30 MB/s 512 calls/s
udpgso_bench_rx: recv: bad gso size, got -1, expected 1472
(-1 == no gso cmsg))
v2:
- Fix confusing descriptions
Signed-off-by: Zhang Yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
Reviewed-by: Xu Xin (CGEL ZTE) <xu.xin16(a)zte.com.cn>
Reviewed-by: Yang Yang (CGEL ZTE) <yang.yang29(a)zte.com.cn>
Cc: Xuexin Jiang (CGEL ZTE) <jiang.xuexin(a)zte.com.cn>
---
tools/testing/selftests/net/udpgso_bench_rx.c | 40 +++++++++++++++++++++------
1 file changed, 31 insertions(+), 9 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso_bench_rx.c b/tools/testing/selftests/net/udpgso_bench_rx.c
index f35a924d4a30..6a2026494cdb 100644
--- a/tools/testing/selftests/net/udpgso_bench_rx.c
+++ b/tools/testing/selftests/net/udpgso_bench_rx.c
@@ -189,26 +189,44 @@ static char sanitized_char(char val)
return (val >= 'a' && val <= 'z') ? val : '.';
}
-static void do_verify_udp(const char *data, int len)
+static void do_verify_udp(const char *data, int start, int len)
{
- char cur = data[0];
+ char cur = data[start];
int i;
/* verify contents */
if (cur < 'a' || cur > 'z')
error(1, 0, "data initial byte out of range");
- for (i = 1; i < len; i++) {
+ for (i = start + 1; i < start + len; i++) {
if (cur == 'z')
cur = 'a';
else
cur++;
- if (data[i] != cur)
+ if (data[i] != cur) {
+ if (cfg_gro_segment && !cfg_expected_gso_size)
+ error(0, 0, "Use -S to obtain gsosize, to %s"
+ , "help guide split and verification.");
+
error(1, 0, "data[%d]: len %d, %c(%hhu) != %c(%hhu)\n",
i, len,
sanitized_char(data[i]), data[i],
sanitized_char(cur), cur);
+ }
+ }
+}
+
+static void do_verify_udp_gro(const char *data, int len, int gso_size)
+{
+ int start = 0;
+
+ while (len - start > 0) {
+ if (len - start > gso_size)
+ do_verify_udp(data, start, gso_size);
+ else
+ do_verify_udp(data, start, len - start);
+ start += gso_size;
}
}
@@ -264,16 +282,20 @@ static void do_flush_udp(int fd)
if (cfg_expected_pkt_len && ret != cfg_expected_pkt_len)
error(1, 0, "recv: bad packet len, got %d,"
" expected %d\n", ret, cfg_expected_pkt_len);
+ if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
+ error(1, 0, "recv: bad gso size, got %d, expected %d %s",
+ gso_size, cfg_expected_gso_size, "(-1 == no gso cmsg))\n");
if (len && cfg_verify) {
if (ret == 0)
error(1, errno, "recv: 0 byte datagram\n");
- do_verify_udp(rbuf, ret);
+ if (!cfg_gro_segment)
+ do_verify_udp(rbuf, 0, ret);
+ else if (gso_size > 0)
+ do_verify_udp_gro(rbuf, ret, gso_size);
+ else
+ do_verify_udp_gro(rbuf, ret, ret);
}
- if (cfg_expected_gso_size && cfg_expected_gso_size != gso_size)
- error(1, 0, "recv: bad gso size, got %d, expected %d "
- "(-1 == no gso cmsg))\n", gso_size,
- cfg_expected_gso_size);
packets++;
bytes += ret;
--
2.15.2
*Changes in v14*
- Fix build error caused by #ifdef added at last minute in some configs
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 478 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2102 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
From: zhang yunkai (CGEL ZTE) <zhang.yunkai(a)zte.com.cn>
1.Fix verifty exception
Executing the following command fails:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
bash# udpgso_bench_tx -l 4 -4 -D "$DST" -S 0
bash# udpgso_bench_rx -4 -G -S 1472 -v
udpgso_bench_rx: data[1472]: len 2944, a(97) != q(113)
This is because the sending buffers are not aligned by 26 bytes, and the
GRO is not merged sequentially, and the receiver does not judge this
situation. We think of the receiver to split the data and then validate
it, just as the application actually uses this feature.
2.Fix gsosize exception
Executing the following command fails:
bash# udpgso_bench_tx -l 4 -4 -D "$DST"
bash# udpgso_bench_tx -l 4 -4 -D "$DST" -S 0
bash# udpgso_bench_rx -4 -G -S 1472
udp rx: 15 MB/s 256 calls/s
udp rx: 30 MB/s 512 calls/s
udpgso_bench_rx: recv: bad gso size, got -1, expected 1472
(-1 == no gso cmsg))
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 7360
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 1472
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 1472
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 4416
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 11776
IP 192.168.2.199.55238 > 192.168.2.203.8000: UDP, length 20608
recv: got one message len:1472, probably not an error.
recv: got one message len:1472, probably not an error.
This is due to network, NAPI, timer, etc., only one message being received.
We believe that this situation should be normal.
3.Fix packet number exception
bash# udpgso_bench_rx -4 -n 100
bash# udpgso_bench_tx -l 1 -4 -D "$DST"
udpgso_bench_rx: wrong packet number! got 0, expected 100
This is because the packets is cleared after print.
Zhang Yunkai (3):
selftests: net: udpgso_bench_rx: Fix verifty exceptions
selftests: net: udpgso_bench_rx: Fix gsosize exceptions
selftests: net: udpgso_bench_rx: Fix packet number exceptions
tools/testing/selftests/net/udpgso_bench_rx.c | 45 +++++++++++++++++++++------
1 file changed, 35 insertions(+), 10 deletions(-)
--
2.15.2
KUnit tests run in a kthread, with the current->kunit_test pointer set
to the test's context. This allows the kunit_get_current_test() and
kunit_fail_current_test() macros to work. Normally, this pointer is
still valid during test shutdown (i.e., the suite->exit function, and
any resource cleanup). However, if the test has exited early (e.g., due
to a failed assertion), the cleanup is done in the parent KUnit thread,
which does not have an active context.
Fix this by setting the active KUnit context for the duration of the
test shutdown procedure. When the test exits normally, this does
nothing. When run from the KUits previous value (probably NULL)
afterwards.
This should make it easier to get access to the KUnit context,
particularly from within resource cleanup functions, which may, for
example, need access to data in test->priv.
Signed-off-by: David Gow <davidgow(a)google.com>
---
This becomes useful with the current kunit_add_action() implementation,
as actions do not get the KUnit context passed in by default:
https://lore.kernel.org/linux-kselftest/CABVgOSmjs0wLUa4=ErkB9tH8p6A1P6N33b…
I think it's probably correct anyway, though, so we should either do
this, or totally rule out using kunit_get_current_test() here at all, by
resetting current->kunit_test to NULL before running cleanup even in
the normal case.
I've only given this the most cursory testing so far (I'm not sure how
much of the executor innards I want to expose to be able to actually
write a proper test for it), so more eyes and/or suggestions are
welcome.
Cheers,
-- David
---
lib/kunit/test.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index e2910b261112..2d7cad249863 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -392,10 +392,21 @@ static void kunit_case_internal_cleanup(struct kunit *test)
static void kunit_run_case_cleanup(struct kunit *test,
struct kunit_suite *suite)
{
+ /*
+ * If we're no-longer running from within the test kthread() because it failed
+ * or timed out, we still need the context to be okay when running exit and
+ * cleanup functions.
+ */
+ struct kunit *old_current = current->kunit_test;
+
+ current->kunit_test = test;
if (suite->exit)
suite->exit(test);
kunit_case_internal_cleanup(test);
+
+ /* Restore the thread's previous test context (probably NULL or test). */
+ current->kunit_test = old_current;
}
struct kunit_try_catch_context {
--
2.40.0.634.g4ca3ef3211-goog
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
Add support for integer type of accessing variable length array.
Add a selftest to check it.
Feng Zhou (2):
bpf: support access variable length array of integer type
selftests/bpf: Add test to access integer type of variable array
kernel/bpf/btf.c | 8 +++++---
.../selftests/bpf/bpf_testmod/bpf_testmod.c | 20 +++++++++++++++++++
.../selftests/bpf/prog_tests/tracing_struct.c | 2 ++
.../selftests/bpf/progs/tracing_struct.c | 13 ++++++++++++
4 files changed, 40 insertions(+), 3 deletions(-)
--
2.20.1
*Changes in v13*
- Rebase on top of next-20230414
- Give-up on using uffd_wp_range() and write new helpers, flush tlb only
once
*Changes in v12*
- Update and other memory types to UFFD_FEATURE_WP_ASYNC
- Rebaase on top of next-20230406
- Review updates
*Changes in v11*
- Rebase on top of next-20230307
- Base patches on UFFD_FEATURE_WP_UNPOPULATED
- Do a lot of cosmetic changes and review updates
- Remove ENGAGE_WP + !GET operation as it can be performed with
UFFDIO_WRITEPROTECT
*Changes in v10*
- Add specific condition to return error if hugetlb is used with wp
async
- Move changes in tools/include/uapi/linux/fs.h to separate patch
- Add documentation
*Changes in v9:*
- Correct fault resolution for userfaultfd wp async
- Fix build warnings and errors which were happening on some configs
- Simplify pagemap ioctl's code
*Changes in v8:*
- Update uffd async wp implementation
- Improve PAGEMAP_IOCTL implementation
*Changes in v7:*
- Add uffd wp async
- Update the IOCTL to use uffd under the hood instead of soft-dirty
flags
*Motivation*
The real motivation for adding PAGEMAP_SCAN IOCTL is to emulate Windows
GetWriteWatch() syscall [1]. The GetWriteWatch{} retrieves the addresses of
the pages that are written to in a region of virtual memory.
This syscall is used in Windows applications and games etc. This syscall is
being emulated in pretty slow manner in userspace. Our purpose is to
enhance the kernel such that we translate it efficiently in a better way.
Currently some out of tree hack patches are being used to efficiently
emulate it in some kernels. We intend to replace those with these patches.
So the whole gaming on Linux can effectively get benefit from this. It
means there would be tons of users of this code.
CRIU use case [2] was mentioned by Andrei and Danylo:
> Use cases for migrating sparse VMAs are binaries sanitized with ASAN,
> MSAN or TSAN [3]. All of these sanitizers produce sparse mappings of
> shadow memory [4]. Being able to migrate such binaries allows to highly
> reduce the amount of work needed to identify and fix post-migration
> crashes, which happen constantly.
Andrei's defines the following uses of this code:
* it is more granular and allows us to track changed pages more
effectively. The current interface can clear dirty bits for the entire
process only. In addition, reading info about pages is a separate
operation. It means we must freeze the process to read information
about all its pages, reset dirty bits, only then we can start dumping
pages. The information about pages becomes more and more outdated,
while we are processing pages. The new interface solves both these
downsides. First, it allows us to read pte bits and clear the
soft-dirty bit atomically. It means that CRIU will not need to freeze
processes to pre-dump their memory. Second, it clears soft-dirty bits
for a specified region of memory. It means CRIU will have actual info
about pages to the moment of dumping them.
* The new interface has to be much faster because basic page filtering
is happening in the kernel. With the old interface, we have to read
pagemap for each page.
*Implementation Evolution (Short Summary)*
From the definition of GetWriteWatch(), we feel like kernel's soft-dirty
feature can be used under the hood with some additions like:
* reset soft-dirty flag for only a specific region of memory instead of
clearing the flag for the entire process
* get and clear soft-dirty flag for a specific region atomically
So we decided to use ioctl on pagemap file to read or/and reset soft-dirty
flag. But using soft-dirty flag, sometimes we get extra pages which weren't
even written. They had become soft-dirty because of VMA merging and
VM_SOFTDIRTY flag. This breaks the definition of GetWriteWatch(). We were
able to by-pass this short coming by ignoring VM_SOFTDIRTY until David
reported that mprotect etc messes up the soft-dirty flag while ignoring
VM_SOFTDIRTY [5]. This wasn't happening until [6] got introduced. We
discussed if we can revert these patches. But we could not reach to any
conclusion. So at this point, I made couple of tries to solve this whole
VM_SOFTDIRTY issue by correcting the soft-dirty implementation:
* [7] Correct the bug fixed wrongly back in 2014. It had potential to cause
regression. We left it behind.
* [8] Keep a list of soft-dirty part of a VMA across splits and merges. I
got the reply don't increase the size of the VMA by 8 bytes.
At this point, we left soft-dirty considering it is too much delicate and
userfaultfd [9] seemed like the only way forward. From there onward, we
have been basing soft-dirty emulation on userfaultfd wp feature where
kernel resolves the faults itself when WP_ASYNC feature is used. It was
straight forward to add WP_ASYNC feature in userfautlfd. Now we get only
those pages dirty or written-to which are really written in reality. (PS
There is another WP_UNPOPULATED userfautfd feature is required which is
needed to avoid pre-faulting memory before write-protecting [9].)
All the different masks were added on the request of CRIU devs to create
interface more generic and better.
[1] https://learn.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-…
[2] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com
[3] https://github.com/google/sanitizers
[4] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
[5] https://lore.kernel.org/all/bfcae708-db21-04b4-0bbe-712badd03071@redhat.com
[6] https://lore.kernel.org/all/20220725142048.30450-1-peterx@redhat.com/
[7] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[8] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[9] https://lore.kernel.org/all/20230306213925.617814-1-peterx@redhat.com
[10] https://lore.kernel.org/all/20230125144529.1630917-1-mdanylo@google.com
* Original Cover letter from v8*
Hello,
Note:
Soft-dirty pages and pages which have been written-to are synonyms. As
kernel already has soft-dirty feature inside which we have given up to
use, we are using written-to terminology while using UFFD async WP under
the hood.
This IOCTL, PAGEMAP_SCAN on pagemap file can be used to get and/or clear
the info about page table entries. The following operations are
supported in this ioctl:
- Get the information if the pages have been written-to (PAGE_IS_WRITTEN),
file mapped (PAGE_IS_FILE), present (PAGE_IS_PRESENT) or swapped
(PAGE_IS_SWAPPED).
- Write-protect the pages (PAGEMAP_WP_ENGAGE) to start finding which
pages have been written-to.
- Find pages which have been written-to and write protect the pages
(atomic PAGE_IS_WRITTEN + PAGEMAP_WP_ENGAGE)
It is possible to find and clear soft-dirty pages entirely in userspace.
But it isn't efficient:
- The mprotect and SIGSEGV handler for bookkeeping
- The userfaultfd wp (synchronous) with the handler for bookkeeping
Some benchmarks can be seen here[1]. This series adds features that weren't
present earlier:
- There is no atomic get soft-dirty/Written-to status and clear present in
the kernel.
- The pages which have been written-to can not be found in accurate way.
(Kernel's soft-dirty PTE bit + sof_dirty VMA bit shows more soft-dirty
pages than there actually are.)
Historically, soft-dirty PTE bit tracking has been used in the CRIU
project. The procfs interface is enough for finding the soft-dirty bit
status and clearing the soft-dirty bit of all the pages of a process.
We have the use case where we need to track the soft-dirty PTE bit for
only specific pages on-demand. We need this tracking and clear mechanism
of a region of memory while the process is running to emulate the
getWriteWatch() syscall of Windows.
*(Moved to using UFFD instead of soft-dirtyi feature to find pages which
have been written-to from v7 patch series)*:
Stop using the soft-dirty flags for finding which pages have been
written to. It is too delicate and wrong as it shows more soft-dirty
pages than the actual soft-dirty pages. There is no interest in
correcting it [2][3] as this is how the feature was written years ago.
It shouldn't be updated to changed behaviour. Peter Xu has suggested
using the async version of the UFFD WP [4] as it is based inherently
on the PTEs.
So in this patch series, I've added a new mode to the UFFD which is
asynchronous version of the write protect. When this variant of the
UFFD WP is used, the page faults are resolved automatically by the
kernel. The pages which have been written-to can be found by reading
pagemap file (!PM_UFFD_WP). This feature can be used successfully to
find which pages have been written to from the time the pages were
write protected. This works just like the soft-dirty flag without
showing any extra pages which aren't soft-dirty in reality.
The information related to pages if the page is file mapped, present and
swapped is required for the CRIU project [5][6]. The addition of the
required mask, any mask, excluded mask and return masks are also required
for the CRIU project [5].
The IOCTL returns the addresses of the pages which match the specific
masks. The page addresses are returned in struct page_region in a compact
form. The max_pages is needed to support a use case where user only wants
to get a specific number of pages. So there is no need to find all the
pages of interest in the range when max_pages is specified. The IOCTL
returns when the maximum number of the pages are found. The max_pages is
optional. If max_pages is specified, it must be equal or greater than the
vec_size. This restriction is needed to handle worse case when one
page_region only contains info of one page and it cannot be compacted.
This is needed to emulate the Windows getWriteWatch() syscall.
The patch series include the detailed selftest which can be used as an
example for the uffd async wp test and PAGEMAP_IOCTL. It shows the
interface usages as well.
[1] https://lore.kernel.org/lkml/54d4c322-cd6e-eefd-b161-2af2b56aae24@collabora…
[2] https://lore.kernel.org/all/20221220162606.1595355-1-usama.anjum@collabora.…
[3] https://lore.kernel.org/all/20221122115007.2787017-1-usama.anjum@collabora.…
[4] https://lore.kernel.org/all/Y6Hc2d+7eTKs7AiH@x1n
[5] https://lore.kernel.org/all/YyiDg79flhWoMDZB@gmail.com/
[6] https://lore.kernel.org/all/20221014134802.1361436-1-mdanylo@google.com/
Regards,
Muhammad Usama Anjum
Muhammad Usama Anjum (4):
fs/proc/task_mmu: Implement IOCTL to get and optionally clear info
about PTEs
tools headers UAPI: Update linux/fs.h with the kernel sources
mm/pagemap: add documentation of PAGEMAP_SCAN IOCTL
selftests: mm: add pagemap ioctl tests
Peter Xu (1):
userfaultfd: UFFD_FEATURE_WP_ASYNC
Documentation/admin-guide/mm/pagemap.rst | 56 +
Documentation/admin-guide/mm/userfaultfd.rst | 35 +
fs/proc/task_mmu.c | 478 +++++++
fs/userfaultfd.c | 26 +-
include/linux/userfaultfd_k.h | 21 +-
include/uapi/linux/fs.h | 53 +
include/uapi/linux/userfaultfd.h | 9 +-
mm/hugetlb.c | 32 +-
mm/memory.c | 27 +-
tools/include/uapi/linux/fs.h | 53 +
tools/testing/selftests/mm/.gitignore | 1 +
tools/testing/selftests/mm/Makefile | 3 +-
tools/testing/selftests/mm/config | 1 +
tools/testing/selftests/mm/pagemap_ioctl.c | 1326 ++++++++++++++++++
tools/testing/selftests/mm/run_vmtests.sh | 4 +
15 files changed, 2102 insertions(+), 23 deletions(-)
create mode 100644 tools/testing/selftests/mm/pagemap_ioctl.c
mode change 100644 => 100755 tools/testing/selftests/mm/run_vmtests.sh
--
2.39.2
Add stackprotector support for all remaining architectures, except s390.
On s390 the stackprotectors are not supported in "global" mode; only
"sysreg" mode which is not suppored in nolibc.
The series also contains a small optimization to strace output during
execution of nolibc-test.
This series is based on the 20230415-nolibc-updates-4a branch of the
nolibc tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (6):
selftests/nolibc: reduce syscalls during space padding
tools/nolibc: riscv: add stackprotector support
tools/nolibc: aarch64: add stackprotector support
tools/nolibc: arm: add stackprotector support
tools/nolibc: loongarch: add stackprotector support
tools/nolibc: mips: add stackprotector support
tools/include/nolibc/arch-aarch64.h | 7 ++++++-
tools/include/nolibc/arch-arm.h | 7 ++++++-
tools/include/nolibc/arch-loongarch.h | 7 ++++++-
tools/include/nolibc/arch-mips.h | 8 +++++++-
tools/include/nolibc/arch-riscv.h | 7 ++++++-
tools/testing/selftests/nolibc/Makefile | 5 +++++
tools/testing/selftests/nolibc/nolibc-test.c | 15 +++++++++++----
7 files changed, 47 insertions(+), 9 deletions(-)
---
base-commit: e35214ea9db7477a02e67a8b412e8046534bb97c
change-id: 20230408-nolibc-stackprotector-archs-42244674616e
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
There was a report that the hardware breakpoints and watch points weren't
reporting the debug architecture version as expected, they were reporting
a version of 0 which is not defined in the architecture. This happens
when running in a KVM guest if the host has a debug architecture version
not supported by KVM, it in turn confuses GDB which rejects any debug
architecture version it does not know about.
Add a test that covers that situation and while we're at it reports the
debug architecture version and number of slots available to aid with
figuring out problems that may arise.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/arm64/abi/ptrace.c | 32 +++++++++++++++++++++++++++++-
1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/arm64/abi/ptrace.c b/tools/testing/selftests/arm64/abi/ptrace.c
index be952511af22..abe4d58d731d 100644
--- a/tools/testing/selftests/arm64/abi/ptrace.c
+++ b/tools/testing/selftests/arm64/abi/ptrace.c
@@ -20,7 +20,7 @@
#include "../../kselftest.h"
-#define EXPECTED_TESTS 7
+#define EXPECTED_TESTS 11
#define MAX_TPIDRS 2
@@ -132,6 +132,34 @@ static void test_tpidr(pid_t child)
}
}
+static void test_hw_debug(pid_t child, int type, const char *type_name)
+{
+ struct user_hwdebug_state state;
+ struct iovec iov;
+ int slots, arch, ret;
+
+ iov.iov_len = sizeof(state);
+ iov.iov_base = &state;
+
+ /* Should be able to read the values */
+ ret = ptrace(PTRACE_GETREGSET, child, type, &iov);
+ ksft_test_result(ret == 0, "read_%s\n", type_name);
+
+ if (ret == 0) {
+ /* Low 8 bits is the number of slots, next 4 bits the arch */
+ slots = state.dbg_info & 0xff;
+ arch = (state.dbg_info >> 8) & 0xf;
+
+ ksft_print_msg("%s version %d with %d slots\n", type_name,
+ arch, slots);
+
+ /* Zero is not currently architecturally valid */
+ ksft_test_result(arch, "%s_arch_set\n", type_name);
+ } else {
+ ksft_test_result_skip("%s_arch_set\n");
+ }
+}
+
static int do_child(void)
{
if (ptrace(PTRACE_TRACEME, -1, NULL, NULL))
@@ -207,6 +235,8 @@ static int do_parent(pid_t child)
ksft_print_msg("Parent is %d, child is %d\n", getpid(), child);
test_tpidr(child);
+ test_hw_debug(child, NT_ARM_HW_WATCH, "NT_ARM_HW_WATCH");
+ test_hw_debug(child, NT_ARM_HW_BREAK, "NT_ARM_HW_BREAK");
ret = EXIT_SUCCESS;
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230414-arm64-test-hw-breakpoint-83fe02f607fc
Best regards,
--
Mark Brown <broonie(a)kernel.org>
This is a follow-up to the kunit_defer() parts of 'KUnit device API
proposal'[1], with a number of changes suggested by Matti Vaittinen,
Maxime Ripard and Benjamin Berg.
Most notably, kunit_defer() has been renamed to kunit_add_action(), in
order to match the equivalent devres API[2]. Likewise:
kunit_defer_cancel() has become kunit_remove_action(), and
kunit_defer_trigger() has become kunit_release_action().
The _token() versions of these APIs remain, for the moment, even though
they're a bit more awkward and less useful, as they have two advantages:
1. They're faster, as the action doesn't need to be looked up.
2. They provide more flexibility in the ordering of actions in cases
where several identical actions are interleaved with other, different
actions.
Similarly, the internal_gfp argument remains for now, as this is useful
in implementing kunit_kalloc() and similar.
The implementation now uses a single allocation for both the
kunit_resource and the kunit_action_ctx (previously kunit_defer_ctx).
The 'cancellation token' is now of type 'struct kunit_action_ctx',
instead of void*.
Tests have been added to the kunit-resource-test suite which exercise
this functionality. Similarly, the kunit executor tests and
kunit allocation functions have been updated to make use of this API.
I'd love to hear any further thoughts!
Cheers,
-- David
[1]: https://lore.kernel.org/linux-kselftest/20230325043104.3761770-1-davidgow@g…
[2]: https://docs.kernel.org/driver-api/basics.html#c.devm_add_action
David Gow (3):
kunit: Add kunit_add_action() to defer a call until test exit
kunit: executor_test: Use kunit_add_action()
kunit: kmalloc_array: Use kunit_add_action()
include/kunit/resource.h | 89 +++++++++++++++++++++++++++
lib/kunit/executor_test.c | 12 ++--
lib/kunit/kunit-test.c | 123 +++++++++++++++++++++++++++++++++++++-
lib/kunit/resource.c | 99 ++++++++++++++++++++++++++++++
lib/kunit/test.c | 48 +++------------
5 files changed, 323 insertions(+), 48 deletions(-)
--
2.40.0.348.gf938b09366-goog
On 16.04.23 00:59, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test to ksm_tests
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add two functions for debugging merge outcome for ksm_tests
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm. The debug output is enabled with the
> -d option.
>
> 4) add KSM prctl test to ksm_functional_tests
>
> This adds a test to the ksm_functional_test that verifies that the
> prctl system call to enable / disable KSM works.
>
> 5) add KSM fork test to ksm_functional_test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
Thanks!
Acked-by: David Hildenbrand <david(a)redhat.com>
--
Thanks,
David / dhildenb
Thanks for moving the functional tests. Some more feedback forksm_functional_tests change. Writing tests in the
ksft testing framework can be a bit "special".
I'm seeing some weird test failures due to
prctl(PR_GET_MEMORY_MERGE, 0)
Apparently, these go away when using
prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0)
to explicitly force the other values to 0. Most probably, we should do that
for PR_SET_MEMORY_MERGE as well (especially if we check for the arguments as
well).
[...]
> @@ -15,8 +15,10 @@
> #include <errno.h>
> #include <fcntl.h>
> #include <sys/mman.h>
> +#include <sys/prctl.h>
> #include <sys/syscall.h>
> #include <sys/ioctl.h>
> +#include <sys/wait.h>
> #include <linux/userfaultfd.h>
>
> #include "../kselftest.h"
> @@ -326,9 +328,80 @@ static void test_unmerge_uffd_wp(void)
> }
> #endif
>
> +/* Verify that KSM can be enabled / queried with prctl. */
> +static void test_ksm_prctl(void)
Maybe call this "test_prctl", because after all, these are all KSM tests.
> +{
> + bool ret = false;
> + int is_on;
> + int is_off;
> +
> + ksft_print_msg("[RUN] %s\n", __func__);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + goto out;
> + }
> +
> + is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + goto out;
> + }
> +
> + is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> + if (is_on && is_off)
> + ret = true;
> +
> +out:
> + ksft_test_result(ret, "prctl get / set\n");
The test fails if the kernel does not support PR_SET_MEMORY_MERGE.
I'd modify this test to:
(1) skip if the first PR_SET_MEMORY_MERGE=1 failed with EINVAL.
(2) distinguish for PR_GET_MEMORY_MERGE whether it returned an error or
whether it returned a wrong value. Feel free to keep that as is, whatever
you prefer.
(3) exit early for all failures, you get exactly one expected skip/pass/fail for the
test and use specific test failure messages.
(4) Pass "0" for all other arguments of prctl.
Something like:
static void test_prctl(void)
{
int ret;
ksft_print_msg("[RUN] %s\n", __func__);
ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
if (ret < 0 && errno == EINVAL){
ksft_test_result_skip("PR_SET_MEMORY_MERGE not supported\n");
return;
} else if (ret) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 failed\n");
return;
}
ret = prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret < 0) {
ksft_test_result_fail("PR_GET_MEMORY_MERGE failed\n");
return;
} else if (ret != 1) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 not effective\n");
return;
}
ret = prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret){
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 failed\n");
return;
}
ret = prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0);
if (ret < 0) {
ksft_test_result_fail("PR_GET_MEMORY_MERGE failed\n");
return;
} else if (ret != 0) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 not effective\n");
return;
}
ksft_test_result_pass("Setting/clearing PR_SET_MEMORY_MERGE works\n");
}
> +}
> +
> +/* Verify that prctl ksm flag is inherited. */
> +static void test_ksm_fork(void)
Maybe call it "test_prctl_fork"
> +{
> + int status;
> + bool ret = false;
> + pid_t child_pid;
> +
> + ksft_print_msg("[RUN] %s\n", __func__);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + ksft_test_result_fail("prctl failed\n");
> + goto out;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on =
> +
> + if (!is_on)
> + exit(-1);
> +
> + exit(0);
> + }
> +
> + if (child_pid < 0) {
> + ksft_test_result_fail("child pid < 0\n");
> + goto out;> +
> + if (waitpid(child_pid, &status, 0) < 0 || WEXITSTATUS(status) != 0) {
> + ksft_test_result_fail("wait pid < 0\n");
> + goto out;
> + }
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0))
> + ksft_test_result_fail("prctl 2 failed\n");
> + else
> + ret = true;
> +
> +out:
> + ksft_test_result(ret, "ksm_flag is inherited\n");
> +}
Again, test fails if kernel support is not around.
I'd modify this test to:
(1) skip if the first PR_SET_MEMORY_MERGE=1 failed with EINVAL just as in the other test.
(2) Use a simple exit(prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0)); in the child.
(3) exit early for all failures, you get exactly one expected skip/pass/fail for the
test and use specific test failure messages.
(4) Split up the waitpid() check to test what failed.
(5) Pass "0" for all other arguments of prctl.
Something like:
static void test_prctl_fork(void)
{
int ret, status;
pid_t child_pid;
ksft_print_msg("[RUN] %s\n", __func__);
ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0);
if (ret < 0 && errno == EINVAL){
ksft_test_result_skip("PR_SET_MEMORY_MERGE not supported\n");
return;
} else if (ret) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=1 failed\n");
return;
}
child_pid = fork();
if (!child_pid) {
exit(prctl(PR_GET_MEMORY_MERGE, 0, 0, 0, 0));
} else if (child_pid < 0) {
ksft_test_result_fail("fork() failed\n");
return;
}
if (waitpid(child_pid, &status, 0) < 0) {
ksft_test_result_fail("waitpid() failed\n");
return;
} else if (WEXITSTATUS(status) != 1) {
ksft_test_result_fail("unexpected PR_GET_MEMORY_MERGE result in child\n");
return;
}
if (prctl(PR_SET_MEMORY_MERGE, 0, 0, 0, 0)) {
ksft_test_result_fail("PR_SET_MEMORY_MERGE=0 failed\n");
return;
}
ksft_test_result_pass("PR_SET_MEMORY_MERGE value is inherited\n");
}
> +
> int main(int argc, char **argv)
> {
> - unsigned int tests = 2;
> + unsigned int tests = 6;
Assuming you execute exactly one ksft_test_result_skip/fail/pass on every path of your two
test, this would become "4".
> int err;
>
> #ifdef __NR_userfaultfd
> @@ -358,6 +431,8 @@ int main(int argc, char **argv)
> #ifdef __NR_userfaultfd
> test_unmerge_uffd_wp();
> #endif
> + test_ksm_prctl();
> + test_ksm_fork();
>
With above outlined changes (feel free to integrate what you consider valuable),
on an older kernel I get:
$ sudo ./ksm_functional_tests
TAP version 13
1..5
# [RUN] test_unmerge
ok 1 Pages were unmerged
# [RUN] test_unmerge_discarded
ok 2 Pages were unmerged
# [RUN] test_unmerge_uffd_wp
ok 3 Pages were unmerged
# [RUN] test_prctl
ok 4 # SKIP PR_SET_MEMORY_MERGE not supported
# [RUN] test_prctl_fork
ok 5 # SKIP PR_SET_MEMORY_MERGE not supported
# Totals: pass:3 fail:0 xfail:0 xpass:0 skip:2 error:0
On a kernel with your patch #1:
# ./ksm_functional_tests
TAP version 13
1..5
# [RUN] test_unmerge
ok 1 Pages were unmerged
# [RUN] test_unmerge_discarded
ok 2 Pages were unmerged
# [RUN] test_unmerge_uffd_wp
ok 3 Pages were unmerged
# [RUN] test_prctl
ok 4 Setting/clearing PR_SET_MEMORY_MERGE works
# [RUN] test_prctl_fork
ok 5 PR_SET_MEMORY_MERGE value is inherited
# Totals: pass:5 fail:0 xfail:0 xpass:0 skip:0 error:0
> err = ksft_get_fail_cnt();
> if (err)
> diff --git a/tools/testing/selftests/mm/ksm_tests.c b/tools/testing/selftests/mm/ksm_tests.c
> index f9eb4d67e0dd..35b3828d44b4 100644
> --- a/tools/testing/selftests/mm/ksm_tests.c
> +++ b/tools/testing/selftests/mm/ksm_tests.c
> @@ -1,6 +1,8 @@
> // SPDX-License-Identifier: GPL-2.0
[...]
Changes to ksm_tests mostly look good. Two comments:
> - if (ksm_merge_pages(map_ptr, page_size * page_count, start_time, timeout))
> + if (ksm_merge_pages(merge_type, map_ptr, page_size * page_count, start_time, timeout))
> goto err_out;
>
> /* verify that the right number of pages are merged */
> if (assert_ksm_pages_count(page_count)) {
> printf("OK\n");
> - munmap(map_ptr, page_size * page_count);
> + if (merge_type == KSM_MERGE_MADVISE)
> + munmap(map_ptr, page_size * page_count);
> + else if (merge_type == KSM_MERGE_PRCTL)
> + prctl(PR_SET_MEMORY_MERGE, 0);
Are you sure that we don't want to unmap here? I'd assume we want to unmap in either way.
[...]
> + case 'd':
> + debug = 1;
> + break;
> case 's':
> size_MB = atoi(optarg);
> if (size_MB <= 0) {
> printf("Size must be greater than 0\n");
> return KSFT_FAIL;
> }
> + case 't':
> + {
> + int tmp = atoi(optarg);
> +
> + if (tmp < 0 || tmp > KSM_MERGE_LAST) {
> + printf("Invalid merge type\n");
> + return KSFT_FAIL;
> + }
> + merge_type = atoi(optarg);
You can simply reuse tmp
merge_type = tmp;
--
Thanks,
David / dhildenb
Patch 1 makes a function static because it is only used in one file.
Patch 2 adds info about the git trees we use to help occasional devs.
Patch 3 removes an unused variable.
Patch 4 removes duplicated entries from the help menu of a tool used in
MPTCP selftests.
Patch 5 removes some ShellCheck warnings in mptcp_join.sh selftest.
Only very minor improvements then.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Geliang Tang (1):
mptcp: make userspace_pm_append_new_local_addr static
Matthieu Baerts (4):
MAINTAINERS: add git trees for MPTCP
mptcp: remove unused 'remaining' variable
selftests: mptcp: remove duplicated entries in usage
selftests: mptcp: join: fix ShellCheck warnings
MAINTAINERS | 2 ++
net/mptcp/options.c | 7 ++-----
net/mptcp/pm_userspace.c | 4 ++--
net/mptcp/protocol.h | 2 --
tools/testing/selftests/net/mptcp/mptcp_connect.c | 8 ++++----
tools/testing/selftests/net/mptcp/mptcp_join.sh | 10 ++++++++--
6 files changed, 18 insertions(+), 15 deletions(-)
---
base-commit: c11d2e718c792468e67389b506451eddf26c2dac
change-id: 20230414-upstream-net-next-20230414-mptcp-small-cleanups-1cba986990b1
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
The existing selftest suite for openvswitch will work for regression
testing the datapath feature bits, but won't test things like adding
interfaces, or the upcall interface. Here, we add some additional
test facilities.
First, extend the ovs-dpctl.py python module to support the OVS_FLOW
and OVS_PACKET netlink families, with some associated messages. These
can be extended over time, but the initial support is for more well
known cases (output, userspace, and CT).
Next, extend the test suite to test upcalls by adding a datapath,
monitoring the upcall socket associated with the datapath, and then
dumping any upcalls that are received. Compare with expected ARP
upcall via arping.
Aaron Conole (3):
selftests: openvswitch: add interface support
selftests: openvswitch: add flow dump support
selftests: openvswitch: add support for upcall testing
.../selftests/net/openvswitch/openvswitch.sh | 89 +-
.../selftests/net/openvswitch/ovs-dpctl.py | 1276 ++++++++++++++++-
2 files changed, 1349 insertions(+), 16 deletions(-)
--
2.39.2
From: Chuck Lever <chuck.lever(a)oracle.com>
Circumvent the .gitignore wildcard to avoid warnings about ignored
.kunitconfig files. As far as I can tell, the warnings are harmless
and these files are not actually ignored.
Reported-by: kernel test robot <lkp(a)intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304142337.jc4oUrov-lkp@intel.com/
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
.gitignore | 1 +
1 file changed, 1 insertion(+)
diff --git a/.gitignore b/.gitignore
index 70ec6037fa7a..51117ba29c88 100644
--- a/.gitignore
+++ b/.gitignore
@@ -105,6 +105,7 @@ modules.order
!.gitignore
!.mailmap
!.rustfmt.toml
+!.kunitconfig
#
# Generated include files
From: Jinrong Liang <cloudliang(a)tencent.com>
Hi,
This patch set adds some tests to ensure consistent PMU performance event
filter behavior. Specifically, the patches aim to improve KVM's PMU event
filter by strengthening the test coverage, adding documentation, and making
other small changes.
The first patch replaces int with uint32_t for nevents to ensure consistency
and readability in the code. The second patch adds fixed_counter_bitmap to
create_pmu_event_filter() to support the use of the same creator to control
the use of guest fixed counters. The third patch adds test cases for
unsupported input values in PMU filter, including unsupported "action"
values, unsupported "flags" values, and unsupported "nevents" values. Also,
it tests setting non-existent fixed counters in the fixed bitmap doesn't
fail.
The fourth patch updates the documentation for KVM_SET_PMU_EVENT_FILTER ioctl
to include a detailed description of how fixed performance events are handled
in the pmu filter. The fifth patch adds tests to cover that pmu_event_filter
works as expected when applied to fixed performance counters, even if there
is no fixed counter exists. The sixth patch adds a test to ensure that setting
both generic and fixed performance event filters does not affect the consistency
of the fixed performance filter behavior in KVM. The seventh patch adds a test
to verify the behavior of the pmu event filter when an incomplete
kvm_pmu_event_filter structure is used.
These changes help to ensure that KVM's PMU event filter functions as expected
in all supported use cases. These patches have been tested and verified to
function properly.
Thanks for your review and feedback.
Sincerely,
Jinrong Liang
Jinrong Liang (7):
KVM: selftests: Replace int with uint32_t for nevents
KVM: selftests: Apply create_pmu_event_filter() to fixed ctrs
KVM: selftests: Test unavailable event filters are rejected
KVM: x86/pmu: Add documentation for fixed ctr on PMU filter
KVM: selftests: Check if pmu_event_filter meets expectations on fixed
ctrs
KVM: selftests: Check gp event filters without affecting fixed event
filters
KVM: selftests: Test pmu event filter with incompatible
kvm_pmu_event_filter
Documentation/virt/kvm/api.rst | 21 ++
.../kvm/x86_64/pmu_event_filter_test.c | 239 ++++++++++++++++--
2 files changed, 243 insertions(+), 17 deletions(-)
base-commit: a25497a280bbd7bbcc08c87ddb2b3909affc8402
--
2.31.1
This series replaces the C99 compatibility patch. (See v1 link below).
After the discussion about support C99 and/or GNU89 I came to the
conclusion supporting straight C89 is not very hard.
Instead of validating both C99 and GNU89 in some awkward way only for
somebody requesting true C89 support let's just do it this way.
Feel free to squash all the comment syntax patches together if you
prefer.
All changes in this series are cosmetic only.
To: Willy Tarreau <w(a)1wt.eu>
To: Shuah Khan <shuah(a)kernel.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-kselftest(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
This series is based on the "dev" branch of the RCU tree.
---
Changes in v2:
- Target C89 instead of C99
- Link to v1: https://lore.kernel.org/r/20230328-nolibc-c99-v1-1-a8302fb19f19@weissschuh.…
---
Thomas Weißschuh (11):
tools/nolibc: use standard __asm__ statements
tools/nolibc: use __inline__ syntax
tools/nolibc: i386: use C89 comment syntax
tools/nolibc: x86_64: use C89 comment syntax
tools/nolibc: riscv: use C89 comment syntax
tools/nolibc: aarch64: use C89 comment syntax
tools/nolibc: arm: use C89 comment syntax
tools/nolibc: mips: use C89 comment syntax
tools/nolibc: loongarch: use C89 comment syntax
tools/nolibc: use C89 comment syntax
tools/nolibc: validate C89 compatibility
tools/include/nolibc/arch-aarch64.h | 32 ++++++++--------
tools/include/nolibc/arch-arm.h | 42 ++++++++++-----------
tools/include/nolibc/arch-i386.h | 40 ++++++++++----------
tools/include/nolibc/arch-loongarch.h | 38 +++++++++----------
tools/include/nolibc/arch-mips.h | 56 ++++++++++++++--------------
tools/include/nolibc/arch-riscv.h | 40 ++++++++++----------
tools/include/nolibc/arch-x86_64.h | 34 ++++++++---------
tools/include/nolibc/stackprotector.h | 4 +-
tools/include/nolibc/stdlib.h | 18 ++++-----
tools/include/nolibc/string.h | 4 +-
tools/include/nolibc/sys.h | 8 ++--
tools/testing/selftests/nolibc/Makefile | 2 +-
tools/testing/selftests/nolibc/nolibc-test.c | 14 +++----
13 files changed, 166 insertions(+), 166 deletions(-)
---
base-commit: bd5b341f0f69eb4c958ffd48699213c5b9af8145
change-id: 20230328-nolibc-c99-59f44ea45636
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Due to the lack of the SKIP directive in the output, if any of the
parameterized test was skipped, the parser could not recognize that
correctly and was marking the test as PASSED.
This can easily be seen by running the new subtest from patch 1:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [PASSED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 3
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params* \
--raw_output
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
After adding the SKIP directive, the report looks as expected:
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [SKIPPED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 2, skipped: 1
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0 # SKIP unsupported param value
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
v2: better align with future support for arbitrary levels of testing
Cc: David Gow <davidgow(a)google.com>
Cc: Rae Moar <rmoar(a)google.com>
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update kunit_print_ok_not_ok function
include/kunit/test.h | 1 +
lib/kunit/kunit-example-test.c | 34 +++++++++++++++++++++++++++
lib/kunit/test.c | 43 ++++++++++++++++++++++------------
3 files changed, 63 insertions(+), 15 deletions(-)
--
2.25.1
Building bpf selftests with clang can trigger errors like the following:
CLNG-BPF [test_maps] bpf_iter_netlink.bpf.o
progs/bpf_iter_netlink.c:32:4: error: incompatible pointer types assigning to 'struct sock *' from 'struct sock___17 *' [-Werror,-Wincompatible-pointer-types]
s = &nlk->sk;
^ ~~~~~~~~
1 error generated.
This is due to the fact that bpftool emits duplicate data types with
different names in vmlinux.h (i.e., `struct sock` in this case) and
these types, despite having a different name, represent in fact the same
object.
Add -Wno-incompatible-pointer-types to CLANG_CLAGS to prevent these
errors.
Signed-off-by: Andrea Righi <andrea.righi(a)canonical.com>
---
tools/testing/selftests/bpf/Makefile | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index b677dcd0b77a..0d9ef819a065 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -356,7 +356,8 @@ BPF_CFLAGS = -g -Werror -D__TARGET_ARCH_$(SRCARCH) $(MENDIAN) \
-I$(abspath $(OUTPUT)/../usr/include)
CLANG_CFLAGS = $(CLANG_SYS_INCLUDES) \
- -Wno-compare-distinct-pointer-types
+ -Wno-compare-distinct-pointer-types \
+ -Wno-incompatible-pointer-types
$(OUTPUT)/test_l4lb_noinline.o: BPF_CFLAGS += -fno-inline
$(OUTPUT)/test_xdp_noinline.o: BPF_CFLAGS += -fno-inline
--
2.39.2
An error snuck in between two recent conflicting changes:
Until recently ->setup() used negative values to indicate
normal test termination. This was changed in
commit fa10366cc6f4 ("selftests/resctrl: Allow ->setup() to return
errors") that transitioned ->setup() to use negative values
to indicate errors and a new END_OF_TESTS to indicate normal
termination.
commit 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100%
allocation on first run in MBM test") continued to use
negative return to indicate normal test termination.
Fix mbm_setup() to use the new END_OF_TESTS to indicate
error-free test termination.
Fixes: 42e3b093eb7c ("selftests/resctrl: Fix set up schemata with 100% allocation on first run in MBM test")
Reported-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Link: https://lore.kernel.org/lkml/bb65cce8-54d7-68c5-ef19-3364ec95392a@linux.int…
Signed-off-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
Hi Shuah,
Apologies, this error snuck in between the two series
merged into kselftest's next this week.
tools/testing/selftests/resctrl/mbm_test.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 146132fa986d..538d35a6485a 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
--
2.34.1
Hello Reinette,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on based on the "next" branch of
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
[v8] https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
There is a spelling mistake in an error message string. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/mm/uffd-common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/mm/uffd-common.c b/tools/testing/selftests/mm/uffd-common.c
index 61c6250adf93..54dfd92d86cf 100644
--- a/tools/testing/selftests/mm/uffd-common.c
+++ b/tools/testing/selftests/mm/uffd-common.c
@@ -311,7 +311,7 @@ int uffd_test_ctx_init(uint64_t features, const char **errmsg)
ret = userfaultfd_open(&features);
if (ret) {
if (errmsg)
- *errmsg = "possible lack of priviledge";
+ *errmsg = "possible lack of privilege";
return ret;
}
--
2.30.2
Due to the lack of the SKIP directive in the output, if any of the
parameterized test was skipped, the parser could not recognize that
correctly and was marking the test as PASSED.
This can easily be seen by running the new subtest from patch 1:
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params*
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [PASSED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 3
$ ./tools/testing/kunit/kunit.py run \
--kunitconfig ./lib/kunit/.kunitconfig *.example_params* \
--raw_output
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
After adding the SKIP directive, the report looks as expected:
[ ] Starting KUnit Kernel (1/1)...
[ ] ============================================================
[ ] =================== example (1 subtest) ====================
[ ] =================== example_params_test ===================
[ ] [PASSED] example value 2
[ ] [PASSED] example value 1
[ ] [SKIPPED] example value 0
[ ] =============== [PASSED] example_params_test ===============
[ ] ===================== [PASSED] example =====================
[ ] ============================================================
[ ] Testing complete. Ran 3 tests: passed: 2, skipped: 1
[ ] Starting KUnit Kernel (1/1)...
KTAP version 1
1..1
# example: initializing suite
KTAP version 1
# Subtest: example
1..1
KTAP version 1
# Subtest: example_params_test
# example_params_test: initializing
ok 1 example value 2
# example_params_test: initializing
ok 2 example value 1
# example_params_test: initializing
ok 3 example value 0 # SKIP unsupported param value
# example_params_test: pass:2 fail:0 skip:1 total:3
ok 1 example_params_test
# Totals: pass:2 fail:0 skip:1 total:3
ok 1 example
Cc: David Gow <davidgow(a)google.com>
Michal Wajdeczko (3):
kunit/test: Add example test showing parameterized testing
kunit: Fix reporting of the skipped parameterized tests
kunit: Update reporting function to support results from subtests
lib/kunit/kunit-example-test.c | 34 ++++++++++++++++++++++++++++++++++
lib/kunit/test.c | 26 +++++++++++++++++---------
2 files changed, 51 insertions(+), 9 deletions(-)
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before map is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
When we added fd based file streams we created references to STx_FILENO in
stdio.h but these constants are declared in unistd.h which is the last file
included by the top level nolibc.h meaning those constants are not defined
when we try to build stdio.h. This causes programs using nolibc.h to fail
to build.
Reorder the headers to avoid this issue.
Fixes: d449546c957f ("tools/nolibc: implement fd-based FILE streams")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/include/nolibc/nolibc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/include/nolibc/nolibc.h b/tools/include/nolibc/nolibc.h
index 04739a6293c4..05a228a6ee78 100644
--- a/tools/include/nolibc/nolibc.h
+++ b/tools/include/nolibc/nolibc.h
@@ -99,11 +99,11 @@
#include "sys.h"
#include "ctype.h"
#include "signal.h"
+#include "unistd.h"
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
#include "time.h"
-#include "unistd.h"
#include "stackprotector.h"
/* Used by programs to avoid std includes */
---
base-commit: 7d8214bba44c1aa6a75921a09a691945d26a8d43
change-id: 20230413-nolibc-stdio-fix-fb42de39d099
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 12.04.23 05:16, Stefan Roesch wrote:
> This adds three new tests to the selftests for KSM. These tests use the
> new prctl API's to enable and disable KSM.
>
> 1) add new prctl flags to prctl header file in tools dir
>
> This adds the new prctl flags to the include file prct.h in the
> tools directory. This makes sure they are available for testing.
>
> 2) add KSM prctl merge test
>
> This adds the -t option to the ksm_tests program. The -t flag
> allows to specify if it should use madvise or prctl ksm merging.
>
> 3) add KSM get merge type test
>
> This adds the -G flag to the ksm_tests program to query the KSM
> status with prctl after KSM has been enabled with prctl.
>
> 4) add KSM fork test
>
> Add fork test to verify that the MMF_VM_MERGE_ANY flag is inherited
> by the child process.
>
> 5) add two functions for debugging merge outcome
>
> This adds two functions to report the metrics in /proc/self/ksm_stat
> and /sys/kernel/debug/mm/ksm.
>
> The debugging can be enabled with the following command line:
> make -C tools/testing/selftests TARGETS="mm" --keep-going \
> EXTRA_CFLAGS=-DDEBUG=1
Would it make sense to instead have a "-D" (if still unused) runtime
options to print this data? Dead code that's not compiled is a bit
unfortunate as it can easily bit-rot.
This patch essentially does two things
1) Add the option to run all tests/benchmarks with the PRCTL instead of
MADVISE
2) Add some functional KSM tests for the new PRCTL (fork, enabling
works, disabling works).
The latter should rather go into ksm_functional_tests().
[...]
>
> -static int check_ksm_unmerge(int mapping, int prot, int timeout, size_t page_size)
> +/* Verify that prctl ksm flag is inherited. */
> +static int check_ksm_fork(void)
> +{
> + int rc = KSFT_FAIL;
> + pid_t child_pid;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl");
> + return KSFT_FAIL;
> + }
> +
> + child_pid = fork();
> + if (child_pid == 0) {
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (!is_on)
> + exit(KSFT_FAIL);
> +
> + exit(KSFT_PASS);
> + }
> +
> + if (child_pid < 0)
> + goto out;
> +
> + if (waitpid(child_pid, &rc, 0) < 0)
> + rc = KSFT_FAIL;
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl");
> + rc = KSFT_FAIL;
> + }
> +
> +out:
> + if (rc == KSFT_PASS)
> + printf("OK\n");
> + else
> + printf("Not OK\n");
> +
> + return rc;
> +}
> +
> +static int check_ksm_get_merge_type(void)
> +{
> + if (prctl(PR_SET_MEMORY_MERGE, 1)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_on = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (prctl(PR_SET_MEMORY_MERGE, 0)) {
> + perror("prctl set");
> + return 1;
> + }
> +
> + int is_off = prctl(PR_GET_MEMORY_MERGE, 0);
> +
> + if (is_on && is_off) {
> + printf("OK\n");
> + return KSFT_PASS;
> + }
> +
> + printf("Not OK\n");
> + return KSFT_FAIL;
> +}
Yes, these two are better located in ksm_functional_tests() to just run
them both automatically when the test is executed.
--
Thanks,
David / dhildenb
Hi Shuah and kselftest team,
There are a couple of resctrl selftest patches that are ready for inclusion. They have been percolating on the list for a while without expecting more feedback. All have "Reviewed-by" tags from at least one reviewer. Could you please consider including them into the kselftest repo? There is one minor merge conflict between two of the series for which the snippet below shows resolution.
[PATCH v8 0/6] Some improvements of resctrl selftest
https://lore.kernel.org/lkml/20230215083230.3155897-1-tan.shaopeng@jp.fujit…
[PATCH v2 0/9] selftests/resctrl: Fixes to error handling logic and cleanups
https://lore.kernel.org/lkml/20230215130605.31583-1-ilpo.jarvinen@linux.int…
[PATCH] selftests/resctrl: Use correct exit code when tests fail
https://lore.kernel.org/lkml/20230309145757.2280518-1-peternewman@google.co…
The snippet below shows resolution of the merge conflict between the
first and second series:
diff --git a/tools/testing/selftests/resctrl/mbm_test.c b/tools/testing/selftests/resctrl/mbm_test.c
index 040ca1f9c173..775f9e542ff6 100644
--- a/tools/testing/selftests/resctrl/mbm_test.c
+++ b/tools/testing/selftests/resctrl/mbm_test.c
@@ -98,7 +98,7 @@ static int mbm_setup(int num, ...)
/* Run NUM_OF_RUNS times */
if (p->num_of_runs >= NUM_OF_RUNS)
- return -1;
+ return END_OF_TESTS;
/* Set up shemata with 100% allocation on the first run. */
if (p->num_of_runs == 0)
Thank you very much.
Reinette
Patch 1 avoids scheduling the MPTCP worker on a closed socket on some
edge cases. It fixes issues that can be visible from v5.11.
Patch 2 makes sure the MPTCP worker doesn't try to manipulate
disconnected sockets. This is also a fix for an issue that can be
visible from v5.11.
Patch 3 fixes a NULL pointer dereference when MPTCP FastOpen is used
and an early fallback is done. A fix for v6.2.
Patch 4 improves the stability of the userspace PM selftest for a
subtest added in v6.2.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (1):
selftests: mptcp: userspace pm: uniform verify events
Paolo Abeni (3):
mptcp: use mptcp_schedule_work instead of open-coding it
mptcp: stricter state check in mptcp_worker
mptcp: fix NULL pointer dereference on fastopen early fallback
net/mptcp/fastopen.c | 11 +++++++++--
net/mptcp/options.c | 5 ++---
net/mptcp/protocol.c | 2 +-
net/mptcp/subflow.c | 18 ++++++------------
tools/testing/selftests/net/mptcp/userspace_pm.sh | 2 ++
5 files changed, 20 insertions(+), 18 deletions(-)
---
base-commit: a4506722dc39ca840593f14e3faa4c9ba9408211
change-id: 20230411-upstream-net-20230411-mptcp-fixes-db47f50c2688
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Nested translation is a hardware feature that is supported by many modern
IOMMU hardwares. It has two stages (stage-1, stage-2) address translation
to get access to the physical address. stage-1 translation table is owned
by userspace (e.g. by a guest OS), while stage-2 is owned by kernel. Changes
to stage-1 translation table should be followed by an IOTLB invalidation.
Take Intel VT-d as an example, the stage-1 translation table is I/O page
table. As the below diagram shows, guest I/O page table pointer in GPA
(guest physical address) is passed to host and be used to perform the stage-1
address translation. Along with it, modifications to present mappings in the
guest I/O page table should be followed with an IOTLB invalidation.
.-------------. .---------------------------.
| vIOMMU | | Guest I/O page table |
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush --+
'-------------' |
| | V
| | I/O page table pointer in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .------------------------.
| pIOMMU | | FS for GIOVA->GPA |
| | '------------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.----------------------------------.
| | | SS for GPA->HPA, unmanaged domain|
| | '----------------------------------'
'-------------'
Where:
- FS = First stage page tables
- SS = Second stage page tables
<Intel VT-d Nested translation>
In IOMMUFD, all the translation tables are tracked by hw_pagetable (hwpt)
and each has an iommu_domain allocated from iommu driver. So in this series
hw_pagetable and iommu_domain means the same thing if no special note.
IOMMUFD has already supported allocating hw_pagetable that is linked with
an IOAS. However, nesting requires IOMMUFD to allow allocating hw_pagetable
with driver specific parameters and interface to sync stage-1 IOTLB as user
owns the stage-1 translation table.
This series is based on the iommu hw info reporting series [1]. It first
introduces new iommu op for allocating domains with user data and the op
for syncing stage-1 IOTLB, and then extend the IOMMUFD internal infrastructure
to accept user_data and parent hwpt, then relay the data to iommu core to
allocate iommu_domain. After it, extend the ioctl IOMMU_HWPT_ALLOC to accept
user data and stage-2 hwpt ID to allocate hwpt. Along with it, ioctl
IOMMU_HWPT_INVALIDATE is added to invalidate stage-1 IOTLB. This is needed
for user-managed hwpts. ioctl IOMMU_DEVICE_GET_HW_INFO is extended to report
the supported hwpt types bitmap to user. Selftest is added as well to cover
the new ioctls.
Complete code can be found in [2], QEMU could can be found in [3].
At last, this is a team work together with Nicolin Chen, Lu Baolu. Thanks
them for the help. ^_^. Look forward to your feedbacks.
base-commit: 3dfe670c94c7fc4af42e5c08cdd8a110b594e18e
[1] https://lore.kernel.org/linux-iommu/20230309075358.571567-1-yi.l.liu@intel.…
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting
Thanks,
Yi Liu
Lu Baolu (2):
iommu: Add new iommu op to create domains owned by userspace
iommu: Add nested domain support
Nicolin Chen (5):
iommufd/hw_pagetable: Do not populate user-managed hw_pagetables
iommufd/selftest: Add domain_alloc_user() support in iommu mock
iommufd/selftest: Add coverage for IOMMU_HWPT_ALLOC with user data
iommufd/selftest: Add IOMMU_TEST_OP_MD_CHECK_IOTLB test op
iommufd/selftest: Add coverage for IOMMU_HWPT_INVALIDATE ioctl
Yi Liu (5):
iommufd/hw_pagetable: Use domain_alloc_user op for domain allocation
iommufd: Pass parent hwpt and user_data to
iommufd_hw_pagetable_alloc()
iommufd: IOMMU_HWPT_ALLOC allocation with user data
iommufd: Add IOMMU_HWPT_INVALIDATE
iommufd/device: Report supported hwpt_types
drivers/iommu/iommufd/device.c | 9 +-
drivers/iommu/iommufd/hw_pagetable.c | 242 +++++++++++++++++-
drivers/iommu/iommufd/iommufd_private.h | 16 +-
drivers/iommu/iommufd/iommufd_test.h | 30 +++
drivers/iommu/iommufd/main.c | 7 +-
drivers/iommu/iommufd/selftest.c | 104 +++++++-
include/linux/iommu.h | 11 +
include/uapi/linux/iommufd.h | 65 +++++
tools/testing/selftests/iommu/iommufd.c | 126 ++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 71 +++++
10 files changed, 654 insertions(+), 27 deletions(-)
--
2.34.1
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v3:
- also provide and use fflush/fclose.
- reject fileno(NULL).
- provide compatability with buffered streams from glibc.
- Link to v2: https://lore.kernel.org/r/20230328-nolibc-printf-test-v2-0-f72bdf210190@wei…
Changes in v2:
- Include <sys/mman.h> for tests.
- Implement FILE* in terms of integer pointers.
- Provide fdopen() and fileno().
- Link to v1: https://lore.kernel.org/lkml/20230328-nolibc-printf-test-v1-0-d7290ec893dd@…
---
Thomas Weißschuh (4):
tools/nolibc: add libc-test binary
tools/nolibc: add wrapper for memfd_create
tools/nolibc: implement fd-based FILE streams
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 95 ++++++++++++++++++++--------
tools/include/nolibc/sys.h | 23 +++++++
tools/testing/selftests/nolibc/.gitignore | 1 +
tools/testing/selftests/nolibc/Makefile | 5 ++
tools/testing/selftests/nolibc/nolibc-test.c | 86 +++++++++++++++++++++++++
5 files changed, 183 insertions(+), 27 deletions(-)
---
base-commit: a63baab5f60110f3631c98b55d59066f1c68c4f7
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before one_page is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
Here is a series with some fixes and cleanups to resctrl selftests and
rewrite of CAT test into something that really tests CAT working or not
condition.
I know that this series will conflict with some of patches from
Shaopeng Tan that so far have not made it into the kselftest tree. Due
to CAT test rewrite done in this series, some of those patches would no
longer be relevant anyway but some of them are still very valid (I've
not tried to reinvent the fixes in Shaopeng's series in this series).
Ilpo Järvinen (22):
selftests/resctrl: Add resctrl.h into build deps
selftests/resctrl: Check also too low values for CBM bits
selftests/resctrl: Make span unsigned long everywhere
selftests/resctrl: Express span in bytes
selftests/resctrl: Remove duplicated preparation for span arg
selftests/resctrl: Don't use variable argument list for ->setup()
selftests/resctrl: Remove "malloc_and_init_memory" param from
run_fill_buf()
selftests/resctrl: Split run_fill_buf() to alloc, work, and dealloc
helpers
selftests/resctrl: Remove start_buf local variable from buffer alloc
func
selftests/resctrl: Don't pass test name to fill_buf
selftests/resctrl: Add flush_buffer() to fill_buf
selftests/resctrl: Remove test type checks from cat_val()
selftests/resctrl: Refactor get_cbm_mask()
selftests/resctrl: Create cache_alloc_size() helper
selftests/resctrl: Replace count_bits with count_consecutive_bits()
selftests/resctrl: Exclude shareable bits from schemata in CAT test
selftests/resctrl: Pass the real number of tests to show_cache_info()
selftests/resctrl: Move CAT/CMT test global vars to func they are used
selftests/resctrl: Read in less obvious order to defeat prefetch
optimizations
selftests/resctrl: Split measure_cache_vals() function
selftests/resctrl: Split show_cache_info() to test specific and
generic parts
selftests/resctrl: Rewrite Cache Allocation Technology (CAT) test
tools/testing/selftests/resctrl/Makefile | 2 +-
tools/testing/selftests/resctrl/cache.c | 154 ++++++------
tools/testing/selftests/resctrl/cat_test.c | 221 +++++++++---------
tools/testing/selftests/resctrl/cmt_test.c | 60 +++--
tools/testing/selftests/resctrl/fill_buf.c | 107 +++++----
tools/testing/selftests/resctrl/mba_test.c | 8 +-
tools/testing/selftests/resctrl/mbm_test.c | 16 +-
tools/testing/selftests/resctrl/resctrl.h | 28 ++-
.../testing/selftests/resctrl/resctrl_tests.c | 34 ++-
tools/testing/selftests/resctrl/resctrl_val.c | 4 +-
tools/testing/selftests/resctrl/resctrlfs.c | 160 ++++++++++---
11 files changed, 447 insertions(+), 347 deletions(-)
--
2.30.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
From: Anh Tuan Phan <tuananhlfc(a)gmail.com>
[ Upstream commit f1594bc676579133a3cd906d7d27733289edfb86 ]
When compiling selftests with target mount_setattr I encountered some errors with the below messages:
mount_setattr_test.c: In function ‘mount_setattr_thread’:
mount_setattr_test.c:343:16: error: variable ‘attr’ has initializer but incomplete type
343 | struct mount_attr attr = {
| ^~~~~~~~~~
These errors might be because of linux/mount.h is not included. This patch resolves that issue.
Signed-off-by: Anh Tuan Phan <tuananhlfc(a)gmail.com>
Acked-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/mount_setattr/mount_setattr_test.c b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
index 8c5fea68ae677..969647228817b 100644
--- a/tools/testing/selftests/mount_setattr/mount_setattr_test.c
+++ b/tools/testing/selftests/mount_setattr/mount_setattr_test.c
@@ -18,6 +18,7 @@
#include <grp.h>
#include <stdbool.h>
#include <stdarg.h>
+#include <linux/mount.h>
#include "../kselftest_harness.h"
--
2.39.2
This patch series adds unit tests for the clk fixed rate basic type and
the clk registration functions that use struct clk_parent_data. To get
there, we add support for loading device tree overlays onto the live DTB
along with probing platform drivers to bind to device nodes in the
overlays. With this series, we're able to exercise some of the code in
the common clk framework that uses devicetree lookups to find parents
and the fixed rate clk code that scans device tree directly and creates
clks. Please review.
I Cced everyone to all the patches so they get the full context. I'm
hoping I can take the whole pile through the clk tree as they almost all
depend on each other.
Changes from v2 (https://lore.kernel.org/r/20230315183729.2376178-1-sboyd@kernel.org):
* Overlays don't depend on __symbols__ node
* Depend on Frank's always create root node if CONFIG_OF series[1]
* Added kernel-doc to KUnit API doc
* Fixed some kernel-doc on functions
* More test cases for fixed rate clk
Changes from v1 (https://lore.kernel.org/r/20230302013822.1808711-1-sboyd@kernel.org):
* Don't depend on UML, use unittest data approach to attach nodes
* Introduce overlay loading API for KUnit
* Move platform_device KUnit code to drivers/base/test
* Use #define macros for constants shared between unit tests and
overlays
* Settle on "test" as a vendor prefix
* Make KUnit wrappers have "_kunit" postfix
Stephen Boyd (11):
of: Add KUnit test to confirm DTB is loaded
of: Add test managed wrappers for of_overlay_apply()/of_node_put()
dt-bindings: vendor-prefixes: Add "test" vendor for KUnit and friends
dt-bindings: test: Add KUnit empty node binding
of: Add a KUnit test for overlays and test managed APIs
platform: Add test managed platform_device/driver APIs
dt-bindings: kunit: Add fixed rate clk consumer test
clk: Add test managed clk provider/consumer APIs
clk: Add KUnit tests for clk fixed rate basic type
dt-bindings: clk: Add KUnit clk_parent_data test
clk: Add KUnit tests for clks registered with struct clk_parent_data
Documentation/dev-tools/kunit/api/clk.rst | 10 +
Documentation/dev-tools/kunit/api/index.rst | 22 +
Documentation/dev-tools/kunit/api/of.rst | 13 +
.../dev-tools/kunit/api/platformdevice.rst | 10 +
.../bindings/clock/test,clk-parent-data.yaml | 47 ++
.../bindings/test/test,clk-fixed-rate.yaml | 35 ++
.../devicetree/bindings/test/test,empty.yaml | 30 ++
.../devicetree/bindings/vendor-prefixes.yaml | 2 +
drivers/base/test/Makefile | 3 +
drivers/base/test/platform_kunit-test.c | 140 ++++++
drivers/base/test/platform_kunit.c | 215 ++++++++
drivers/clk/.kunitconfig | 3 +
drivers/clk/Kconfig | 7 +
drivers/clk/Makefile | 9 +-
drivers/clk/clk-fixed-rate_test.c | 374 ++++++++++++++
drivers/clk/clk-fixed-rate_test.h | 8 +
drivers/clk/clk_kunit.c | 224 +++++++++
drivers/clk/clk_parent_data_test.h | 10 +
drivers/clk/clk_test.c | 459 +++++++++++++++++-
drivers/clk/kunit_clk_fixed_rate_test.dtso | 19 +
drivers/clk/kunit_clk_parent_data_test.dtso | 28 ++
drivers/of/.kunitconfig | 5 +
drivers/of/Kconfig | 19 +
drivers/of/Makefile | 4 +
drivers/of/kunit_overlay_test.dtso | 9 +
drivers/of/of_kunit.c | 125 +++++
drivers/of/of_test.c | 34 ++
drivers/of/overlay_test.c | 110 +++++
include/kunit/clk.h | 28 ++
include/kunit/of.h | 94 ++++
include/kunit/platform_device.h | 15 +
31 files changed, 2109 insertions(+), 2 deletions(-)
create mode 100644 Documentation/dev-tools/kunit/api/clk.rst
create mode 100644 Documentation/dev-tools/kunit/api/of.rst
create mode 100644 Documentation/dev-tools/kunit/api/platformdevice.rst
create mode 100644 Documentation/devicetree/bindings/clock/test,clk-parent-data.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,clk-fixed-rate.yaml
create mode 100644 Documentation/devicetree/bindings/test/test,empty.yaml
create mode 100644 drivers/base/test/platform_kunit-test.c
create mode 100644 drivers/base/test/platform_kunit.c
create mode 100644 drivers/clk/clk-fixed-rate_test.c
create mode 100644 drivers/clk/clk-fixed-rate_test.h
create mode 100644 drivers/clk/clk_kunit.c
create mode 100644 drivers/clk/clk_parent_data_test.h
create mode 100644 drivers/clk/kunit_clk_fixed_rate_test.dtso
create mode 100644 drivers/clk/kunit_clk_parent_data_test.dtso
create mode 100644 drivers/of/.kunitconfig
create mode 100644 drivers/of/kunit_overlay_test.dtso
create mode 100644 drivers/of/of_kunit.c
create mode 100644 drivers/of/of_test.c
create mode 100644 drivers/of/overlay_test.c
create mode 100644 include/kunit/clk.h
create mode 100644 include/kunit/of.h
create mode 100644 include/kunit/platform_device.h
[1] https://lore.kernel.org/r/20230317053415.2254616-1-frowand.list@gmail.com
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
prerequisite-patch-id: 33517b96dd0768ab9c265f5721629786354ee320
prerequisite-patch-id: 909221815eeca0a2b0cdd385c76f57e185fb9e33
--
https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git/https://git.kernel.org/pub/scm/linux/kernel/git/sboyd/spmi.git
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
---
v3:
- Integrate selftest into test_progs (Alexei)
v2:
- Fixes for checkpatch.pl
- Fixes for kernel test robot
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 28 ++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 119 ++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 ++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/prog_tests/test_tunnel.c | 153 +++++++++++++++++-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 ++++++++++++++
10 files changed, 432 insertions(+), 19 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
This is the follow-up on [1], adding selftests (testing for known issues
we added workarounds for and other issues that haven't been fixed yet),
fixing sparc64, reverting the workarounds, and perform one cleanup.
The patch from [1] was modified slightly (updated/extended patch
description, dropped one unnecessary NOP instruction from the ASM in
__pte_mkhwwrite()).
Retested on x86_64 and sparc64 (sun4u in QEMU).
I scanned most architectures to make sure their (pte|pmd)_mkdirty()
handling is correct. To be sure, we can run the selftests and find out if
other architectures are still affectes (loongarch was fixed recently as
well).
Based on master for now. I don't expect surprises regarding mm-tress, but
I can rebase if there are any problems.
[1] https://lkml.kernel.org/r/20221212130213.136267-1-david@redhat.com
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
David Hildenbrand (6):
selftests/mm: reuse read_pmd_pagesize() in COW selftest
selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs
without write permissions
sparc/mm: don't unconditionally set HW writable bit when setting PTE
dirty on 64bit
mm/migrate: revert "mm/migrate: fix wrongly apply write bit after
mkdirty on sparc64"
mm/huge_memory: revert "Partly revert "mm/thp: carry over dirty bit
when thp splits on pmd""
mm/huge_memory: conditionally call maybe_mkwrite() and drop
pte_wrprotect() in __split_huge_pmd_locked()
arch/sparc/include/asm/pgtable_64.h | 116 +++---
mm/huge_memory.c | 16 +-
mm/migrate.c | 2 -
tools/testing/selftests/mm/Makefile | 2 +
tools/testing/selftests/mm/cow.c | 33 +-
tools/testing/selftests/mm/khugepaged.c | 4 +
tools/testing/selftests/mm/mkdirty.c | 379 ++++++++++++++++++
tools/testing/selftests/mm/soft-dirty.c | 3 +
.../selftests/mm/split_huge_page_test.c | 4 +
tools/testing/selftests/mm/vm_util.c | 4 +-
10 files changed, 468 insertions(+), 95 deletions(-)
create mode 100644 tools/testing/selftests/mm/mkdirty.c
--
2.39.2
This patch series introduces a new "isolcpus" partition type to the
existing list of {member, root, isolated} types. The primary reason
of adding this new "isolcpus" partition is to facilitate the
distribution of isolated CPUs down the cgroup v2 hierarchy.
The other non-member partition types have the limitation that their
parents have to be valid partitions too. It will be hard to create a
partition a few layers down the hierarchy.
It is relatively rare to have applications that require creation of
a separate scheduling domain (root). However, it is more common to
have applications that require the use of isolated CPUs (isolated),
e.g. DPDK. One can use the "isolcpus" or "nohz_full" boot command options
to get that statically. Of course, the "isolated" partition is another
way to achieve that dynamically.
Modern container orchestration tools like Kubernetes use the cgroup
hierarchy to manage different containers. If a container needs to use
isolated CPUs, it is hard to get those with existing set of cpuset
partition types. With this patch series, a new "isolcpus" partition
can be created to hold a set of isolated CPUs that can be pull into
other "isolated" partitions.
The "isolcpus" partition is special that there can have at most one
instance of this in a system. It serves as a pool for isolated CPUs
and cannot hold tasks or sub-cpusets underneath it. It is also not
cpu-exclusive so that the isolated CPUs can be distributed down the
sibling hierarchies, though those isolated CPUs will not be useable
until the partition type becomes "isolated".
Once isolated CPUs are needed in a cgroup, the administrator can write
a list of isolated CPUs into its "cpuset.cpus" and change its partition
type to "isolated" to pull in those isolated CPUs from the "isolcpus"
partition and use them in that cgroup. That will make the distribution
of isolated CPUs to cgroups that need them much easier.
In the future, we may be able to extend this special "isolcpus" partition
type to support other isolation attributes like those that can be
specified with the "isolcpus" boot command line and related options.
Waiman Long (5):
cgroup/cpuset: Extract out CS_CPU_EXCLUSIVE & CS_SCHED_LOAD_BALANCE
handling
cgroup/cpuset: Add a new "isolcpus" paritition root state
cgroup/cpuset: Make isolated partition pull CPUs from isolcpus
partition
cgroup/cpuset: Documentation update for the new "isolcpus" partition
cgroup/cpuset: Extend test_cpuset_prs.sh to test isolcpus partition
Documentation/admin-guide/cgroup-v2.rst | 89 ++-
kernel/cgroup/cpuset.c | 548 +++++++++++++++---
.../selftests/cgroup/test_cpuset_prs.sh | 376 ++++++++----
3 files changed, 789 insertions(+), 224 deletions(-)
--
2.31.1
Hello,
The aim of this patch series is to improve the resctrl selftest.
Without these fixes, some unnecessary processing will be executed
and test results will be confusing.
There is no behavior change in test themselves.
[patch 1] Make write_schemata() run to set up shemata with 100% allocation
on first run in MBM test.
[patch 2] The MBA test result message is always output as "ok",
make output message to be "not ok" if MBA check result is failed.
[patch 3] When a child process is created by fork(), the buffer of the
parent process is also copied. Flush the buffer before
executing fork().
[patch 4] An error occurs whether in parents process or child process,
the parents process always kills child process and runs
umount_resctrlfs(), and the child process always waits to be
killed by the parent process.
[patch 5] If a signal received, to cleanup properly before exiting the
parent process, commonize the signal handler registered for
CMT/MBM/MBA tests and reuse it in CAT, also unregister the
signal handler at the end of each test.
[patch 6] Before exiting each test CMT/CAT/MBM/MBA, clear test result
files function cat/cmt/mbm/mba_test_cleanup() are called
twice. Delete once.
This patch series is based on Linux v6.2-rc7.
Difference from v7:
[patch 4]
- Fix commitlog.
[patch 5]
- Fix commitlog.
Pervious versions of this series:
[v1] https://lore.kernel.org/lkml/20220914015147.3071025-1-tan.shaopeng@jp.fujit…
[v2] https://lore.kernel.org/lkml/20221005013933.1486054-1-tan.shaopeng@jp.fujit…
[v3] https://lore.kernel.org/lkml/20221101094341.3383073-1-tan.shaopeng@jp.fujit…
[v4] https://lore.kernel.org/lkml/20221117010541.1014481-1-tan.shaopeng@jp.fujit…
[v5] https://lore.kernel.org/lkml/20230111075802.3556803-1-tan.shaopeng@jp.fujit…
[v6] https://lore.kernel.org/lkml/20230131054655.396270-1-tan.shaopeng@jp.fujits…
[v7] https://lore.kernel.org/lkml/20230213062428.1721572-1-tan.shaopeng@jp.fujit…
Shaopeng Tan (6):
selftests/resctrl: Fix set up schemata with 100% allocation on first
run in MBM test
selftests/resctrl: Return MBA check result and make it to output
message
selftests/resctrl: Flush stdout file buffer before executing fork()
selftests/resctrl: Cleanup properly when an error occurs in CAT test
selftests/resctrl: Commonize the signal handler register/unregister
for all tests
selftests/resctrl: Remove duplicate codes that clear each test result
file
tools/testing/selftests/resctrl/cat_test.c | 29 ++++----
tools/testing/selftests/resctrl/cmt_test.c | 7 +-
tools/testing/selftests/resctrl/fill_buf.c | 14 ----
tools/testing/selftests/resctrl/mba_test.c | 23 +++----
tools/testing/selftests/resctrl/mbm_test.c | 20 +++---
tools/testing/selftests/resctrl/resctrl.h | 2 +
.../testing/selftests/resctrl/resctrl_tests.c | 4 --
tools/testing/selftests/resctrl/resctrl_val.c | 67 ++++++++++++++-----
tools/testing/selftests/resctrl/resctrlfs.c | 5 +-
9 files changed, 96 insertions(+), 75 deletions(-)
--
2.27.0
On Tue, Apr 11, 2023 at 08:16:46PM -0700, Stefan Roesch wrote:
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> +
> + if (!err)
> + ksm_add_vmas(me->mm);
in the last version of this patch, you reported the error. Now you
swallow the error. I have no idea which is correct, but you've
changed the behaviour without explaining it, so I assume it's wrong.
> + } else {
> + clear_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + }
> + mmap_write_unlock(me->mm);
> + break;
> + case PR_GET_MEMORY_MERGE:
> + if (arg2 || arg3 || arg4 || arg5)
> + return -EINVAL;
> +
> + error = !!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + break;
Why do we need a GET? Just for symmetry, or is there an actual need for
it?
This patch updates the cgroup-v2.rst file to include information about
the new "isolcpus" partition type.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
Documentation/admin-guide/cgroup-v2.rst | 89 +++++++++++++++++++------
1 file changed, 70 insertions(+), 19 deletions(-)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index f67c0829350b..352a02849fa7 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -2225,7 +2225,8 @@ Cpuset Interface Files
========== =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
========== =====================================
The root cgroup is always a partition root and its state
@@ -2237,24 +2238,41 @@ Cpuset Interface Files
its descendants except those that are separate partition roots
themselves and their descendants.
+ When set to "isolcpus", the CPUs in that partition root will
+ be in an isolated state without any load balancing from the
+ scheduler. This partition root is special as there can be at
+ most one instance of it in a system and no task or child cpuset
+ is allowed in this cgroup. It acts as a pool of isolated CPUs to
+ be pulled into other "isolated" partitions. The "cpuset.cpus"
+ of an "isolcpus" partition root contains the list of isolated
+ CPUs it holds, where "cpuset.cpus.effective" contains the list
+ of freely available isolated CPUs that are ready to be pull
+ into other "isolated" partition.
+
When set to "isolated", the CPUs in that partition root will
be in an isolated state without any load balancing from the
scheduler. Tasks placed in such a partition with multiple
CPUs should be carefully distributed and bound to each of the
- individual CPUs for optimal performance.
-
- The value shown in "cpuset.cpus.effective" of a partition root
- is the CPUs that the partition root can dedicate to a potential
- new child partition root. The new child subtracts available
- CPUs from its parent "cpuset.cpus.effective".
-
- A partition root ("root" or "isolated") can be in one of the
- two possible states - valid or invalid. An invalid partition
- root is in a degraded state where some state information may
- be retained, but behaves more like a "member".
-
- All possible state transitions among "member", "root" and
- "isolated" are allowed.
+ individual CPUs for optimal performance. The isolated CPUs can
+ come from either the parent partition root or from an "isolcpus"
+ partition if the parent cannot satisfy its request.
+
+ The value shown in "cpuset.cpus.effective" of a partition root is
+ the CPUs that the partition root can dedicate to a potential new
+ child partition root. The new child partition subtracts available
+ CPUs from its parent "cpuset.cpus.effective". An exception is
+ an "isolated" partition that pulls its isolated CPUs from the
+ "isolcpus" partition root that is not its direct parent.
+
+ A partition root can be in one of the two possible states -
+ valid or invalid. An invalid partition root is in a degraded
+ state where some state information may be retained, but behaves
+ more like a "member".
+
+ All possible state transitions among "member", "root", "isolcpus"
+ and "isolated" are allowed. However, the partition root may
+ not be valid if the corresponding prerequisite conditions are
+ not met.
On read, the "cpuset.cpus.partition" file can show the following
values.
@@ -2262,16 +2280,18 @@ Cpuset Interface Files
============================= =====================================
"member" Non-root member of a partition
"root" Partition root
- "isolated" Partition root without load balancing
+ "isolcpus" Partition root for isolated CPUs pool
+ "isolated" Partition root for isolated CPUs
"root invalid (<reason>)" Invalid partition root
+ "isolcpus invalid (<reason>)" Invalid isolcpus partition root
"isolated invalid (<reason>)" Invalid isolated partition root
============================= =====================================
In the case of an invalid partition root, a descriptive string on
- why the partition is invalid is included within parentheses.
+ why the partition is invalid may be included within parentheses.
- For a partition root to become valid, the following conditions
- must be met.
+ For a "root" partition root to become valid, the following
+ conditions must be met.
1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
are not shared by any of its siblings (exclusivity rule).
@@ -2281,6 +2301,37 @@ Cpuset Interface Files
4) The "cpuset.cpus.effective" cannot be empty unless there is
no task associated with this partition.
+ A valid "isolcpus" partition root requires the following
+ conditions.
+
+ 1) The parent cgroup is a valid partition root.
+ 2) The "cpuset.cpus" must be a subset of parent's "cpuset.cpus"
+ including an empty cpu list.
+ 3) There can be no more than one valid "isolcpus" partition.
+ 4) No task or child cpuset is allowed.
+
+ Note that an "isolcpus" partition is not exclusive and its
+ isolated CPUs can be distributed down sibling cgroups even
+ though they may not appear in their "cpuset.cpus.effective".
+
+ A valid "isolated" partition root can pull isolated CPUs from
+ either its parent partition or from the "isolcpus" partition.
+ It also requires the following conditions to be met.
+
+ 1) The "cpuset.cpus" is exclusive with its siblings , i.e. they
+ are not shared by any of its siblings (exclusivity rule).
+ 2) The "cpuset.cpus" is not empty and must be a subset of
+ parent's "cpuset.cpus".
+ 3) The "cpuset.cpus.effective" cannot be empty unless there is
+ no task associated with this partition.
+
+ If pulling isolated CPUS from "isolcpus" partition,
+ the "cpuset.cpus" must also be a subset of "isolcpus"
+ partition's "cpuset.cpus" and all the requested CPUs must
+ be available for pulling, i.e. in "isolcpus" partition's
+ "cpuset.cpus.effective". In this case, its hierarchical parent
+ does not need to be a valid partition root.
+
External events like hotplug or changes to "cpuset.cpus" can
cause a valid partition root to become invalid and vice versa.
Note that a task cannot be moved to a cgroup with empty
--
2.31.1
With the addition of a new "isolcpus" partition in a previous patch,
this patch adds the capability for a privileged user to pull isolated
CPUs from the "isolcpus" partition to an "isolated" partition if its
parent cannot satisfy its request directly.
The following conditions must be true for the pulling of isolated CPUs
from "isolcpus" partition to be successful.
(1) The value of "cpuset.cpus" must still be a subset of its parent's
"cpuset.cpus" to ensure proper inheritance even though these CPUs
cannot be used until the cpuset becomes an "isolated" partition.
(2) All the CPUs in "cpuset.cpus" are freely available in the
"isolcpus" partition, i.e. in its "cpuset.cpus.effective" and not
yet claimed by other isolated partitions.
With this change, the CPUs in an "isolated" partition can either
come from the "isolcpus" partition or from its direct parent, but not
both. Now the parent of an isolated partition does not need to be a
partition root anymore.
Because of the cpu exclusive nature of an "isolated" partition, these
isolated CPUs cannot be distributed to other siblings of that isolated
partition.
Changes to "cpuset.cpus" of such an isolated partition is allowed as
long as all the newly requested CPUs can be granted from the "isolcpus"
partition. Otherwise, the partition will become invalid.
This makes the management and distribution of isolated CPUs to those
applications that require them much easier.
An "isolated" partition that pulls CPUs from the special "isolcpus"
partition can now have 2 parents - the "isolcpus" partition where
it gets its isolated CPUs and its hierarchical parent where it gets
all the other resources. However, such an "isolated" partition cannot
have subpartitions as all the CPUs from "isolcpus" must be in the same
isolated state.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 282 ++++++++++++++++++++++++++++++++++++++---
1 file changed, 264 insertions(+), 18 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 444eae3a9a6b..a5bbd43ed46e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -101,6 +101,7 @@ enum prs_errcode {
PERR_ISOLCPUS,
PERR_ISOLTASK,
PERR_ISOLCHILD,
+ PERR_ISOPARENT,
};
static const char * const perr_strings[] = {
@@ -114,6 +115,7 @@ static const char * const perr_strings[] = {
[PERR_ISOLCPUS] = "An isolcpus partition is already present",
[PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
[PERR_ISOLCHILD] = "Isolcpus partition can't have children",
+ [PERR_ISOPARENT] = "Isolated/isolcpus parent can't have subpartition",
};
struct cpuset {
@@ -1333,6 +1335,195 @@ static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
rebuild_sched_domains_locked();
}
+/*
+ * isolcpus_pull - Enable or disable pulling of isolated cpus from isolcpus
+ * @cs: the cpuset to update
+ * @cmd: the command code (only partcmd_enable or partcmd_disable)
+ * Return: 1 if successful, 0 if error
+ *
+ * Note that pulling isolated cpus from isolcpus or cpus from parent does
+ * not require rebuilding sched domains. So we can change the flags directly.
+ */
+static int isolcpus_pull(struct cpuset *cs, enum subparts_cmd cmd)
+{
+ struct cpuset *parent = parent_cs(cs);
+
+ if (!isolcpus_cs)
+ return 0;
+
+ /*
+ * To enable pulling of isolated CPUs from isolcpus, cpus_allowed
+ * must be a subset of both its parent's cpus_allowed and isolcpus_cs's
+ * effective_cpus and the user has sysadmin privilege.
+ */
+ if ((cmd == partcmd_enable) && capable(CAP_SYS_ADMIN) &&
+ cpumask_subset(cs->cpus_allowed, isolcpus_cs->effective_cpus) &&
+ cpumask_subset(cs->cpus_allowed, parent->cpus_allowed)) {
+ /*
+ * Move cpus from effective_cpus to subparts_cpus & make
+ * cs a child of isolcpus partition.
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_copy(cs->effective_cpus, cs->cpus_allowed);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (cs->use_parent_ecpus) {
+ cs->use_parent_ecpus = false;
+ parent->child_ecpus_count--;
+ }
+ list_add(&cs->isol_sibling, &isol_children);
+ clear_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+
+ if ((cmd == partcmd_disable) && !list_empty(&cs->isol_sibling)) {
+ /*
+ * This can be called after isolcpus shrinks its cpu list.
+ * So not all the cpus should be returned back to isolcpus.
+ */
+ WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED);
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, cs->cpus_allowed);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cs->effective_cpus);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus,
+ isolcpus_cs->cpus_allowed);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+
+ if (!cpumask_and(cs->effective_cpus, parent->effective_cpus,
+ cs->cpus_allowed)) {
+ cs->use_parent_ecpus = true;
+ parent->child_ecpus_count++;
+ cpumask_copy(cs->effective_cpus,
+ parent->effective_cpus);
+ }
+ list_del_init(&cs->isol_sibling);
+ cs->partition_root_state = PRS_INVALID_ISOLATED;
+ cs->prs_err = PERR_INVCPUS;
+
+ set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
+ clear_bit(CS_CPU_EXCLUSIVE, &cs->flags);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+ }
+ return 0;
+}
+
+static void isolcpus_disable(void)
+{
+ struct cpuset *child, *next;
+
+ list_for_each_entry_safe(child, next, &isol_children, isol_sibling)
+ WARN_ON_ONCE(isolcpus_pull(child, partcmd_disable));
+
+ isolcpus_cs = NULL;
+}
+
+/*
+ * isolcpus_cpus_update - cpuset.cpus change in isolcpus partition
+ */
+static void isolcpus_cpus_update(struct cpuset *cs)
+{
+ struct cpuset *child, *next;
+
+ if (WARN_ON_ONCE(isolcpus_cs != cs))
+ return;
+
+ if (list_empty(&isol_children))
+ return;
+
+ /*
+ * Remove child isolated partitions that are not fully covered by
+ * subparts_cpus.
+ */
+ list_for_each_entry_safe(child, next, &isol_children,
+ isol_sibling) {
+ if (cpumask_subset(child->cpus_allowed,
+ cs->subparts_cpus))
+ continue;
+
+ isolcpus_pull(child, partcmd_disable);
+ }
+}
+
+/*
+ * isolated_cpus_update - cpuset.cpus change in isolated partition
+ *
+ * Return: 1 if no further action needs, 0 otherwise
+ */
+static int isolated_cpus_update(struct cpuset *cs, struct cpumask *newmask,
+ struct tmpmasks *tmp)
+{
+ struct cpumask *addmask = tmp->addmask;
+ struct cpumask *delmask = tmp->delmask;
+
+ if (WARN_ON_ONCE(cs->partition_root_state != PRS_ISOLATED) ||
+ list_empty(&cs->isol_sibling))
+ return 0;
+
+ if (WARN_ON_ONCE(!isolcpus_cs) || cpumask_empty(newmask)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ if (cpumask_andnot(addmask, newmask, cs->cpus_allowed)) {
+ /*
+ * Check if isolcpus partition can provide the new CPUs
+ */
+ if (!cpumask_subset(addmask, isolcpus_cs->cpus_allowed) ||
+ cpumask_intersects(addmask, isolcpus_cs->subparts_cpus)) {
+ isolcpus_pull(cs, partcmd_disable);
+ return 0;
+ }
+
+ /*
+ * Pull addmask isolated CPUs from isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_andnot(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, addmask);
+ cpumask_andnot(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, addmask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ if (cpumask_andnot(tmp->delmask, cs->cpus_allowed, newmask)) {
+ /*
+ * Return isolated CPUs back to isolcpus partition
+ */
+ spin_lock_irq(&callback_lock);
+ cpumask_or(isolcpus_cs->subparts_cpus,
+ isolcpus_cs->subparts_cpus, delmask);
+ cpumask_or(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, delmask);
+ cpumask_and(isolcpus_cs->effective_cpus,
+ isolcpus_cs->effective_cpus, cpu_active_mask);
+ isolcpus_cs->nr_subparts_cpus
+ = cpumask_weight(isolcpus_cs->subparts_cpus);
+ spin_unlock_irq(&callback_lock);
+ }
+
+ spin_lock_irq(&callback_lock);
+ cpumask_copy(cs->cpus_allowed, newmask);
+ cpumask_andnot(cs->effective_cpus, newmask, cs->subparts_cpus);
+ cpumask_and(cs->effective_cpus, cs->effective_cpus, cpu_active_mask);
+ spin_unlock_irq(&callback_lock);
+ return 1;
+}
+
/**
* update_parent_subparts_cpumask - update subparts_cpus mask of parent cpuset
* @cs: The cpuset that requests change in partition root state
@@ -1579,7 +1770,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
- isolcpus_cs = NULL;
+ isolcpus_disable();
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1625,6 +1816,12 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
struct cpuset *parent = parent_cs(cp);
bool update_parent = false;
+ /*
+ * Skip isolated cpuset that pull isolated CPUs from isolcpus
+ */
+ if (!list_empty(&cp->isol_sibling))
+ continue;
+
compute_effective_cpumask(tmp->new_cpus, cp, parent);
/*
@@ -1742,7 +1939,7 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
WARN_ON(!is_in_v2_mode() &&
!cpumask_equal(cp->cpus_allowed, cp->effective_cpus));
- update_tasks_cpumask(cp, tmp->new_cpus);
+ update_tasks_cpumask(cp, cp->effective_cpus);
/*
* On legacy hierarchy, if the effective cpumask of any non-
@@ -1888,6 +2085,10 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
return retval;
if (cs->partition_root_state) {
+ if (!list_empty(&cs->isol_sibling) &&
+ isolated_cpus_update(cs, trialcs->cpus_allowed, &tmp))
+ goto update_hier; /* CPUs update done */
+
if (invalidate)
update_parent_subparts_cpumask(cs, partcmd_invalidate,
NULL, &tmp);
@@ -1920,6 +2121,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
}
spin_unlock_irq(&callback_lock);
+update_hier:
#ifdef CONFIG_CPUMASK_OFFSTACK
/* Now trialcs->cpus_allowed is available */
tmp.new_cpus = trialcs->cpus_allowed;
@@ -1928,8 +2130,7 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
/* effective_cpus will be updated here */
update_cpumasks_hier(cs, &tmp, false);
- if (cs->partition_root_state) {
- bool force = (cs->partition_root_state == PRS_ISOLCPUS);
+ if (cs->partition_root_state && list_empty(&cs->isol_sibling)) {
struct cpuset *parent = parent_cs(cs);
/*
@@ -1937,8 +2138,12 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
* cpusets if they use parent's effective_cpus or when
* the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count || force)
- update_sibling_cpumasks(parent, cs, &tmp, force);
+ if (cs->partition_root_state == PRS_ISOLCPUS) {
+ update_sibling_cpumasks(parent, cs, &tmp, true);
+ isolcpus_cpus_update(cs);
+ } else if (parent->child_ecpus_count) {
+ update_sibling_cpumasks(parent, cs, &tmp, false);
+ }
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2307,7 +2512,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
return err;
}
-/**
+/*
* update_prstate - update partition_root_state
* @cs: the cpuset to update
* @new_prs: new partition root state
@@ -2325,13 +2530,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
return 0;
/*
- * For a previously invalid partition root, leave it at being
- * invalid if new_prs is not "member".
+ * For a previously invalid partition root, treat it like a "member".
*/
- if (new_prs && is_prs_invalid(old_prs)) {
- cs->partition_root_state = -new_prs;
- return 0;
- }
+ if (new_prs && is_prs_invalid(old_prs))
+ old_prs = PRS_MEMBER;
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
@@ -2371,6 +2573,21 @@ static int update_prstate(struct cpuset *cs, int new_prs)
}
}
+ /*
+ * A parent isolated partition that gets its isolated CPUs from
+ * isolcpus cannot have subpartition.
+ */
+ if (new_prs && !list_empty(&parent->isol_sibling)) {
+ err = PERR_ISOPARENT;
+ goto out;
+ }
+
+ if ((old_prs == PRS_ISOLATED) && !list_empty(&cs->isol_sibling)) {
+ isolcpus_pull(cs, partcmd_disable);
+ old_prs = 0;
+ }
+ WARN_ON_ONCE(!list_empty(&cs->isol_sibling));
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2386,6 +2603,10 @@ static int update_prstate(struct cpuset *cs, int new_prs)
err = update_parent_subparts_cpumask(cs, partcmd_enable,
NULL, &tmpmask);
+ if (err && (new_prs == PRS_ISOLATED) &&
+ isolcpus_pull(cs, partcmd_enable))
+ err = 0; /* Successful isolcpus pull */
+
if (err)
goto out;
} else if (old_prs && new_prs) {
@@ -2445,7 +2666,7 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (new_prs == PRS_ISOLCPUS)
isolcpus_cs = cs;
else if (cs == isolcpus_cs)
- isolcpus_cs = NULL;
+ isolcpus_disable();
/*
* Update child cpusets, if present.
@@ -3674,8 +3895,31 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
}
parent = parent_cs(cs);
- compute_effective_cpumask(&new_cpus, cs, parent);
nodes_and(new_mems, cs->mems_allowed, parent->effective_mems);
+ /*
+ * In the special case of a valid isolated cpuset pulling isolated
+ * cpus from isolcpus. We just need to mask offline cpus from
+ * cpus_allowed unless all the isolated cpus are gone.
+ */
+ if (!list_empty(&cs->isol_sibling)) {
+ if (!cpumask_and(&new_cpus, cs->cpus_allowed, cpu_active_mask))
+ isolcpus_pull(cs, partcmd_disable);
+ } else if ((cs->partition_root_state == PRS_ISOLCPUS) &&
+ cpumask_empty(cs->cpus_allowed)) {
+ /*
+ * For isolcpus with empty cpus_allowed, just update
+ * effective_mems and be done with it.
+ */
+ spin_lock_irq(&callback_lock);
+ if (nodes_empty(new_mems))
+ cs->effective_mems = parent->effective_mems;
+ else
+ cs->effective_mems = new_mems;
+ spin_unlock_irq(&callback_lock);
+ goto unlock;
+ } else {
+ compute_effective_cpumask(&new_cpus, cs, parent);
+ }
if (cs->nr_subparts_cpus)
/*
@@ -3707,10 +3951,12 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
* the following conditions hold:
* 1) empty effective cpus but not valid empty partition.
* 2) parent is invalid or doesn't grant any cpus to child
- * partitions.
+ * partitions and not an isolated cpuset pulling cpus from
+ * isolcpus.
*/
- if (is_partition_valid(cs) && (!parent->nr_subparts_cpus ||
- (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
+ if (is_partition_valid(cs) &&
+ ((!parent->nr_subparts_cpus && list_empty(&cs->isol_sibling)) ||
+ (cpumask_empty(&new_cpus) && partition_is_populated(cs, NULL)))) {
int old_prs, parent_prs;
update_parent_subparts_cpumask(cs, partcmd_disable, NULL, tmp);
--
2.31.1
One can use "cpuset.cpus.partition" to create multiple scheduling domains
or to produce a set of isolated CPUs where load balancing is disabled.
The former use case is less common but the latter one can be frequently
used especially for the Telco use cases like DPDK.
The existing "isolated" partition can be used to produce isolated
CPUs if the applications have full control of a system. However, in a
containerized environment where all the apps are run in a container,
it is hard to distribute out isolated CPUs from the root down given
the unified hierarchy nature of cgroup v2.
The container running on isolated CPUs can be several layers down from
the root. The current partition feature requires that all the ancestors
of a leaf partition root must be parititon roots themselves. This can
be hard to manage.
This patch introduces a new special partition root state called
"isolcpus" that serves as a pool of isolated CPUs to be pulled into other
"isolated" partitions. At most one instance of the "isolcpus" partition
is allowed in a system preferrably as a child of the top cpuset.
In a valid "isolcpus" partition, "cpuset.cpus" contains the set of
isolated CPUs and "cpuset.cpus.effective" contains the set of freely
available isolated CPUs that have not yet been pulled into other
"isolated" cpusets.
The special "isolcpus" partition cannot have normal cpuset children. So
we are not allowed to enable child cpuset in its "cgroup.subtree_control"
file if it has children. Tasks are also not allowed in the "cgroup.procs"
of the "isolcpus" partition. Unlike other partition roots, empty
"cpuset.cpus" is allowed in the "isolcpus" partition as this special
cpuset is not designed to hold tasks.
The CPUs in the "isolcpus" partition are not exclusive so that those
isolated CPUs can be distributed down sibling hierarchies as usual even
though they will not show up in their "cpuset.cpus.effective".
Right now, an "isolcpus" partition only disable load balancing of
the isolated CPUs. In the near future, it may be extended to support
additional isolation attributes like those currently supported by the
"isolcpus" or related kernel boot command line options.
In a subsequent patch, a privileged user can change a "member" cpuset
to an "isolated" partition root by pulling isolated CPUs from the
"isolcpus" partition if its parent is not a partition root that can
directly satisfy the request.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/cgroup/cpuset.c | 158 ++++++++++++++++++++++++++++++++++-------
1 file changed, 133 insertions(+), 25 deletions(-)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 83a7193e0f2c..444eae3a9a6b 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -98,6 +98,9 @@ enum prs_errcode {
PERR_NOCPUS,
PERR_HOTPLUG,
PERR_CPUSEMPTY,
+ PERR_ISOLCPUS,
+ PERR_ISOLTASK,
+ PERR_ISOLCHILD,
};
static const char * const perr_strings[] = {
@@ -108,6 +111,9 @@ static const char * const perr_strings[] = {
[PERR_NOCPUS] = "Parent unable to distribute cpu downstream",
[PERR_HOTPLUG] = "No cpu available due to hotplug",
[PERR_CPUSEMPTY] = "cpuset.cpus is empty",
+ [PERR_ISOLCPUS] = "An isolcpus partition is already present",
+ [PERR_ISOLTASK] = "Isolcpus partition can't have tasks",
+ [PERR_ISOLCHILD] = "Isolcpus partition can't have children",
};
struct cpuset {
@@ -198,6 +204,9 @@ struct cpuset {
/* Handle for cpuset.cpus.partition */
struct cgroup_file partition_file;
+
+ /* siblings list anchored at isol_children */
+ struct list_head isol_sibling;
};
/*
@@ -206,14 +215,26 @@ struct cpuset {
* 0 - member (not a partition root)
* 1 - partition root
* 2 - partition root without load balancing (isolated)
+ * 3 - isolated cpu pool (isolcpus)
* -1 - invalid partition root
* -2 - invalid isolated partition root
+ * -3 - invalid isolated cpu pool
+ *
+ * An isolated cpu pool is a special isolated partition root. At most one
+ * instance of it is allowed in a system. It provides a pool of isolated
+ * cpus that a normal isolated partition root can pull from, if privileged,
+ * in case its parent cannot fulfill its request.
*/
#define PRS_MEMBER 0
#define PRS_ROOT 1
#define PRS_ISOLATED 2
+#define PRS_ISOLCPUS 3
#define PRS_INVALID_ROOT -1
#define PRS_INVALID_ISOLATED -2
+#define PRS_INVALID_ISOLCPUS -3
+
+static struct cpuset *isolcpus_cs; /* System isolcpus partition root */
+static struct list_head isol_children; /* Children that pull isolated cpus */
static inline bool is_prs_invalid(int prs_state)
{
@@ -335,6 +356,7 @@ static struct cpuset top_cpuset = {
.flags = ((1 << CS_ONLINE) | (1 << CS_CPU_EXCLUSIVE) |
(1 << CS_MEM_EXCLUSIVE)),
.partition_root_state = PRS_ROOT,
+ .isol_sibling = LIST_HEAD_INIT(top_cpuset.isol_sibling),
};
/**
@@ -1282,7 +1304,7 @@ static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs,
*/
static int update_partition_exclusive(struct cpuset *cs, int new_prs)
{
- bool exclusive = (new_prs > 0);
+ bool exclusive = (new_prs == PRS_ROOT) || (new_prs == PRS_ISOLATED);
if (exclusive && !is_cpu_exclusive(cs)) {
if (update_flag(CS_CPU_EXCLUSIVE, cs, 1))
@@ -1303,7 +1325,7 @@ static int update_partition_exclusive(struct cpuset *cs, int new_prs)
static void update_partition_sd_lb(struct cpuset *cs, int old_prs)
{
int new_prs = cs->partition_root_state;
- bool new_lb = (new_prs != PRS_ISOLATED);
+ bool new_lb = (new_prs != PRS_ISOLATED) && (new_prs != PRS_ISOLCPUS);
if (new_lb != !!is_sched_load_balance(cs))
update_flag(CS_SCHED_LOAD_BALANCE, cs, new_lb);
@@ -1360,18 +1382,20 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
int part_error = PERR_NONE; /* Partition error? */
percpu_rwsem_assert_held(&cpuset_rwsem);
+ old_prs = new_prs = cs->partition_root_state;
/*
* The parent must be a partition root.
* The new cpumask, if present, or the current cpus_allowed must
- * not be empty.
+ * not be empty except for isolcpus partition.
*/
if (!is_partition_valid(parent)) {
return is_partition_invalid(parent)
? PERR_INVPARENT : PERR_NOTPART;
}
- if ((newmask && cpumask_empty(newmask)) ||
- (!newmask && cpumask_empty(cs->cpus_allowed)))
+ if ((new_prs != PRS_ISOLCPUS) &&
+ ((newmask && cpumask_empty(newmask)) ||
+ (!newmask && cpumask_empty(cs->cpus_allowed))))
return PERR_CPUSEMPTY;
/*
@@ -1379,7 +1403,6 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
* partcmd_invalidate commands.
*/
adding = deleting = false;
- old_prs = new_prs = cs->partition_root_state;
if (cmd == partcmd_enable) {
/*
* Enabling partition root is not allowed if cpus_allowed
@@ -1498,11 +1521,13 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
switch (cs->partition_root_state) {
case PRS_ROOT:
case PRS_ISOLATED:
+ case PRS_ISOLCPUS:
if (part_error)
new_prs = -old_prs;
break;
case PRS_INVALID_ROOT:
case PRS_INVALID_ISOLATED:
+ case PRS_INVALID_ISOLCPUS:
if (!part_error)
new_prs = -old_prs;
break;
@@ -1553,6 +1578,9 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);
+ if ((isolcpus_cs == cs) && (cs->partition_root_state != PRS_ISOLCPUS))
+ isolcpus_cs = NULL;
+
if (adding || deleting)
update_tasks_cpumask(parent, tmp->addmask);
@@ -1640,7 +1668,14 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
*/
old_prs = new_prs = cp->partition_root_state;
if ((cp != cs) && old_prs) {
- switch (parent->partition_root_state) {
+ int parent_prs = parent->partition_root_state;
+
+ /*
+ * isolcpus partition parent can't have children
+ */
+ WARN_ON_ONCE(parent_prs == PRS_ISOLCPUS);
+
+ switch (parent_prs) {
case PRS_ROOT:
case PRS_ISOLATED:
update_parent = true;
@@ -1735,9 +1770,10 @@ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp,
* @parent: Parent cpuset
* @cs: Current cpuset
* @tmp: Temp variables
+ * @force: Force update if set
*/
static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
- struct tmpmasks *tmp)
+ struct tmpmasks *tmp, bool force)
{
struct cpuset *sibling;
struct cgroup_subsys_state *pos_css;
@@ -1756,7 +1792,7 @@ static void update_sibling_cpumasks(struct cpuset *parent, struct cpuset *cs,
cpuset_for_each_child(sibling, pos_css, parent) {
if (sibling == cs)
continue;
- if (!sibling->use_parent_ecpus)
+ if (!sibling->use_parent_ecpus && !force)
continue;
if (!css_tryget_online(&sibling->css))
continue;
@@ -1893,14 +1929,16 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
update_cpumasks_hier(cs, &tmp, false);
if (cs->partition_root_state) {
+ bool force = (cs->partition_root_state == PRS_ISOLCPUS);
struct cpuset *parent = parent_cs(cs);
/*
* For partition root, update the cpumasks of sibling
- * cpusets if they use parent's effective_cpus.
+ * cpusets if they use parent's effective_cpus or when
+ * the current cpuset is an isolcpus partition.
*/
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmp);
+ if (parent->child_ecpus_count || force)
+ update_sibling_cpumasks(parent, cs, &tmp, force);
/* Update CS_SCHED_LOAD_BALANCE and/or sched_domains */
update_partition_sd_lb(cs, old_prs);
@@ -2298,6 +2336,41 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (alloc_cpumasks(NULL, &tmpmask))
return -ENOMEM;
+ /*
+ * Only one isolcpus partition is allowed and it can't have children
+ * or tasks in it. The isolcpus partition is also not exclusive so
+ * that the isolated but unused cpus can be distributed down the
+ * hierarchy.
+ */
+ if (new_prs == PRS_ISOLCPUS) {
+ if (isolcpus_cs)
+ err = PERR_ISOLCPUS;
+ else if (!list_empty(&cs->css.children))
+ err = PERR_ISOLCHILD;
+ else if (cs->css.cgroup->nr_populated_csets)
+ err = PERR_ISOLTASK;
+
+ if (err && old_prs) {
+ /*
+ * A previous valid partition root is now invalid
+ */
+ goto disable_partition;
+ } else if (err) {
+ goto out;
+ }
+
+ /*
+ * Unlike other partition types, an isolated cpu pool can
+ * be empty as it is essentially a place holder for isolated
+ * CPUs.
+ */
+ if (!old_prs && cpumask_empty(cs->cpus_allowed)) {
+ /* Force effective_cpus to be empty too */
+ cpumask_clear(cs->effective_cpus);
+ goto out;
+ }
+ }
+
err = update_partition_exclusive(cs, new_prs);
if (err)
goto out;
@@ -2316,11 +2389,9 @@ static int update_prstate(struct cpuset *cs, int new_prs)
if (err)
goto out;
} else if (old_prs && new_prs) {
- /*
- * A change in load balance state only, no change in cpumasks.
- */
- goto out;
+ goto out; /* Skip cpuset and sibling task update */
} else {
+disable_partition:
/*
* Switching back to member is always allowed even if it
* disables child partitions.
@@ -2342,8 +2413,13 @@ static int update_prstate(struct cpuset *cs, int new_prs)
update_tasks_cpumask(parent, tmpmask.new_cpus);
- if (parent->child_ecpus_count)
- update_sibling_cpumasks(parent, cs, &tmpmask);
+ /*
+ * Since isolcpus partition is not exclusive, we have to update
+ * sibling hierarchies as well.
+ */
+ if ((new_prs == PRS_ISOLCPUS) || parent->child_ecpus_count)
+ update_sibling_cpumasks(parent, cs, &tmpmask,
+ new_prs == PRS_ISOLCPUS);
out:
/*
@@ -2363,6 +2439,14 @@ static int update_prstate(struct cpuset *cs, int new_prs)
/* Update sched domains and load balance flag */
update_partition_sd_lb(cs, old_prs);
+ /*
+ * Check isolcpus_cs state
+ */
+ if (new_prs == PRS_ISOLCPUS)
+ isolcpus_cs = cs;
+ else if (cs == isolcpus_cs)
+ isolcpus_cs = NULL;
+
/*
* Update child cpusets, if present.
* Force update if switching back to member.
@@ -2486,7 +2570,12 @@ static struct cpuset *cpuset_attach_old_cs;
*/
static int cpuset_can_attach_check(struct cpuset *cs)
{
+ /*
+ * Task cannot be moved to a cpuset with empty effective cpus or
+ * is an isolcpus partition.
+ */
if (cpumask_empty(cs->effective_cpus) ||
+ (cs->partition_root_state == PRS_ISOLCPUS) ||
(!is_in_v2_mode() && nodes_empty(cs->mems_allowed)))
return -ENOSPC;
return 0;
@@ -2902,24 +2991,30 @@ static s64 cpuset_read_s64(struct cgroup_subsys_state *css, struct cftype *cft)
static int sched_partition_show(struct seq_file *seq, void *v)
{
struct cpuset *cs = css_cs(seq_css(seq));
+ int prs = cs->partition_root_state;
const char *err, *type = NULL;
- switch (cs->partition_root_state) {
+ switch (prs) {
case PRS_ROOT:
seq_puts(seq, "root\n");
break;
case PRS_ISOLATED:
seq_puts(seq, "isolated\n");
break;
+ case PRS_ISOLCPUS:
+ seq_puts(seq, "isolcpus\n");
+ break;
case PRS_MEMBER:
seq_puts(seq, "member\n");
break;
- case PRS_INVALID_ROOT:
- type = "root";
- fallthrough;
- case PRS_INVALID_ISOLATED:
- if (!type)
+ default:
+ if (prs == PRS_INVALID_ROOT)
+ type = "root";
+ else if (prs == PRS_INVALID_ISOLATED)
type = "isolated";
+ else
+ type = "isolcpus";
+
err = perr_strings[READ_ONCE(cs->prs_err)];
if (err)
seq_printf(seq, "%s invalid (%s)\n", type, err);
@@ -2948,6 +3043,8 @@ static ssize_t sched_partition_write(struct kernfs_open_file *of, char *buf,
val = PRS_MEMBER;
else if (!strcmp(buf, "isolated"))
val = PRS_ISOLATED;
+ else if (!strcmp(buf, "isolcpus"))
+ val = PRS_ISOLCPUS;
else
return -EINVAL;
@@ -3157,6 +3254,7 @@ cpuset_css_alloc(struct cgroup_subsys_state *parent_css)
nodes_clear(cs->effective_mems);
fmeter_init(&cs->fmeter);
cs->relax_domain_level = -1;
+ INIT_LIST_HEAD(&cs->isol_sibling);
/* Set CS_MEMORY_MIGRATE for default hierarchy */
if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
@@ -3171,6 +3269,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
struct cpuset *parent = parent_cs(cs);
struct cpuset *tmp_cs;
struct cgroup_subsys_state *pos_css;
+ int err = 0;
if (!parent)
return 0;
@@ -3178,6 +3277,14 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
cpus_read_lock();
percpu_down_write(&cpuset_rwsem);
+ /*
+ * An isolcpus partition cannot have direct children.
+ */
+ if (parent->partition_root_state == PRS_ISOLCPUS) {
+ err = -EINVAL;
+ goto out_unlock;
+ }
+
set_bit(CS_ONLINE, &cs->flags);
if (is_spread_page(parent))
set_bit(CS_SPREAD_PAGE, &cs->flags);
@@ -3229,7 +3336,7 @@ static int cpuset_css_online(struct cgroup_subsys_state *css)
out_unlock:
percpu_up_write(&cpuset_rwsem);
cpus_read_unlock();
- return 0;
+ return err;
}
/*
@@ -3434,6 +3541,7 @@ int __init cpuset_init(void)
fmeter_init(&top_cpuset.fmeter);
set_bit(CS_SCHED_LOAD_BALANCE, &top_cpuset.flags);
top_cpuset.relax_domain_level = -1;
+ INIT_LIST_HEAD(&isol_children);
BUG_ON(!alloc_cpumask_var(&cpus_attach, GFP_KERNEL));
--
2.31.1
I have been running hid-tools for a while, but it was in its own
separate repository for multiple reasons. And the past few weeks
I finally managed to make the kernel tests in that repo in a
state where we can merge them in the kernel tree directly:
- the tests run in ~2 to 3 minutes
- the tests are way more reliable than previously
- the tests are mostly self-contained now (to the exception
of the Sony ones)
To be able to run the tests we need to use the latest release
of hid-tools, as this project still keeps the HID parsing logic
and is capable of generating the HID events.
The series also ensures we can run the tests with vmtest.sh,
allowing for a quick development and test in the tree itself.
This should allow us to require tests to be added to a series
when we see fit and keep them alive properly instead of having
to deal with 2 repositories.
In Cc are all of the people who participated in the elaboration
of those tests, so please send back a signed-off-by for each
commit you are part of.
This series applies on top of the for-6.3/hid-bpf branch, which
is the one that added the tools/testing/selftests/hid directory.
Given that this is unlikely this series will make the cut for
6.3, we might just consider this series to be based on top of
the future 6.3-rc1.
Cheers,
Benjamin
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
Benjamin Tissoires (11):
selftests: hid: make vmtest rely on make
selftests: hid: import hid-tools hid-core tests
selftests: hid: import hid-tools hid-gamepad tests
selftests: hid: import hid-tools hid-keyboards tests
selftests: hid: import hid-tools hid-mouse tests
selftests: hid: import hid-tools hid-multitouch and hid-tablets tests
selftests: hid: import hid-tools wacom tests
selftests: hid: import hid-tools hid-apple tests
selftests: hid: import hid-tools hid-ite tests
selftests: hid: import hid-tools hid-sony and hid-playstation tests
selftests: hid: import hid-tools usb-crash tests
tools/testing/selftests/hid/Makefile | 12 +
tools/testing/selftests/hid/config | 11 +
tools/testing/selftests/hid/hid-apple.sh | 7 +
tools/testing/selftests/hid/hid-core.sh | 7 +
tools/testing/selftests/hid/hid-gamepad.sh | 7 +
tools/testing/selftests/hid/hid-ite.sh | 7 +
tools/testing/selftests/hid/hid-keyboard.sh | 7 +
tools/testing/selftests/hid/hid-mouse.sh | 7 +
tools/testing/selftests/hid/hid-multitouch.sh | 7 +
tools/testing/selftests/hid/hid-sony.sh | 7 +
tools/testing/selftests/hid/hid-tablet.sh | 7 +
tools/testing/selftests/hid/hid-usb_crash.sh | 7 +
tools/testing/selftests/hid/hid-wacom.sh | 7 +
tools/testing/selftests/hid/run-hid-tools-tests.sh | 28 +
tools/testing/selftests/hid/settings | 3 +
tools/testing/selftests/hid/tests/__init__.py | 2 +
tools/testing/selftests/hid/tests/base.py | 345 ++++
tools/testing/selftests/hid/tests/conftest.py | 81 +
.../selftests/hid/tests/descriptors_wacom.py | 1360 +++++++++++++
.../selftests/hid/tests/test_apple_keyboard.py | 440 +++++
tools/testing/selftests/hid/tests/test_gamepad.py | 209 ++
tools/testing/selftests/hid/tests/test_hid_core.py | 154 ++
.../selftests/hid/tests/test_ite_keyboard.py | 166 ++
tools/testing/selftests/hid/tests/test_keyboard.py | 485 +++++
tools/testing/selftests/hid/tests/test_mouse.py | 977 +++++++++
.../testing/selftests/hid/tests/test_multitouch.py | 2088 ++++++++++++++++++++
tools/testing/selftests/hid/tests/test_sony.py | 282 +++
tools/testing/selftests/hid/tests/test_tablet.py | 872 ++++++++
.../testing/selftests/hid/tests/test_usb_crash.py | 103 +
.../selftests/hid/tests/test_wacom_generic.py | 844 ++++++++
tools/testing/selftests/hid/vmtest.sh | 25 +-
31 files changed, 8554 insertions(+), 10 deletions(-)
---
base-commit: 2f7f4efb9411770b4ad99eb314d6418e980248b4
change-id: 20230217-import-hid-tools-tests-dc0cd4f3c8a8
Best regards,
--
Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign()
As a pointer is passed into posix_memalign(), initialize *one_page
to NULL to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the error
is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..94c7dffc4d7d 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)&one_page, pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *s to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/powerpc/stringloops/strlen.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/powerpc/stringloops/strlen.c b/tools/testing/selftests/powerpc/stringloops/strlen.c
index 9055ebc484d0..f9c1f9cc2d32 100644
--- a/tools/testing/selftests/powerpc/stringloops/strlen.c
+++ b/tools/testing/selftests/powerpc/stringloops/strlen.c
@@ -1,5 +1,4 @@
// SPDX-License-Identifier: GPL-2.0
-#include <malloc.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
@@ -51,10 +50,11 @@ static void bench_test(char *s)
static int testcase(void)
{
char *s;
+ int ret;
unsigned long i;
- s = memalign(128, SIZE);
- if (!s) {
+ ret = posix_memalign((void **)&s, 128, SIZE);
+ if (ret < 0) {
perror("memalign");
exit(1);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign().
As a pointer is passed into posix_memalign(),initialize *map to
NULL,to silence a warning about the function's return value being
used as uninitialized (which is not valid anyway because the
error is properly checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..c99350e110ec 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,9 +80,9 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
- ksft_exit_fail_msg("memalign failed\n");
+ ret = posix_memalign((void **)(&map), hpage_len, hpage_len);
+ if (ret < 0)
+ ksft_exit_fail_msg("posix_memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
if (ret)
--
2.27.0
The "test_encl.elf" file used by test_sgx is not installed in
INSTALL_PATH. Attempting to execute test_sgx causes false negative:
"
enclave executable open(): No such file or directory
main.c:188:unclobbered_vdso:Failed to load the test enclave.
"
Add "test_encl.elf" to TEST_FILES so that it will be installed.
Fixes: 2adcba79e69d ("selftests/x86: Add a selftest for SGX")
Signed-off-by: Yi Lai <yi1.lai(a)intel.com>
---
tools/testing/selftests/sgx/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/sgx/Makefile b/tools/testing/selftests/sgx/Makefile
index 75af864e07b6..50aab6b57da3 100644
--- a/tools/testing/selftests/sgx/Makefile
+++ b/tools/testing/selftests/sgx/Makefile
@@ -17,6 +17,7 @@ ENCL_CFLAGS := -Wall -Werror -static -nostdlib -nostartfiles -fPIC \
-fno-stack-protector -mrdrnd $(INCLUDES)
TEST_CUSTOM_PROGS := $(OUTPUT)/test_sgx
+TEST_FILES := $(OUTPUT)/test_encl.elf
ifeq ($(CAN_BUILD_X86_64), 1)
all: $(TEST_CUSTOM_PROGS) $(OUTPUT)/test_encl.elf
--
2.25.1
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/split_huge_page_test.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/testing/selftests/mm/split_huge_page_test.c
index cbb5e6893cbf..8f48f07bc821 100644
--- a/tools/testing/selftests/mm/split_huge_page_test.c
+++ b/tools/testing/selftests/mm/split_huge_page_test.c
@@ -96,10 +96,10 @@ void split_pmd_thp(void)
char *one_page;
size_t len = 4 * pmd_pagesize;
size_t i;
+ int ret;
- one_page = memalign(pmd_pagesize, len);
-
- if (!one_page) {
+ ret = posix_memalign((void **)(&one_page), pmd_pagesize, len);
+ if (ret < 0) {
printf("Fail to allocate memory\n");
exit(EXIT_FAILURE);
}
--
2.27.0
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..4bb7421141a2 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,8 +80,8 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
+ ret = posix_memalign((void *)(&map), hpage_len, hpage_len);
+ if (ret < 0)
ksft_exit_fail_msg("memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
--
2.27.0
Dzień dobry,
zapoznałem się z Państwa ofertą i z przyjemnością przyznaję, że przyciąga uwagę i zachęca do dalszych rozmów.
Pomyślałem, że może mógłbym mieć swój wkład w Państwa rozwój i pomóc dotrzeć z tą ofertą do większego grona odbiorców. Pozycjonuję strony www, dzięki czemu generują świetny ruch w sieci.
Możemy porozmawiać w najbliższym czasie?
Pozdrawiam
Adam Charachuta
memalign() is obsolete according to its manpage.
Replace memalign() with posix_memalign() and remove malloc.h include
that was there for memalign().
As a pointer is passed into posix_memalign(), initialize *p to NULL
to silence a warning about the function's return value being used as
uninitialized (which is not valid anyway because the error is properly
checked before p is returned).
Signed-off-by: Deming Wang <wangdeming(a)inspur.com>
---
tools/testing/selftests/mm/soft-dirty.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c
index 21d8830c5f24..4bb7421141a2 100644
--- a/tools/testing/selftests/mm/soft-dirty.c
+++ b/tools/testing/selftests/mm/soft-dirty.c
@@ -80,8 +80,8 @@ static void test_hugepage(int pagemap_fd, int pagesize)
int i, ret;
size_t hpage_len = read_pmd_pagesize();
- map = memalign(hpage_len, hpage_len);
- if (!map)
+ ret = posix_memalign((void *)(&map), hpage_len, hpage_len)
+ if (ret < 0)
ksft_exit_fail_msg("memalign failed\n");
ret = madvise(map, hpage_len, MADV_HUGEPAGE);
--
2.27.0
On Thu, Apr 06, 2023 at 09:53:37AM -0700, Stefan Roesch wrote:
> + case PR_SET_MEMORY_MERGE:
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (arg2) {
> + int err = ksm_add_mm(me->mm);
> + if (err)
> + return err;
You'll return to userspace with the mutex held, no?
On 06.04.23 18:53, Stefan Roesch wrote:
> This adds the general_profit KSM sysfs knob and the process profit metric
> and process merge type knobs to ksm_stat.
>
> 1) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 2) document general_profit sysfs knob
>
> 3) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 4) add ksm_merge_type() function
>
> This adds the ksm_merge_type function. The function returns the
> merge type for the process. For madvise it returns "madvise", for
> prctl it returns "process" and otherwise it returns "none".
I'm curious, why exactly is this change required in this context? It
might be sufficient to observe if the prctl is set for a process. If
not, the ksm stats can reveal whether KSM is still active for that
process -> madvise.
For your use case, I'd assume it's pretty unnecessary to expose that.
If there is no compelling reason, I'd suggest to drop this and limit
this patch to exposing the general/per-mm profit, which I can understand
why it's desirable when fine-tuning a workload.
[...]
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
> Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +++++
> Documentation/admin-guide/mm/ksm.rst | 8 ++++-
> fs/proc/base.c | 5 +++
> include/linux/ksm.h | 5 +++
> mm/ksm.c | 32 +++++++++++++++++++
> 5 files changed, 57 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-ksm b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> index d244674a9480..7768e90f7a8f 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> +++ b/Documentation/ABI/testing/sysfs-kernel-mm-ksm
> @@ -51,3 +51,11 @@ Description: Control merging pages across different NUMA nodes.
>
> When it is set to 0 only pages from the same node are merged,
> otherwise pages from all nodes can be merged together (default).
> +
> +What: /sys/kernel/mm/ksm/general_profit
> +Date: January 2023
^ No
> +KernelVersion: 6.1
^ Outdated
(kind of weird having to come up with the right numbers before getting
it merged)
[...]
>
> diff --git a/fs/proc/base.c b/fs/proc/base.c
> index 07463ad4a70a..c74450318e05 100644
> --- a/fs/proc/base.c
> +++ b/fs/proc/base.c
> @@ -96,6 +96,7 @@
> #include <linux/time_namespace.h>
> #include <linux/resctrl.h>
> #include <linux/cn_proc.h>
> +#include <linux/ksm.h>
> #include <trace/events/oom.h>
> #include "internal.h"
> #include "fd.h"
> @@ -3199,6 +3200,7 @@ static int proc_pid_ksm_merging_pages(struct seq_file *m, struct pid_namespace *
>
> return 0;
> }
> +
^ unrelated change
> static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns,
> struct pid *pid, struct task_struct *task)
> {
> @@ -3208,6 +3210,9 @@ static int proc_pid_ksm_stat(struct seq_file *m, struct pid_namespace *ns,
> if (mm) {
> seq_printf(m, "ksm_rmap_items %lu\n", mm->ksm_rmap_items);
> seq_printf(m, "zero_pages_sharing %lu\n", mm->ksm_zero_pages_sharing);
> + seq_printf(m, "ksm_merging_pages %lu\n", mm->ksm_merging_pages);
> + seq_printf(m, "ksm_merge_type %s\n", ksm_merge_type(mm));
> + seq_printf(m, "ksm_process_profit %ld\n", ksm_process_profit(mm));
> mmput(mm);
> }
>
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index c65455bf124c..4c32f9bca723 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -60,6 +60,11 @@ struct page *ksm_might_need_to_copy(struct page *page,
> void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc);
> void folio_migrate_ksm(struct folio *newfolio, struct folio *folio);
>
> +#ifdef CONFIG_PROC_FS
> +long ksm_process_profit(struct mm_struct *);
> +const char *ksm_merge_type(struct mm_struct *mm);
> +#endif /* CONFIG_PROC_FS */
> +
> #else /* !CONFIG_KSM */
>
> static inline int ksm_add_mm(struct mm_struct *mm)
> diff --git a/mm/ksm.c b/mm/ksm.c
> index ab95ae0f9def..76b10ff840ac 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -3042,6 +3042,25 @@ static void wait_while_offlining(void)
> }
> #endif /* CONFIG_MEMORY_HOTREMOVE */
>
> +#ifdef CONFIG_PROC_FS
> +long ksm_process_profit(struct mm_struct *mm)
> +{
> + return (long)mm->ksm_merging_pages * PAGE_SIZE -
Do we really need the cast to long? mm->ksm_merging_pages is defined as
"unsigned long". Just like "ksm_pages_sharing" below.
> + mm->ksm_rmap_items * sizeof(struct ksm_rmap_item);
> +}
> +
> +/* Return merge type name as string. */
> +const char *ksm_merge_type(struct mm_struct *mm)
> +{
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + return "process";
> + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> + return "madvise";
> + else
> + return "none";
> +}
> +#endif /* CONFIG_PROC_FS */
> +
Apart from these nits, LGTM (again, I don't see why the merge type
should belong into this patch, and why there is a real need to expose it
like that).
Acked-by: David Hildenbrand <david(a)redhat.com>
--
Thanks,
David / dhildenb
At present the kselftest header can't be used with nolibc since it makes
use of vprintf() which is not available in nolibc. Fortunately nolibc
has a vfprintf() so we can just wrap that in order to allow kselftests
to be built with nolibc and still use the standard kselftest headers
with a small change to prevent the inclusion of the standard libc
headers. This allows us to avoid open coding of KTAP output for
selftests that need to use nolibc in order to test interfaces that are
controlled by libc, we've got several open coded examples of this in the
tree already.
As an example of using this the existing za-fork test is converted to
use kselftest.h. The changes to kselftest and nolibc don't have any
interaction until they are used by a test so could be merged separately
if desired.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Turns out nolibc has a vfprintf() already which we can use so do that.
- Link to v1: https://lore.kernel.org/r/20230405-kselftest-nolibc-v1-0-63fbcd70b202@kerne…
---
Mark Brown (3):
tools/nolibc/stdio: Implement vprintf()
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
tools/include/nolibc/stdio.h | 6 ++
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 ++++++------------------------
tools/testing/selftests/kselftest.h | 2 +
4 files changed, 25 insertions(+), 73 deletions(-)
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230405-kselftest-nolibc-cb2ce0446d09
Best regards,
--
Mark Brown <broonie(a)kernel.org>
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When access traced function arguments with type is u32*, bpf verifier failed.
Because u32 have typedef, needs to skip modifier. Add btf_type_is_modifier in
is_int_ptr. Add a selftest to check it.
Feng Zhou (2):
bpf/btf: Fix is_int_ptr()
selftests/bpf: Add test to access u32 ptr argument in tracing program
Changelog:
v2->v3: Addressed comments from jirka
- Fix an issue that caused other test items to fail
Details in here:
https://lore.kernel.org/all/20230407084608.62296-1-zhoufeng.zf@bytedance.co…
v1->v2: Addressed comments from Martin KaFai Lau
- Add a selftest.
- use btf_type_skip_modifiers.
Some details in here:
https://lore.kernel.org/all/20221012125815.76120-1-zhouchengming@bytedance.…
kernel/bpf/btf.c | 8 ++------
net/bpf/test_run.c | 8 +++++++-
.../testing/selftests/bpf/verifier/btf_ctx_access.c | 13 +++++++++++++
3 files changed, 22 insertions(+), 7 deletions(-)
--
2.20.1
This is the basic functionality for iommufd to support
iommufd_device_replace() and IOMMU_HWPT_ALLOC for physical devices.
iommufd_device_replace() allows changing the HWPT associated with the
device to a new IOAS or HWPT. Replace does this in way that failure leaves
things unchanged, and utilizes the iommu iommu_group_replace_domain() API
to allow the iommu driver to perform an optional non-disruptive change.
IOMMU_HWPT_ALLOC allows HWPTs to be explicitly allocated by the user and
used by attach or replace. At this point it isn't very useful since the
HWPT is the same as the automatically managed HWPT from the IOAS. However
a following series will allow userspace to customize the created HWPT.
The implementation is complicated because we have to introduce some
per-iommu_group memory in iommufd and redo how we think about multi-device
groups to be more explicit. This solves all the locking problems in the
prior attempts.
This series is infrastructure work for the following series which:
- Add replace for attach
- Expose replace through VFIO APIs
- Implement driver parameters for HWPT creation (nesting)
I'll update the linux-next branch with this version and keep it on a side
branch and accumulate the following series when they are ready so we can have
a stable base and make more incremental progress. When we have all the parts
together to get a full implementation it can go to Linus.
This is on github: https://github.com/jgunthorpe/linux/commits/iommufd_hwpt
v5:
- Got back to the v3 version of the code, keep the comment changes from
v4. Syzkaller says the group lock change in v4 didn't work.
- Adjust the fail_nth test to cover the path syzkaller found. We need to
have an ioas with a mapped page installed to inject a failure during
domain attachment.
v4: https://lore.kernel.org/r/0-v4-9cd79ad52ee8+13f5-iommufd_alloc_jgg@nvidia.c…
- Refine comments and commit messages
- Move the group lock into iommufd_hw_pagetable_attach()
- Fix error unwind in iommufd_device_do_replace()
v3: https://lore.kernel.org/r/0-v3-61d41fd9e13e+1f5-iommufd_alloc_jgg@nvidia.com
- Refine comments and commit messages
- Adjust the flow in iommufd_device_auto_get_domain() so pt_id is only
set on success
- Reject replace on non-attached devices
- Add missing __reserved check for IOMMU_HWPT_ALLOC
v2: https://lore.kernel.org/r/0-v2-51b9896e7862+8a8c-iommufd_alloc_jgg@nvidia.c…
- Use WARN_ON for the igroup->group test and move that logic to a
function iommufd_group_try_get()
- Change igroup->devices to igroup->device list
Replace will need to iterate over all attached idevs
- Rename to iommufd_group_setup_msi()
- New patch to export iommu_get_resv_regions()
- New patch to use per-device reserved regions instead of per-group
regions
- Split out the reorganizing of iommufd_device_change_pt() from the
replace patch
- Replace uses the per-dev reserved regions
- Use stdev_id in a few more places in the selftest
- Fix error handling in IOMMU_HWPT_ALLOC
- Clarify comments
- Rebase on v6.3-rc1
v1: https://lore.kernel.org/all/0-v1-7612f88c19f5+2f21-iommufd_alloc_jgg@nvidia…
Jason Gunthorpe (15):
iommufd: Move isolated msi enforcement to iommufd_device_bind()
iommufd: Add iommufd_group
iommufd: Replace the hwpt->devices list with iommufd_group
iommu: Export iommu_get_resv_regions()
iommufd: Keep track of each device's reserved regions instead of
groups
iommufd: Use the iommufd_group to avoid duplicate MSI setup
iommufd: Make sw_msi_start a group global
iommufd: Move putting a hwpt to a helper function
iommufd: Add enforced_cache_coherency to iommufd_hw_pagetable_alloc()
iommufd: Reorganize iommufd_device_attach into
iommufd_device_change_pt
iommufd: Add iommufd_device_replace()
iommufd: Make destroy_rwsem use a lock class per object type
iommufd: Add IOMMU_HWPT_ALLOC
iommufd/selftest: Return the real idev id from selftest mock_domain
iommufd/selftest: Add a selftest for IOMMU_HWPT_ALLOC
Nicolin Chen (2):
iommu: Introduce a new iommu_group_replace_domain() API
iommufd/selftest: Test iommufd_device_replace()
drivers/iommu/iommu-priv.h | 10 +
drivers/iommu/iommu.c | 41 +-
drivers/iommu/iommufd/device.c | 516 +++++++++++++-----
drivers/iommu/iommufd/hw_pagetable.c | 96 +++-
drivers/iommu/iommufd/io_pagetable.c | 27 +-
drivers/iommu/iommufd/iommufd_private.h | 51 +-
drivers/iommu/iommufd/iommufd_test.h | 6 +
drivers/iommu/iommufd/main.c | 17 +-
drivers/iommu/iommufd/selftest.c | 40 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/iommufd.h | 26 +
tools/testing/selftests/iommu/iommufd.c | 67 ++-
.../selftests/iommu/iommufd_fail_nth.c | 67 ++-
tools/testing/selftests/iommu/iommufd_utils.h | 63 ++-
14 files changed, 825 insertions(+), 203 deletions(-)
create mode 100644 drivers/iommu/iommu-priv.h
base-commit: fd8c1a4aee973e87d890a5861e106625a33b2c4e
--
2.40.0
From: Benjamin Berg <benjamin.berg(a)intel.com>
The existing KUNIT_ARRAY_PARAM macro requires a separate function to
get the description. However, in a lot of cases the description can
just be copied directly from the array. Add a second macro that
avoids having to write a static function just for a single strscpy.
Signed-off-by: Benjamin Berg <benjamin.berg(a)intel.com>
---
include/kunit/test.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/include/kunit/test.h b/include/kunit/test.h
index 08d3559dd703..519b90261c72 100644
--- a/include/kunit/test.h
+++ b/include/kunit/test.h
@@ -1414,6 +1414,25 @@ do { \
return NULL; \
}
+/**
+ * KUNIT_ARRAY_PARAM_DESC() - Define test parameter generator from an array.
+ * @name: prefix for the test parameter generator function.
+ * @array: array of test parameters.
+ * @desc_member: structure member from array element to use as description
+ *
+ * Define function @name_gen_params which uses @array to generate parameters.
+ */
+#define KUNIT_ARRAY_PARAM_DESC(name, array, desc_member) \
+ static const void *name##_gen_params(const void *prev, char *desc) \
+ { \
+ typeof((array)[0]) *__next = prev ? ((typeof(__next)) prev) + 1 : (array); \
+ if (__next - (array) < ARRAY_SIZE((array))) { \
+ strscpy(desc, __next->desc_member, KUNIT_PARAM_DESC_SIZE); \
+ return __next; \
+ } \
+ return NULL; \
+ }
+
// TODO(dlatypov(a)google.com): consider eventually migrating users to explicitly
// include resource.h themselves if they need it.
#include <kunit/resource.h>
--
2.39.2
No need to call maybe_mkwrite() to then wrprotect if the source PMD was not
writable.
It's worth nothing that this now allows for PTEs to be writable even if
the source PMD was not writable: if vma->vm_page_prot includes write
permissions.
As documented in commit 931298e103c2 ("mm/userfaultfd: rely on
vma->vm_page_prot in uffd_wp_range()"), any mechanism that intends to
have pages wrprotected (COW, writenotify, mprotect, uffd-wp, softdirty,
...) has to properly adjust vma->vm_page_prot upfront, to not include
write permissions. If vma->vm_page_prot includes write permissions, the
PTE/PMD can be writable as default.
This now mimics the handling in mm/migrate.c:remove_migration_pte() and in
mm/huge_memory.c:remove_migration_pmd(), which has been in place for a
long time (except that 96a9c287e25d ("mm/migrate: fix wrongly apply write
bit after mkdirty on sparc64") temporarily changed it).
Signed-off-by: David Hildenbrand <david(a)redhat.com>
---
mm/huge_memory.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 6f3af65435c8..8332e16ac97b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2235,11 +2235,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
entry = pte_swp_mkuffd_wp(entry);
} else {
entry = mk_pte(page + i, READ_ONCE(vma->vm_page_prot));
- entry = maybe_mkwrite(entry, vma);
+ if (write)
+ entry = maybe_mkwrite(entry, vma);
if (anon_exclusive)
SetPageAnonExclusive(page + i);
- if (!write)
- entry = pte_wrprotect(entry);
if (!young)
entry = pte_mkold(entry);
/* NOTE: this may set soft-dirty too on some archs */
--
2.39.2
[ This series depends on the VFIO device cdev series ]
Changelog
v6:
* Rebased on top of cdev v8 series
https://lore.kernel.org/kvm/20230327094047.47215-1-yi.l.liu@intel.com/
* Added "Reviewed-by" from Kevin to PATCH-4
* Squashed access->ioas updating lines into iommufd_access_change_pt(),
and changed function return type accordingly for simplification.
v5:
https://lore.kernel.org/linux-iommu/cover.1679559476.git.nicolinc@nvidia.co…
* Kept the cmd->id in the iommufd_test_create_access() so the access can
be created with an ioas by default. Then, renamed the previous ioctl
IOMMU_TEST_OP_ACCESS_SET_IOAS to IOMMU_TEST_OP_ACCESS_REPLACE_IOAS, so
it would be used to replace an access->ioas pointer.
* Added iommufd_access_replace() API after the introductions of the other
two APIs iommufd_access_attach() and iommufd_access_detach().
* Since vdev->iommufd_attached is also set in emulated pathway too, call
iommufd_access_update(), similar to the physical pathway.
v4:
https://lore.kernel.org/linux-iommu/cover.1678284812.git.nicolinc@nvidia.co…
* Rebased on top of Jason's series adding replace() and hwpt_alloc()
https://lore.kernel.org/linux-iommu/0-v2-51b9896e7862+8a8c-iommufd_alloc_jg…
* Rebased on top of cdev series v6
https://lore.kernel.org/kvm/20230308132903.465159-1-yi.l.liu@intel.com/
* Dropped the patch that's moved to cdev series.
* Added unmap function pointer sanity before calling it.
* Added "Reviewed-by" from Kevin and Yi.
* Added back the VFIO change updating the ATTACH uAPI.
v3:
https://lore.kernel.org/linux-iommu/cover.1677288789.git.nicolinc@nvidia.co…
* Rebased on top of Jason's iommufd_hwpt branch:
https://lore.kernel.org/linux-iommu/0-v2-406f7ac07936+6a-iommufd_hwpt_jgg@n…
* Dropped patches from this series accordingly. There were a couple of
VFIO patches that will be submitted after the VFIO cdev series. Also,
renamed the series to be "emulated".
* Moved dma_unmap sanity patch to the first in the series.
* Moved dma_unmap sanity to cover both VFIO and IOMMUFD pathways.
* Added Kevin's "Reviewed-by" to two of the patches.
* Fixed a NULL pointer bug in vfio_iommufd_emulated_bind().
* Moved unmap() call to the common place in iommufd_access_set_ioas().
v2:
https://lore.kernel.org/linux-iommu/cover.1675802050.git.nicolinc@nvidia.co…
* Rebased on top of vfio_device cdev v2 series.
* Update the kdoc and commit message of iommu_group_replace_domain().
* Dropped revert-to-core-domain part in iommu_group_replace_domain().
* Dropped !ops->dma_unmap check in vfio_iommufd_emulated_attach_ioas().
* Added missing rc value in vfio_iommufd_emulated_attach_ioas() from the
iommufd_access_set_ioas() call.
* Added a new patch in vfio_main to deny vfio_pin/unpin_pages() calls if
vdev->ops->dma_unmap is not implemented.
* Added a __iommmufd_device_detach helper and let the replace routine do
a partial detach().
* Added restriction on auto_domains to use the replace feature.
* Added the patch "iommufd/device: Make hwpt_list list_add/del symmetric"
from the has_group removal series.
v1:
https://lore.kernel.org/linux-iommu/cover.1675320212.git.nicolinc@nvidia.co…
Hi all,
The existing IOMMU APIs provide a pair of functions: iommu_attach_group()
for callers to attach a device from the default_domain (NULL if not being
supported) to a given iommu domain, and iommu_detach_group() for callers
to detach a device from a given domain to the default_domain. Internally,
the detach_dev op is deprecated for the newer drivers with default_domain.
This means that those drivers likely can switch an attaching domain to
another one, without stagging the device at a blocking or default domain,
for use cases such as:
1) vPASID mode, when a guest wants to replace a single pasid (PASID=0)
table with a larger table (PASID=N)
2) Nesting mode, when switching the attaching device from an S2 domain
to an S1 domain, or when switching between relevant S1 domains.
This series is rebased on top of Jason Gunthorpe's series that introduces
iommu_group_replace_domain API and IOMMUFD infrastructure for the IOMMUFD
"physical" devices. The IOMMUFD "emulated" deivces will need some extra
steps to replace the access->ioas object and its iopt pointer.
You can also find this series on Github:
https://github.com/nicolinc/iommufd/commits/iommu_group_replace_domain-v6
Thank you
Nicolin Chen
Nicolin Chen (4):
vfio: Do not allow !ops->dma_unmap in vfio_pin/unpin_pages()
iommufd: Add iommufd_access_replace() API
iommufd/selftest: Add IOMMU_TEST_OP_ACCESS_REPLACE_IOAS coverage
vfio: Support IO page table replacement
drivers/iommu/iommufd/device.c | 53 ++++++++++++++-----
drivers/iommu/iommufd/iommufd_test.h | 4 ++
drivers/iommu/iommufd/selftest.c | 19 +++++++
drivers/vfio/iommufd.c | 11 ++--
drivers/vfio/vfio_main.c | 4 ++
include/linux/iommufd.h | 1 +
include/uapi/linux/vfio.h | 6 +++
tools/testing/selftests/iommu/iommfd*.c | 0
tools/testing/selftests/iommu/iommufd.c | 29 +++++++++-
tools/testing/selftests/iommu/iommufd_utils.h | 19 +++++++
10 files changed, 127 insertions(+), 19 deletions(-)
create mode 100644 tools/testing/selftests/iommu/iommfd*.c
--
2.40.0
From: Rong Tao <rongtao(a)cestc.cn>
When the number of symbols is greater than MAX_SYMS (300000), the access
array struct ksym syms[MAX_SYMS] goes out of bounds, which will result in
a segfault.
Resolve this issue by judging the maximum number and exiting the loop, and
increasing the default size appropriately. (6.2.9 = 329839 below)
$ cat /proc/kallsyms | wc -l
329839
GDB debugging:
$ cd linux/samples/bpf
$ sudo gdb ./sampleip
...
(gdb) r
...
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7e2debf in malloc () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install
elfutils-libelf-0.189-1.fc37.x86_64 glibc-2.36-9.fc37.x86_64
libzstd-1.5.4-1.fc37.x86_64 zlib-1.2.12-5.fc37.x86_64
(gdb) bt
#0 0x00007ffff7e2debf in malloc () from /lib64/libc.so.6
#1 0x00007ffff7e33f8e in strdup () from /lib64/libc.so.6
#2 0x0000000000403fb0 in load_kallsyms_refresh() from trace_helpers.c
#3 0x00000000004038b2 in main ()
Signed-off-by: Rong Tao <rongtao(a)cestc.cn>
---
tools/testing/selftests/bpf/trace_helpers.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/trace_helpers.c b/tools/testing/selftests/bpf/trace_helpers.c
index 09a16a77bae4..a9d589c560d2 100644
--- a/tools/testing/selftests/bpf/trace_helpers.c
+++ b/tools/testing/selftests/bpf/trace_helpers.c
@@ -14,7 +14,7 @@
#define DEBUGFS "/sys/kernel/debug/tracing/"
-#define MAX_SYMS 300000
+#define MAX_SYMS 400000
static struct ksym syms[MAX_SYMS];
static int sym_cnt;
@@ -44,7 +44,8 @@ int load_kallsyms_refresh(void)
continue;
syms[i].addr = (long) addr;
syms[i].name = strdup(func);
- i++;
+ if (++i >= MAX_SYMS)
+ break;
}
fclose(f);
sym_cnt = i;
--
2.39.2
There is a spelling mistake in a test assert message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/kvm/aarch64/smccc_filter.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/aarch64/smccc_filter.c b/tools/testing/selftests/kvm/aarch64/smccc_filter.c
index 0f9db0641847..82650313451a 100644
--- a/tools/testing/selftests/kvm/aarch64/smccc_filter.c
+++ b/tools/testing/selftests/kvm/aarch64/smccc_filter.c
@@ -211,7 +211,7 @@ static void expect_call_fwd_to_user(struct kvm_vcpu *vcpu, uint32_t func_id,
"KVM_HYPERCALL_EXIT_SMC is not set");
else
TEST_ASSERT(!(run->hypercall.flags & KVM_HYPERCALL_EXIT_SMC),
- "KVM_HYPERCAL_EXIT_SMC is set");
+ "KVM_HYPERCALL_EXIT_SMC is set");
}
/* SMCCC calls forwarded to userspace cause KVM_EXIT_HYPERCALL exits */
--
2.30.2
From: Feng Zhou <zhoufeng.zf(a)bytedance.com>
When access traced function arguments with type is u32*, bpf verifier failed.
Because u32 have typedef, needs to skip modifier. Add btf_type_is_modifier in
is_int_ptr. Add a selftest to check it.
Feng Zhou (2):
bpf/btf: Fix is_int_ptr()
selftests/bpf: Add test to access u32 ptr argument in tracing program
Changelog:
v1->v2: Addressed comments from Martin KaFai Lau
- Add a selftest.
- use btf_type_skip_modifiers.
Some details in here:
https://lore.kernel.org/all/20221012125815.76120-1-zhouchengming@bytedance.…
kernel/bpf/btf.c | 5 ++---
net/bpf/test_run.c | 8 +++++++-
.../testing/selftests/bpf/verifier/btf_ctx_access.c | 13 +++++++++++++
3 files changed, 22 insertions(+), 4 deletions(-)
--
2.20.1
On Thu, 6 Apr 2023 09:53:37 -0700 Stefan Roesch <shr(a)devkernel.io> wrote:
> So far KSM can only be enabled by calling madvise for memory regions. To
> be able to use KSM for more workloads, KSM needs to have the ability to be
> enabled / disabled at the process / cgroup level.
>
> ...
>
> @@ -53,6 +62,18 @@ void folio_migrate_ksm(struct folio *newfolio, struct folio *folio);
>
> #else /* !CONFIG_KSM */
>
> +static inline int ksm_add_mm(struct mm_struct *mm)
> +{
> +}
The compiler doesn't like the lack of a return value.
I queued up a patch to simply delete the above function - seems that
ksm_add_mm() has no callers if CONFIG_KSM=n.
The same might be true of the ksm_add_vma()...ksm_exit() stubs also,
Perhaps some kind soul could take a look at whether we can simply clean
those out.
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
---
v2:
- Fixes for checkpatch.pl
- Fixes for kernel test robot
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 28 +++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 119 ++++++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 +++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 +++++++++++++++++
tools/testing/selftests/bpf/test_tunnel.sh | 81 ++++++++++++
10 files changed, 362 insertions(+), 17 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
Hi Christophe Leroy and gpio experts,
Greeting!
Platform: Tigerlake-H and so on x86 platforms
All detailed info is in link: https://github.com/xupengfe/syzkaller_logs/tree/main/issue_bisect/230406_gp…
Bisect info: https://github.com/xupengfe/syzkaller_logs/blob/main/issue_bisect/230406_gp…
gpio-mockup.sh kslef-test failed in v6.3-rc5 kernel.
gpio-mockup.sh(gpio overflow test) in kself-test could reproduce this issue:
cd linux/tools/testing/selftests
1. ./kselftest_install.sh
2. cd linux/tools/testing/selftests/kselftest_install/gpio
# ./gpio-mockup.sh
1. Module load tests
1.1. dynamic allocation of gpio
2. Module load error tests
2.1 gpio overflow
test failed: unexpected chip - gpiochip1
GPIO gpio-mockup test FAIL
And the simplified steps to reproduce this issue are as follow:
"
# Load gpio_mockup with overflow ranges -1,1024:
modprobe -q gpio_mockup gpio_mockup_ranges="-1,1024"
# Check is there some Call Trace generated in dmesg
dmesg | grep -C 5 Call
# Should not generate any gpiochip folder like /sys/kernel/debug/gpio-mockup/gpiochip1
# Because load gpio_mockup with overflow ranges -1,1024
find "/sys/kernel/debug/gpio-mockup/" -name gpiochip* -type d | sort
# Unload the gpio_mockup module
modprobe -r gpio_mockup
# Check is there "Call Trace" generated in dmesg
dmesg | grep -C 5 Call
"
Actually the judgement "gpio-mockup.sh" test/bisect judgement point is that:
Should not generate any gpiochip folder like
/sys/kernel/debug/gpio-mockup/gpiochip1 after load gpio_mockup with overflow
ranges -1,1024.
I met gpio-mockup.sh test failed but there is no any "Call Trace" dmesg info
sometimes.
So the shortest check steps are as follow:
"
1. modprobe -q gpio_mockup gpio_mockup_ranges="-1,1024"
After above gpio_mockup module loaded with overflow range "-1,1024":
Correct behavior as previous v6.1 or older kernel:"gpio should not load "gpiochip1" due to overflow range -1,1024";
Wrong behavior in v6.3-rc5 kernel: "gpio *load* "gpiochip1" with overflow range -1,1024 and "gpiochip1" should not be loaded".
The underlying problem was already buried here.
2. Could use below command to check if "gpiochip1" generated:
As before v6.1, there was no "/sys/kernel/debug/gpio-mockup/gpiochip1" sysfs folder due to overflow range -1,1024";
Wrong behavior in v6.3-rc5 kernel: "/sys/kernel/debug/gpio-mockup/gpiochip1" sysfs folder generated as follow command check:
# find "/sys/kernel/debug/gpio-mockup/" -name gpiochip* -type d | sort
/sys/kernel/debug/gpio-mockup/gpiochip1
If there is gpiochip* generated, gpio-mockup.sh kself-test would be failed also.
"
Bisected and found the bad commit was:
"
7b61212f2a07a5afd213c8876e52b5c9946441e2
gpiolib: Get rid of ARCH_NR_GPIOS
"
And after reverted the above commit on top of v6.3-rc5 kernel, above
gpio-mockup.sh kself-test could pass and this issue was gone.
Now gpio-mockup.sh kself-test is failed on almost all x86 platform from
v6.2 cycle mainline kernel.
I hope above info is helpful to solve the "gpio-mockup.sh kself-test failed"
problem.
Thanks!
BR.
Hi Linus,
Please pull the following Kselftest fixes update for Linux 6.3-rc6.
This Kselftest fixes update for Linux 6.3-rc6 consists of one single
fix to mount_setattr_test build failure.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 05107edc910135d27fe557267dc45be9630bf3dd:
selftests: sigaltstack: fix -Wuninitialized (2023-03-20 17:28:31 -0600)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest tags/linux-kselftest-fixes-6.3-rc6
for you to fetch changes up to f1594bc676579133a3cd906d7d27733289edfb86:
selftests mount: Fix mount_setattr_test builds failed (2023-03-31 09:18:45 -0600)
----------------------------------------------------------------
linux-kselftest-fixes-6.3-rc6
This Kselftest fixes update for Linux 6.3-rc6 consists of one single
fix to mount_setattr_test build failure.
----------------------------------------------------------------
Anh Tuan Phan (1):
selftests mount: Fix mount_setattr_test builds failed
tools/testing/selftests/mount_setattr/mount_setattr_test.c | 1 +
1 file changed, 1 insertion(+)
----------------------------------------------------------------
So far KSM can only be enabled by calling madvise for memory regions. To
be able to use KSM for more workloads, KSM needs to have the ability to be
enabled / disabled at the process / cgroup level.
Use case 1:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Use case 2:
The same interpreter can also be used for workloads where KSM brings no
benefit or even has overhead. We'd like to be able to enable KSM on a workload
by workload basis.
Use case 3:
With the madvise call sharing opportunities are only enabled for the current
process: it is a workload-local decision. A considerable number of sharing
opportuniites may exist across multiple workloads or jobs. Only a higler level
entity like a job scheduler or container can know for certain if its running
one or more instances of a job. That job scheduler however doesn't have
the necessary internal worklaod knowledge to make targeted madvise calls.
Security concerns:
In previous discussions security concerns have been brought up. The problem is
that an individual workload does not have the knowledge about what else is
running on a machine. Therefore it has to be very conservative in what memory
areas can be shared or not. However, if the system is dedicated to running
multiple jobs within the same security domain, its the job scheduler that has
the knowledge that sharing can be safely enabled and is even desirable.
Performance:
Experiments with using UKSM have shown a capacity increase of around 20%.
1. New options for prctl system command
This patch series adds two new options to the prctl system call. The first
one allows to enable KSM at the process level and the second one to query the
setting.
The setting will be inherited by child processes.
With the above setting, KSM can be enabled for the seed process of a cgroup
and all processes in the cgroup will inherit the setting.
2. Changes to KSM processing
When KSM is enabled at the process level, the KSM code will iterate over all
the VMA's and enable KSM for the eligible VMA's.
When forking a process that has KSM enabled, the setting will be inherited by
the new child process.
In addition when KSM is disabled for a process, KSM will be disabled for the
VMA's where KSM has been enabled.
3. Add general_profit metric
The general_profit metric of KSM is specified in the documentation, but not
calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm.
4. Add more metrics to ksm_stat
This adds the process profit and ksm type metric to /proc/<pid>/ksm_stat.
5. Add more tests to ksm_tests
This adds an option to specify the merge type to the ksm_tests. This allows to
test madvise and prctl KSM. It also adds a new option to query if prctl KSM has
been enabled. It adds a fork test to verify that the KSM process setting is
inherited by client processes.
Changes:
- V4:
- removing check in prctl for MMF_VM_MERGEABLE in PR_SET_MEMORY_MERGE
handling
- Checking for VM_MERGEABLE AND MMF_VM_MERGE_ANY to avoid chaning vm_flags
- This requires also checking that the vma is compatible. The
compatibility check is provided by a new helper
- processes which have set MMF_VM_MERGE_ANY, only need to call the
helper and not madvise.
- removed unmerge_vmas function, this function is no longer necessary,
clearing the MMF_VM_MERGE_ANY bit is sufficient
- V3:
- folded patch 1 - 6
- folded patch 7 - 14
- folded patch 15 - 19
- Expanded on the use cases in the cover letter
- Added a section on security concerns to the cover letter
- V2:
- Added use cases to the cover letter
- Removed the tracing patch from the patch series and posted it as an
individual patch
- Refreshed repo
Stefan Roesch (3):
mm: add new api to enable ksm per process
mm: add new KSM process and sysfs knobs
selftests/mm: add new selftests for KSM
Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +
Documentation/admin-guide/mm/ksm.rst | 8 +-
fs/proc/base.c | 5 +
include/linux/ksm.h | 19 +-
include/linux/sched/coredump.h | 1 +
include/uapi/linux/prctl.h | 2 +
kernel/sys.c | 27 ++
mm/ksm.c | 137 +++++++---
tools/include/uapi/linux/prctl.h | 2 +
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/ksm_tests.c | 254 +++++++++++++++---
11 files changed, 388 insertions(+), 77 deletions(-)
--
2.34.1
So far KSM can only be enabled by calling madvise for memory regions. To
be able to use KSM for more workloads, KSM needs to have the ability to be
enabled / disabled at the process / cgroup level.
Use case 1:
The madvise call is not available in the programming language. An example for
this are programs with forked workloads using a garbage collected language without
pointers. In such a language madvise cannot be made available.
In addition the addresses of objects get moved around as they are garbage
collected. KSM sharing needs to be enabled "from the outside" for these type of
workloads.
Use case 2:
The same interpreter can also be used for workloads where KSM brings no
benefit or even has overhead. We'd like to be able to enable KSM on a workload
by workload basis.
Use case 3:
With the madvise call sharing opportunities are only enabled for the current
process: it is a workload-local decision. A considerable number of sharing
opportunities may exist across multiple workloads or jobs (if they are part
of the same security domain). Only a higler level entity like a job scheduler
or container can know for certain if its running one or more instances of a
job. That job scheduler however doesn't have the necessary internal workload
knowledge to make targeted madvise calls.
Security concerns:
In previous discussions security concerns have been brought up. The problem is
that an individual workload does not have the knowledge about what else is
running on a machine. Therefore it has to be very conservative in what memory
areas can be shared or not. However, if the system is dedicated to running
multiple jobs within the same security domain, its the job scheduler that has
the knowledge that sharing can be safely enabled and is even desirable.
Performance:
Experiments with using UKSM have shown a capacity increase of around 20%.
1. New options for prctl system command
This patch series adds two new options to the prctl system call. The first
one allows to enable KSM at the process level and the second one to query the
setting.
The setting will be inherited by child processes.
With the above setting, KSM can be enabled for the seed process of a cgroup
and all processes in the cgroup will inherit the setting.
2. Changes to KSM processing
When KSM is enabled at the process level, the KSM code will iterate over all
the VMA's and enable KSM for the eligible VMA's.
When forking a process that has KSM enabled, the setting will be inherited by
the new child process.
In addition when KSM is disabled for a process, KSM will be disabled for the
VMA's where KSM has been enabled.
3. Add general_profit metric
The general_profit metric of KSM is specified in the documentation, but not
calculated. This adds the general profit metric to /sys/kernel/debug/mm/ksm.
4. Add more metrics to ksm_stat
This adds the process profit and ksm type metric to /proc/<pid>/ksm_stat.
5. Add more tests to ksm_tests
This adds an option to specify the merge type to the ksm_tests. This allows to
test madvise and prctl KSM. It also adds a new option to query if prctl KSM has
been enabled. It adds a fork test to verify that the KSM process setting is
inherited by client processes.
Changes:
- V5:
- When the prctl system call is invoked, mark all compatible VMA
as mergeable
- Instead of checcking during scan if VMA is mergeable, mark the VMA
mergeable when the VMA is created (in case the VMA is compatible)
- Remove earlier changes, they are no longer necessary
- Unset the flag MMF_VM_MERGE_ANY in gmap_mark_unmergeable().
- When unsetting the MMF_VM_MERGE_ANY flag with prctl, only unset the
flag
- Remove pages_volatile function (with the simplar general_profit calculation,
the function is no longer needed)
- Use simpler formula for calculation of general_profit
- V4:
- removing check in prctl for MMF_VM_MERGEABLE in PR_SET_MEMORY_MERGE
handling
- Checking for VM_MERGEABLE AND MMF_VM_MERGE_ANY to avoid chaning vm_flags
- This requires also checking that the vma is compatible. The
compatibility check is provided by a new helper
- processes which have set MMF_VM_MERGE_ANY, only need to call the
helper and not madvise.
- removed unmerge_vmas function, this function is no longer necessary,
clearing the MMF_VM_MERGE_ANY bit is sufficient
- V3:
- folded patch 1 - 6
- folded patch 7 - 14
- folded patch 15 - 19
- Expanded on the use cases in the cover letter
- Added a section on security concerns to the cover letter
- V2:
- Added use cases to the cover letter
- Removed the tracing patch from the patch series and posted it as an
individual patch
- Refreshed repo
Stefan Roesch (3):
mm: add new api to enable ksm per process
mm: add new KSM process and sysfs knobs
selftests/mm: add new selftests for KSM
Documentation/ABI/testing/sysfs-kernel-mm-ksm | 8 +
Documentation/admin-guide/mm/ksm.rst | 8 +-
arch/s390/mm/gmap.c | 1 +
fs/proc/base.c | 5 +
include/linux/ksm.h | 36 ++-
include/linux/sched/coredump.h | 1 +
include/uapi/linux/prctl.h | 2 +
kernel/fork.c | 1 +
kernel/sys.c | 24 ++
mm/ksm.c | 143 ++++++++--
mm/mmap.c | 7 +
tools/include/uapi/linux/prctl.h | 2 +
tools/testing/selftests/mm/Makefile | 2 +-
tools/testing/selftests/mm/ksm_tests.c | 254 +++++++++++++++---
14 files changed, 426 insertions(+), 68 deletions(-)
--
2.34.1
This patch set makes it possible to have synchronized dynamic ATU and FDB
entries on locked ports. As locked ports are not able to automatically
learn, they depend on userspace added entries, where userspace can add
static or dynamic entries. The lifetime of static entries are completely
dependent on userspace intervention, and thus not of interest here. We
are only concerned with dynamic entries, which can be added with a
command like:
bridge fdb replace ADDR dev <DEV> master dynamic
We choose only to support this feature on locked ports, as it involves
utilizing the CPU to handle ATU related switchcore events (typically
interrupts) and thus can result in significant performance loss if
exposed to heavy traffic.
On locked ports it is important for userspace to know when an authorized
station has become silent, hence not breaking the communication of a
station that has been authorized based on the MAC-Authentication Bypass
(MAB) scheme. Thus if the station keeps being active after authorization,
it will continue to have an open port as long as it is active. Only after
a silent period will it have to be reauthorized. As the ageing process in
the ATU is dependent on incoming traffic to the switchcore port, it is
necessary for the ATU to signal that an entry has aged out, so that the
FDB can be updated at the correct time.
This patch set includes a solution for the Marvell mv88e6xxx driver, where
for this driver we use the Hold-At-One feature so that an age-out
violation interrupt occurs when a station has been silent for the
system-set age time. The age out violation interrupt allows the switchcore
driver to remove both the ATU and the FDB entry at the same time.
It is up to the maintainers of other switchcore drivers to implement the
feature for their specific driver.
LOG:
V2: Ensure the port is locked when using the feature as we
must ensure that learning is enabled at all times for
the interrupts to occur. This was missed in the previous
version.
Instead of ignoring unsupported flags, ensure that
drivers are only called when supporting the feature.
As 'dynamic' flag is legacy, all drivers support it at
least by their previous handling.
Hans J. Schultz (6):
net: bridge: add dynamic flag to switchdev notifier
net: dsa: propagate flags down towards drivers
drivers: net: dsa: add fdb entry flags incoming to switchcore drivers
net: bridge: ensure FDB offloaded flag is handled as needed
net: dsa: mv88e6xxx: implementation of dynamic ATU entries
selftests: forwarding: add dynamic FDB test
drivers/net/dsa/b53/b53_common.c | 4 +-
drivers/net/dsa/b53/b53_priv.h | 4 +-
drivers/net/dsa/hirschmann/hellcreek.c | 4 +-
drivers/net/dsa/lan9303-core.c | 4 +-
drivers/net/dsa/lantiq_gswip.c | 4 +-
drivers/net/dsa/microchip/ksz_common.c | 6 +-
drivers/net/dsa/mt7530.c | 4 +-
drivers/net/dsa/mv88e6xxx/chip.c | 20 ++++--
drivers/net/dsa/mv88e6xxx/chip.h | 9 ++-
drivers/net/dsa/mv88e6xxx/global1_atu.c | 21 +++++++
drivers/net/dsa/mv88e6xxx/port.c | 6 +-
drivers/net/dsa/mv88e6xxx/switchdev.c | 61 +++++++++++++++++++
drivers/net/dsa/mv88e6xxx/switchdev.h | 5 ++
drivers/net/dsa/mv88e6xxx/trace.h | 5 ++
drivers/net/dsa/ocelot/felix.c | 4 +-
drivers/net/dsa/qca/qca8k-common.c | 4 +-
drivers/net/dsa/qca/qca8k.h | 4 +-
drivers/net/dsa/rzn1_a5psw.c | 4 +-
drivers/net/dsa/sja1105/sja1105_main.c | 11 ++--
include/net/dsa.h | 9 ++-
include/net/switchdev.h | 1 +
net/bridge/br_fdb.c | 5 +-
net/bridge/br_switchdev.c | 1 +
net/dsa/dsa.c | 6 ++
net/dsa/port.c | 28 +++++----
net/dsa/port.h | 8 +--
net/dsa/slave.c | 20 ++++--
net/dsa/switch.c | 26 +++++---
net/dsa/switch.h | 1 +
.../net/forwarding/bridge_locked_port.sh | 36 +++++++++++
30 files changed, 258 insertions(+), 67 deletions(-)
--
2.34.1
At present the kselftest header can't be used with nolibc since it makes
use of vprintf() which is not available in nolibc and seems like it would
be inappropriate to implement given the minimal system requirements and
environment intended for nolibc. This has resulted in some open coded
kselftests which use nolibc to test features that are supposed to be
controlled via libc and therefore better exercised in an environment with
no libc.
Rather than continue this let's factor out the I/O routines in kselftest.h
into a separate header file and provide a nolibc implementation which only
allows simple strings to be provided rather than full printf() support.
This is limiting but a great improvement on sharing no code at all.
As an example of using this I've updated the arm64 za-fork test to use
the standard kselftest.h.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Mark Brown (2):
kselftest: Support nolibc
kselftest/arm64: Convert za-fork to use kselftest.h
tools/testing/selftests/arm64/fp/Makefile | 2 +-
tools/testing/selftests/arm64/fp/za-fork.c | 88 +++--------------
tools/testing/selftests/kselftest-nolibc.h | 93 ++++++++++++++++++
tools/testing/selftests/kselftest-std.h | 151 +++++++++++++++++++++++++++++
tools/testing/selftests/kselftest.h | 149 +++-------------------------
5 files changed, 272 insertions(+), 211 deletions(-)
---
base-commit: e8d018dd0257f744ca50a729e3d042cf2ec9da65
change-id: 20230405-kselftest-nolibc-cb2ce0446d09
Best regards,
--
Mark Brown <broonie(a)kernel.org>
On 10.03.23 19:28, Stefan Roesch wrote:
> This adds the general_profit KSM sysfs knob and the process profit metric
> and process merge type knobs to ksm_stat.
>
> 1) split off pages_volatile function
>
> This splits off the pages_volatile function. The next patch will
> use this function.
>
> 2) expose general_profit metric
>
> The documentation mentions a general profit metric, however this
> metric is not calculated. In addition the formula depends on the size
> of internal structures, which makes it more difficult for an
> administrator to make the calculation. Adding the metric for a better
> user experience.
>
> 3) document general_profit sysfs knob
>
> 4) calculate ksm process profit metric
>
> The ksm documentation mentions the process profit metric and how to
> calculate it. This adds the calculation of the metric.
>
> 5) add ksm_merge_type() function
>
> This adds the ksm_merge_type function. The function returns the
> merge type for the process. For madvise it returns "madvise", for
> prctl it returns "process" and otherwise it returns "none".
>
> 6) mm: expose ksm process profit metric and merge type in ksm_stat
>
> This exposes the ksm process profit metric in /proc/<pid>/ksm_stat.
> The name of the value is ksm_merge_type. The documentation mentions
> the formula for the ksm process profit metric, however it does not
> calculate it. In addition the formula depends on the size of internal
> structures. So it makes sense to expose it.
>
> 7) document new procfs ksm knobs
>
Often, when you have to start making a list of things that a patch does,
it might make sense to split some of the items into separate patches
such that you can avoid lists and just explain in list-free text how the
pieces in the patch fit together.
I'd suggest splitting this patch into logical pieces. For example,
separating the general profit calculation/exposure from the per-mm
profit and the per-mm ksm type indication.
> Link: https://lkml.kernel.org/r/20230224044000.3084046-3-shr@devkernel.io
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Reviewed-by: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
[...]
> KSM_ATTR_RO(pages_volatile);
>
> @@ -3280,6 +3305,21 @@ static ssize_t zero_pages_sharing_show(struct kobject *kobj,
> }
> KSM_ATTR_RO(zero_pages_sharing);
>
> +static ssize_t general_profit_show(struct kobject *kobj,
> + struct kobj_attribute *attr, char *buf)
> +{
> + long general_profit;
> + long all_rmap_items;
> +
> + all_rmap_items = ksm_max_page_sharing + ksm_pages_shared +
> + ksm_pages_unshared + pages_volatile();
Are you sure you want to count a config knob (ksm_max_page_sharing) into
that formula? I yet have to digest what this calculation implies, but it
does feel odd.
Further, maybe just avoid pages_volatile(). Expanding the formula
(excluding ksm_max_page_sharing for now):
all_rmap = ksm_pages_shared + ksm_pages_unshared + pages_volatile();
-> expand pages_volatile() (ignoring the < 0 case)
all_rmap = ksm_pages_shared + ksm_pages_unshared + ksm_rmap_items -
ksm_pages_shared - ksm_pages_sharing - ksm_pages_unshared;
-> simplify
all_rmap = ksm_rmap_items + ksm_pages_sharing;
Or is the < 0 case relevant here?
> + general_profit = ksm_pages_sharing * PAGE_SIZE -
> + all_rmap_items * sizeof(struct ksm_rmap_item);
> +
> + return sysfs_emit(buf, "%ld\n", general_profit);
> +}
> +KSM_ATTR_RO(general_profit);
> +
> static ssize_t stable_node_dups_show(struct kobject *kobj,
> struct kobj_attribute *attr, char *buf)
> {
> @@ -3345,6 +3385,7 @@ static struct attribute *ksm_attrs[] = {
> &stable_node_dups_attr.attr,
> &stable_node_chains_prune_millisecs_attr.attr,
> &use_zero_pages_attr.attr,
> + &general_profit_attr.attr,
> NULL,
> };
>
The calculations (profit) don't include when KSM places the shared
zeropage I guess. Accounting that per MM (and eventually globally) is in
the works. [1]
[1]
https://lore.kernel.org/lkml/20230328153852.26c2577e4bd921c371c47a7e@linux-…
--
Thanks,
David / dhildenb
Commit 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
added an additional test, so the number passed to ksft_set_plan needs to
be bumped accordingly.
Also use ksft_finish to print results and exit. This will catch future
mismatches between ksft_set_plan and the number of tests being run.
Fixes: 515bddf0ec41 ("selftests/clone3: test clone3 with CLONE_NEWTIME")
Cc: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Tobias Klauser <tklauser(a)distanz.ch>
---
tools/testing/selftests/clone3/clone3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/clone3/clone3.c b/tools/testing/selftests/clone3/clone3.c
index 4fce46afe6db..7b45c9854202 100644
--- a/tools/testing/selftests/clone3/clone3.c
+++ b/tools/testing/selftests/clone3/clone3.c
@@ -129,7 +129,7 @@ int main(int argc, char *argv[])
uid_t uid = getuid();
ksft_print_header();
- ksft_set_plan(17);
+ ksft_set_plan(18);
test_clone3_supported();
/* Just a simple clone3() should return 0.*/
--
2.39.1
Add unaligned descriptor test for frame size of 4001. Using an odd frame
size ensures that the end of the UMEM is not near a page boundary. This
allows testing descriptors that staddle the end of the UMEM but not a
page.
This test used to fail without the previous commit ("xsk: Add check for
unaligned descriptors that overrun UMEM").
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 24 ++++++++++++++++++++++++
tools/testing/selftests/bpf/xskxceiver.h | 1 +
2 files changed, 25 insertions(+)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 1a4bdd5aa78c..5a9691e942de 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -69,6 +69,7 @@
*/
#define _GNU_SOURCE
+#include <assert.h>
#include <fcntl.h>
#include <errno.h>
#include <getopt.h>
@@ -1876,6 +1877,29 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
test->ifobj_rx->umem->unaligned_mode = true;
testapp_invalid_desc(test);
break;
+ case TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME: {
+ u64 page_size, umem_size;
+
+ if (!hugepages_present(test->ifobj_tx)) {
+ ksft_test_result_skip("No 2M huge pages present.\n");
+ return;
+ }
+ test_spec_set_name(test, "UNALIGNED_INV_DESC_4K1_FRAME_SIZE");
+ /* Odd frame size so the UMEM doesn't end near a page boundary. */
+ test->ifobj_tx->umem->frame_size = 4001;
+ test->ifobj_rx->umem->frame_size = 4001;
+ test->ifobj_tx->umem->unaligned_mode = true;
+ test->ifobj_rx->umem->unaligned_mode = true;
+ /* This test exists to test descriptors that staddle the end of
+ * the UMEM but not a page.
+ */
+ page_size = sysconf(_SC_PAGESIZE);
+ umem_size = test->ifobj_tx->umem->num_frames * test->ifobj_tx->umem->frame_size;
+ assert(umem_size % page_size > PKT_SIZE);
+ assert(umem_size % page_size < page_size - PKT_SIZE);
+ testapp_invalid_desc(test);
+ break;
+ }
case TEST_TYPE_UNALIGNED:
if (!testapp_unaligned(test))
return;
diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h
index cc24ab72f3ff..919327807a4e 100644
--- a/tools/testing/selftests/bpf/xskxceiver.h
+++ b/tools/testing/selftests/bpf/xskxceiver.h
@@ -78,6 +78,7 @@ enum test_type {
TEST_TYPE_ALIGNED_INV_DESC,
TEST_TYPE_ALIGNED_INV_DESC_2K_FRAME,
TEST_TYPE_UNALIGNED_INV_DESC,
+ TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME,
TEST_TYPE_HEADROOM,
TEST_TYPE_TEARDOWN,
TEST_TYPE_BIDI,
--
2.39.2
Add new subtest to video_device_test to cover the VIDIOC_G_PRIORITY
and VIDIOC_S_PRIORITY ioctl calls. This test tries to set the priority
associated with the file descriptior via ioctl VIDIOC_S_PRIORITY
command from V4L2 API. After that, the test tries to get the new
priority via VIDIOC_G_PRIORITY ioctl command and compares the result
with the v4l2_priority it set before. At the end, the test restores the
old priority.
This test will increase the code coverage for video_device_test, so
I think it might be useful. Additionally, this patch will refactor the
video_device_test a little bit, according to the new functionality.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
V1 -> V2: Revert the test description change
.../selftests/media_tests/video_device_test.c | 111 +++++++++++++-----
1 file changed, 83 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/media_tests/video_device_test.c b/tools/testing/selftests/media_tests/video_device_test.c
index 0f6aef2e2593..2c44e115f2f0 100644
--- a/tools/testing/selftests/media_tests/video_device_test.c
+++ b/tools/testing/selftests/media_tests/video_device_test.c
@@ -37,45 +37,58 @@
#include <time.h>
#include <linux/videodev2.h>
-int main(int argc, char **argv)
+#define PRIORITY_MAX 4
+
+int priority_test(int fd)
{
- int opt;
- char video_dev[256];
- int count;
- struct v4l2_tuner vtuner;
- struct v4l2_capability vcap;
+ /* This test will try to update the priority associated with a file descriptor */
+
+ enum v4l2_priority old_priority, new_priority, priority_to_compare;
int ret;
- int fd;
+ int result = 0;
- if (argc < 2) {
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to get priority: %s\n", strerror(errno));
+ return -1;
+ }
+ new_priority = (old_priority + 1) % PRIORITY_MAX;
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &new_priority);
+ if (ret < 0) {
+ printf("Failed to set priority: %s\n", strerror(errno));
+ return -1;
+ }
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &priority_to_compare);
+ if (ret < 0) {
+ printf("Failed to get new priority: %s\n", strerror(errno));
+ result = -1;
+ goto cleanup;
+ }
+ if (priority_to_compare != new_priority) {
+ printf("Priority wasn't set - test failed\n");
+ result = -1;
}
- /* Process arguments */
- while ((opt = getopt(argc, argv, "d:")) != -1) {
- switch (opt) {
- case 'd':
- strncpy(video_dev, optarg, sizeof(video_dev) - 1);
- video_dev[sizeof(video_dev)-1] = '\0';
- break;
- default:
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
- }
+cleanup:
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to restore priority: %s\n", strerror(errno));
+ return -1;
}
+ return result;
+}
+
+int loop_test(int fd)
+{
+ int count;
+ struct v4l2_tuner vtuner;
+ struct v4l2_capability vcap;
+ int ret;
/* Generate random number of interations */
srand((unsigned int) time(NULL));
count = rand();
- /* Open Video device and keep it open */
- fd = open(video_dev, O_RDWR);
- if (fd == -1) {
- printf("Video Device open errno %s\n", strerror(errno));
- exit(-1);
- }
-
printf("\nNote:\n"
"While test is running, remove the device or unbind\n"
"driver and ensure there are no use after free errors\n"
@@ -98,4 +111,46 @@ int main(int argc, char **argv)
sleep(10);
count--;
}
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ int opt;
+ char video_dev[256];
+ int fd;
+ int test_result;
+
+ if (argc < 2) {
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+
+ /* Process arguments */
+ while ((opt = getopt(argc, argv, "d:")) != -1) {
+ switch (opt) {
+ case 'd':
+ strncpy(video_dev, optarg, sizeof(video_dev) - 1);
+ video_dev[sizeof(video_dev)-1] = '\0';
+ break;
+ default:
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+ }
+
+ /* Open Video device and keep it open */
+ fd = open(video_dev, O_RDWR);
+ if (fd == -1) {
+ printf("Video Device open errno %s\n", strerror(errno));
+ exit(-1);
+ }
+
+ test_result = priority_test(fd);
+ if (!test_result)
+ printf("Priority test - PASSED\n");
+ else
+ printf("Priority test - FAILED\n");
+
+ loop_test(fd);
}
--
2.34.1
Add new subtest to video_device_test to cover the VIDIOC_G_PRIORITY
and VIDIOC_S_PRIORITY ioctl calls. This test tries to set the priority
associated with the file descriptior via ioctl VIDIOC_S_PRIORITY
command from V4L2 API. After that, the test tries to get the new
priority via VIDIOC_G_PRIORITY ioctl command and compares the result
with the v4l2_priority it set before. At the end, the test restores the
old priority.
This test will increase the code coverage for video_device_test, so
I think it might be useful. Additionally, this patch will refactor the
video_device_test a little bit, according to the new functionality.
Signed-off-by: Ivan Orlov <ivan.orlov0322(a)gmail.com>
---
.../selftests/media_tests/video_device_test.c | 131 +++++++++++++-----
1 file changed, 93 insertions(+), 38 deletions(-)
diff --git a/tools/testing/selftests/media_tests/video_device_test.c b/tools/testing/selftests/media_tests/video_device_test.c
index 0f6aef2e2593..5e6f65ad2ca3 100644
--- a/tools/testing/selftests/media_tests/video_device_test.c
+++ b/tools/testing/selftests/media_tests/video_device_test.c
@@ -13,18 +13,9 @@
* in the Kselftest run. This test should be run when hardware and driver
* that makes use of V4L2 API is present.
*
- * This test opens user specified Video Device and calls video ioctls in a
- * loop once every 10 seconds.
- *
* Usage:
* sudo ./video_device_test -d /dev/videoX
- *
- * While test is running, remove the device or unbind the driver and
- * ensure there are no use after free errors and other Oops in the
- * dmesg.
- * When possible, enable KaSan kernel config option for use-after-free
- * error detection.
-*/
+ */
#include <stdio.h>
#include <unistd.h>
@@ -37,45 +28,67 @@
#include <time.h>
#include <linux/videodev2.h>
-int main(int argc, char **argv)
+#define PRIORITY_MAX 4
+
+int priority_test(int fd)
{
- int opt;
- char video_dev[256];
- int count;
- struct v4l2_tuner vtuner;
- struct v4l2_capability vcap;
+ /* This test will try to update the priority associated with a file descriptor */
+
+ enum v4l2_priority old_priority, new_priority, priority_to_compare;
int ret;
- int fd;
+ int result = 0;
- if (argc < 2) {
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to get priority: %s\n", strerror(errno));
+ return -1;
+ }
+ new_priority = (old_priority + 1) % PRIORITY_MAX;
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &new_priority);
+ if (ret < 0) {
+ printf("Failed to set priority: %s\n", strerror(errno));
+ return -1;
+ }
+ ret = ioctl(fd, VIDIOC_G_PRIORITY, &priority_to_compare);
+ if (ret < 0) {
+ printf("Failed to get new priority: %s\n", strerror(errno));
+ result = -1;
+ goto cleanup;
+ }
+ if (priority_to_compare != new_priority) {
+ printf("Priority wasn't set - test failed\n");
+ result = -1;
}
- /* Process arguments */
- while ((opt = getopt(argc, argv, "d:")) != -1) {
- switch (opt) {
- case 'd':
- strncpy(video_dev, optarg, sizeof(video_dev) - 1);
- video_dev[sizeof(video_dev)-1] = '\0';
- break;
- default:
- printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
- exit(-1);
- }
+cleanup:
+ ret = ioctl(fd, VIDIOC_S_PRIORITY, &old_priority);
+ if (ret < 0) {
+ printf("Failed to restore priority: %s\n", strerror(errno));
+ return -1;
}
+ return result;
+}
+
+int loop_test(int fd)
+{
+ /*
+ * This test opens user specified Video Device and calls video ioctls in a
+ * loop once every 10 seconds.
+ * While test is running, remove the device or unbind the driver and
+ * ensure there are no use after free errors and other Oops in the
+ * dmesg.
+ * When possible, enable KaSan kernel config option for use-after-free
+ * error detection.
+ */
+ int count;
+ struct v4l2_tuner vtuner;
+ struct v4l2_capability vcap;
+ int ret;
/* Generate random number of interations */
srand((unsigned int) time(NULL));
count = rand();
- /* Open Video device and keep it open */
- fd = open(video_dev, O_RDWR);
- if (fd == -1) {
- printf("Video Device open errno %s\n", strerror(errno));
- exit(-1);
- }
-
printf("\nNote:\n"
"While test is running, remove the device or unbind\n"
"driver and ensure there are no use after free errors\n"
@@ -98,4 +111,46 @@ int main(int argc, char **argv)
sleep(10);
count--;
}
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ int opt;
+ char video_dev[256];
+ int fd;
+ int test_result;
+
+ if (argc < 2) {
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+
+ /* Process arguments */
+ while ((opt = getopt(argc, argv, "d:")) != -1) {
+ switch (opt) {
+ case 'd':
+ strncpy(video_dev, optarg, sizeof(video_dev) - 1);
+ video_dev[sizeof(video_dev)-1] = '\0';
+ break;
+ default:
+ printf("Usage: %s [-d </dev/videoX>]\n", argv[0]);
+ exit(-1);
+ }
+ }
+
+ /* Open Video device and keep it open */
+ fd = open(video_dev, O_RDWR);
+ if (fd == -1) {
+ printf("Video Device open errno %s\n", strerror(errno));
+ exit(-1);
+ }
+
+ test_result = priority_test(fd);
+ if (!test_result)
+ printf("Priority test - PASSED\n");
+ else
+ printf("Priority test - FAILED\n");
+
+ loop_test(fd);
}
--
2.34.1
Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after
receiving the last (valid) packet which is not the final packet sent in
the test (valid and invalid packets are sent in alternating fashion with
the final packet being invalid). Since the last packet may or may not
have been dropped already, both outcomes must be allowed.
This issue could also be fixed by making sure the last packet sent is
valid. This alternative is left as an exercise to the reader (or the
benevolent maintainers of this file).
This problem was quite visible on certain setups. On one machine this
failure was observed 50% of the time.
Also, remove a redundant assignment of pkt_stream->nb_pkts. This field
is already initialized by __pkt_stream_alloc.
Fixes: 27e934bec35b ("selftests: xsk: make stat tests not spin on getsockopt")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
Acked-by: Magnus Karlsson <magnus.karlsson(a)intel.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index a17655107a94..30a364283542 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -631,7 +631,6 @@ static struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem, u32 nb
if (!pkt_stream)
exit_with_error(ENOMEM);
- pkt_stream->nb_pkts = nb_pkts;
for (i = 0; i < nb_pkts; i++) {
pkt_set(umem, &pkt_stream->pkts[i], (i % umem->num_frames) * umem->frame_size,
pkt_len);
@@ -1124,7 +1123,14 @@ static int validate_rx_dropped(struct ifobject *ifobject)
if (err)
return TEST_FAILURE;
- if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2)
+ /* The receiver calls getsockopt after receiving the last (valid)
+ * packet which is not the final packet sent in this test (valid and
+ * invalid packets are sent in alternating fashion with the final
+ * packet being invalid). Since the last packet may or may not have
+ * been dropped already, both outcomes must be allowed.
+ */
+ if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 ||
+ stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 - 1)
return TEST_PASS;
return TEST_FAILURE;
--
2.39.2
This change fixes flakiness in the BIDIRECTIONAL test:
# [is_pkt_valid] expected length [60], got length [90]
not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL
When IPv6 is enabled, the interface will periodically send MLDv1 and
MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail
since it uses VETH0 for RX.
For other tests, this was not a problem since they only receive on VETH1
and IPv6 was already disabled on VETH0.
Fixes: a89052572ebb ("selftests/bpf: Xsk selftests framework")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/test_xsk.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh
index b077cf58f825..377fb157a57c 100755
--- a/tools/testing/selftests/bpf/test_xsk.sh
+++ b/tools/testing/selftests/bpf/test_xsk.sh
@@ -116,6 +116,7 @@ setup_vethPairs() {
ip link add ${VETH0} numtxqueues 4 numrxqueues 4 type veth peer name ${VETH1} numtxqueues 4 numrxqueues 4
if [ -f /proc/net/if_inet6 ]; then
echo 1 > /proc/sys/net/ipv6/conf/${VETH0}/disable_ipv6
+ echo 1 > /proc/sys/net/ipv6/conf/${VETH1}/disable_ipv6
fi
if [[ $verbose -eq 1 ]]; then
echo "setting up ${VETH1}"
--
2.39.2
On 10.03.23 19:28, Stefan Roesch wrote:
> Patch series "mm: process/cgroup ksm support", v3.
>
> So far KSM can only be enabled by calling madvise for memory regions. To
> be able to use KSM for more workloads, KSM needs to have the ability to be
> enabled / disabled at the process / cgroup level.
>
> Use case 1:
>
> The madvise call is not available in the programming language. An
> example for this are programs with forked workloads using a garbage
> collected language without pointers. In such a language madvise cannot
> be made available.
>
> In addition the addresses of objects get moved around as they are
> garbage collected. KSM sharing needs to be enabled "from the outside"
> for these type of workloads.
I guess the interpreter could enable it (like a memory allocator could
enable it for the whole heap). But I get that it's much easier to enable
this per-process, and eventually only when a lot of the same processes
are running in that particular environment.
>
> Use case 2:
>
> The same interpreter can also be used for workloads where KSM brings
> no benefit or even has overhead. We'd like to be able to enable KSM on
> a workload by workload basis.
Agreed. A per-process control is also helpful to identidy workloads
where KSM might be beneficial (and to which degree).
>
> Use case 3:
>
> With the madvise call sharing opportunities are only enabled for the
> current process: it is a workload-local decision. A considerable number
> of sharing opportuniites may exist across multiple workloads or jobs.
> Only a higler level entity like a job scheduler or container can know
> for certain if its running one or more instances of a job. That job
> scheduler however doesn't have the necessary internal worklaod knowledge
> to make targeted madvise calls.
>
> Security concerns:
>
> In previous discussions security concerns have been brought up. The
> problem is that an individual workload does not have the knowledge about
> what else is running on a machine. Therefore it has to be very
> conservative in what memory areas can be shared or not. However, if the
> system is dedicated to running multiple jobs within the same security
> domain, its the job scheduler that has the knowledge that sharing can be
> safely enabled and is even desirable.
>
> Performance:
>
> Experiments with using UKSM have shown a capacity increase of around
> 20%.
>
As raised, it would be great to include more details about the workload
where this particulalry helps (e.g., a lot of Django processes operating
in the same domain).
>
> 1. New options for prctl system command
>
> This patch series adds two new options to the prctl system call.
> The first one allows to enable KSM at the process level and the second
> one to query the setting.
>
> The setting will be inherited by child processes.
>
> With the above setting, KSM can be enabled for the seed process of a
> cgroup and all processes in the cgroup will inherit the setting.
>
> 2. Changes to KSM processing
>
> When KSM is enabled at the process level, the KSM code will iterate
> over all the VMA's and enable KSM for the eligible VMA's.
>
> When forking a process that has KSM enabled, the setting will be
> inherited by the new child process.
>
> In addition when KSM is disabled for a process, KSM will be disabled
> for the VMA's where KSM has been enabled.
Do we want to make MADV_MERGEABLE/MADV_UNMERGEABLE fail while the new
prctl is enabled for a process?
>
> 3. Add general_profit metric
>
> The general_profit metric of KSM is specified in the documentation,
> but not calculated. This adds the general profit metric to
> /sys/kernel/debug/mm/ksm.
>
> 4. Add more metrics to ksm_stat
>
> This adds the process profit and ksm type metric to
> /proc/<pid>/ksm_stat.
>
> 5. Add more tests to ksm_tests
>
> This adds an option to specify the merge type to the ksm_tests.
> This allows to test madvise and prctl KSM. It also adds a new option
> to query if prctl KSM has been enabled. It adds a fork test to verify
> that the KSM process setting is inherited by client processes.
>
> An update to the prctl(2) manpage has been proposed at [1].
>
> This patch (of 3):
>
> This adds a new prctl to API to enable and disable KSM on a per process
> basis instead of only at the VMA basis (with madvise).
>
> 1) Introduce new MMF_VM_MERGE_ANY flag
>
> This introduces the new flag MMF_VM_MERGE_ANY flag. When this flag
> is set, kernel samepage merging (ksm) gets enabled for all vma's of a
> process.
>
> 2) add flag to __ksm_enter
>
> This change adds the flag parameter to __ksm_enter. This allows to
> distinguish if ksm was called by prctl or madvise.
>
> 3) add flag to __ksm_exit call
>
> This adds the flag parameter to the __ksm_exit() call. This allows
> to distinguish if this call is for an prctl or madvise invocation.
>
> 4) invoke madvise for all vmas in scan_get_next_rmap_item
>
> If the new flag MMF_VM_MERGE_ANY has been set for a process, iterate
> over all the vmas and enable ksm if possible. For the vmas that can be
> ksm enabled this is only done once.
>
> 5) support disabling of ksm for a process
>
> This adds the ability to disable ksm for a process if ksm has been
> enabled for the process.
>
> 6) add new prctl option to get and set ksm for a process
>
> This adds two new options to the prctl system call
> - enable ksm for all vmas of a process (if the vmas support it).
> - query if ksm has been enabled for a process.
Did you consider, instead of handling MMF_VM_MERGE_ANY in a special way,
to instead make it reuse the existing MMF_VM_MERGEABLE/VM_MERGEABLE
infrastructure. Especially:
1) During prctl(MMF_VM_MERGE_ANY), set VM_MERGABLE on all applicable
compatible. Further, set MMF_VM_MERGEABLE and enter KSM if not
already set.
2) When creating a new, compatible VMA and MMF_VM_MERGE_ANY is set, set
VM_MERGABLE?
The you can avoid all runtime checks for compatible VMAs and only look
at the VM_MERGEABLE flag. In fact, the VM_MERGEABLE will be completely
expressive then for all VMAs. You don't need vma_ksm_mergeable() then.
Another thing to consider is interaction with arch/s390/mm/gmap.c:
s390x/kvm does not support KSM and it has to disable it for all VMAs. We
have to find a way to fence the prctl (for example, fail setting the
prctl after gmap_mark_unmergeable() ran, and make
gmap_mark_unmergeable() fail if the prctl ran -- or handle it gracefully
in some other way).
>
> Link: https://lkml.kernel.org/r/20230227220206.436662-1-shr@devkernel.io [1]
> Link: https://lkml.kernel.org/r/20230224044000.3084046-1-shr@devkernel.io
> Link: https://lkml.kernel.org/r/20230224044000.3084046-2-shr@devkernel.io
> Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
> Cc: David Hildenbrand <david(a)redhat.com>
> Cc: Johannes Weiner <hannes(a)cmpxchg.org>
> Cc: Michal Hocko <mhocko(a)suse.com>
> Cc: Rik van Riel <riel(a)surriel.com>
> Cc: Bagas Sanjaya <bagasdotme(a)gmail.com>
> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
> ---
> include/linux/ksm.h | 14 ++++--
> include/linux/sched/coredump.h | 1 +
> include/uapi/linux/prctl.h | 2 +
> kernel/sys.c | 27 ++++++++++
> mm/ksm.c | 90 +++++++++++++++++++++++-----------
> 5 files changed, 101 insertions(+), 33 deletions(-)
>
> diff --git a/include/linux/ksm.h b/include/linux/ksm.h
> index 7e232ba59b86..d38a05a36298 100644
> --- a/include/linux/ksm.h
> +++ b/include/linux/ksm.h
> @@ -18,20 +18,24 @@
> #ifdef CONFIG_KSM
> int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
> unsigned long end, int advice, unsigned long *vm_flags);
> -int __ksm_enter(struct mm_struct *mm);
> -void __ksm_exit(struct mm_struct *mm);
> +int __ksm_enter(struct mm_struct *mm, int flag);
> +void __ksm_exit(struct mm_struct *mm, int flag);
>
> static inline int ksm_fork(struct mm_struct *mm, struct mm_struct *oldmm)
> {
> + if (test_bit(MMF_VM_MERGE_ANY, &oldmm->flags))
> + return __ksm_enter(mm, MMF_VM_MERGE_ANY);
> if (test_bit(MMF_VM_MERGEABLE, &oldmm->flags))
> - return __ksm_enter(mm);
> + return __ksm_enter(mm, MMF_VM_MERGEABLE);
> return 0;
> }
>
> static inline void ksm_exit(struct mm_struct *mm)
> {
> - if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> - __ksm_exit(mm);
> + if (test_bit(MMF_VM_MERGE_ANY, &mm->flags))
> + __ksm_exit(mm, MMF_VM_MERGE_ANY);
> + else if (test_bit(MMF_VM_MERGEABLE, &mm->flags))
> + __ksm_exit(mm, MMF_VM_MERGEABLE);
> }
>
> /*
> diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h
> index 0e17ae7fbfd3..0ee96ea7a0e9 100644
> --- a/include/linux/sched/coredump.h
> +++ b/include/linux/sched/coredump.h
> @@ -90,4 +90,5 @@ static inline int get_dumpable(struct mm_struct *mm)
> #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\
> MMF_DISABLE_THP_MASK | MMF_HAS_MDWE_MASK)
>
> +#define MMF_VM_MERGE_ANY 29
> #endif /* _LINUX_SCHED_COREDUMP_H */
> diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
> index 1312a137f7fb..759b3f53e53f 100644
> --- a/include/uapi/linux/prctl.h
> +++ b/include/uapi/linux/prctl.h
> @@ -290,4 +290,6 @@ struct prctl_mm_map {
> #define PR_SET_VMA 0x53564d41
> # define PR_SET_VMA_ANON_NAME 0
>
> +#define PR_SET_MEMORY_MERGE 67
> +#define PR_GET_MEMORY_MERGE 68
> #endif /* _LINUX_PRCTL_H */
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 495cd87d9bf4..edc439b1cae9 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -15,6 +15,7 @@
> #include <linux/highuid.h>
> #include <linux/fs.h>
> #include <linux/kmod.h>
> +#include <linux/ksm.h>
> #include <linux/perf_event.h>
> #include <linux/resource.h>
> #include <linux/kernel.h>
> @@ -2661,6 +2662,32 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
> case PR_SET_VMA:
> error = prctl_set_vma(arg2, arg3, arg4, arg5);
> break;
> +#ifdef CONFIG_KSM
> + case PR_SET_MEMORY_MERGE:
> + if (!capable(CAP_SYS_RESOURCE))
> + return -EPERM;
> +
> + if (arg2) {
> + if (mmap_write_lock_killable(me->mm))
> + return -EINTR;
> +
> + if (!test_bit(MMF_VM_MERGE_ANY, &me->mm->flags))
> + error = __ksm_enter(me->mm, MMF_VM_MERGE_ANY);
Hm, I think this might be problematic if we alread called __ksm_enter()
via madvise(). Maybe we should really consider making MMF_VM_MERGE_ANY
set MMF_VM_MERGABLE instead. Like:
error = 0;
if(test_bit(MMF_VM_MERGEABLE, &me->mm->flags))
error = __ksm_enter(me->mm);
if (!error)
set_bit(MMF_VM_MERGE_ANY, &me->mm->flags);
> + mmap_write_unlock(me->mm);
> + } else {
> + __ksm_exit(me->mm, MMF_VM_MERGE_ANY);
Hm, I'd prefer if we really only call __ksm_exit() when we really exit
the process. Is there a strong requirement to optimize disabling of KSM
or would it be sufficient to clear the MMF_VM_MERGE_ANY flag here?
Also, I wonder what happens if we have another VMA in that process that
has it enabled ..
Last but not least, wouldn't we want to do the same thing as
MADV_UNMERGEABLE and actually unmerge the KSM pages?
It smells like it could be simpler and more consistent to handle by
letting PR_SET_MEMORY_MERGE piggy-back on MMF_VM_MERGABLE/VM_MERGABLE
and mimic what ksm_madvise() does simply for all VMAs.
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -534,16 +534,58 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr,
> return (ret & VM_FAULT_OOM) ? -ENOMEM : 0;
> }
>
> +static bool vma_ksm_compatible(struct vm_area_struct *vma)
> +{
> + /*
> + * Be somewhat over-protective for now!
> + */
> + if (vma->vm_flags & (VM_MERGEABLE | VM_SHARED | VM_MAYSHARE |
> + VM_PFNMAP | VM_IO | VM_DONTEXPAND |
> + VM_HUGETLB | VM_MIXEDMAP))
> + return false; /* just ignore the advice */
That comment is kind-of stale and ksm_madvise() specific.
> +
The VM_MERGEABLE check is really only used for ksm_madvise() to return
immediately. I suggest keeping it in ksm_madvise() -- "Already enabled".
Returning "false" in that case looks wrong (it's not broken because you
do an early check in vma_ksm_mergeable(), it's just semantically weird).
--
Thanks,
David / dhildenb
The va_128TBswitch selftest is designed and implemented for PowerPC and
x86 architectures which support a 128TB switch, up to 256TB of virtual
address space and hugepage sizes of 16MB and 2MB respectively. Arm64
platforms on the other hand support a 256Tb switch, up to 4PB of virtual
address space and a default hugepage size of 512MB when 64k pagesize is
enabled.
These architectural differences require introducing support for arm64
platforms, after which a more generic naming convention is suggested.
The in code comments are amended to provide a more platform independent
explanation of the working of the code and nr_hugepages are configured
as required. Finally, the file running the testcase is modified in order
to prevent skipping of hugetlb testcases of va_high_addr_switch.
This series has been tested on 6.3.0-rc3 kernel, both on arm64 and x86
platforms.
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: linux-mm(a)kvack.org
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Chaitanya S Prakash (5):
selftests/mm: Add support for arm64 platform on va switch
selftests/mm: Rename va_128TBswitch to va_high_addr_switch
selftests/mm: Add platform independent in code comments
selftests/mm: Configure nr_hugepages for arm64
selftests/mm: Run hugetlb testcases of va switch
tools/testing/selftests/mm/Makefile | 4 +-
tools/testing/selftests/mm/run_vmtests.sh | 12 +++++-
...va_128TBswitch.c => va_high_addr_switch.c} | 41 +++++++++++++++----
..._128TBswitch.sh => va_high_addr_switch.sh} | 6 ++-
4 files changed, 49 insertions(+), 14 deletions(-)
rename tools/testing/selftests/mm/{va_128TBswitch.c => va_high_addr_switch.c} (86%)
rename tools/testing/selftests/mm/{va_128TBswitch.sh => va_high_addr_switch.sh} (89%)
--
2.30.2
Hi,
No progress on this bug report, so it is still unpatched in 6.3-rc5 so I am
submitting again.
Please see the relevant data at the bottom:
On 27. 01. 2023. 19:36, Mirsad Goran Todorovac wrote:
> Hi all,
>
> I came across a memory leak with the vanilla mainline Torvalds tree kernel
> with MGLRU and CONFIG_KMEMLEAK enabled:
>
> unreferenced object 0xffff8d7c92ad5180 (size 192):
> comm "ftracetest", pid 2738512, jiffies 4335176273 (age 4842.976s)
> hex dump (first 32 bytes):
> c0 59 ad 92 7c 8d ff ff 60 dd d7 31 7c 8d ff ff .Y..|...`..1|...
> 60 55 df 97 ff ff ff ff 09 00 02 00 00 00 00 00 `U..............
> backtrace:
> [<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
> [<ffffffff96556dda>] kmalloc_trace+0x2a/0xa0
> [<ffffffff964382fc>] tracing_log_err+0x16c/0x1b0
> [<ffffffff96451963>] append_filter_err+0x113/0x1d0
> [<ffffffff96453c0a>] create_event_filter+0xba/0xe0
> [<ffffffff96454b18>] set_trigger_filter+0x98/0x160
> [<ffffffff96456554>] event_trigger_parse+0x104/0x180
> [<ffffffff96455823>] trigger_process_regex+0xc3/0x110
> [<ffffffff964558f7>] event_trigger_write+0x77/0xe0
> [<ffffffff96623a41>] vfs_write+0xd1/0x420
> [<ffffffff9662413b>] ksys_write+0x7b/0x100
> [<ffffffff966241e9>] __x64_sys_write+0x19/0x20
> [<ffffffff971c9188>] do_syscall_64+0x58/0x80
> [<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> unreferenced object 0xffff8d7b076be000 (size 32):
> comm "ftracetest", pid 2738512, jiffies 4335176273 (age 4842.976s)
> hex dump (first 32 bytes):
> 0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 61 0a 00 00 . Command: a...
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
> [<ffffffff96557a8d>] __kmalloc+0x4d/0xd0
> [<ffffffff96438314>] tracing_log_err+0x184/0x1b0
> [<ffffffff96451963>] append_filter_err+0x113/0x1d0
> [<ffffffff96453c0a>] create_event_filter+0xba/0xe0
> [<ffffffff96454b18>] set_trigger_filter+0x98/0x160
> [<ffffffff96456554>] event_trigger_parse+0x104/0x180
> [<ffffffff96455823>] trigger_process_regex+0xc3/0x110
> [<ffffffff964558f7>] event_trigger_write+0x77/0xe0
> [<ffffffff96623a41>] vfs_write+0xd1/0x420
> [<ffffffff9662413b>] ksys_write+0x7b/0x100
> [<ffffffff966241e9>] __x64_sys_write+0x19/0x20
> [<ffffffff971c9188>] do_syscall_64+0x58/0x80
> [<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
> unreferenced object 0xffff8d7c92ad59c0 (size 192):
> comm "ftracetest", pid 2738512, jiffies 4335176280 (age 4843.088s)
> hex dump (first 32 bytes):
> c0 5c ad 92 7c 8d ff ff 80 51 ad 92 7c 8d ff ff .\..|....Q..|...
> 60 55 df 97 ff ff ff ff 01 00 0b 00 00 00 00 00 `U..............
> backtrace:
> [<ffffffff965d9bf0>] __kmem_cache_alloc_node+0x1e0/0x340
> [<ffffffff96556dda>] kmalloc_trace+0x2a/0xa0
> [<ffffffff964382fc>] tracing_log_err+0x16c/0x1b0
> [<ffffffff96451963>] append_filter_err+0x113/0x1d0
> [<ffffffff96453c0a>] create_event_filter+0xba/0xe0
> [<ffffffff96454b18>] set_trigger_filter+0x98/0x160
> [<ffffffff96456554>] event_trigger_parse+0x104/0x180
> [<ffffffff96455823>] trigger_process_regex+0xc3/0x110
> [<ffffffff964558f7>] event_trigger_write+0x77/0xe0
> [<ffffffff96623a41>] vfs_write+0xd1/0x420
> [<ffffffff9662413b>] ksys_write+0x7b/0x100
> [<ffffffff966241e9>] __x64_sys_write+0x19/0x20
> [<ffffffff971c9188>] do_syscall_64+0x58/0x80
> [<ffffffff972000aa>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
>
> The bug was noticed on Lenovo desktop 10TX000VCR (LENOVO_MT_10TX_BU_Lenovo_FM_V530S-07ICB)
> running AlmaLinux 8.7 (Stone Smilodon), a CentOS clone, with the compiler:
>
> mtodorov@domac:~/linux/kernel/linux_torvalds$ gcc --version
> gcc (Debian 8.3.0-6) 8.3.0
> Copyright (C) 2018 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> mtodorov@domac:~/linux/kernel/linux_torvalds$
>
> Bisecting gave the following culprit commit:
>
> git bisect good a92ce570c81dc0feaeb12a429b4bc65686d17967
> # good: [c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a] ipmi/watchdog: use strscpy() to instead of strncpy()
> git bisect good c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a
> # good: [90b12f423d3c8a89424c7bdde18e1923dfd0941e] Merge tag 'for-linus-6.2-1' of https://github.com/cminyard/linux-ipmi
> git bisect good 90b12f423d3c8a89424c7bdde18e1923dfd0941e
> # first bad commit: [71946a25f357a51dcce849367501d7fb04c0465b] Merge tag 'mmc-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
>
> The commit was merged on December 13th 2022.
>
> It is a huge commit.
>
> The selftests/ftrace/ftracetest triggers this leak, sometimes several times in a run.
> ftracetest requires root permission to run, but I haven't yet realised whether a non-superuser
> could devise an automated script to abuse this leak exhausting all kernel's memory.
>
> Non-root user gets a EPERM error when trying to access /proc/sys/kernel internals:
>
> [marvin@pc-mtodorov linux_torvalds]$ tools/testing/selftests/ftrace/ftracetest
> Error: this must be run by root user
> tools/testing/selftests/ftrace/ftracetest: line 46: /proc/sys/kernel/sched_rt_runtime_us: Permission denied
> [marvin@pc-mtodorov linux_torvalds]$
>
> Hope this helps.
>
> According to the Code of Conduct, I have Cc:-ed maintainers from get_maintainers.pl and
> I will add Thorsten because this is sort of a regression :-)
The debug output is like follows:
unreferenced object 0xffff93a3dc2d1e18 (size 192):
comm "ftracetest", pid 12451, jiffies 4295087353 (age 463.476s)
hex dump (first 32 bytes):
20 08 2d dc a3 93 ff ff c0 bd 5d cd a3 93 ff ff .-.......].....
c0 bf 85 b6 ff ff ff ff 09 00 02 00 00 00 00 00 ................
backtrace:
[<ffffffffb4afb23c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffffb4b02b19>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffffb4a7693e>] kmalloc_trace+0x2e/0xc0
[<ffffffffb493a8fb>] tracing_log_err+0x18b/0x1d0
[<ffffffffb4959049>] append_filter_err.isra.13+0x119/0x190
[<ffffffffb495a89f>] create_filter+0xbf/0xe0
[<ffffffffb495ab10>] create_event_filter+0x10/0x20
[<ffffffffb495c040>] set_trigger_filter+0xa0/0x180
[<ffffffffb495d745>] event_trigger_parse+0xf5/0x160
[<ffffffffb495c889>] trigger_process_regex+0xc9/0x120
[<ffffffffb495c976>] event_trigger_write+0x86/0xf0
[<ffffffffb4b52dc2>] vfs_write+0xf2/0x520
[<ffffffffb4b533d8>] ksys_write+0x68/0xe0
[<ffffffffb4b5347e>] __x64_sys_write+0x1e/0x30
[<ffffffffb586619c>] do_syscall_64+0x5c/0x90
[<ffffffffb5a000ae>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
unreferenced object 0xffff93a3873dda20 (size 32):
comm "ftracetest", pid 12451, jiffies 4295087353 (age 463.476s)
hex dump (first 32 bytes):
0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 61 0a 00 00 . Command: a...
00 00 cc cc cc cc cc cc cc cc cc cc cc cc cc cc ................
backtrace:
[<ffffffffb4afb23c>] slab_post_alloc_hook+0x8c/0x3e0
[<ffffffffb4b02b19>] __kmem_cache_alloc_node+0x1d9/0x2a0
[<ffffffffb4a77785>] __kmalloc+0x55/0x160
[<ffffffffb493a913>] tracing_log_err+0x1a3/0x1d0
[<ffffffffb4959049>] append_filter_err.isra.13+0x119/0x190
[<ffffffffb495a89f>] create_filter+0xbf/0xe0
[<ffffffffb495ab10>] create_event_filter+0x10/0x20
[<ffffffffb495c040>] set_trigger_filter+0xa0/0x180
[<ffffffffb495d745>] event_trigger_parse+0xf5/0x160
[<ffffffffb495c889>] trigger_process_regex+0xc9/0x120
[<ffffffffb495c976>] event_trigger_write+0x86/0xf0
[<ffffffffb4b52dc2>] vfs_write+0xf2/0x520
[<ffffffffb4b533d8>] ksys_write+0x68/0xe0
[<ffffffffb4b5347e>] __x64_sys_write+0x1e/0x30
[<ffffffffb586619c>] do_syscall_64+0x5c/0x90
[<ffffffffb5a000ae>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
Please find the complete debug info at the URL:
https://domac.alu.unizg.hr/~mtodorov/linux/bugreports/ftracetest/
Bisect log is [edited]:
> git bisect good a92ce570c81dc0feaeb12a429b4bc65686d17967
> # good: [c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a] ipmi/watchdog: use strscpy() to instead of strncpy()
> git bisect good c6f613e5f35b0e2154d5ca12f0e8e0be0c19be9a
> # good: [90b12f423d3c8a89424c7bdde18e1923dfd0941e] Merge tag 'for-linus-6.2-1' of https://github.com/cminyard/linux-ipmi
> git bisect good 90b12f423d3c8a89424c7bdde18e1923dfd0941e
> # first bad commit: [71946a25f357a51dcce849367501d7fb04c0465b] Merge tag 'mmc-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
>
> The commit was merged on December 13th 2022.
The amount of applied diffs in the culprit commit 71946a25f357a51dcce849367501d7fb04c0465b
prevents me from bisecting further - I do not know which changes depend of which, and which
can be tested independently.
Hopefully I might come up with a reproducer, but I need some feedback first. Maybe there
are ways to narrow down the lines of code that could have caused the leaks, yet I am
completely new to the kernel/trace subtree.
Apologies for not Cc:ing Ulf nine weeks ago, but it was an omission, not deliberate act.
Best regards,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
"I see something approaching fast ... Will it be friends with me?"
o6irnndpcv7 writes via Kernel.org Bugzilla:
Hello and good day!
I think I found a missing dependency.
In case of setting CONFIG_FIPS_SIGNATURE_SELFTEST, CONFIG_CRYPTO_SHA256 also needs to be set. But not as module.
Failing to do so results in an early kernel panic during boot.
Tested on linux-6.1.12-gentoo and linux-6.1.19-gentoo.
Thanks,
sephora
View: https://bugzilla.kernel.org/show_bug.cgi?id=217293#c0
You can reply to this message to join the discussion.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)
Hi All,
In TDX guest, the attestation process is used to verify the TDX guest
trustworthiness to other entities before provisioning secrets to the
guest.
The TDX guest attestation process consists of two steps:
1. TDREPORT generation
2. Quote generation.
The First step (TDREPORT generation) involves getting the TDX guest
measurement data in the format of TDREPORT which is further used to
validate the authenticity of the TDX guest. The second step involves
sending the TDREPORT to a Quoting Enclave (QE) server to generate a
remotely verifiable Quote. TDREPORT by design can only be verified on
the local platform. To support remote verification of the TDREPORT,
TDX leverages Intel SGX Quoting Enclave to verify the TDREPORT
locally and convert it to a remotely verifiable Quote. Although
attestation software can use communication methods like TCP/IP or
vsock to send the TDREPORT to QE, not all platforms support these
communication models. So TDX GHCI specification [1] defines a method
for Quote generation via hypercalls. Please check the discussion from
Google [2] and Alibaba [3] which clarifies the need for hypercall based
Quote generation support. This patch set adds this support.
Support for TDREPORT generation already exists in the TDX guest driver.
This patchset extends the same driver to add the Quote generation
support.
Following are the details of the patch set:
Patch 1/3 -> Adds event notification IRQ support.
Patch 2/3 -> Adds Quote generation support.
Patch 3/3 -> Adds selftest support for Quote generation feature.
[1] https://cdrdv2.intel.com/v1/dl/getContent/726790, section titled "TDG.VP.VMCALL<GetQuote>".
[2] https://lore.kernel.org/lkml/CAAYXXYxxs2zy_978GJDwKfX5Hud503gPc8=1kQ-+JwG_k…
[3] https://lore.kernel.org/lkml/a69faebb-11e8-b386-d591-dbd08330b008@linux.ali…
Kuppuswamy Sathyanarayanan (3):
x86/tdx: Add TDX Guest event notify interrupt support
virt: tdx-guest: Add Quote generation support
selftests/tdx: Test GetQuote TDX attestation feature
Documentation/virt/coco/tdx-guest.rst | 11 +
arch/x86/coco/tdx/tdx.c | 203 +++++++++++++++
arch/x86/include/asm/tdx.h | 8 +
drivers/virt/coco/tdx-guest/tdx-guest.c | 249 ++++++++++++++++++-
include/uapi/linux/tdx-guest.h | 44 ++++
tools/testing/selftests/tdx/tdx_guest_test.c | 68 ++++-
6 files changed, 575 insertions(+), 8 deletions(-)
--
2.34.1
This change fixes flakiness in the BIDIRECTIONAL test:
# [is_pkt_valid] expected length [60], got length [90]
not ok 1 FAIL: SKB BUSY-POLL BIDIRECTIONAL
When IPv6 is enabled, the interface will periodically send MLDv1 and
MLDv2 packets. These packets can cause the BIDIRECTIONAL test to fail
since it uses VETH0 for RX.
For other tests, this was not a problem since they only receive on VETH1
and IPv6 was already disabled on VETH0.
Fixes: a89052572ebb ("selftests/bpf: Xsk selftests framework")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/test_xsk.sh | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/bpf/test_xsk.sh b/tools/testing/selftests/bpf/test_xsk.sh
index b077cf58f825..377fb157a57c 100755
--- a/tools/testing/selftests/bpf/test_xsk.sh
+++ b/tools/testing/selftests/bpf/test_xsk.sh
@@ -116,6 +116,7 @@ setup_vethPairs() {
ip link add ${VETH0} numtxqueues 4 numrxqueues 4 type veth peer name ${VETH1} numtxqueues 4 numrxqueues 4
if [ -f /proc/net/if_inet6 ]; then
echo 1 > /proc/sys/net/ipv6/conf/${VETH0}/disable_ipv6
+ echo 1 > /proc/sys/net/ipv6/conf/${VETH1}/disable_ipv6
fi
if [[ $verbose -eq 1 ]]; then
echo "setting up ${VETH1}"
--
2.39.2
All related to the pages code, and the latter are reproducible with a
simple test.
Jason Gunthorpe (4):
iommufd: Check for uptr overflow
iommufd: Fix unpinning of pages when an access is present
iommufd: Do not corrupt the pfn list when doing batch carry
iommufd/selftest: Cover domain unmap with huge pages and access
drivers/iommu/iommufd/pages.c | 16 ++++++++++--
tools/testing/selftests/iommu/iommufd.c | 34 +++++++++++++++++++++++++
2 files changed, 48 insertions(+), 2 deletions(-)
base-commit: 9c7d518b9b71f4d5ca3d12952cda3417ac6126c4
--
2.40.0
Dzień dobry,
chcielibyśmy zapewnić Państwu kompleksowe rozwiązania, jeśli chodzi o system monitoringu GPS.
Precyzyjne monitorowanie pojazdów na mapach cyfrowych, śledzenie ich parametrów eksploatacyjnych w czasie rzeczywistym oraz kontrola paliwa to kluczowe funkcjonalności naszego systemu.
Organizowanie pracy pracowników jest dzięki temu prostsze i bardziej efektywne, a oszczędności i optymalizacja w zakresie ponoszonych kosztów, mają dla każdego przedsiębiorcy ogromne znaczenie.
Dopasujemy naszą ofertę do Państwa oczekiwań i potrzeb organizacji. Czy moglibyśmy porozmawiać o naszej propozycji?
Pozdrawiam
Krystian Wieczorek
Hi all,
This patch series adds support to run tests via kunit_tool on the
SuperH-based virtualized r2d platform. As r2d uses the second serial
port as the console, this needs a small modification of the core
infrastructure.
Thanks for your comments!
Geert Uytterhoeven (2):
kunit: tool: Add support for overriding the QEMU serial port
kunit: tool: Add support for SH under QEMU
tools/testing/kunit/kunit_kernel.py | 3 ++-
tools/testing/kunit/qemu_config.py | 1 +
tools/testing/kunit/qemu_configs/sh.py | 17 +++++++++++++++++
3 files changed, 20 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/kunit/qemu_configs/sh.py
--
2.34.1
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert(a)linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
The default timeout for kselftests is 45 seconds, but that isn't enough
time to run pcm-test when there are many PCMs on the device, nor for
mixer-test when slower control buses and fancier CODECs are present.
As data points, running pcm-test on mt8192-asurada-spherion takes about
1m15s, and mixer-test on rk3399-gru-kevin takes about 2m.
Set the timeout to 4 minutes to allow both pcm-test and mixer-test to
run to completion with some slack.
Reviewed-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Changes in v2:
- Reduced timeout from 10 to 4 minutes
- Tweaked commit message to also mention mixer-test and run time for
mixer-test on rk3399-gru-kevin
tools/testing/selftests/alsa/settings | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tools/testing/selftests/alsa/settings
diff --git a/tools/testing/selftests/alsa/settings b/tools/testing/selftests/alsa/settings
new file mode 100644
index 000000000000..b478e684846a
--- /dev/null
+++ b/tools/testing/selftests/alsa/settings
@@ -0,0 +1 @@
+timeout=240
--
2.39.0
Add unaligned descriptor test for frame size of 4001. Using an odd frame
size ensures that the end of the UMEM is not near a page boundary. This
allows testing descriptors that staddle the end of the UMEM but not a
page.
This test used to fail without the previous commit ("xsk: Add check for
unaligned descriptors that overrun UMEM").
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 25 ++++++++++++++++++++++++
tools/testing/selftests/bpf/xskxceiver.h | 1 +
2 files changed, 26 insertions(+)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 1a4bdd5aa78c..9b9efd0e0a4c 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -69,6 +69,7 @@
*/
#define _GNU_SOURCE
+#include <assert.h>
#include <fcntl.h>
#include <errno.h>
#include <getopt.h>
@@ -1876,6 +1877,30 @@ static void run_pkt_test(struct test_spec *test, enum test_mode mode, enum test_
test->ifobj_rx->umem->unaligned_mode = true;
testapp_invalid_desc(test);
break;
+ case TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME:
+ if (!hugepages_present(test->ifobj_tx)) {
+ ksft_test_result_skip("No 2M huge pages present.\n");
+ return;
+ }
+ test_spec_set_name(test, "UNALIGNED_INV_DESC_4K1_FRAME_SIZE");
+ /* Odd frame size so the UMEM doesn't end near a page boundary. */
+ test->ifobj_tx->umem->frame_size = 4001;
+ test->ifobj_rx->umem->frame_size = 4001;
+ test->ifobj_tx->umem->unaligned_mode = true;
+ test->ifobj_rx->umem->unaligned_mode = true;
+ /* This test exists to test descriptors that staddle the end of
+ * the UMEM but not a page.
+ */
+ {
+ u64 umem_size = test->ifobj_tx->umem->num_frames *
+ test->ifobj_tx->umem->frame_size;
+ u64 page_size = sysconf(_SC_PAGESIZE);
+
+ assert(umem_size % page_size > PKT_SIZE);
+ assert(umem_size % page_size < page_size - PKT_SIZE);
+ }
+ testapp_invalid_desc(test);
+ break;
case TEST_TYPE_UNALIGNED:
if (!testapp_unaligned(test))
return;
diff --git a/tools/testing/selftests/bpf/xskxceiver.h b/tools/testing/selftests/bpf/xskxceiver.h
index cc24ab72f3ff..919327807a4e 100644
--- a/tools/testing/selftests/bpf/xskxceiver.h
+++ b/tools/testing/selftests/bpf/xskxceiver.h
@@ -78,6 +78,7 @@ enum test_type {
TEST_TYPE_ALIGNED_INV_DESC,
TEST_TYPE_ALIGNED_INV_DESC_2K_FRAME,
TEST_TYPE_UNALIGNED_INV_DESC,
+ TEST_TYPE_UNALIGNED_INV_DESC_4K1_FRAME,
TEST_TYPE_HEADROOM,
TEST_TYPE_TEARDOWN,
TEST_TYPE_BIDI,
--
2.39.2
Fix flaky STATS_RX_DROPPED test. The receiver calls getsockopt after
receiving the last (valid) packet which is not the final packet sent in
the test (valid and invalid packets are sent in alternating fashion with
the final packet being invalid). Since the last packet may or may not
have been dropped already, both outcomes must be allowed.
This issue could also be fixed by making sure the last packet sent is
valid. This alternative is left as an exercise to the reader (or the
benevolent maintainers of this file).
This problem was quite visible on certain setups. On one machine this
failure was observed 50% of the time.
Also, remove a redundant assignment of pkt_stream->nb_pkts. This field
is already initialized by __pkt_stream_alloc.
Fixes: 27e934bec35b ("selftests: xsk: make stat tests not spin on getsockopt")
Signed-off-by: Kal Conley <kal.conley(a)dectris.com>
---
tools/testing/selftests/bpf/xskxceiver.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/xskxceiver.c b/tools/testing/selftests/bpf/xskxceiver.c
index 34a1f32fe752..1a4bdd5aa78c 100644
--- a/tools/testing/selftests/bpf/xskxceiver.c
+++ b/tools/testing/selftests/bpf/xskxceiver.c
@@ -633,7 +633,6 @@ static struct pkt_stream *pkt_stream_generate(struct xsk_umem_info *umem, u32 nb
if (!pkt_stream)
exit_with_error(ENOMEM);
- pkt_stream->nb_pkts = nb_pkts;
for (i = 0; i < nb_pkts; i++) {
pkt_set(umem, &pkt_stream->pkts[i], (i % umem->num_frames) * umem->frame_size,
pkt_len);
@@ -1141,7 +1140,14 @@ static int validate_rx_dropped(struct ifobject *ifobject)
if (err)
return TEST_FAILURE;
- if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2)
+ /* The receiver calls getsockopt after receiving the last (valid)
+ * packet which is not the final packet sent in this test (valid and
+ * invalid packets are sent in alternating fashion with the final
+ * packet being invalid). Since the last packet may or may not have
+ * been dropped already, both outcomes must be allowed.
+ */
+ if (stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 ||
+ stats.rx_dropped == ifobject->pkt_stream->nb_pkts / 2 - 1)
return TEST_PASS;
return TEST_FAILURE;
--
2.39.2
We're testing usage of vsock as a way to redirect guest-local UDS
requests to the host and this patch series greatly improves the
performance of such a setup.
Compared to copying packets via userspace, this improves throughput by
121% in basic testing.
Tested as follows.
Setup: guest unix dgram sender -> guest vsock redirector -> host vsock
server
Threads: 1
Payload: 64k
No sockmap:
- 76.3 MB/s
- The guest vsock redirector was
"socat VSOCK-CONNECT:2:1234 UNIX-RECV:/path/to/sock"
Using sockmap (this patch):
- 168.8 MB/s (+121%)
- The guest redirector was a simple sockmap echo server,
redirecting unix ingress to vsock 2:1234 egress.
- Same sender and server programs
*Note: these numbers are from RFC v1
Only the virtio transport has been tested. The loopback transport was
used in writing bpf/selftests, but not thoroughly tested otherwise.
This series requires the skb patch.
Changes in v4:
- af_vsock: fix parameter alignment in vsock_dgram_recvmsg()
- af_vsock: add TCP_ESTABLISHED comment in vsock_dgram_connect()
- vsock/bpf: change ret type to bool
Changes in v3:
- vsock/bpf: Refactor wait logic in vsock_bpf_recvmsg() to avoid
backwards goto
- vsock/bpf: Check psock before acquiring slock
- vsock/bpf: Return bool instead of int of 0 or 1
- vsock/bpf: Wrap macro args __sk/__psock in parens
- vsock/bpf: Place comment trailer */ on separate line
Changes in v2:
- vsock/bpf: rename vsock_dgram_* -> vsock_*
- vsock/bpf: change sk_psock_{get,put} and {lock,release}_sock() order
to minimize slock hold time
- vsock/bpf: use "new style" wait
- vsock/bpf: fix bug in wait log
- vsock/bpf: add check that recvmsg sk_type is one dgram, seqpacket, or
stream. Return error if not one of the three.
- virtio/vsock: comment __skb_recv_datagram() usage
- virtio/vsock: do not init copied in read_skb()
- vsock/bpf: add ifdef guard around struct proto in dgram_recvmsg()
- selftests/bpf: add vsock loopback config for aarch64
- selftests/bpf: add vsock loopback config for s390x
- selftests/bpf: remove vsock device from vmtest.sh qemu machine
- selftests/bpf: remove CONFIG_VIRTIO_VSOCKETS=y from config.x86_64
- vsock/bpf: move transport-related (e.g., if (!vsk->transport)) checks
out of fast path
Signed-off-by: Bobby Eshleman <bobby.eshleman(a)bytedance.com>
---
Bobby Eshleman (3):
vsock: support sockmap
selftests/bpf: add vsock to vmtest.sh
selftests/bpf: add a test case for vsock sockmap
drivers/vhost/vsock.c | 1 +
include/linux/virtio_vsock.h | 1 +
include/net/af_vsock.h | 17 ++
net/vmw_vsock/Makefile | 1 +
net/vmw_vsock/af_vsock.c | 64 +++++++-
net/vmw_vsock/virtio_transport.c | 2 +
net/vmw_vsock/virtio_transport_common.c | 25 +++
net/vmw_vsock/vsock_bpf.c | 174 +++++++++++++++++++++
net/vmw_vsock/vsock_loopback.c | 2 +
tools/testing/selftests/bpf/config.aarch64 | 2 +
tools/testing/selftests/bpf/config.s390x | 3 +
tools/testing/selftests/bpf/config.x86_64 | 3 +
.../selftests/bpf/prog_tests/sockmap_listen.c | 163 +++++++++++++++++++
13 files changed, 452 insertions(+), 6 deletions(-)
---
base-commit: e5b42483ccce50d5b957f474fd332afd4ef0c27b
change-id: 20230327-vsock-sockmap-30b090c70cd1
Best regards,
--
Bobby Eshleman <bobby.eshleman(a)bytedance.com>
Hi all,
This is a cleanup series to consolidate a common signal setup code.
Right now quite a bit of duplicated code is there in an unorganized
way. Here is a rework of that signal-related code:
(1) Consolidate the signal handler helpers
They have been exactly copied everywhere. Place them in the shared
code. Then, remove those duplicates.
(2) Simplify altstack code
Most cases require just a usable alternate stack. So, there is a
chance to simplify them all. Abstract the entire setup code to one
setup call. Then, it can reduce the amount of code there.
For testing sigaltstack() specifically, another helper is provided
that excludes the syscall part.
The series also includes some preparatory changes for them:
* Along with the rework, some existing problem was uncovered. A couple
of tests look to free the altstack memory even before the signal
delivery. Adjust the memory cleanup to resolve this issue.
* Also resolve a define conflict separately before including the
refactored header.
Then, there is another selftest fix that I posted:
https://lore.kernel.org/lkml/20230330233520.21937-1-chang.seok.bae@intel.co…
which has a conflict with this. As the fix should go first, this
cleanup series is based on it.
FWIW, at the moment, the new x86 selftest cases -- lam and
test_shadow_stack do not conflict with this.
Here is the repository where this series can be found:
git://github.com/intel/amx-linux.git selftest-signal
Thanks,
Chang
Chang S. Bae (4):
selftests/x86: Fix the altstack free
selftests/x86/mov_ss_trap: Include processor-flags.h
selftests/x86: Consolidate signal handler helpers
selftests/x86: Refactor altstack setup code
tools/testing/selftests/x86/Makefile | 16 ++-
tools/testing/selftests/x86/amx.c | 67 +++--------
.../selftests/x86/corrupt_xstate_header.c | 15 +--
tools/testing/selftests/x86/entry_from_vm86.c | 25 +---
tools/testing/selftests/x86/fsgsbase.c | 25 +---
tools/testing/selftests/x86/helpers.c | 110 ++++++++++++++++++
tools/testing/selftests/x86/helpers.h | 10 ++
tools/testing/selftests/x86/ioperm.c | 26 +----
tools/testing/selftests/x86/iopl.c | 26 +----
tools/testing/selftests/x86/ldt_gdt.c | 19 +--
tools/testing/selftests/x86/mov_ss_trap.c | 26 +----
tools/testing/selftests/x86/ptrace_syscall.c | 24 +---
tools/testing/selftests/x86/sigaltstack.c | 67 +++--------
tools/testing/selftests/x86/sigreturn.c | 35 +-----
.../selftests/x86/single_step_syscall.c | 36 +-----
.../testing/selftests/x86/syscall_arg_fault.c | 24 +---
tools/testing/selftests/x86/syscall_nt.c | 13 ---
tools/testing/selftests/x86/sysret_rip.c | 24 +---
tools/testing/selftests/x86/test_vsyscall.c | 13 ---
tools/testing/selftests/x86/unwind_vdso.c | 13 ---
20 files changed, 205 insertions(+), 409 deletions(-)
create mode 100644 tools/testing/selftests/x86/helpers.c
--
2.17.1
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Changes in v2:
- Include <sys/mman.h> for tests.
- Implement FILE* in terms of integer pointers.
- Provide fdopen() and fileno().
- Link to v1: https://lore.kernel.org/lkml/20230328-nolibc-printf-test-v1-0-d7290ec893dd@…
---
Thomas Weißschuh (3):
tools/nolibc: add wrapper for memfd_create
tools/nolibc: implement fd-based FILE streams
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 60 +++++++++++----------
tools/include/nolibc/sys.h | 23 ++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 78 ++++++++++++++++++++++++++++
3 files changed, 134 insertions(+), 27 deletions(-)
---
base-commit: a63baab5f60110f3631c98b55d59066f1c68c4f7
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
vfprintf() is complex and so far did not have proper tests.
This series is based on the "dev" branch of the RCU tree.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (3):
tools/nolibc: add wrapper for memfd_create
tools/nolibc: let FILE streams contain an fd
tools/nolibc: add testcases for vfprintf
tools/include/nolibc/stdio.h | 36 +++----------
tools/include/nolibc/sys.h | 23 +++++++++
tools/testing/selftests/nolibc/nolibc-test.c | 77 ++++++++++++++++++++++++++++
3 files changed, 107 insertions(+), 29 deletions(-)
---
base-commit: a5333c037de823912dd20e933785c63de7679e64
change-id: 20230328-nolibc-printf-test-052d5abc2118
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
Hi,
This series adds initial KVM selftests support for powerpc
(64-bit, BookS). It spans 3 maintainers but it does not really
affect arch/powerpc, and it is well contained in selftests
code, just touches some makefiles and a tiny bit headers so
conflicts should be unlikely and trivial.
I guess Paolo is the best point to merge these, if no comments
or objections?
Thanks,
Nick
Nicholas Piggin (2):
KVM: PPC: Add kvm selftests support for powerpc
KVM: PPC: Add basic framework tests for kvm selftests
tools/testing/selftests/kvm/Makefile | 14 +
.../selftests/kvm/include/kvm_util_base.h | 13 +
.../selftests/kvm/include/powerpc/hcall.h | 22 ++
.../selftests/kvm/include/powerpc/processor.h | 13 +
tools/testing/selftests/kvm/lib/kvm_util.c | 10 +
.../testing/selftests/kvm/lib/powerpc/hcall.c | 45 +++
.../selftests/kvm/lib/powerpc/processor.c | 355 ++++++++++++++++++
.../testing/selftests/kvm/lib/powerpc/ucall.c | 30 ++
.../testing/selftests/kvm/powerpc/null_test.c | 186 +++++++++
.../selftests/kvm/powerpc/rtas_hcall.c | 146 +++++++
10 files changed, 834 insertions(+)
create mode 100644 tools/testing/selftests/kvm/include/powerpc/hcall.h
create mode 100644 tools/testing/selftests/kvm/include/powerpc/processor.h
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/hcall.c
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/processor.c
create mode 100644 tools/testing/selftests/kvm/lib/powerpc/ucall.c
create mode 100644 tools/testing/selftests/kvm/powerpc/null_test.c
create mode 100644 tools/testing/selftests/kvm/powerpc/rtas_hcall.c
--
2.37.2
This patch set adds support for using FOU or GUE encapsulation with
an ipip device operating in collect-metadata mode and a set of kfuncs
for controlling encap parameters exposed to a BPF tc-hook.
BPF tc-hooks allow us to read tunnel metadata (like remote IP addresses)
in the ingress path of an externally controlled tunnel interface via
the bpf_skb_get_tunnel_{key,opt} bpf-helpers. Packets can then be
redirected to the same or a different externally controlled tunnel
interface by overwriting metadata via the bpf_skb_set_tunnel_{key,opt}
helpers and a call to bpf_redirect. This enables us to redirect packets
between tunnel interfaces - and potentially change the encapsulation
type - using only a single BPF program.
Today this approach works fine for a couple of tunnel combinations.
For example: redirecting packets between Geneve and GRE interfaces or
GRE and plain ipip interfaces. However, redirecting using FOU or GUE is
not supported today. The ip_tunnel module does not allow us to egress
packets using additional UDP encapsulation from an ipip device in
collect-metadata mode.
Patch 1 lifts this restriction by adding a struct ip_tunnel_encap to
the tunnel metadata. It can be filled by a new BPF kfunc introduced
in Patch 2 and evaluated by the ip_tunnel egress path. This will allow
us to use FOU and GUE encap with externally controlled ipip devices.
Patch 2 introduces two new BPF kfuncs: bpf_skb_{set,get}_fou_encap.
These helpers can be used to set and get UDP encap parameters from the
BPF tc-hook doing the packet redirect.
Patch 3 adds BPF tunnel selftests using the two kfuncs.
Christian Ehrig (3):
ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip
devices
bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs
selftests/bpf: Test FOU kfuncs for externally controlled ipip devices
include/net/fou.h | 2 +
include/net/ip_tunnels.h | 27 ++--
net/ipv4/Makefile | 2 +-
net/ipv4/fou_bpf.c | 118 ++++++++++++++++++
net/ipv4/fou_core.c | 5 +
net/ipv4/ip_tunnel.c | 22 +++-
net/ipv4/ipip.c | 1 +
net/ipv6/sit.c | 2 +-
.../selftests/bpf/progs/test_tunnel_kern.c | 117 +++++++++++++++++
tools/testing/selftests/bpf/test_tunnel.sh | 81 ++++++++++++
10 files changed, 360 insertions(+), 17 deletions(-)
create mode 100644 net/ipv4/fou_bpf.c
--
2.39.2
Support ROHM BU27034 ALS sensor
This series adds support for ROHM BU27034 Ambient Light Sensor.
The BU27034 has configurable gain and measurement (integration) time
settings. Both of these have inversely proportional relation to the
sensor's intensity channel scale.
Many users only set the scale, which means that many drivers attempt to
'guess' the best gain+time combination to meet the scale. Usually this
is the biggest integration time which allows setting the requested
scale. Typically, increasing the integration time has better accuracy
than increasing the gain, which often amplifies the noise as well as the
real signal.
However, there may be cases where more responsive sensors are needed.
So, in some cases the longest integration times may not be what the user
prefers. The driver has no way of knowing this.
Hence, the approach taken by this series is to allow user to set both
the scale and the integration time with following logic:
1. When scale is set, the existing integration time is tried to be
maintained as a first priority.
1a) If the requested scale can't be met by current time, then also
other time + gain combinations are searched. If scale can be met
by some other integration time, then the new time may be applied.
If the time setting is common for all channels, then also other
channels must be able to maintain their scale with this new time
(by changing their gain). The new times are scanned in the order
of preference (typically the longest times first).
1b) If the requested scale can be met using current time, then only
the gain for the channel is changed.
2. When the integration time change - scale is tried to be maintained.
When integration time change is requested also gain for all impacted
channels is adjusted so that the scale is not changed, or is chaned
as little as possible. This is different from the RFCv1 where the
request was rejected if suitable gain couldn't be found for some
channel(s).
This logic is simple. When total gain (either caused by time or hw-gain)
is doubled, the scale gets halved. Also, the supported times are given a
'multiplier' value which tells how much they increase the total gain.
However, when I wrote this logic in bu27034 driver, I made quite a few
errors on the way - and driver got pretty big. As I am writing drivers
for two other sensors (RGB C/IR + flicker BU27010 and RGB C/IR BU27008)
with similar gain-time-scale logic I thought that adding common helpers
for these computations might be wise. I hope this way all the bugs will
be concentrated in one place and not in every individual driver ;)
Hence, this series also intriduces IIO gain-time-scale helpers
(abbreviated as gts-helpers) + a couple of KUnit tests for the most
hairy parts.
Speaking of which - testing the devm interfaces requires a 'dummy
device'. There were neat helpers in DRM tests for creating and freeing
such a device. This series moves those helpers to more generic location.
What is worth noting is that there is something similar ongoing in the
CCF territory:
https://lore.kernel.org/all/20230302013822.1808711-1-sboyd@kernel.org/
These efforts should be somehow coordinated in order to avoid any avoid
conflicts.
Finally, these added helpers do provide some value also for drivers
which only:
a) allow gain change
or
b) allow changing both the time and gain while trying to maintain the
scale.
For a) we provide the gain - selector (register value) table format +
selector to gain look-ups, gain <-> scale conversions and the available
scales helpers.
For latter case we also provide the time-tables, and actually all the
APIs should be usable by setting the time multiplier to 1. (not testeted
thoroughly though).
The patch 1/8 introduces the helpers for creating/dropping a test device
for devm-tests. It can be applied alone.
The patches 2/8 (convert DRM tests to use new helper) depends on patch
1/8 but is othervice not part of this series. It can be applied to DRM
tree after the dependency to 1/8 is handled.
The patch 5/8 (IIO GTS tests) also depends on the patch 1/8 (and also
other patches in the series).
Rest of the series should be Ok to be applied with/without the patches
1/8, 2/8, 5/8 - although the 5/8 would be "nice to have" together with
the rest of the series for the testability reasons.
Revision history:
v4 => v5: Mostly fixes to review comments from Andy and Jonathan.
- more accurate change-log in individual patches
- copy code from DRM test helper instead of moving it to simplify
merging
- document all exported GTS helpers.
- inline a few GTS helpers
- use again Milli lux for the bu27034 with RAW IIO_LIGHT channel and scale
- Fix bug from added in v4 bu27034 time setting.
v3 => v4: (Still mostly fixes to review comments from Andy and Jonathan)
- more accurate change-log in individual patches
- dt-binding and maintainer patches unchanged.
- dropped unused helpers and converted ones currently used only internally
to static.
- extracted "dummy device" creation helpers from DRM tests.
- added tests for devm APIs
- dropped scale for PROCESSED channel in BU27034 and converted mLux
values to luxes
- dropped channel 2 GAIN setting which can't be done due to HW
limitations.
v2 => v3: (Mostly fixes to review comments from Andy and Jonathan)
- dt-binding and maintainer patches unchanged.
- iio-gts-helper tests: Use namespaces
- iio-gts-helpers + bu27034 plenty of changes. See more comprehensive
changelog in individual patches.
RFCv1 => v2:
dt-bindings:
- Fix binding file name and id by using comma instead of a hyphen to
separate the vendor and part names.
gts-helpers:
- fix include guardian
- Improve kernel doc for iio_init_iio_gts.
- Add iio_gts_scale_to_total_gain
- Add iio_gts_total_gain_to_scale
- Fix review comments from Jonathan
- add documentation to few functions
- replace 0xffffffffffffffffLLU by U64_MAX
- some styling fixes
- drop unnecessary NULL checks
- order function arguments by in / out purpose
- drop GAIN_SCALE_ITIME_MS()
- Add helpers for available scales and times
- Rename to iio-gts-helpers
gts-tests:
- add tests for available scales/times helpers
- adapt to renamed iio-gts-helpers.h header
bu27034-driver:
- (really) protect read-only registers
- fix get and set gain
- buffered mode
- Protect the whole sequences including meas_en/meas_dis to avoid messing
up the enable / disable order
- typofixes / doc improvements
- change dropped GAIN_SCALE_ITIME_MS() to GAIN_SCALE_ITIME_US()
- use more accurate scale for lux channel (milli lux)
- provide available scales / integration times (using helpers).
- adapt to renamed iio-gts-helpers.h file
- bu27034 - longer lines in Kconfig
- Drop bu27034_meas_en and bu27034_meas_dis wrappers.
- Change device-name from bu27034-als to bu27034
MAINTAINERS:
- Add iio-list
---
Matti Vaittinen (8):
drivers: kunit: Generic helpers for test device creation
drm/tests: helpers: Use generic helpers
dt-bindings: iio: light: Support ROHM BU27034
iio: light: Add gain-time-scale helpers
iio: test: test gain-time-scale helpers
MAINTAINERS: Add IIO gain-time-scale helpers
iio: light: ROHM BU27034 Ambient Light Sensor
MAINTAINERS: Add ROHM BU27034
.../bindings/iio/light/rohm,bu27034.yaml | 46 +
MAINTAINERS | 14 +
drivers/base/test/Kconfig | 5 +
drivers/base/test/Makefile | 2 +
drivers/base/test/test_kunit_device.c | 83 +
drivers/gpu/drm/Kconfig | 2 +
.../gpu/drm/tests/drm_client_modeset_test.c | 5 +-
drivers/gpu/drm/tests/drm_kunit_helpers.c | 69 -
drivers/gpu/drm/tests/drm_managed_test.c | 5 +-
drivers/gpu/drm/tests/drm_modes_test.c | 5 +-
drivers/gpu/drm/tests/drm_probe_helper_test.c | 5 +-
drivers/gpu/drm/vc4/Kconfig | 1 +
drivers/gpu/drm/vc4/tests/vc4_mock.c | 3 +-
.../gpu/drm/vc4/tests/vc4_test_pv_muxing.c | 9 +-
drivers/iio/Kconfig | 3 +
drivers/iio/Makefile | 1 +
drivers/iio/industrialio-gts-helper.c | 1064 ++++++++++++
drivers/iio/light/Kconfig | 14 +
drivers/iio/light/Makefile | 1 +
drivers/iio/light/rohm-bu27034.c | 1482 +++++++++++++++++
drivers/iio/test/Kconfig | 14 +
drivers/iio/test/Makefile | 1 +
drivers/iio/test/iio-test-gts.c | 542 ++++++
include/drm/drm_kunit_helpers.h | 7 +-
include/kunit/platform_device.h | 13 +
include/linux/iio/iio-gts-helper.h | 206 +++
26 files changed, 3515 insertions(+), 87 deletions(-)
create mode 100644 Documentation/devicetree/bindings/iio/light/rohm,bu27034.yaml
create mode 100644 drivers/base/test/test_kunit_device.c
create mode 100644 drivers/iio/industrialio-gts-helper.c
create mode 100644 drivers/iio/light/rohm-bu27034.c
create mode 100644 drivers/iio/test/iio-test-gts.c
create mode 100644 include/kunit/platform_device.h
create mode 100644 include/linux/iio/iio-gts-helper.h
base-commit: eeac8ede17557680855031c6f305ece2378af326
--
2.39.2
--
Matti Vaittinen, Linux device drivers
ROHM Semiconductors, Finland SWDC
Kiviharjunlenkki 1E
90220 OULU
FINLAND
~~~ "I don't think so," said Rene Descartes. Just then he vanished ~~~
Simon says - in Latin please.
~~~ "non cogito me" dixit Rene Descarte, deinde evanescavit ~~~
Thanks to Simon Glass for the translation =]
Change the KTAP v2 spec to allow variable prefixes to KTAP lines,
instead of fixed indentation of two spaces. However, the prefix must be
constant on the same level of testing (besides unknown lines).
This was proposed by Tim Bird in 2021 and then supported by Frank Rowand
in 2022 (see link below).
Link: https://lore.kernel.org/all/bc6e9ed7-d98b-c4da-2a59-ee0915c18f10@gmail.com/
As cited in the original proposal, it is useful in some Fuego tests to
include an identifier in the prefix. This is an example:
KTAP version 1
1..2
[batch_id 4] KTAP version 1
[batch_id 4] 1..2
[batch_id 4] ok 1 cyclictest with 1000 cycles
[batch_id 4] # problem setting CLOCK_REALTIME
[batch_id 4] not ok 2 cyclictest with CLOCK_REALTIME
not ok 1 check realtime
[batch_id 4] KTAP version 1
[batch_id 4] 1..1
[batch_id 4] ok 1 IOZone read/write 4k blocks
ok 2 check I/O performance
Here is a link to a version of the KUnit parser that is able to parse
variable length prefixes for KTAP version 2. Note that the prefix must
be constant at the same level of testing.
Link: https://kunit-review.googlesource.com/c/linux/+/5710
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
This idea has already been proposed but I wanted to potentially
restart the discussion by demonstrating this change can by
implemented in the KUnit parser. Let me know what you think.
Note: this patch is based on Frank's ktap_spec_version_2 branch.
Documentation/dev-tools/ktap.rst | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..ac61fdd97096 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -192,9 +192,11 @@ starting with another KTAP version line and test plan, and end with the overall
result. If one of the subtests fail, for example, the parent test should also
fail.
-Additionally, all lines in a subtest should be indented. One level of
-indentation is two spaces: " ". The indentation should begin at the version
-line and should end before the parent test's result line.
+Additionally, all lines in a subtest should be indented. The standard for one
+level of indentation is two spaces: " ". However, any prefix for indentation
+is allowed as long as the prefix is consistent throughout that level of
+testing. The indentation should begin at the version line and should end
+before the parent test's result line.
"Unknown lines" are not considered to be lines in a subtest and thus are
allowed to be either indented or not indented.
@@ -229,6 +231,19 @@ An example format with multiple levels of nested testing:
not ok 1 example_test_1
ok 2 example_test_2
+An example of a test with two nested subtests using prefixes:
+
+::
+
+ KTAP version 2
+ 1..1
+ [prefix_1] KTAP version 2
+ [prefix_1] 1..2
+ [prefix_1] ok 1 test_1
+ [prefix_1] ok 2 test_2
+ # example passed
+ ok 1 example
+
Major differences between TAP and KTAP
--------------------------------------
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.40.0.rc1.284.g88254d51c5-goog
Add recognition of the test name line ("# Subtest: <name>") to the KTAP v2
spec.
The purpose of this line is to declare the name of a test before its
results. This functionality is especially useful when trying to parse test
results incrementally and when interpretting results after a crash.
This line is already compliant with KTAP v1 as it is interpretted as a
diagnostic line by parsers. Additionally, the line is currently used by
KUnit tests and was derived from the TAP 14 spec:
https://testanything.org/tap-version-14-specification.html.
Recognition of this line would create an accepted way for different test
frameworks to declare the name of a test before its results.
The proposed location for this line is between the version line and the
test plan line. This location ensures that the line would not be
accidentally parsed as a subtest's diagnostic lines. Note this proposed
location would be a slight differentiation from KTAP v1.
Example of test name line:
KTAP version 2
# Subtest: main_test
1..1
KTAP version 2
# Subtest: sub_test
1..2
ok 1 test_1
ok 2 test_2
ok 1 sub_test
Here is a link to a version of the KUnit parser that is able to parse the
test name line for KTAP version 2. Note this includes a test name line for
the main level of KTAP.
Link: https://kunit-review.googlesource.com/c/linux/+/5709
Signed-off-by: Rae Moar <rmoar(a)google.com>
---
This is a RFC. I would like to know what people think and use this as a
platform for discussion on KTAP v2.
Note: this patch is based on Frank's ktap_spec_version_2 branch.
Documentation/dev-tools/ktap.rst | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/Documentation/dev-tools/ktap.rst b/Documentation/dev-tools/ktap.rst
index ff77f4aaa6ef..9c7ed66d9f77 100644
--- a/Documentation/dev-tools/ktap.rst
+++ b/Documentation/dev-tools/ktap.rst
@@ -28,8 +28,7 @@ KTAP output is built from four different types of lines:
In general, valid KTAP output should also form valid TAP output, but some
information, in particular nested test results, may be lost. Also note that
there is a stagnant draft specification for TAP14, KTAP diverges from this in
-a couple of places (notably the "Subtest" header), which are described where
-relevant later in this document.
+a couple of places, which are described where relevant later in this document.
Version lines
-------------
@@ -44,8 +43,8 @@ For example:
- "TAP version 14"
Note that, in KTAP, subtests also begin with a version line, which denotes the
-start of the nested test results. This differs from TAP14, which uses a
-separate "Subtest" line.
+start of the nested test results. This differs from TAP14, which uses only a
+"Subtest" line.
While, going forward, "KTAP version 2" should be used by compliant tests, it
is expected that most parsers and other tooling will accept the other versions
@@ -166,6 +165,12 @@ even if they do not start with a "#": this is to capture any other useful
kernel output which may help debug the test. It is nevertheless recommended
that tests always prefix any diagnostic output they have with a "#" character.
+One recognized diagnostic line is the "# Subtest: <name>" line. This line
+is used to declare the name of a test before subtest results are printed. This
+is helpful for parsing and for providing context during crashes. As a rule,
+this line is placed after the version line and before the plan line. Note
+this line can be used for the main test, as well as subtests.
+
Unknown lines
-------------
@@ -206,6 +211,7 @@ An example of a test with two nested subtests:
KTAP version 2
1..1
KTAP version 2
+ # Subtest: example
1..2
ok 1 test_1
not ok 2 test_2
@@ -219,6 +225,7 @@ An example format with multiple levels of nested testing:
KTAP version 2
1..2
KTAP version 2
+ # Subtest: example_test_1
1..2
KTAP version 2
1..2
@@ -245,7 +252,7 @@ allows an arbitrary number of tests to be nested no yes
The TAP14 specification does permit nested tests, but instead of using another
nested version line, uses a line of the form
-"Subtest: <name>" where <name> is the name of the parent test.
+"Subtest: <name>" where <name> is the name of the parent test as discussed above.
Example KTAP output
--------------------
@@ -254,6 +261,7 @@ Example KTAP output
KTAP version 2
1..1
KTAP version 2
+ # Subtest: main_test
1..3
KTAP version 2
1..1
@@ -266,6 +274,7 @@ Example KTAP output
ok 2 test_2
ok 2 example_test_2
KTAP version 2
+ # Subtest: example_test_3
1..3
ok 1 test_1
# test_2: FAIL
base-commit: 906f02e42adfbd5ae70d328ee71656ecb602aaf5
--
2.40.0.rc1.284.g88254d51c5-goog
On Fri, Mar 31, 2023 at 8:05 AM kernel test robot <yujie.liu(a)intel.com> wrote:
>
> Hello,
>
> kernel test robot noticed kernel-selftests.memfd.run_fuse_test.sh.fail due to commit (built with gcc-11):
>
> commit: 11f75a01448f1b7a739e75dbd8f17b844fcfc510 ("selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-d4cf28ee-1_20230110
> with following parameters:
>
> group: group-02
>
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
> on test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> # selftests: memfd: run_fuse_test.sh
> # Aborted
> not ok 2 selftests: memfd: run_fuse_test.sh # exit=134
>
> $ ./run_fuse_test.sh
> opening: ./mnt/memfd
> 8 != 40 = GET_SEALS(4)
> Aborted
Hi Jeff,
I think this is caused by test_sysctl() in memfd_test, which sets
/proc/sys/vm/memfd_noexec to a non-zero value and does not restore it
at the end of the test. If fuse_test runs after that, it will
unexpectedly get F_SEAL_EXEC in its memfd seals in addition to the
F_SEAL_WRITE that it intended to add.
I'm not sure how kernel selftests normally perform cleanup (e.g. an
atexit() hook to make sure it cleans up if a test fails?), but at
least we should probably set /proc/sys/vm/memfd_noexec back to its
original value after test_sysctl().
Thanks,
-- Daniel
The fork function in gcc is considered a built in function due to
being used by libgcov when building with gnu extensions.
Rename fork to sched_process_fork to prevent this conflict.
See details:
https://github.com/gcc-mirror/gcc/commit/d1c38823924506d389ca58d02926ace21b…https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82457
Fixes the following error:
In file included from progs/bench_local_storage_create.c:6:
progs/bench_local_storage_create.c:43:14: error: conflicting types for
built-in function 'fork'; expected 'int(void)'
[-Werror=builtin-declaration-mismatch]
43 | int BPF_PROG(fork, struct task_struct *parent, struct
task_struct *child)
| ^~~~
Fixes: cbe9d93d58b1 ("selftests/bpf: Add bench for task storage creation")
Signed-off-by: James Hilliard <james.hilliard1(a)gmail.com>
Cc: Martin KaFai Lau <martin.lau(a)kernel.org>
---
tools/testing/selftests/bpf/benchs/bench_local_storage_create.c | 2 +-
tools/testing/selftests/bpf/progs/bench_local_storage_create.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c b/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c
index abb0321d4f34..cff703f90e95 100644
--- a/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c
+++ b/tools/testing/selftests/bpf/benchs/bench_local_storage_create.c
@@ -95,7 +95,7 @@ static void setup(void)
exit(1);
}
} else {
- if (!bpf_program__attach(skel->progs.fork)) {
+ if (!bpf_program__attach(skel->progs.sched_process_fork)) {
fprintf(stderr, "Error attaching bpf program\n");
exit(1);
}
diff --git a/tools/testing/selftests/bpf/progs/bench_local_storage_create.c b/tools/testing/selftests/bpf/progs/bench_local_storage_create.c
index 7c851c9d5e47..e4bfbba6c193 100644
--- a/tools/testing/selftests/bpf/progs/bench_local_storage_create.c
+++ b/tools/testing/selftests/bpf/progs/bench_local_storage_create.c
@@ -40,7 +40,7 @@ int BPF_PROG(kmalloc, unsigned long call_site, const void *ptr,
}
SEC("tp_btf/sched_process_fork")
-int BPF_PROG(fork, struct task_struct *parent, struct task_struct *child)
+int BPF_PROG(sched_process_fork, struct task_struct *parent, struct task_struct *child)
{
struct storage *stg;
--
2.34.1
Commit 65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h") moved the
definition of IPPROTO_L2TP from a define to an enum, but since
__stringify doesn't work properly with enums, we ended up breaking the
modalias strings for the l2tp modules:
$ modinfo l2tp_ip l2tp_ip6 | grep alias
alias: net-pf-2-proto-IPPROTO_L2TP
alias: net-pf-2-proto-2-type-IPPROTO_L2TP
alias: net-pf-10-proto-IPPROTO_L2TP
alias: net-pf-10-proto-2-type-IPPROTO_L2TP
Use the resolved number directly in MODULE_ALIAS_*() macros (as we
already do with SOCK_DGRAM) to fix the alias strings:
$ modinfo l2tp_ip l2tp_ip6 | grep alias
alias: net-pf-2-proto-115
alias: net-pf-2-proto-115-type-2
alias: net-pf-10-proto-115
alias: net-pf-10-proto-115-type-2
Moreover, fix the ordering of the parameters passed to
MODULE_ALIAS_NET_PF_PROTO_TYPE() by switching proto and type.
Fixes: 65b32f801bfb ("uapi: move IPPROTO_L2TP to in.h")
Link: https://lore.kernel.org/lkml/ZCQt7hmodtUaBlCP@righiandr-XPS-13-7390
Signed-off-by: Guillaume Nault <gnault(a)redhat.com>
Signed-off-by: Andrea Righi <andrea.righi(a)canonical.com>
---
net/l2tp/l2tp_ip.c | 8 ++++----
net/l2tp/l2tp_ip6.c | 8 ++++----
2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
index 4db5a554bdbd..41a74fc84ca1 100644
--- a/net/l2tp/l2tp_ip.c
+++ b/net/l2tp/l2tp_ip.c
@@ -677,8 +677,8 @@ MODULE_AUTHOR("James Chapman <jchapman(a)katalix.com>");
MODULE_DESCRIPTION("L2TP over IP");
MODULE_VERSION("1.0");
-/* Use the value of SOCK_DGRAM (2) directory, because __stringify doesn't like
- * enums
+/* Use the values of SOCK_DGRAM (2) as type and IPPROTO_L2TP (115) as protocol,
+ * because __stringify doesn't like enums
*/
-MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 2, IPPROTO_L2TP);
-MODULE_ALIAS_NET_PF_PROTO(PF_INET, IPPROTO_L2TP);
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET, 115, 2);
+MODULE_ALIAS_NET_PF_PROTO(PF_INET, 115);
diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index 2478aa60145f..5137ea1861ce 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -806,8 +806,8 @@ MODULE_AUTHOR("Chris Elston <celston(a)katalix.com>");
MODULE_DESCRIPTION("L2TP IP encapsulation for IPv6");
MODULE_VERSION("1.0");
-/* Use the value of SOCK_DGRAM (2) directory, because __stringify doesn't like
- * enums
+/* Use the values of SOCK_DGRAM (2) as type and IPPROTO_L2TP (115) as protocol,
+ * because __stringify doesn't like enums
*/
-MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET6, 2, IPPROTO_L2TP);
-MODULE_ALIAS_NET_PF_PROTO(PF_INET6, IPPROTO_L2TP);
+MODULE_ALIAS_NET_PF_PROTO_TYPE(PF_INET6, 115, 2);
+MODULE_ALIAS_NET_PF_PROTO(PF_INET6, 115);
--
2.39.2