From: Yi Liu <yi.l.liu(a)intel.com>
[ Upstream commit 1062d81086156e42878d701b816d2f368b53a77c ]
Allocating a domain with a fault ID indicates that the domain is faultable.
However, there is a gap for the nested parent domain to support PRI. Some
hardware lacks the capability to distinguish whether PRI occurs at stage 1
or stage 2. This limitation may require software-based page table walking
to resolve. Since no in-tree IOMMU driver currently supports this
functionality, it is disallowed. For more details, refer to the related
discussion at [1].
[1] https://lore.kernel.org/linux-iommu/bd1655c6-8b2f-4cfa-adb1-badc00d01811@in…
Link: https://patch.msgid.link/r/20250226104012.82079-1-yi.l.liu@intel.com
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu(a)intel.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/iommu/iommufd/hw_pagetable.c | 3 +++
tools/testing/selftests/iommu/iommufd.c | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index d06bf6e6c19fd..2454627a8b61b 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -122,6 +122,9 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
!device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING))
return ERR_PTR(-EOPNOTSUPP);
+ if ((flags & IOMMU_HWPT_FAULT_ID_VALID) &&
+ (flags & IOMMU_HWPT_ALLOC_NEST_PARENT))
+ return ERR_PTR(-EOPNOTSUPP);
hwpt_paging = __iommufd_object_alloc(
ictx, hwpt_paging, IOMMUFD_OBJ_HWPT_PAGING, common.obj);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index 4927b9add5add..06f252733660a 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -289,6 +289,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
&test_hwpt_id);
test_err_hwpt_alloc(EINVAL, self->device_id, self->device_id, 0,
&test_hwpt_id);
+ test_err_hwpt_alloc(EOPNOTSUPP, self->device_id, self->ioas_id,
+ IOMMU_HWPT_ALLOC_NEST_PARENT |
+ IOMMU_HWPT_FAULT_ID_VALID,
+ &test_hwpt_id);
test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
IOMMU_HWPT_ALLOC_NEST_PARENT,
--
2.39.5
From: Petr Machata <petrm(a)nvidia.com>
[ Upstream commit d9e9f6d7b7d0c520bb87f19d2cbc57aeeb2091d5 ]
Attempts to replace an MDB group membership of the host itself are
currently bounced:
# ip link add name br up type bridge vlan_filtering 1
# bridge mdb replace dev br port br grp 239.0.0.1 vid 2
# bridge mdb replace dev br port br grp 239.0.0.1 vid 2
Error: bridge: Group is already joined by host.
A similar operation done on a member port would succeed. Ignore the check
for replacement of host group memberships as well.
The bit of code that this enables is br_multicast_host_join(), which, for
already-joined groups only refreshes the MC group expiration timer, which
is desirable; and a userspace notification, also desirable.
Change a selftest that exercises this code path from expecting a rejection
to expecting a pass. The rest of MDB selftests pass without modification.
Signed-off-by: Petr Machata <petrm(a)nvidia.com>
Reviewed-by: Ido Schimmel <idosch(a)nvidia.com>
Acked-by: Nikolay Aleksandrov <razor(a)blackwall.org>
Link: https://patch.msgid.link/e5c5188b9787ae806609e7ca3aa2a0a501b9b5c4.173868564…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
net/bridge/br_mdb.c | 2 +-
tools/testing/selftests/net/forwarding/bridge_mdb.sh | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bridge/br_mdb.c b/net/bridge/br_mdb.c
index 1a52a0bca086d..7e1ad229e1330 100644
--- a/net/bridge/br_mdb.c
+++ b/net/bridge/br_mdb.c
@@ -1040,7 +1040,7 @@ static int br_mdb_add_group(const struct br_mdb_config *cfg,
/* host join */
if (!port) {
- if (mp->host_joined) {
+ if (mp->host_joined && !(cfg->nlflags & NLM_F_REPLACE)) {
NL_SET_ERR_MSG_MOD(extack, "Group is already joined by host");
return -EEXIST;
}
diff --git a/tools/testing/selftests/net/forwarding/bridge_mdb.sh b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
index d9d587454d207..8c1597ebc2d38 100755
--- a/tools/testing/selftests/net/forwarding/bridge_mdb.sh
+++ b/tools/testing/selftests/net/forwarding/bridge_mdb.sh
@@ -149,7 +149,7 @@ cfg_test_host_common()
check_err $? "Failed to add $name host entry"
bridge mdb replace dev br0 port br0 grp $grp $state vid 10 &> /dev/null
- check_fail $? "Managed to replace $name host entry"
+ check_err $? "Failed to replace $name host entry"
bridge mdb del dev br0 port br0 grp $grp $state vid 10
bridge mdb get dev br0 grp $grp vid 10 &> /dev/null
--
2.39.5
From: Brendan Jackman <jackmanb(a)google.com>
[ Upstream commit 08fafac4c9f289a9d9a22d838921e4b3eb22c664 ]
As noted in [0], SeaBIOS (QEMU default) makes a mess of the terminal,
qboot does not.
It turns out this is actually useful with kunit.py, since the user is
exposed to this issue if they set --raw_output=all.
qboot is also faster than SeaBIOS, but it's is marginal for this
usecase.
[0] https://lore.kernel.org/all/CA+i-1C0wYb-gZ8Mwh3WSVpbk-LF-Uo+njVbASJPe1WXDUR…
Both SeaBIOS and qboot are x86-specific.
Link: https://lore.kernel.org/r/20250124-kunit-qboot-v1-1-815e4d4c6f7c@google.com
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/qemu_configs/x86_64.py | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/kunit/qemu_configs/x86_64.py b/tools/testing/kunit/qemu_configs/x86_64.py
index dc79490768630..4a6bf4e048f5b 100644
--- a/tools/testing/kunit/qemu_configs/x86_64.py
+++ b/tools/testing/kunit/qemu_configs/x86_64.py
@@ -7,4 +7,6 @@ CONFIG_SERIAL_8250_CONSOLE=y''',
qemu_arch='x86_64',
kernel_path='arch/x86/boot/bzImage',
kernel_command_line='console=ttyS0',
- extra_qemu_params=[])
+ # qboot is faster than SeaBIOS and doesn't mess up
+ # the terminal.
+ extra_qemu_params=['-bios', 'qboot.rom'])
--
2.39.5
From: Yi Liu <yi.l.liu(a)intel.com>
[ Upstream commit 1062d81086156e42878d701b816d2f368b53a77c ]
Allocating a domain with a fault ID indicates that the domain is faultable.
However, there is a gap for the nested parent domain to support PRI. Some
hardware lacks the capability to distinguish whether PRI occurs at stage 1
or stage 2. This limitation may require software-based page table walking
to resolve. Since no in-tree IOMMU driver currently supports this
functionality, it is disallowed. For more details, refer to the related
discussion at [1].
[1] https://lore.kernel.org/linux-iommu/bd1655c6-8b2f-4cfa-adb1-badc00d01811@in…
Link: https://patch.msgid.link/r/20250226104012.82079-1-yi.l.liu@intel.com
Suggested-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu(a)intel.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/iommu/iommufd/hw_pagetable.c | 3 +++
tools/testing/selftests/iommu/iommufd.c | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 598be26a14e28..9b5b0b8522299 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -126,6 +126,9 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
if ((flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING) &&
!device_iommu_capable(idev->dev, IOMMU_CAP_DIRTY_TRACKING))
return ERR_PTR(-EOPNOTSUPP);
+ if ((flags & IOMMU_HWPT_FAULT_ID_VALID) &&
+ (flags & IOMMU_HWPT_ALLOC_NEST_PARENT))
+ return ERR_PTR(-EOPNOTSUPP);
hwpt_paging = __iommufd_object_alloc(
ictx, hwpt_paging, IOMMUFD_OBJ_HWPT_PAGING, common.obj);
diff --git a/tools/testing/selftests/iommu/iommufd.c b/tools/testing/selftests/iommu/iommufd.c
index a1b2b657999dc..618c03bb6509b 100644
--- a/tools/testing/selftests/iommu/iommufd.c
+++ b/tools/testing/selftests/iommu/iommufd.c
@@ -439,6 +439,10 @@ TEST_F(iommufd_ioas, alloc_hwpt_nested)
&test_hwpt_id);
test_err_hwpt_alloc(EINVAL, self->device_id, self->device_id, 0,
&test_hwpt_id);
+ test_err_hwpt_alloc(EOPNOTSUPP, self->device_id, self->ioas_id,
+ IOMMU_HWPT_ALLOC_NEST_PARENT |
+ IOMMU_HWPT_FAULT_ID_VALID,
+ &test_hwpt_id);
test_cmd_hwpt_alloc(self->device_id, self->ioas_id,
IOMMU_HWPT_ALLOC_NEST_PARENT,
--
2.39.5
From: Niklas Cassel <cassel(a)kernel.org>
[ Upstream commit af1451b6738ec7cf91f2914f53845424959ec4ee ]
Currently BARs that have been disabled by the endpoint controller driver
will result in a test FAIL.
Returning FAIL for a BAR that is disabled seems overly pessimistic.
There are EPC that disables one or more BARs intentionally.
One reason for this is that there are certain EPCs that are hardwired to
expose internal PCIe controller registers over a certain BAR, so the EPC
driver disables such a BAR, such that the host will not overwrite random
registers during testing.
Such a BAR will be disabled by the EPC driver's init function, and the
BAR will be marked as BAR_RESERVED, such that it will be unavailable to
endpoint function drivers.
Let's return FAIL only for BARs that are actually enabled and failed the
test, and let's return skip for BARs that are not even enabled.
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20250123120147.3603409-4-cassel@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/pci_endpoint/pci_endpoint_test.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c b/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
index c267b822c1081..576c590b277b1 100644
--- a/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
+++ b/tools/testing/selftests/pci_endpoint/pci_endpoint_test.c
@@ -65,6 +65,8 @@ TEST_F(pci_ep_bar, BAR_TEST)
int ret;
pci_ep_ioctl(PCITEST_BAR, variant->barno);
+ if (ret == -ENODATA)
+ SKIP(return, "BAR is disabled");
EXPECT_FALSE(ret) TH_LOG("Test failed for BAR%d", variant->barno);
}
--
2.39.5
From: Rae Moar <rmoar(a)google.com>
[ Upstream commit 1d4c06d51963f89f67c7b75d5c0c34e9d1bb2ae6 ]
A bug was identified where the KTAP below caused an infinite loop:
TAP version 13
ok 4 test_case
1..4
The infinite loop was caused by the parser not parsing a test plan
if following a test result line.
Fix this bug by parsing test plan line to avoid the infinite loop.
Link: https://lore.kernel.org/r/20250313192714.1380005-1-rmoar@google.com
Signed-off-by: Rae Moar <rmoar(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
Signed-off-by: Shuah Khan <shuah(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/kunit/kunit_parser.py | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/tools/testing/kunit/kunit_parser.py b/tools/testing/kunit/kunit_parser.py
index 29fc27e8949bd..da53a709773a2 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -759,7 +759,7 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# If parsing the main/top-level test, parse KTAP version line and
# test plan
test.name = "main"
- ktap_line = parse_ktap_header(lines, test, printer)
+ parse_ktap_header(lines, test, printer)
test.log.extend(parse_diagnostic(lines))
parse_test_plan(lines, test)
parent_test = True
@@ -768,13 +768,12 @@ def parse_test(lines: LineStream, expected_num: int, log: List[str], is_subtest:
# the KTAP version line and/or subtest header line
ktap_line = parse_ktap_header(lines, test, printer)
subtest_line = parse_test_header(lines, test)
+ test.log.extend(parse_diagnostic(lines))
+ parse_test_plan(lines, test)
parent_test = (ktap_line or subtest_line)
if parent_test:
- # If KTAP version line and/or subtest header is found, attempt
- # to parse test plan and print test header
- test.log.extend(parse_diagnostic(lines))
- parse_test_plan(lines, test)
print_test_header(test, printer)
+
expected_count = test.expected_count
subtests = []
test_num = 1
--
2.39.5
After a long delay I'm posting next iteration of lockless /proc/pid/maps
reading patchset. Differences from v2 [1]:
- Add a set of tests concurrently modifying address space and checking for
correct reading results;
- Use new mmap_lock_speculate_xxx APIs for concurrent change detection and
retries;
- Add lockless PROCMAP_QUERY execution support;
The new tests are designed to check for any unexpected data tearing while
performing some common address space modifications (vma split, resize and
remap). Even before these changes, reading /proc/pid/maps might have
inconsistent data because the file is read page-by-page with mmap_lock
being dropped between the pages. Such tearing is expected and userspace
is supposed to deal with that possibility. An example of user-visible
inconsistency can be that the same vma is printed twice: once before
it was modified and then after the modifications. For example if vma was
extended, it might be found and reported twice. Whan is not expected is
to see a gap where there should have been a vma both before and after
modification. This patchset increases the chances of such tearing,
therefore it's event more important now to test for unexpected
inconsistencies.
Thanks to Paul McKenney who developed a benchmark to test performance
of concurrent reads and updates, we also have data on performance
benefits:
The test has a pair of processes scanning /proc/PID/maps, and another
process unmapping and remapping 4K pages from a 128MB range of anonymous
memory. At the end of each 10-second run, the latency of each mmap()
or munmap() operation is measured, and for each run the maximum and mean
latency is printed. (Yes, the map/unmap process is started first, its
PID is passed to the scanners, and then the map/unmap process waits until
both scanners are running before starting its timed test. The scanners
keep scanning until the specified /proc/PID/maps file disappears.)
In summary, with stock mm, 78% of the runs had maximum latencies in
excess of 0.5 milliseconds, and with more then half of the runs' latencies
exceeding a full millisecond. In contrast, 98% of the runs with Suren's
patch series applied had maximum latencies of less than 0.5 milliseconds.
From a median-performance viewpoint, Suren's series also looks good,
with stock mm weighing in at 13 microseconds and Suren's series at 10
microseconds, better than a 20% improvement.
[1] https://lore.kernel.org/all/20240123231014.3801041-1-surenb@google.com/
Suren Baghdasaryan (8):
selftests/proc: add /proc/pid/maps tearing from vma split test
selftests/proc: extend /proc/pid/maps tearing test to include vma
resizing
selftests/proc: extend /proc/pid/maps tearing test to include vma
remapping
selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently
modified
selftests/proc: add verbose more for tests to facilitate debugging
mm: make vm_area_struct anon_name field RCU-safe
mm/maps: read proc/pid/maps under RCU
mm/maps: execute PROCMAP_QUERY ioctl under RCU
fs/proc/internal.h | 6 +
fs/proc/task_mmu.c | 233 +++++-
include/linux/mm_inline.h | 28 +-
include/linux/mm_types.h | 3 +-
mm/madvise.c | 30 +-
tools/testing/selftests/proc/proc-pid-vm.c | 793 ++++++++++++++++++++-
6 files changed, 1061 insertions(+), 32 deletions(-)
base-commit: 79f35c4125a9a3fd98efeed4cce1cd7ce5311a44
--
2.49.0.805.g082f7c87e0-goog