A hds-thresh value is not set correctly if input value is 0.
The cause is that ethtool_ringparam_get_cfg(), which is a internal
function that returns ringparameters from both ->get_ringparam() and
dev->cfg can't return a correct hds-thresh value.
The first patch fixes ethtool_ringparam_get_cfg() to set hds-thresh
value correcltly.
The second patch adds random test for hds-thresh value.
So that we can test 0 value for a hds-thresh properly.
v2:
- Skips set_hds_thresh_random test when hds-thresh-max value is too
small. (2/2)
- Change random range from 1-MAX to 1-(MAX-1). (2/2)
Taehee Yoo (2):
net: ethtool: fix ethtool_ringparam_get_cfg() returns a hds_thresh
value always as 0.
selftests: drv-net: test random value for hds-thresh
net/ethtool/common.c | 1 +
tools/testing/selftests/drivers/net/hds.py | 33 +++++++++++++++++++++-
2 files changed, 33 insertions(+), 1 deletion(-)
--
2.34.1
Bingbu reported an issue in [1] that udmabuf vmap failed and in [2], we
discussed the scenario of folio vmap due to the misuse of vmap_pfn
in udmabuf.
We reached the conclusion that vmap_pfn prohibits the use of page-based
PFNs:
Christoph Hellwig : 'No, vmap_pfn is entirely for memory not backed by
pages or folios, i.e. PCIe BARs and similar memory. This must not be
mixed with proper folio backed memory.'
But udmabuf still need consider HVO based folio's vmap, and need fix
vmap issue. This RFC code want to show the two point that I mentioned
in [2], and more deep talk it:
Point1. simple copy vmap_pfn code, don't bother common vmap_pfn, use by
itself and remove pfn_valid check.
Point2. implement folio array based vmap(vmap_folios), which can given a
range of each folio(offset, nr_pages), so can suit HVO folio's vmap.
Patch 1-2 implement point1, and add a test simple set in udmabuf driver.
Patch 3-5 implement point2, also can test it.
Kasireddy also show that 'another option is to just limit udmabuf's vmap()
to only shmem folios'(This I guess folio_test_hugetlb_vmemmap_optimized
can help.)
But I prefer point2 to solution this issue, and IMO, folio based vmap still
need.
Compare to page based vmap(or pfn based), we need split each large folio
into single page struct, this need more large array struct and more longer
iter. If each tail page struct not exist(like HVO), can only use pfn vmap,
but there are no common api to do this.
In [2], we talked that udmabuf can use hugetlb as the memory
provider, and can give a range use. So if HVO used in hugetlb, each folio's
tail page may freed, so we can't use page based vmap, only can use pfn
based, which show in point1.
Further more, Folio based vmap only need record each folio(and offset,
nr_pages if range need). For 20MB vmap, page based need 5120 pages(40KB),
2MB folios only need 10 folio(80Byte).
Matthew show that Vishal also offered a folio based vmap - vmap_file[3].
This RFC patch want a range based folio, not only a full folio's map(like
file's folio), to resolve some problem like HVO's range folio vmap.
Please give me more suggestion.
Test case:
//enable/disable HVO
1. echo [1|0] > /proc/sys/vm/hugetlb_optimize_vmemmap
//prepare HUGETLB
2. echo 10 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
3. ./udmabuf_vmap
4. check output, and dmesg if any warn.
[1] https://lore.kernel.org/all/9172a601-c360-0d5b-ba1b-33deba430455@linux.inte…
[2] https://lore.kernel.org/lkml/20250312061513.1126496-1-link@vivo.com/
[3] https://lore.kernel.org/linux-mm/20250131001806.92349-1-vishal.moola@gmail.…
Huan Yang (6):
udmabuf: try fix udmabuf vmap
udmabuf: try udmabuf vmap test
mm/vmalloc: try add vmap folios range
udmabuf: use vmap_range_folios
udmabuf: vmap test suit for pages and pfns compare
udmabuf: remove no need code
drivers/dma-buf/udmabuf.c | 29 +++++++++-----------
include/linux/vmalloc.h | 57 +++++++++++++++++++++++++++++++++++++++
mm/vmalloc.c | 47 ++++++++++++++++++++++++++++++++
3 files changed, 117 insertions(+), 16 deletions(-)
--
2.48.1
Hi!
The Cirrus tests keep failing for me when run on x86
./tools/testing/kunit/kunit.py run --alltests --json --arch=x86_64
https://netdev-3.bots.linux.dev/kunit/results/60103/stdout
It seems like new cases continue to appear and we have to keep adding
them to the local ignored list. Is it possible to get these fixed or
can we exclude the cirrus tests from KUNIT_ALL_TESTS?
If no explicit XARCH is specified, use the toolchains default.
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
---
Thomas Weißschuh (2):
selftests/nolibc: drop dependency from sysroot to defconfig
selftests/nolibc: only consider XARCH for CFLAGS when requested
tools/testing/selftests/nolibc/Makefile | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
---
base-commit: 9a9b20007ab833c1aa3791efcfdf67e7e3ea8902
change-id: 20250330-nolibc-nolibc-test-native-6d4d84d764eb
Best regards,
--
Thomas Weißschuh <linux(a)weissschuh.net>
The test_memcontrol selftest consistently fails its test_memcg_low
sub-test due to the fact that two of its test child cgroups which
have a memmory.low of 0 or an effective memory.low of 0 still have low
events generated for them since mem_cgroup_below_low() use the ">="
operator when comparing to elow.
The two failed use cases are as follows:
1) memory.low is set to 0, but low events can still be triggered and
so the cgroup may have a non-zero low event count. I doubt users are
looking for that as they didn't set memory.low at all.
2) memory.low is set to a non-zero value but the cgroup has no task in
it so that it has an effective low value of 0. Again it may have a
non-zero low event count if memory reclaim happens. This is probably
not a result expected by the users and it is really doubtful that
users will check an empty cgroup with no task in it and expecting
some non-zero event counts.
The simple and naive fix of changing the operator to ">", however,
changes the memory reclaim behavior which can lead to other failures
as low events are needed to facilitate memory reclaim. So we can't do
that without some relatively riskier changes in memory reclaim.
Another simpler alternative is to avoid reporting below_low failure
if either memory.low or its effective equivalent is 0 which is done
by this patch specifically for the two failed use cases above.
With this patch applied, the test_memcg_low sub-test finishes
successfully without failure in most cases. Though both test_memcg_low
and test_memcg_min sub-tests may still fail occasionally if the
memory.current values fall outside of the expected ranges.
To be consistent, similar change is appled to mem_cgroup_below_min()
as to avoid the two failed use cases above with low replaced by min.
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
include/linux/memcontrol.h | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 53364526d877..4d4a1f159eaa 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -601,21 +601,31 @@ static inline bool mem_cgroup_unprotected(struct mem_cgroup *target,
static inline bool mem_cgroup_below_low(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long elow;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.elow) >=
- page_counter_read(&memcg->memory);
+ elow = READ_ONCE(memcg->memory.elow);
+ if (!elow || !READ_ONCE(memcg->memory.low))
+ return false;
+
+ return page_counter_read(&memcg->memory) <= elow;
}
static inline bool mem_cgroup_below_min(struct mem_cgroup *target,
struct mem_cgroup *memcg)
{
+ unsigned long emin;
+
if (mem_cgroup_unprotected(target, memcg))
return false;
- return READ_ONCE(memcg->memory.emin) >=
- page_counter_read(&memcg->memory);
+ emin = READ_ONCE(memcg->memory.emin);
+ if (!emin || !READ_ONCE(memcg->memory.min))
+ return false;
+
+ return page_counter_read(&memcg->memory) <= emin;
}
int __mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, gfp_t gfp);
--
2.48.1
┌────────────┐ ┌───────────────────────────────────┐ ┌────────────────┐
│ │ │ │ │ │
│ │ │ PCI Endpoint │ │ PCI Host │
│ │ │ │ │ │
│ │◄──┤ 1.platform_msi_domain_alloc_irqs()│ │ │
│ │ │ │ │ │
│ MSI ├──►│ 2.write_msi_msg() ├──►├─BAR<n> │
│ Controller │ │ update doorbell register address│ │ │
│ │ │ for BAR │ │ │
│ │ │ │ │ 3. Write BAR<n>│
│ │◄──┼───────────────────────────────────┼───┤ │
│ │ │ │ │ │
│ ├──►│ 4.Irq Handle │ │ │
│ │ │ │ │ │
│ │ │ │ │ │
└────────────┘ └───────────────────────────────────┘ └────────────────┘
This patches based on old https://lore.kernel.org/imx/20221124055036.1630573-1-Frank.Li@nxp.com/
Original patch only target to vntb driver. But actually it is common
method.
This patches add new API to pci-epf-core, so any EP driver can use it.
Previous v2 discussion here.
https://lore.kernel.org/imx/20230911220920.1817033-1-Frank.Li@nxp.com/
Changes in v16:
- remove arm64: dts: imx95-19x19-evk: Add PCIe1 endpoint function overlay file
because there are better patches, which under review.
- Add document for pcie-ep msi-map usage
- other change to see each patch's change log
About IMMUTABLE (No change for this part, tglx provide feedback)
> - This IMMUTABLE thing serves no purpose, because you don't randomly
> plug this end-point block on any MSI controller. They come as part
> of an SoC.
"Yes and no. The problem is that the EP implementation is meant to be a
generic library and while GIC-ITS guarantees immutability of the
address/data pair after setup, there are architectures (x86, loongson,
riscv) where the base MSI controller does not and immutability is only
achieved when interrupt remapping is enabled. The latter can be disabled
at boot-time and then the EP implementation becomes a lottery across
affinity changes.
That was my concern about this library implementation and that's why I
asked for a mechanism to ensure that the underlying irqdomain provides a
immutable address/data pair.
So it does not matter for GIC-ITS, but in the larger picture it matters.
Thanks,
tglx
"
So it does not matter for GIC-ITS, but in the larger picture it matters.
- Link to v15: https://lore.kernel.org/r/20250211-ep-msi-v15-0-bcacc1f2b1a9@nxp.com
Changes in v15:
- rebase to v6.14-rc1
- fix build issue find by kernel test robot
- Link to v14: https://lore.kernel.org/r/20250207-ep-msi-v14-0-9671b136f2b8@nxp.com
Changes in v14:
Marc Zyngier raised concerns about adding DOMAIN_BUS_DEVICE_PCI_EP_MSI. As
a result, the approach has been reverted to the v9 method. However, there
are several improvements:
MSI now supports msi-map in addition to msi-parent.
- The struct device: id is used as the endpoint function (EPF) device
identity to map to the stream ID (sideband information).
- The EPC device tree source (DTS) utilizes msi-map to provide such
information.
- The EPF device's of_node is set to the EPC controller’s node. This
approach is commonly used for multi-function device (MFD) platform child
devices, allowing them to inherit properties from the MFD device’s DTS,
such as reset-cells and gpio-cells. This method is well-suited for the
current case, as the EPF is inherently created/binded to the EPC and
should inherit the EPC’s DTS node properties.
Additionally:
Since the basic IMX95 LUT support has already been merged into the
mainline, a DTS and driver increment patch is added to complete the
solution. The patch is rebased onto the latest linux-next tree and
aligned with the new pcitest framework.
- Link to v13: https://lore.kernel.org/r/20241218-ep-msi-v13-0-646e2192dc24@nxp.com
Changes in v13:
- Change to use DOMAIN_BUS_PCI_DEVICE_EP_MSI
- Change request id as func | vfunc << 3
- Remove IRQ_DOMAIN_MSI_IMMUTABLE
Thomas Gleixner:
I hope capture all your points in review comments. If missed, let me know.
- Link to v12: https://lore.kernel.org/r/20241211-ep-msi-v12-0-33d4532fa520@nxp.com
Changes in v12:
- Change to use IRQ_DOMAIN_MSI_IMMUTABLE and add help function
irq_domain_msi_is_immuatble().
- split PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check to 3 patches
- Link to v11: https://lore.kernel.org/r/20241209-ep-msi-v11-0-7434fa8397bd@nxp.com
Changes in v11:
- Change to use MSI_FLAG_MSG_IMMUTABLE
- Link to v10: https://lore.kernel.org/r/20241204-ep-msi-v10-0-87c378dbcd6d@nxp.com
Changes in v10:
Thomas Gleixner:
There are big change in pci-ep-msi.c. I am sure if go on the
corrent path. The key improvement is remove only 1 function devices's
limitation.
I use new patch for imutable check, which relative additional
feature compared to base enablement patch.
- Remove patch Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Add new patch irqchip/gic-v3-its: Avoid overwriting msi_prepare callback if provided by msi_domain_info
- Remove only support 1 endpoint function limiation.
- Create one MSI domain for each endpoint function devices.
- Use "msi-map" in pci ep controler node, instead of of msi-parent. first
argument is
(func_no << 8 | vfunc_no)
- Link to v9: https://lore.kernel.org/r/20241203-ep-msi-v9-0-a60dbc3f15dd@nxp.com
Changes in v9
- Add patch platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
- Remove patch PCI: endpoint: Add pci_epc_get_fn() API for customizable filtering
- Remove API pci_epf_align_inbound_addr_lo_hi
- Move doorbell_alloc in to doorbell_enable function.
- Link to v8: https://lore.kernel.org/r/20241116-ep-msi-v8-0-6f1f68ffd1bb@nxp.com
Changes in v8:
- update helper function name to pci_epf_align_inbound_addr()
- Link to v7: https://lore.kernel.org/r/20241114-ep-msi-v7-0-d4ac7aafbd2c@nxp.com
Changes in v7:
- Add helper function pci_epf_align_addr();
- Link to v6: https://lore.kernel.org/r/20241112-ep-msi-v6-0-45f9722e3c2a@nxp.com
Changes in v6:
- change doorbell_addr to doorbell_offset
- use round_down()
- add Niklas's test by tag
- rebase to pci/endpoint
- Link to v5: https://lore.kernel.org/r/20241108-ep-msi-v5-0-a14951c0d007@nxp.com
Changes in v5:
- Move request_irq to epf test function driver for more flexiable user case
- Add fixed size bar handler
- Some minor improvememtn to see each patches's changelog.
- Link to v4: https://lore.kernel.org/r/20241031-ep-msi-v4-0-717da2d99b28@nxp.com
Changes in v4:
- Remove patch genirq/msi: Add cleanup guard define for msi_lock_descs()/msi_unlock_descs()
- Use new method to avoid compatible problem.
Add new command DOORBELL_ENABLE and DOORBELL_DISABLE.
pcitest -B send DOORBELL_ENABLE first, EP test function driver try to
remap one of BAR_N (except test register bar) to ITS MSI MMIO space. Old
driver don't support new command, so failure return, not side effect.
After test, DOORBELL_DISABLE command send out to recover original map, so
pcitest bar test can pass as normal.
- Other detail change see each patches's change log
- Link to v3: https://lore.kernel.org/r/20241015-ep-msi-v3-0-cedc89a16c1a@nxp.com
Change from v2 to v3
- Fixed manivannan's comments
- Move common part to pci-ep-msi.c and pci-ep-msi.h
- rebase to 6.12-rc1
- use RevID to distingiush old version
mkdir /sys/kernel/config/pci_ep/functions/pci_epf_test/func1
echo 16 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/msi_interrupts
echo 0x080c > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/deviceid
echo 0x1957 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/vendorid
echo 1 > /sys/kernel/config/pci_ep/functions/pci_epf_test/func1/revid
^^^^^^ to enable platform msi support.
ln -s /sys/kernel/config/pci_ep/functions/pci_epf_test/func1 /sys/kernel/config/pci_ep/controllers/4c380000.pcie-ep
- use new device ID, which identify support doorbell to avoid broken
compatility.
Enable doorbell support only for PCI_DEVICE_ID_IMX8_DB, while other devices
keep the same behavior as before.
EP side RC with old driver RC with new driver
PCI_DEVICE_ID_IMX8_DB no probe doorbell enabled
Other device ID doorbell disabled* doorbell disabled*
* Behavior remains unchanged.
Change from v1 to v2
- Add missed patch for endpont/pci-epf-test.c
- Move alloc and free to epc driver from epf.
- Provide general help function for EPC driver to alloc platform msi irq.
- Fixed manivannan's comments.
Signed-off-by: Frank Li <Frank.Li(a)nxp.com>
---
Frank Li (15):
platform-msi: Add msi_remove_device_irq_domain() in platform_device_msi_free_irqs_all()
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_IMMUTABLE and irq_domain_is_msi_immutable()
irqchip/gic-v3-its: Set IRQ_DOMAIN_FLAG_MSI_IMMUTABLE for ITS
dt-bindings: pci: pci-msi: Add support for PCI Endpoint msi-map
irqchip/gic-v3-its: Add support for device tree msi-map and msi-mask
PCI: endpoint: Set ID and of_node for function driver
PCI: endpoint: Add RC-to-EP doorbell support using platform MSI controller
PCI: endpoint: pci-ep-msi: Add MSI address/data pair mutable check
PCI: endpoint: Add pci_epf_align_inbound_addr() helper for address alignment
PCI: endpoint: pci-epf-test: Add doorbell test support
misc: pci_endpoint_test: Add doorbell test case
selftests: pci_endpoint: Add doorbell test case
pci: imx6: Add helper function imx_pcie_add_lut_by_rid()
pci: imx6: Add LUT setting for MSI/IOMMU in Endpoint mode
arm64: dts: imx95: Add msi-map for pci-ep device
Documentation/devicetree/bindings/pci/pci-msi.txt | 51 ++++++++
arch/arm64/boot/dts/freescale/imx95.dtsi | 1 +
drivers/base/platform-msi.c | 1 +
drivers/irqchip/irq-gic-v3-its-msi-parent.c | 8 ++
drivers/irqchip/irq-gic-v3-its.c | 2 +-
drivers/misc/pci_endpoint_test.c | 82 ++++++++++++
drivers/pci/controller/dwc/pci-imx6.c | 25 ++--
drivers/pci/endpoint/Makefile | 1 +
drivers/pci/endpoint/functions/pci-epf-test.c | 142 +++++++++++++++++++++
drivers/pci/endpoint/pci-ep-msi.c | 90 +++++++++++++
drivers/pci/endpoint/pci-epf-core.c | 48 +++++++
include/linux/irqdomain.h | 7 +
include/linux/pci-ep-msi.h | 28 ++++
include/linux/pci-epf.h | 21 +++
include/uapi/linux/pcitest.h | 1 +
.../selftests/pci_endpoint/pci_endpoint_test.c | 28 ++++
16 files changed, 527 insertions(+), 9 deletions(-)
---
base-commit: a4949bd40778aa9beac77c89e4c6a1da52875c8b
change-id: 20241010-ep-msi-8b4cab33b1be
Best regards,
---
Frank Li <Frank.Li(a)nxp.com>
Currently if the filesystem for the cgroups version it wants to use is
not mounted charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh
tests will attempt to mount it on the hard coded path
/dev/cgroup/memory, deleting that directory when the test finishes. This
will fail if there is not a preexisting directory at that path, and
since the directory is deleted subsequent runs of the test will fail.
Instead of relying on this hard coded directory name use mktemp to
generate a temporary directory to use as a mountpoint, fixing both the
assumption and the disruption caused by deleting a preexisting
directory.
This means that if the relevant cgroup filesystem is not already mounted
then we rely on having coreutils (which provides mktemp) installed. I
suspect that many current users are relying on having things automounted
by default, and given that the script relies on bash it's probably not
an unreasonable requirement.
Fixes: 209376ed2a84 ("selftests/vm: make charge_reserved_hugetlb.sh work with existing cgroup setting")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
tools/testing/selftests/mm/charge_reserved_hugetlb.sh | 4 ++--
tools/testing/selftests/mm/hugetlb_reparenting_test.sh | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
index 67df7b47087f..e1fe16bcbbe8 100755
--- a/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
+++ b/tools/testing/selftests/mm/charge_reserved_hugetlb.sh
@@ -29,7 +29,7 @@ fi
if [[ $cgroup2 ]]; then
cgroup_path=$(mount -t cgroup2 | head -1 | awk '{print $3}')
if [[ -z "$cgroup_path" ]]; then
- cgroup_path=/dev/cgroup/memory
+ cgroup_path=$(mktemp -d)
mount -t cgroup2 none $cgroup_path
do_umount=1
fi
@@ -37,7 +37,7 @@ if [[ $cgroup2 ]]; then
else
cgroup_path=$(mount -t cgroup | grep ",hugetlb" | awk '{print $3}')
if [[ -z "$cgroup_path" ]]; then
- cgroup_path=/dev/cgroup/memory
+ cgroup_path=$(mktemp -d)
mount -t cgroup memory,hugetlb $cgroup_path
do_umount=1
fi
diff --git a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
index 11f9bbe7dc22..0b0d4ba1af27 100755
--- a/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
+++ b/tools/testing/selftests/mm/hugetlb_reparenting_test.sh
@@ -23,7 +23,7 @@ fi
if [[ $cgroup2 ]]; then
CGROUP_ROOT=$(mount -t cgroup2 | head -1 | awk '{print $3}')
if [[ -z "$CGROUP_ROOT" ]]; then
- CGROUP_ROOT=/dev/cgroup/memory
+ CGROUP_ROOT=$(mktemp -d)
mount -t cgroup2 none $CGROUP_ROOT
do_umount=1
fi
---
base-commit: a4cda136f021ad44b8b52286aafd613030a6db5f
change-id: 20250403-kselftest-mm-cgroup2-detection-b761fd232f9d
Best regards,
--
Mark Brown <broonie(a)kernel.org>