On 2021-07-30 12:35, Anders Roxell wrote:
From: Robin Murphy robin.murphy@arm.com
Now that PCI inbound window restrictions are handled generically between the of_pci resource parsing and the IOMMU layer, and described in the Juno DT, we can finally enable the PCIe SMMU without the risk of DMA mappings inadvertently allocating unusable addresses.
Similarly, the relevant support for IOMMU mappings for peripheral transfers has been hooked up in the pl330 driver for ages, so we can happily enable the DMA SMMU without that breaking anything either.
Signed-off-by: Robin Murphy robin.murphy@arm.com
When we build a kernel with 64k page size and run the ltp syscalls we sporadically see a kernel crash while doing a mkfs on a connected SATA drive. This is happening every third test run on any juno-r2 device in the lab with the same kernel image (stable-rc 5.13.y, mainline and next) with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing would be knowing what DMA address it was trying to use that went wrong, but IOMMU tracepoints and/or dma-debug are going to generate an crazy amount of data to sift through and try to correlate - having done it before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
Robin.
Here is a snippet of the boot log [1]:
- mkfs -t ext4 /dev/disk/by-id/ata-SanDisk_SDSSDA120G_165192443611
mke2fs 1.43.8 (1-Jan-2018) Discarding device blocks: 4096/29305200 [ 55.344291] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [ 55.351423] ata1.00: irq_stat 0x00020002, failed to transmit command FIS [ 55.358205] ata1.00: failed command: DATA SET MANAGEMENT [ 55.363561] ata1.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 12 dma 512 out [ 55.363561] res ec/ff:00:00:00:00/00:00:00:00:ec/00 Emask 0x12 (ATA bus error) [ 55.378955] ata1.00: status: { Busy } [ 55.382658] ata1.00: error: { ICRC UNC AMNF IDNF ABRT } [ 55.387947] ata1: hard resetting link [ 55.391653] ata1: controller in dubious state, performing PORT_RST [ 57.588447] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 0) [ 57.613471] ata1.00: configured for UDMA/100 [ 57.617866] ata1.00: device reported invalid CHS sector 0 [ 57.623397] ata1: EH complete
When we revert this patch we don't see any issue.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Cheers, Anders [1] https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.13.y/build/v5.13....
On 2021-07-30 13:17, Robin Murphy wrote:
On 2021-07-30 12:35, Anders Roxell wrote:
From: Robin Murphy robin.murphy@arm.com
Now that PCI inbound window restrictions are handled generically between the of_pci resource parsing and the IOMMU layer, and described in the Juno DT, we can finally enable the PCIe SMMU without the risk of DMA mappings inadvertently allocating unusable addresses.
Similarly, the relevant support for IOMMU mappings for peripheral transfers has been hooked up in the pl330 driver for ages, so we can happily enable the DMA SMMU without that breaking anything either.
Signed-off-by: Robin Murphy robin.murphy@arm.com
When we build a kernel with 64k page size and run the ltp syscalls we sporadically see a kernel crash while doing a mkfs on a connected SATA drive. This is happening every third test run on any juno-r2 device in the lab with the same kernel image (stable-rc 5.13.y, mainline and next) with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing would be knowing what DMA address it was trying to use that went wrong, but IOMMU tracepoints and/or dma-debug are going to generate an crazy amount of data to sift through and try to correlate - having done it before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
I did this change, and run the job 7 times and could not reproduce the issue.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */ - dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>, - <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, + dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Cheers, Anders
On 2021-07-30 15:34, Anders Roxell wrote:
On 2021-07-30 13:17, Robin Murphy wrote:
On 2021-07-30 12:35, Anders Roxell wrote:
From: Robin Murphy robin.murphy@arm.com
Now that PCI inbound window restrictions are handled generically between the of_pci resource parsing and the IOMMU layer, and described in the Juno DT, we can finally enable the PCIe SMMU without the risk of DMA mappings inadvertently allocating unusable addresses.
Similarly, the relevant support for IOMMU mappings for peripheral transfers has been hooked up in the pl330 driver for ages, so we can happily enable the DMA SMMU without that breaking anything either.
Signed-off-by: Robin Murphy robin.murphy@arm.com
When we build a kernel with 64k page size and run the ltp syscalls we sporadically see a kernel crash while doing a mkfs on a connected SATA drive. This is happening every third test run on any juno-r2 device in the lab with the same kernel image (stable-rc 5.13.y, mainline and next) with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing would be knowing what DMA address it was trying to use that went wrong, but IOMMU tracepoints and/or dma-debug are going to generate an crazy amount of data to sift through and try to correlate - having done it before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
I did this change, and run the job 7 times and could not reproduce the issue.
Thanks! And hold that thought; if it works then I suspect it probably is the best fix, but I'll double-check and write it up properly next week.
Cheers, Robin.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Cheers, Anders
On Fri, 30 Jul 2021 at 16:44, Robin Murphy robin.murphy@arm.com wrote:
On 2021-07-30 15:34, Anders Roxell wrote:
On 2021-07-30 13:17, Robin Murphy wrote:
On 2021-07-30 12:35, Anders Roxell wrote:
From: Robin Murphy robin.murphy@arm.com
Now that PCI inbound window restrictions are handled generically between the of_pci resource parsing and the IOMMU layer, and described in the Juno DT, we can finally enable the PCIe SMMU without the risk of DMA mappings inadvertently allocating unusable addresses.
Similarly, the relevant support for IOMMU mappings for peripheral transfers has been hooked up in the pl330 driver for ages, so we can happily enable the DMA SMMU without that breaking anything either.
Signed-off-by: Robin Murphy robin.murphy@arm.com
When we build a kernel with 64k page size and run the ltp syscalls we sporadically see a kernel crash while doing a mkfs on a connected SATA drive. This is happening every third test run on any juno-r2 device in the lab with the same kernel image (stable-rc 5.13.y, mainline and next) with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing would be knowing what DMA address it was trying to use that went wrong, but IOMMU tracepoints and/or dma-debug are going to generate an crazy amount of data to sift through and try to correlate - having done it before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
I did this change, and run the job 7 times and could not reproduce the issue.
Thanks! And hold that thought; if it works then I suspect it probably is the best fix, but I'll double-check and write it up properly next week.
Thank you Robin.
Cheers, Anders
Cheers, Robin.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Cheers, Anders
On Fri, 30 Jul 2021 at 16:44, Robin Murphy robin.murphy@arm.com wrote:
On 2021-07-30 15:34, Anders Roxell wrote:
On 2021-07-30 13:17, Robin Murphy wrote:
On 2021-07-30 12:35, Anders Roxell wrote:
From: Robin Murphy robin.murphy@arm.com
Now that PCI inbound window restrictions are handled generically between the of_pci resource parsing and the IOMMU layer, and described in the Juno DT, we can finally enable the PCIe SMMU without the risk of DMA mappings inadvertently allocating unusable addresses.
Similarly, the relevant support for IOMMU mappings for peripheral transfers has been hooked up in the pl330 driver for ages, so we can happily enable the DMA SMMU without that breaking anything either.
Signed-off-by: Robin Murphy robin.murphy@arm.com
When we build a kernel with 64k page size and run the ltp syscalls we sporadically see a kernel crash while doing a mkfs on a connected SATA drive. This is happening every third test run on any juno-r2 device in the lab with the same kernel image (stable-rc 5.13.y, mainline and next) with gcc-11.
Hmm, I guess 64K pages might make a difference in that we'll chew through IOVA space a lot faster with small mappings...
I'll have to try to reproduce this locally, since the interesting thing would be knowing what DMA address it was trying to use that went wrong, but IOMMU tracepoints and/or dma-debug are going to generate an crazy amount of data to sift through and try to correlate - having done it before it's not something I'd readily ask someone else to do for me :)
On a hunch, though, does it make any difference if you remove the first entry from the PCIe "dma-ranges" (the 0x2c1c0000 one)?
I did this change, and run the job 7 times and could not reproduce the issue.
Thanks! And hold that thought; if it works then I suspect it probably is the best fix, but I'll double-check and write it up properly next week.
I just want to send a friendly reminder to this issue, since I haven't seen a patch for this. We still see the issue on v5.13.y and above.
Or have I missed anything?
Cheers, Anders
Cheers, Robin.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Cheers, Anders
Hi Robin,
Since we did not get a reply on this email thread. and those intermittent failures are causing a lot of noise in reports summary. We will wait one more week and stop running 64k page size testing on Juno-r2 devices.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Reference email thread, https://lore.kernel.org/stable/0a1d437d-9ea0-de83-3c19-e07f560ad37c@arm.com/
- Naresh
On Mon, Feb 14, 2022 at 07:36:00PM +0530, Naresh Kamboju wrote:
Hi Robin,
Since we did not get a reply on this email thread. and those intermittent failures are causing a lot of noise in reports summary. We will wait one more week and stop running 64k page size testing on Juno-r2 devices.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Reference email thread, https://lore.kernel.org/stable/0a1d437d-9ea0-de83-3c19-e07f560ad37c@arm.com/
I was about to tag the fix for this and was just reading this thread. I will send the pull request soon. Sorry for the delay, it is in next for some time now. Are you seeing the issue even in linux-next ?
On Mon, 14 Feb 2022 at 19:43, Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Feb 14, 2022 at 07:36:00PM +0530, Naresh Kamboju wrote:
Hi Robin,
Since we did not get a reply on this email thread. and those intermittent failures are causing a lot of noise in reports summary. We will wait one more week and stop running 64k page size testing on Juno-r2 devices.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Reference email thread, https://lore.kernel.org/stable/0a1d437d-9ea0-de83-3c19-e07f560ad37c@arm.com/
I was about to tag the fix for this and was just reading this thread. I will send the pull request soon. Sorry for the delay, it is in next for some time now. Are you seeing the issue even in linux-next ?
Due to load balance / test queue maintenance on Juno-r2 devices, We have stopped running 64k page testing on mainline and next instead running on stable-rc builds.
Allow me a day to test Linux next 64k page size build testing on Juno-r2 and get back to you.
- Naresh
Hi Sudeep,
On Mon, 14 Feb 2022 at 20:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Mon, 14 Feb 2022 at 19:43, Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Feb 14, 2022 at 07:36:00PM +0530, Naresh Kamboju wrote:
Hi Robin,
Since we did not get a reply on this email thread. and those intermittent failures are causing a lot of noise in reports summary. We will wait one more week and stop running 64k page size testing on Juno-r2 devices.
diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi index 8e7a66943b01..d3148730e951 100644 --- a/arch/arm64/boot/dts/arm/juno-base.dtsi +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; /* Standard AXI Translation entries as programmed by EDK2 */
dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>,
<0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>,
dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; #interrupt-cells = <1>; interrupt-map-mask = <0 0 0 7>;
Reference email thread, https://lore.kernel.org/stable/0a1d437d-9ea0-de83-3c19-e07f560ad37c@arm.com/
I was about to tag the fix for this and was just reading this thread. I will send the pull request soon. Sorry for the delay, it is in next for some time now. Are you seeing the issue even in linux-next ?
I have tested Linux next arm64 64k page size builds on Juno-r2 and confirm that the reported issue is fixed now.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
- Naresh Kamboju
-- Linaro LKFT https://lkft.linaro.org
On Wed, Feb 16, 2022 at 05:02:53PM +0530, Naresh Kamboju wrote:
Hi Sudeep,
On Mon, 14 Feb 2022 at 20:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Mon, 14 Feb 2022 at 19:43, Sudeep Holla sudeep.holla@arm.com wrote:
On Mon, Feb 14, 2022 at 07:36:00PM +0530, Naresh Kamboju wrote:
Hi Robin,
Since we did not get a reply on this email thread. and those intermittent failures are causing a lot of noise in reports summary. We will wait one more week and stop running 64k page size testing on Juno-r2 devices.
> diff --git a/arch/arm64/boot/dts/arm/juno-base.dtsi b/arch/arm64/boot/dts/arm/juno-base.dtsi > index 8e7a66943b01..d3148730e951 100644 > --- a/arch/arm64/boot/dts/arm/juno-base.dtsi > +++ b/arch/arm64/boot/dts/arm/juno-base.dtsi > @@ -545,8 +545,7 @@ pcie_ctlr: pcie@40000000 { > <0x02000000 0x00 0x50000000 0x00 0x50000000 0x0 0x08000000>, > <0x42000000 0x40 0x00000000 0x40 0x00000000 0x1 0x00000000>; > /* Standard AXI Translation entries as programmed by EDK2 */ > - dma-ranges = <0x02000000 0x0 0x2c1c0000 0x0 0x2c1c0000 0x0 0x00040000>, > - <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, > + dma-ranges = <0x02000000 0x0 0x80000000 0x0 0x80000000 0x0 0x80000000>, > <0x43000000 0x8 0x00000000 0x8 0x00000000 0x2 0x00000000>; > #interrupt-cells = <1>; > interrupt-map-mask = <0 0 0 7>; >
Reference email thread, https://lore.kernel.org/stable/0a1d437d-9ea0-de83-3c19-e07f560ad37c@arm.com/
I was about to tag the fix for this and was just reading this thread. I will send the pull request soon. Sorry for the delay, it is in next for some time now. Are you seeing the issue even in linux-next ?
I have tested Linux next arm64 64k page size builds on Juno-r2 and confirm that the reported issue is fixed now.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Thanks for testing. I have already sent the pull request to Arnd yesterday.
linux-stable-mirror@lists.linaro.org