A recent change affecting the behaviour of phys_to_dma() to actually require the device tree ranges to work unmasked a bug in the Integrator DMA ranges.
The PL110 uses the CMA allocator to obtain coherent allocations from a dedicated 1MB video memory, leading to the following call chain:
drm_gem_cma_create() dma_alloc_attrs() dma_alloc_from_dev_coherent() __dma_alloc_from_coherent() dma_get_device_base() phys_to_dma() translate_phys_to_dma()
phys_to_dma() by way of translate_phys_to_dma() will nowadays not provide 1:1 mappings unless the ranges are properly defined in the device tree and reflected into the dev->dma_range_map.
There is a bug in the device trees because the DMA ranges are incorrectly specified, and the patch uncovers this bug.
Solution:
- Fix the LB (logic bus) ranges to be 1-to-1 like they should have always been. - Provide a 1:1 dma-ranges attribute to the PL110. - Mark the PL110 display controller as DMA coherent.
This makes the DMA ranges work right and makes the PL110 framebuffer work again.
Cc: Christoph Hellwig hch@lst.de Cc: Arnd Bergmann arnd@arndb.de Cc: stable@vger.kernel.org Fixes: ae626eb97376 ("ARM/dma-mapping: use dma-direct unconditionally") Signed-off-by: Linus Walleij linus.walleij@linaro.org --- SoC folks: please apply this directly for fixes if it seems to be the right thing to do. --- arch/arm/boot/dts/integratorap-im-pd1.dts | 2 ++ arch/arm/boot/dts/integratorap.dts | 8 ++++---- 2 files changed, 6 insertions(+), 4 deletions(-)
diff --git a/arch/arm/boot/dts/integratorap-im-pd1.dts b/arch/arm/boot/dts/integratorap-im-pd1.dts index 31724753d3f3..ecccbd1777a3 100644 --- a/arch/arm/boot/dts/integratorap-im-pd1.dts +++ b/arch/arm/boot/dts/integratorap-im-pd1.dts @@ -248,6 +248,8 @@ display@1000000 { /* 640x480 16bpp @ 25.175MHz is 36827428 bytes/s */ max-memory-bandwidth = <40000000>; memory-region = <&impd1_ram>; + dma-ranges; + dma-coherent;
port@0 { #address-cells = <1>; diff --git a/arch/arm/boot/dts/integratorap.dts b/arch/arm/boot/dts/integratorap.dts index c983435ed492..9148287fa0a9 100644 --- a/arch/arm/boot/dts/integratorap.dts +++ b/arch/arm/boot/dts/integratorap.dts @@ -262,7 +262,7 @@ bus@c0000000 { lm0: bus@c0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xc0000000 0x10000000>; - dma-ranges = <0x00000000 0x80000000 0x10000000>; + dma-ranges = <0x00000000 0xc0000000 0x10000000>; reg = <0xc0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>; @@ -270,7 +270,7 @@ lm0: bus@c0000000 { lm1: bus@d0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xd0000000 0x10000000>; - dma-ranges = <0x00000000 0x80000000 0x10000000>; + dma-ranges = <0x00000000 0xd0000000 0x10000000>; reg = <0xd0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>; @@ -278,7 +278,7 @@ lm1: bus@d0000000 { lm2: bus@e0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xe0000000 0x10000000>; - dma-ranges = <0x00000000 0x80000000 0x10000000>; + dma-ranges = <0x00000000 0xe0000000 0x10000000>; reg = <0xe0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>; @@ -286,7 +286,7 @@ lm2: bus@e0000000 { lm3: bus@f0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xf0000000 0x10000000>; - dma-ranges = <0x00000000 0x80000000 0x10000000>; + dma-ranges = <0x00000000 0xf0000000 0x10000000>; reg = <0xf0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>;
On Fri, Sep 23, 2022, at 1:38 PM, Linus Walleij wrote:
A recent change affecting the behaviour of phys_to_dma() to actually require the device tree ranges to work unmasked a bug in the Integrator DMA ranges.
The PL110 uses the CMA allocator to obtain coherent allocations from a dedicated 1MB video memory, leading to the following call chain:
drm_gem_cma_create() dma_alloc_attrs() dma_alloc_from_dev_coherent() __dma_alloc_from_coherent() dma_get_device_base() phys_to_dma() translate_phys_to_dma()
phys_to_dma() by way of translate_phys_to_dma() will nowadays not provide 1:1 mappings unless the ranges are properly defined in the device tree and reflected into the dev->dma_range_map.
I don't understand this yet, what did the kernel previously do to that allowed the correct DMA mapping when a wrong address was set in the DT?
There is a bug in the device trees because the DMA ranges are incorrectly specified, and the patch uncovers this bug.
Solution:
- Fix the LB (logic bus) ranges to be 1-to-1 like they should have always been.
- Provide a 1:1 dma-ranges attribute to the PL110.
- Mark the PL110 display controller as DMA coherent.
Are you sure the actually coherent? I'm not aware of any other ARM9 based SoC with cache-coherent DMA. What is the DMA master that accesses
diff --git a/arch/arm/boot/dts/integratorap-im-pd1.dts b/arch/arm/boot/dts/integratorap-im-pd1.dts index 31724753d3f3..ecccbd1777a3 100644 --- a/arch/arm/boot/dts/integratorap-im-pd1.dts +++ b/arch/arm/boot/dts/integratorap-im-pd1.dts @@ -248,6 +248,8 @@ display@1000000 { /* 640x480 16bpp @ 25.175MHz is 36827428 bytes/s */ max-memory-bandwidth = <40000000>; memory-region = <&impd1_ram>;
dma-ranges;
dma-coherent;
port@0 { #address-cells = <1>;
Which device is the actual DMA master here? The "dma-coherent" property sets the pl110 as coherent, but the dma-ranges property would refer to the port, right?
diff --git a/arch/arm/boot/dts/integratorap.dts b/arch/arm/boot/dts/integratorap.dts index c983435ed492..9148287fa0a9 100644 --- a/arch/arm/boot/dts/integratorap.dts +++ b/arch/arm/boot/dts/integratorap.dts @@ -262,7 +262,7 @@ bus@c0000000 { lm0: bus@c0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xc0000000 0x10000000>;
dma-ranges = <0x00000000 0x80000000 0x10000000>;
dma-ranges = <0x00000000 0xc0000000 0x10000000>;
In your description, you say that you set a 1:1 map, but this is not what you seem to be setting here. Instead you map DMA address zero to point to the beginning of RAM.
Arnd
On Fri, Sep 23, 2022 at 1:58 PM Arnd Bergmann arnd@arndb.de wrote:
On Fri, Sep 23, 2022, at 1:38 PM, Linus Walleij wrote:
A recent change affecting the behaviour of phys_to_dma() to actually require the device tree ranges to work unmasked a bug in the Integrator DMA ranges.
The PL110 uses the CMA allocator to obtain coherent allocations from a dedicated 1MB video memory, leading to the following call chain:
drm_gem_cma_create() dma_alloc_attrs() dma_alloc_from_dev_coherent() __dma_alloc_from_coherent() dma_get_device_base() phys_to_dma() translate_phys_to_dma()
phys_to_dma() by way of translate_phys_to_dma() will nowadays not provide 1:1 mappings unless the ranges are properly defined in the device tree and reflected into the dev->dma_range_map.
I don't understand this yet, what did the kernel previously do to that allowed the correct DMA mapping when a wrong address was set in the DT?
Ah this goes way back. I think I need a different fixes tag. I hadn't tested this platform in a while.
What happens currently in translate_phys_to_dma() is that it returns DMA_MAPPING_ERROR (0xffffffff) because of the buggy device tree and that causes the problem.
Before the dma direct rework we selected ARCH_HAS_PHYS_TO_DMA and arch/arm/include/asm/dma-direct.h did this:
static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn) { if (dev) pfn -= dev->dma_pfn_offset; return (dma_addr_t)__pfn_to_bus(pfn); } (...) static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr) { unsigned int offset = paddr & ~PAGE_MASK; return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset; }
As dma_pfn_offset was 0 this resulted in a 1:1 map.
The actual switchover from this code happens in commit af6f23b88e9508f1cb8d0af008bb175019428f73 "ARM/dma-mapping: use the generic versions of dma_to_phys/phys_to_dma by default"
I'm not sure which commit is the appropriate for Fixes: really.
There is a bug in the device trees because the DMA ranges are incorrectly specified, and the patch uncovers this bug.
Solution:
- Fix the LB (logic bus) ranges to be 1-to-1 like they should have always been.
- Provide a 1:1 dma-ranges attribute to the PL110.
- Mark the PL110 display controller as DMA coherent.
Are you sure the actually coherent? I'm not aware of any other ARM9 based SoC with cache-coherent DMA. What is the DMA master that accesses
No scratch that, this oneliner isn't even needed.
I got my head wrong around what coherent means in this context. Captain hindsight says we should have named the dt flag dma-cache-coherent instead of just dma-coherent.
Odd thing: this flag has no device tree bindings I could find. Maybe it's in the really old DT docs?
Which device is the actual DMA master here? The "dma-coherent" property sets the pl110 as coherent, but the dma-ranges property would refer to the port, right?
Yeah just the latter should be set. Mea culpa.
diff --git a/arch/arm/boot/dts/integratorap.dts b/arch/arm/boot/dts/integratorap.dts index c983435ed492..9148287fa0a9 100644 --- a/arch/arm/boot/dts/integratorap.dts +++ b/arch/arm/boot/dts/integratorap.dts @@ -262,7 +262,7 @@ bus@c0000000 { lm0: bus@c0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xc0000000 0x10000000>;
dma-ranges = <0x00000000 0x80000000 0x10000000>;
dma-ranges = <0x00000000 0xc0000000 0x10000000>;
In your description, you say that you set a 1:1 map, but this is not what you seem to be setting here. Instead you map DMA address zero to point to the beginning of RAM.
This is in this context (after my changes):
reserved-memory { #address-cells = <1>; #size-cells = <1>; ranges;
impd1_ram: vram@c2000000 { /* 1 MB of designated video RAM on the IM-PD1 */ compatible = "shared-dma-pool"; reg = <0xc2000000 0x00100000>; no-map; }; };
So reserved-memory has to be in the root of the device tree because we don't support putting reserved memory anywhere else (would be nice to fix this!)
Here the special memory appears in the CPU address map.
bus@c0000000 { compatible = "arm,integrator-ap-lm"; #address-cells = <1>; #size-cells = <1>; ranges = <0xc0000000 0xc0000000 0x40000000>; dma-ranges; (...) lm0: bus@c0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xc0000000 0x10000000>; dma-ranges = <0x00000000 0xc0000000 0x10000000>; reg = <0xc0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>;
display@1000000 { compatible = "arm,pl110", "arm,primecell"; reg = <0x01000000 0x1000>; (...) memory-region = <&impd1_ram>; dma-ranges; (...)
Here we find the reg properly at physical address 0xc1000000 thanks to the ranges: 0xc0000000 + 0x00000000 + 0x01000000 = 0xc1000000.
But the special memory region is in the root of the device tree, outside of the translation.
Now dma-ranges will assume that when we translate the memory region it is in the address space of the display controller, but it isn't, because the phandle goes to something in the root of the device tree.
0xc2000000 + 0x00000000 + 0x00000000 = 0xc2000000
Whenever I do ranges my head start spinning :/
If you have a better way to accomodate the DMA ranges I am happy to comply!
We can fix the problem that reserved memory can only be in the root as well I suppose but that is no easy regression fix.
Yours, Linus Walleij
On Fri, Sep 23, 2022, at 2:56 PM, Linus Walleij wrote:
On Fri, Sep 23, 2022 at 1:58 PM Arnd Bergmann arnd@arndb.de wrote:
On Fri, Sep 23, 2022, at 1:38 PM, Linus Walleij wrote:
A recent change affecting the behaviour of phys_to_dma() to actually require the device tree ranges to work unmasked a bug in the Integrator DMA ranges.
The PL110 uses the CMA allocator to obtain coherent allocations from a dedicated 1MB video memory, leading to the following call chain:
drm_gem_cma_create() dma_alloc_attrs() dma_alloc_from_dev_coherent() __dma_alloc_from_coherent() dma_get_device_base() phys_to_dma() translate_phys_to_dma()
phys_to_dma() by way of translate_phys_to_dma() will nowadays not provide 1:1 mappings unless the ranges are properly defined in the device tree and reflected into the dev->dma_range_map.
I don't understand this yet, what did the kernel previously do to that allowed the correct DMA mapping when a wrong address was set in the DT?
Ah this goes way back. I think I need a different fixes tag. I hadn't tested this platform in a while.
What happens currently in translate_phys_to_dma() is that it returns DMA_MAPPING_ERROR (0xffffffff) because of the buggy device tree and that causes the problem.
Before the dma direct rework we selected ARCH_HAS_PHYS_TO_DMA and arch/arm/include/asm/dma-direct.h did this:
static inline dma_addr_t pfn_to_dma(struct device *dev, unsigned long pfn) { if (dev) pfn -= dev->dma_pfn_offset; return (dma_addr_t)__pfn_to_bus(pfn); } (...) static inline dma_addr_t __phys_to_dma(struct device *dev, phys_addr_t paddr) { unsigned int offset = paddr & ~PAGE_MASK; return pfn_to_dma(dev, __phys_to_pfn(paddr)) + offset; }
As dma_pfn_offset was 0 this resulted in a 1:1 map.
Ah, I see. So the offset was 0 because the dma-ranges did not actually point to RAM then, right?
Odd thing: this flag has no device tree bindings I could find. Maybe it's in the really old DT docs?
No idea. I do remember that in the old times, whether DMA was coherent or not was just a compile-time option, so ppc was usually assumed to be coherent while Arm was not, but then we needed something to identify machines that were different.
Here the special memory appears in the CPU address map.
bus@c0000000 { compatible = "arm,integrator-ap-lm"; #address-cells = <1>; #size-cells = <1>; ranges = <0xc0000000 0xc0000000 0x40000000>; dma-ranges;
(...) lm0: bus@c0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xc0000000 0x10000000>; dma-ranges = <0x00000000 0xc0000000 0x10000000>; reg = <0xc0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>;
display@1000000 { compatible = "arm,pl110", "arm,primecell"; reg = <0x01000000 0x1000>;
(...) memory-region = <&impd1_ram>; dma-ranges; (...)
Here we find the reg properly at physical address 0xc1000000 thanks to the ranges: 0xc0000000 + 0x00000000 + 0x01000000 = 0xc1000000.
But the special memory region is in the root of the device tree, outside of the translation.
Now dma-ranges will assume that when we translate the memory region it is in the address space of the display controller, but it isn't, because the phandle goes to something in the root of the device tree.
0xc2000000 + 0x00000000 + 0x00000000 = 0xc2000000
Whenever I do ranges my head start spinning :/
If you have a better way to accomodate the DMA ranges I am happy to comply!
I'm still not sure I understand it. Is that reserved memory area in RAM or on the logic bus? The 'ranges' property makes it appear that all of 0xc0000000-0xcfffffff is the logic bus, but then the vram itself is also in that address range, which would indicate that it is behind that bus. However, using the 'reserved-memory' property indicates that this is actually just normal RAM that is carved out from what the OS manages.
Arnd
On Fri, Sep 23, 2022 at 3:21 PM Arnd Bergmann arnd@arndb.de wrote:
On Fri, Sep 23, 2022, at 2:56 PM, Linus Walleij wrote:
Here the special memory appears in the CPU address map.
bus@c0000000 { compatible = "arm,integrator-ap-lm"; #address-cells = <1>; #size-cells = <1>; ranges = <0xc0000000 0xc0000000 0x40000000>; dma-ranges;
(...) lm0: bus@c0000000 { compatible = "simple-bus"; ranges = <0x00000000 0xc0000000 0x10000000>; dma-ranges = <0x00000000 0xc0000000 0x10000000>; reg = <0xc0000000 0x10000000>; #address-cells = <1>; #size-cells = <1>;
display@1000000 { compatible = "arm,pl110", "arm,primecell"; reg = <0x01000000 0x1000>;
(...) memory-region = <&impd1_ram>; dma-ranges; (...)
Here we find the reg properly at physical address 0xc1000000 thanks to the ranges: 0xc0000000 + 0x00000000 + 0x01000000 = 0xc1000000.
But the special memory region is in the root of the device tree, outside of the translation.
Now dma-ranges will assume that when we translate the memory region it is in the address space of the display controller, but it isn't, because the phandle goes to something in the root of the device tree.
0xc2000000 + 0x00000000 + 0x00000000 = 0xc2000000
Whenever I do ranges my head start spinning :/
If you have a better way to accomodate the DMA ranges I am happy to comply!
I'm still not sure I understand it. Is that reserved memory area in RAM or on the logic bus?
It's on the logic bus slot "lm0", so the display controller is at 0xc1000000 and right below it at 0xc2000000 is 1MB of dedicated graphics memory.
The 'ranges' property makes it appear that all of 0xc0000000-0xcfffffff is the logic bus,
Yep it is.
but then the vram itself is also in that address range, which would indicate that it is behind that bus.
Yep it is.
However, using the 'reserved-memory' property indicates that this is actually just normal RAM that is carved out from what the OS manages.
The binding document Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml unfortunately does not mention anywhere that the reserved memory must be a subset of the system main RAM "carveout" or anything like that, it says that the memory should be "designed for special usage by various device drivers", and "excluded from normal usage" which it is, hence the misunderstanding.
We have used this for Vexpress like forever, but I did the change from some custom vexpress memory property to shared-dma-pool so I guess that's my fault. Incidentally this also works very well.
Is there some other mechanism we should be using for dedicated framebuffer memory on some odd address?
Yours, Linus Walleij
On Fri, Sep 23, 2022, at 3:54 PM, Linus Walleij wrote:
On Fri, Sep 23, 2022 at 3:21 PM Arnd Bergmann arnd@arndb.de wrote:
On Fri, Sep 23, 2022, at 2:56 PM, Linus Walleij wrote:
but then the vram itself is also in that address range, which would indicate that it is behind that bus.
Yep it is.
Ok.
However, using the 'reserved-memory' property indicates that this is actually just normal RAM that is carved out from what the OS manages.
The binding document Documentation/devicetree/bindings/reserved-memory/reserved-memory.yaml unfortunately does not mention anywhere that the reserved memory must be a subset of the system main RAM "carveout" or anything like that, it says that the memory should be "designed for special usage by various device drivers", and "excluded from normal usage" which it is, hence the misunderstanding.
We have used this for Vexpress like forever, but I did the change from some custom vexpress memory property to shared-dma-pool so I guess that's my fault. Incidentally this also works very well.
Is there some other mechanism we should be using for dedicated framebuffer memory on some odd address?
I think the correct way to do this would have been to have a 'reg' property in the device itself that points to the VRAM, instead of describing it as memory.
This in turn would mean that the driver would use some variant of ioremap_wc() to map the framebuffer into kernel space, rather than using dma_alloc_coherent().
I guess the author of the driver copied the code from another fbdev driver that was meant for an Arm SoC with a shared memory framebuffer rather than a dedicated VRAM...
It's probably too late to change now, and the new dma-ranges property is at least consistent with the usage.
Can you update the changelog text based on the above and send the patch again with the dma-coherent property removed?
Arnd
linux-stable-mirror@lists.linaro.org