On Mon, May 23, 2022 at 04:09:20PM +0000, David Laight wrote:
From: Petr Malat
Sent: 23 May 2022 16:28 On Mon, May 23, 2022 at 02:51:41PM +0000, David Laight wrote:
From: Petr Malat
Sent: 23 May 2022 15:28
One can't use memcpy on memory obtained by ioremap, because IO memory may have different alignment and size access restriction than the system memory. Use memremap as phram driver operates on RAM.
Does that actually help? The memcpy() is still likely to issue unaligned accesses that the hardware can't handle.
Yes, it solves the issue. Memcpy can cause unaligned access only on platforms, which can handle it. And on ARM64 it's handled only for RAM and not for a device memory (__pgprot(PROT_DEVICE_*)).
Does mapping it as memory cause it to be cached? So the hardware only sees cache line reads (which are aligned) and the cpu support for misaligned memory accesses then stop the faults?
Yes, this is controlled by the MEMREMAP_WB flag, which sets up a mapping, which "matches the default mapping for System RAM on the architecture. This is usually a read-allocate write-back cache.
On x86 (which I know a lot more about) memcpy() has a nasty habit of getting implemented as 'rep movsb' relying on the cpu to speed it up. But that doesn't happen for uncached addresses - so you get very slow byte copies. OTOH misaligned PCIe transfers generate TLP that have the correct byte enables for the end words. Provided the PCIe target isn't broken they are fine.
With memremap one should get the same behavior and performance as with the system memory and it seems to be a good choice for "Physical system RAM" MTD driver, but if one uses it for actual IO memory, he should use ioremam, memcpy_toio and memcpy_fromio. Using these prevents the crash on arm64 as well, but could lead to a performance degradation on some platforms.
If you think there could be users using the driver for a real IO mem, I can provide both behaviors and let the user choose with an option. Petr