(adding Will who was part of a similar discussion before)
On Tue, 8 Jan 2019 at 19:35, Carsten Haitzler Carsten.Haitzler@arm.com wrote:
On 08/01/2019 17:07, Grant Likely wrote:
FYI I have a Radeon RX550 with amdgpu on my thunder-x2. yes - it's a server ARM (aarch64) system, but it works a charm. 2 screens attached. I did have to do the following:
- patch kernel DRM code to force uncached mappings (the code apparently
assumes WC x86-style):
--- ./include/drm/drm_cache.h~ 2018-08-12 21:41:04.000000000 +0100 +++ ./include/drm/drm_cache.h 2018-11-16 11:06:16.976842816 +0000 @@ -48,7 +48,7 @@ #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3) return false; #else
return true;
return false;
#endif }
OK, so this is rather interesting. First of all, this is the exact change we apply to the nouveau driver to work on SynQuacer, i.e., demote all normal-non cacheable mappings of memory exposed by the PCIe controller via a BAR to device mappings. On SynQuacer, we need this because of a known silicon bug in the integration of the PCIe IP.
However, the fact that even on TX2, you need device mappings to map RAM exposed via PCIe is rather troubling, and it has come up in the past as well. The problem is that the GPU driver stack on Linux, including VDPAU libraries and other userland pieces all assume that memory exposed via PCIe has proper memory semantics, including the ability to perform unaligned accesses on it or use DC ZVA instructions to clear it. As we all know, these driver stacks are rather complex, and adding awareness to each level in the stack regarding whether a certain piece of memory is real memory or PCI memory is going to be cumbersome.
When we discussed this in the past, an ARM h/w engineer pointed out that normal-nc is fundamentally incompatible with AMBA or AXI or whatever we use on ARM to integrate these components at the silicon level. If that means we can only use device mappings, it means we will need to make intrusive changes to a *lot* of code to ensure it doesn't use memcpy() or do other things that device mappings don't tolerate on ARM.
So, can we get the right people from the ARM side involved to clarify this once and for all?
- ensure I have a proper Xorg conf file for it to stop trying to do
things with the dumb FB on board:
Section "ServerFlags" Option "AutoAddGPU" "false" EndSection
Section "Device" Identifier "amdgpu" Driver "amdgpu" BusID "137:0:0" Option "DRI" "2" Option "TearFree" "on" EndSection
I put that in /usr/share/X11/xorg.conf.d/10-amdgpu.conf
This is needed to ignore the BMC display controller, I take it?
The kernel patch is just about the only really "advanced" thing I had to do. I currently don't really know how I should hack that up to be "consumable in general" (Should we do this all the time on ARM/aarch64 by default? Only on some SoCs/CPUs? Should we create kernel/module parameters to do the above?).
[Looping in Carsten who also has an AMD board in his TX2]
g.
On 08/01/2019 16:24, Jeremy Linton wrote:
Hi,
On 01/08/2019 08:28 AM, Ard Biesheuvel wrote:
Hi Bero,
On Tue, 8 Jan 2019 at 15:25, Bero Rosenkränzer Bernhard.Rosenkranzer@linaro.org wrote: ...
This is bad for the ARM ecosystem, since it is the only driver we have for NVIDIA hardware, and it has fewer issues than AMD GPU drivers + hardware running on ARM systenms.
What's the problem with AMD GPU drivers on ARM? Just missing workarounds for things like the SynQuacer PCIE bug? On x86, AMD drivers tend to work better than NVIDIA, so maybe the simple (and acceptable, given NVIDIA refuses to join) fix would be improving AMD drivers and telling NVIDIA to go to hell...
AMD gfx cards produce lots of visual corruption under Linux on all of the arm64 boards I tried (including Seattle which has properly working PCIe), and I have never managed to get anyone interested in looking into it (although I didn't try /that/ hard tbh)
Presumably amdgpu, and a more recent board? I had good luck with a hd5450 and the radeon driver although in its default configuration it expects to bootstrap the card using PCI PIO. Nouveau is hardly free from failures either, particularly if you happen to be running a 64k kernel.
Given that AMD has been a bit more open with their documentation, has an aarch64 UEFI/GOP driver on their site (https://www.amd.com/en/support/kb/release-notes/rn-aar), and doesn't seem to suffer as much when running !4k kernels, fixing it sounds well worth the effort.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.