Re: nouveau h/w acceleration is blacklisted by default in chromium 71+

9 Jan 2019

On Wed, Jan 09, 2019 at 11:41:00AM +0100, Ard Biesheuvel wrote:
...
(adding Will who was part of a similar discussion before)
On Tue, 8 Jan 2019 at 19:35, Carsten Haitzler Carsten.Haitzler@arm.com wrote:
...
On 08/01/2019 17:07, Grant Likely wrote:
FYI I have a Radeon RX550 with amdgpu on my thunder-x2. yes - it's a
server ARM (aarch64) system, but it works a charm. 2 screens attached. I
did have to do the following:

patch kernel DRM code to force uncached mappings (the code apparently

assumes WC x86-style):

--- ./include/drm/drm_cache.h~  2018-08-12 21:41:04.000000000 +0100
+++ ./include/drm/drm_cache.h   2018-11-16 11:06:16.976842816 +0000
@@ -48,7 +48,7 @@
 #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
        return false;
 #else

  return true;




  return false;



#endif
 }
OK, so this is rather interesting. First of all, this is the exact
change we apply to the nouveau driver to work on SynQuacer, i.e.,
demote all normal-non cacheable mappings of memory exposed by the PCIe
controller via a BAR to device mappings. On SynQuacer, we need this
because of a known silicon bug in the integration of the PCIe IP.
However, the fact that even on TX2, you need device mappings to map
RAM exposed via PCIe is rather troubling, and it has come up in the
past as well. The problem is that the GPU driver stack on Linux,
including VDPAU libraries and other userland pieces all assume that
memory exposed via PCIe has proper memory semantics, including the
ability to perform unaligned accesses on it or use DC ZVA instructions
to clear it. As we all know, these driver stacks are rather complex,
and adding awareness to each level in the stack regarding whether a
certain piece of memory is real memory or PCI memory is going to be
cumbersome.
When we discussed this in the past, an ARM h/w engineer pointed out
that normal-nc is fundamentally incompatible with AMBA or AXI or
whatever we use on ARM to integrate these components at the silicon
level.
FWIW, I still don't understand exactly what the point being made was in that
thread, but I do know that many of the assertions along the way were either
vague or incorrect. Yes, it's possible to integrate different buses in a way
that doesn't work, but I don't see anything "fundamental" about it.
...
If that means we can only use device mappings, it means we will
need to make intrusive changes to a *lot* of code to ensure it doesn't
use memcpy() or do other things that device mappings don't tolerate on
ARM.
Even if we got it working, it would probably be horribly slow.
...
So, can we get the right people from the ARM side involved to clarify
this once and for all?
Last time I looked at this code, the problem actually seemed to be that the
DRM core ends up trying to remap the CPU pages in ttm_set_pages_uc(). This
is a NOP for !x86, so I think we end up with the CPU using a cacheable
mapping but the device using a non-cacheable mapping, which could explain
the hang.
At the time, implementing set_pages_uc() to remap the linear mapping wasn't
feasible because it would preclude the use of block mappings, but now that
we're using page mappings by default maybe you could give it a try.
Will

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: nouveau h/w acceleration is blacklisted by default in chromium 71+