On Wed, 9 Jan 2019 at 15:38, Carsten Haitzler Carsten.Haitzler@arm.com wrote:
On 09/01/2019 14:36, Ard Biesheuvel wrote:
On Wed, 9 Jan 2019 at 15:34, Carsten Haitzler Carsten.Haitzler@arm.com wrote:
On 09/01/2019 14:33, Bero Rosenkränzer wrote:
On Wed, 9 Jan 2019 at 14:55, Carsten Haitzler Carsten.Haitzler@arm.com wrote:
My understanding is that ARM can't do "WC" in a guaranteed way like x86, so turning it off is the right thing to do anyway,
My understanding too.
FWIW I've added the fix to the OpenMandriva distro kernel https://github.com/OpenMandrivaSoftware/linux/commit/657041c5665c681d4519cf8... Let's see if any user starts screaming ;)
ttyl bero
let's see,. i have put in a patch to the internal kernel patch review before i send off to dri-devel. it's exactly your patch there just with a commit log explaining why.
So what exactly is it about x86 style wc that ARM cannot do?
From Pavel Shamis here at ARM:
"Short version.
X86 has well define behavior for WC memory – it combines multiples consecutive stores (has to be aligned to the cache line ) in 64B cache line writes over PCIe.
On Arm WC corresponds to Normal NC. Arm uarch does not do combining to cache line size. On some uarch we do 16B combining but not cache line.
The first uarch that will be doing cache line size combining is Aries.
It is important to note that WC is an opportunistic optimization and the software/hardware should not make an assumption that it always “combines” (true for x86 and arm)"
OK, so that only means that ARM WC mappings may behave more like x86 uncached mappings than x86 WC mappings. It does not explain why things break if we use them.
The problem with using uncached mappings here is that it breaks use cases that expect memory semantics, for unaligned access or DC ZVA instructions. At least VDPAU on nouveau breaks due to this, and likely many more other use cases as well.
So *please*, do not send that patch to dri-devel. Let's instead fix the root cause, which may be related to the thing pointed out by Will, i.e., that ttm_set_pages_uc() is not implemented correctly.