On 10/01/2019 11:49, Will Deacon wrote:
On Thu, Jan 10, 2019 at 11:36:41AM +0000, Carsten Haitzler wrote:
On 09/01/2019 19:07, Ard Biesheuvel wrote:
I can confirm that this change fixes all the issues I observed on AMD Seattle with HD5450 and HD7450 cards which use the Radeon driver (not the amdpgu one)
Hooray. Another happy user. :) I suspect Bero will report success too. At least this is at worst the "tip of the iceberg" of the problem and that patch fixes it with a sledgehammer. At best it's the exact right fix. :) that's another topic. I see another mail with some patch so I'll continue there.
Thanks.
So I will attempt to dig into this a bit further myself, and hopefully find something that carries over to amdgpu as well, so I may ask you to test something if I do.
It may not be perfect, but it is better than it was and other MIPS/PPC and even x86 32bit systems already need this kind of fix. In the same way it seems ARM needs it too and no one to date has bothered upstream. I'd rather things improve for at least some set of people than they do not improve at all for an undefined amount of time. Note that working is an improvement to "fast but doesn't work" in my book. :) Don't get me wrong. Looking for a better fix in the meantime,if one could exist, is a positive thing. It's not something I can get stuck into as above.
I'd just like to see if we can fix properly before we upstream a hack.
If we find a significantly better fix in short order - sure. If this is going to drag out into weeks and weeks of back and forth, I think we should consider getting a fix out until something better can be found. Just keep in mind, for every day no fix is available someone somewhere is yelling at some system that doesn't work and they don't know why. They may not know C or how to even compile things... but they are unhappy. :)
This patch perpetuates the unfounded accusation that the Arm architecture is fundamenatally incompatible with write-combining and PCI. If we don't bother to diagnose the reported failures correctly, removing hacks such as this when we are forced to understand the problem properly tends to be considerably more effort in my experience, particularly if the same hack has been adopted by other drivers or subsystems.
So I don't think this patch is anything more than a short-term hack, which isn't something we should commit to maintaining upstream. I'm over the moon that it allows you to use your workstation effectively, but please let's try to root-cause this (as Ard is doing) before we rush something in that we're unable to reason about.
I know you're not a fan of rebooting, but I'd appreciate it if you could please help with testing (or throw me an AMD card for a few days so I can do it myself).
I don't have any spare cards. :( Remember that this is my primary machine for everything, not a test machine.
I just had my tx2 (ThunderX2) lock up. It took 2 hours to have a working machine again as it refused to reboot (admittedly with lunch in between hoping a cool-down might fix things, but without it'd still have taken an hour or so to fix). Anything involving reboots is risk on my tx2. Going through the BIOS means multiple minutes of waiting for stuff hoping it finally boots, during which time I don't have a workstation to do email or anything else on. I should time the boot but I think it's about 5-10mins (and I'm too scared to reboot just to time it).
It's not about not being a fan or not of rebooting. It's about not sitting around potentially for hours or even days waiting on reboots and being unable to work on anything else. If I had a spare machine I could let boot on the side and get on with stuff on my main... I'd be fine. :) It's probably OK to reboot a bit and take the risk on my machine, but not frequently. :)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.