On Fri, May 1, 2020 at 11:26 AM Jordan Crouse jcrouse@codeaurora.org wrote:
Writing to the devfreq sysfs nodes while the GPU is powered down can result in a system crash (on a5xx) or a nasty GMU error (on a6xx):
$ /sys/class/devfreq/5000000.gpu# echo 500000000 > min_freq [ 104.841625] platform 506a000.gmu: [drm:a6xx_gmu_set_oob] *ERROR* Timeout waiting for GMU OOB set GPU_DCVS: 0x0
Despite the fact that we carefully try to suspend the devfreq device when the hardware is powered down there are lots of holes in the governors that don't check for the suspend state and blindly call into the devfreq callbacks that end up triggering hardware reads in the GPU driver.
Check the power state in the gpu_busy() and gpu_set_freq() callbacks for a5xx and a6xx to make sure that the hardware is active before trying to access it.
Chatted on IRC -- while this avoids the instaboot on db820c when setting /sys/class/devfreq/devfreq1/min_freq, I think we should be using pm_runtime_get_if_in_use() to avoid the races while still avoiding bringing up the GPU.