From: Rob Clark robdclark@chromium.org
Inspired by https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vetter@ffw... it seemed like a good idea to get rid of memory allocation in job_run() fence signaling path, and use lockdep annotations to yell at us about anything that could deadlock against shrinker/reclaim. Anything that can trigger reclaim, or block on any other thread that has triggered reclaim, can block the GPU shrinker from releasing memory if it is waiting the job to complete, causing deadlock.
The first patch pre-allocates the hw_fence, splitting allocation and initialization, to avoid allocation in the job_run() path. The next eight decouple the obj lock from job_run(), as the obj lock is required to pin/unpin backing pages (ie. holding an obj lock in job_run() could deadlock the shrinker by blocking forward progress towards pinned buffers becoming idle). Followed by two so that we could idr_preload() in order to avoid memory allocations under locks indirectly connected to the shrinker path.
Next are three paths to decouple initialization (where allocations are needed) from GPU runpm and devfreq, to avoid allocations in the fence signaling path. Followed by various PM devfreq/qos and interconnect locking fixes to decouple initialization (allocation) from runtime.
And finally, the last patch is a modified version of danvet's patch to add lockdep annotations to gpu scheduler, but does so conditionally so that drivers can opt-in.
v2: Switch from embedding hw_fence in submit/job object to preallocating the hw_fence. Rework "fenced unpin" locking to drop obj lock from fence signaling path (ie. the part that was still WIP in the first iteration of the patchset). Adds the final patch to enable fence signaling annotations now that job_run() and job_free() are safe. The PM devfreq/QoS and interconnect patches are unchanged.
Rob Clark (23): drm/msm: Pre-allocate hw_fence drm/msm: Move submit bo flags update from obj lock drm/msm/gem: Tidy up VMA API drm/msm: Decouple vma tracking from obj lock drm/msm/gem: Simplify vmap vs LRU tracking drm/gem: Export drm_gem_lru_move_tail_locked() drm/msm/gem: Move update_lru() drm/msm/gem: Protect pin_count/madv by LRU lock drm/msm/gem: Avoid obj lock in job_run() drm/msm: Switch idr_lock to spinlock drm/msm: Use idr_preload() drm/msm/gpu: Move fw loading out of hw_init() path drm/msm/gpu: Move BO allocation out of hw_init drm/msm/a6xx: Move ioremap out of hw_init path PM / devfreq: Drop unneed locking to appease lockdep PM / devfreq: Teach lockdep about locking order PM / QoS: Fix constraints alloc vs reclaim locking PM / QoS: Decouple request alloc from dev_pm_qos_mtx PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order soc: qcom: smd-rpm: Use GFP_ATOMIC in write path interconnect: Fix locking for runpm vs reclaim interconnect: Teach lockdep about icc_bw_lock order drm/sched: Add (optional) fence signaling annotation
drivers/base/power/qos.c | 83 +++++++++--- drivers/devfreq/devfreq.c | 52 ++++---- drivers/gpu/drm/drm_gem.c | 11 +- drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 48 ++++--- drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 18 ++- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 ++++--- drivers/gpu/drm/msm/adreno/adreno_device.c | 6 + drivers/gpu/drm/msm/adreno/adreno_gpu.c | 9 +- drivers/gpu/drm/msm/msm_drv.c | 6 +- drivers/gpu/drm/msm/msm_fence.c | 12 +- drivers/gpu/drm/msm/msm_fence.h | 3 +- drivers/gpu/drm/msm/msm_gem.c | 145 ++++++++++++++------- drivers/gpu/drm/msm/msm_gem.h | 29 +++-- drivers/gpu/drm/msm/msm_gem_submit.c | 27 ++-- drivers/gpu/drm/msm/msm_gem_vma.c | 91 ++++++++++--- drivers/gpu/drm/msm/msm_gpu.h | 8 +- drivers/gpu/drm/msm/msm_ringbuffer.c | 9 +- drivers/gpu/drm/msm/msm_submitqueue.c | 2 +- drivers/gpu/drm/scheduler/sched_main.c | 9 ++ drivers/interconnect/core.c | 18 ++- drivers/soc/qcom/smd-rpm.c | 2 +- include/drm/drm_gem.h | 1 + include/drm/gpu_scheduler.h | 2 + 23 files changed, 416 insertions(+), 221 deletions(-)