Flush the g2h worker explicitly if TLB timeout happens which is observed on LNL and that points to the recent scheduling issue with E-cores on LNL.
This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is E core scheduling fix.
v2: Add platform check(Himal) v3: Remove gfx platform check as the issue related to cpu platform(John) Use the common WA macro(John) and print when the flush resolves timeout(Matt B)
Cc: Badal Nilawar badal.nilawar@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: John Harrison John.C.Harrison@Intel.com Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: stable@vger.kernel.org # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687 Signed-off-by: Nirmoy Das nirmoy.das@intel.com --- drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c index 773de1f08db9..0bdb3ba5220a 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c @@ -81,6 +81,15 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work) if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt)) break;
+ LNL_FLUSH_WORK(>->uc.guc.ct.g2h_worker); + since_inval_ms = ktime_ms_delta(ktime_get(), + fence->invalidation_time); + if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt)) { + xe_gt_dbg(gt, "LNL_FLUSH_WORK resolved TLB invalidation fence timeout, seqno=%d recv=%d", + fence->seqno, gt->tlb_invalidation.seqno_recv); + break; + } + trace_xe_gt_tlb_invalidation_fence_timeout(xe, fence); xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d", fence->seqno, gt->tlb_invalidation.seqno_recv);