[PATCH v4 3/3] drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout

29 Oct 2024

Flush the g2h worker explicitly if TLB timeout happens which is
observed on LNL and that points to the recent scheduling issue with
E-cores on LNL.
This is similar to the recent fix:
commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h
response timeout") and should be removed once there is E core
scheduling fix.
v2: Add platform check(Himal)
v3: Remove gfx platform check as the issue related to cpu
    platform(John)
    Use the common WA macro(John) and print when the flush
    resolves timeout(Matt B)
Cc: Badal Nilawar badal.nilawar@intel.com
Cc: Matthew Brost matthew.brost@intel.com
Cc: Matthew Auld matthew.auld@intel.com
Cc: John Harrison John.C.Harrison@Intel.com
Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com
Cc: Lucas De Marchi lucas.demarchi@intel.com
Cc: stable@vger.kernel.org # v6.11+
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687
Signed-off-by: Nirmoy Das nirmoy.das@intel.com
---
 drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
index 773de1f08db9..0bdb3ba5220a 100644
--- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
+++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c
@@ -81,6 +81,15 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work)
    	if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt))
    		break;
+		LNL_FLUSH_WORK(&gt->uc.guc.ct.g2h_worker);
+		since_inval_ms = ktime_ms_delta(ktime_get(),
+						fence->invalidation_time);
+		if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt)) {
+			xe_gt_dbg(gt, "LNL_FLUSH_WORK resolved TLB invalidation fence timeout, seqno=%d recv=%d",
+				  fence->seqno, gt->tlb_invalidation.seqno_recv);
+			break;
+		}
+
    	trace_xe_gt_tlb_invalidation_fence_timeout(xe, fence);
    	xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d",
    		  fence->seqno, gt->tlb_invalidation.seqno_recv);
-- 
2.46.0



    

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH v4 3/3] drm/xe/guc/tlb: Flush g2h worker in case of tlb timeout