On 11/1/2024 7:51 PM, John Harrison wrote:
On 10/29/2024 05:01, Nirmoy Das wrote:
Flush the g2h worker explicitly if TLB timeout happens which is observed on LNL and that points to the recent scheduling issue with E-cores on LNL.
This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is E core scheduling fix.
v2: Add platform check(Himal) v3: Remove gfx platform check as the issue related to cpu platform(John) Use the common WA macro(John) and print when the flush resolves timeout(Matt B) v4: Remove the resolves log and do the flush before taking pending_lock(Matt A)
Cc: Badal Nilawar badal.nilawar@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Matthew Auld matthew.auld@intel.com Cc: John Harrison John.C.Harrison@Intel.com Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Cc: Lucas De Marchi lucas.demarchi@intel.com Cc: stable@vger.kernel.org # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687 Signed-off-by: Nirmoy Das nirmoy.das@intel.com
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c index 773de1f08db9..3cb228c773cd 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c @@ -72,6 +72,8 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work) struct xe_device *xe = gt_to_xe(gt); struct xe_gt_tlb_invalidation_fence *fence, *next; + LNL_FLUSH_WORK(>->uc.guc.ct.g2h_worker);
Do we not want some kind of 'success required flush' message here as per the other instances?
This flush resolves TLB timeout but in this function, I can't think of a simple way to detect that and log a message.
Regards,
Nirmoy
John.
spin_lock_irq(>->tlb_invalidation.pending_lock); list_for_each_entry_safe(fence, next, >->tlb_invalidation.pending_fences, link) {