[PATCH v2 7/7] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW

5 May 2025

From: Karol Wachowski karol.wachowski@intel.com
commit dad945c27a42dfadddff1049cf5ae417209a8996 upstream.
Mark as invalid context of a job that returned HW context violation
error and queue work that aborts jobs from faulty context.
Add engine reset to the context abort thread handler to not only abort
currently executing jobs but also to ensure NPU invalid state recovery.
Cc: stable@vger.kernel.org # v6.12
Signed-off-by: Karol Wachowski karol.wachowski@intel.com
Signed-off-by: Maciej Falkowski maciej.falkowski@linux.intel.com
Reviewed-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
Link: https://patchwork.freedesktop.org/patch/msgid/20250107173238.381120-13-macie...
Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com
---
 drivers/accel/ivpu/ivpu_job.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/drivers/accel/ivpu/ivpu_job.c b/drivers/accel/ivpu/ivpu_job.c
index a8e3eca14989c..27121c66e48f8 100644
--- a/drivers/accel/ivpu/ivpu_job.c
+++ b/drivers/accel/ivpu/ivpu_job.c
@@ -486,6 +486,26 @@ static int ivpu_job_signal_and_destroy(struct ivpu_device *vdev, u32 job_id, u32
lockdep_assert_held(&vdev->submitted_jobs_lock);
+	job = xa_load(&vdev->submitted_jobs_xa, job_id);
+	if (!job)
+		return -ENOENT;
+
+	if (job_status == VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW) {
+		guard(mutex)(&job->file_priv->lock);
+
+		if (job->file_priv->has_mmu_faults)
+			return 0;
+
+		/*
+		 * Mark context as faulty and defer destruction of the job to jobs abort thread
+		 * handler to synchronize between both faults and jobs returning context violation
+		 * status and ensure both are handled in the same way
+		 */
+		job->file_priv->has_mmu_faults = true;
+		queue_work(system_wq, &vdev->context_abort_work);
+		return 0;
+	}
+
    job = ivpu_job_remove_from_submitted_jobs(vdev, job_id);
    if (!job)
    	return -ENOENT;
@@ -795,6 +815,9 @@ void ivpu_context_abort_thread_handler(struct work_struct *work)
    struct ivpu_job *job;
    unsigned long id;
+	if (vdev->fw->sched_mode == VPU_SCHEDULING_MODE_HW)
+		ivpu_jsm_reset_engine(vdev, 0);
+
    mutex_lock(&vdev->context_list_lock);
    xa_for_each(&vdev->context_xa, ctx_id, file_priv) {
    	if (!file_priv->has_mmu_faults || file_priv->aborted)
@@ -808,6 +831,8 @@ void ivpu_context_abort_thread_handler(struct work_struct *work)
if (vdev->fw->sched_mode != VPU_SCHEDULING_MODE_HW)
    	return;
+
+	ivpu_jsm_hws_resume_engine(vdev, 0);
    /*
     * In hardware scheduling mode NPU already has stopped processing jobs
     * and won't send us any further notifications, thus we have to free job related resources
-- 
2.45.1



    

2025

2024

2023

2022

2021

2020

2019

2018

2017

[PATCH v2 7/7] accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW