On Thu, Dec 12, 2024 at 09:34:32AM -0800, Lucas De Marchi wrote:
This partially reverts commit fe4f5d4b6616 ("drm/xe: Clean up VM / exec queue file lock usage."). While it's desired to have the mutex to protect only the reference to the exec queue, getting and dropping each mutex and then later getting the GPU timestamp, doesn't produce a correct result: it introduces multiple opportunities for the task to be scheduled out and thus wrecking havoc the deltas reported to userspace.
Also, to better correlate the timestamp from the exec queues with the GPU, disable preemption so they can be updated without allowing the task to be scheduled out. We leave interrupts enabled as that shouldn't be enough disturbance for the deltas to matter to userspace.
Assuming
- timestamp from exec queues = xe_exec_queue_update_run_ticks() - GPU timestamp = xe_hw_engine_read_timestamp()
shouldn't you also move the xe_hw_engine_read_timestamp() within the preempt_disable/preempt_enable section?
Something like this ..
preempt_disable();
xa_for_each(&xef->exec_queue.xa, i, q) xe_exec_queue_update_run_ticks(q);
gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
preempt_enable();
Thanks, Umesh
Test scenario:
- IGT'S `xe_drm_fdinfo --r --r utilization-single-full-load`
- Platform: LNL, where CI occasionally reports failures
- `stress -c $(nproc)` running in parallel to disturb the system
This brings a first failure from "after ~150 executions" to "never occurs after 1000 attempts".
Cc: stable@vger.kernel.org # v6.11+ Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3512 Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
drivers/gpu/drm/xe/xe_drm_client.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c index 298a587da7f17..e307b4d6bab5a 100644 --- a/drivers/gpu/drm/xe/xe_drm_client.c +++ b/drivers/gpu/drm/xe/xe_drm_client.c @@ -338,15 +338,12 @@ static void show_run_ticks(struct drm_printer *p, struct drm_file *file)
/* Accumulate all the exec queues from this client */ mutex_lock(&xef->exec_queue.lock);
- xa_for_each(&xef->exec_queue.xa, i, q) {
xe_exec_queue_get(q);
mutex_unlock(&xef->exec_queue.lock);
preempt_disable();
xa_for_each(&xef->exec_queue.xa, i, q) xe_exec_queue_update_run_ticks(q);
mutex_lock(&xef->exec_queue.lock);
xe_exec_queue_put(q);
- }
preempt_enable(); mutex_unlock(&xef->exec_queue.lock);
gpu_timestamp = xe_hw_engine_read_timestamp(hwe);
-- 2.47.0