Ensure we serialize with completion side to prevent UAF with fence going out of scope on the stack, since we have no clue if it will fire after the timeout before we can erase from the xa. Also we have some dependent loads and stores for which we need the correct ordering, and we lack the needed barriers. Fix this by grabbing the ct->lock after the wait, which is also held by the completion side.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Badal Nilawar badal.nilawar@intel.com Cc: stable@vger.kernel.org # v6.8+ --- drivers/gpu/drm/xe/xe_guc_ct.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 4b95f75b1546..232eb69bd8e4 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -903,16 +903,26 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, }
ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ); + + /* + * Ensure we serialize with completion side to prevent UAF with fence going out of scope on + * the stack, since we have no clue if it will fire after the timeout before we can erase + * from the xa. Also we have some dependent loads and stores below for which we need the + * correct ordering, and we lack the needed barriers. + */ + mutex_lock(&ct->lock); if (!ret) { xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x", g2h_fence.seqno, action[0]); xa_erase_irq(&ct->fence_lookup, g2h_fence.seqno); + mutex_unlock(&ct->lock); return -ETIME; }
if (g2h_fence.retry) { xe_gt_dbg(gt, "H2G action %#x retrying: reason %#x\n", action[0], g2h_fence.reason); + mutex_unlock(&ct->lock); goto retry; } if (g2h_fence.fail) { @@ -921,7 +931,12 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, ret = -EIO; }
- return ret > 0 ? response_buffer ? g2h_fence.response_len : g2h_fence.response_data : ret; + if (ret > 0) + ret = response_buffer ? g2h_fence.response_len : g2h_fence.response_data; + + mutex_unlock(&ct->lock); + + return ret; }
/**
Hi Matthew,
On 30-09-2024 17:59, Matthew Auld wrote:
Ensure we serialize with completion side to prevent UAF with fence going out of scope on the stack, since we have no clue if it will fire after the timeout before we can erase from the xa. Also we have some dependent loads and stores for which we need the correct ordering, and we lack the needed barriers. Fix this by grabbing the ct->lock after the wait, which is also held by the completion side.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Badal Nilawar badal.nilawar@intel.com Cc: stable@vger.kernel.org # v6.8+
drivers/gpu/drm/xe/xe_guc_ct.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 4b95f75b1546..232eb69bd8e4 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -903,16 +903,26 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, } ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
- /*
* Ensure we serialize with completion side to prevent UAF with fence going out of scope on
* the stack, since we have no clue if it will fire after the timeout before we can erase
* from the xa. Also we have some dependent loads and stores below for which we need the
* correct ordering, and we lack the needed barriers.
*/
Before acquiring lock it is still possible that fence will be fired. To know it it would be good to print g2h_fence.done in error message below.
Regards, Badal
- mutex_lock(&ct->lock); if (!ret) { xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x", g2h_fence.seqno, action[0]); xa_erase_irq(&ct->fence_lookup, g2h_fence.seqno);
return -ETIME; }mutex_unlock(&ct->lock);
if (g2h_fence.retry) { xe_gt_dbg(gt, "H2G action %#x retrying: reason %#x\n", action[0], g2h_fence.reason);
goto retry; } if (g2h_fence.fail) {mutex_unlock(&ct->lock);
@@ -921,7 +931,12 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, ret = -EIO; }
- return ret > 0 ? response_buffer ? g2h_fence.response_len : g2h_fence.response_data : ret;
- if (ret > 0)
ret = response_buffer ? g2h_fence.response_len : g2h_fence.response_data;
- mutex_unlock(&ct->lock);
- return ret; }
/**
On 01/10/2024 06:22, Nilawar, Badal wrote:
Hi Matthew,
On 30-09-2024 17:59, Matthew Auld wrote:
Ensure we serialize with completion side to prevent UAF with fence going out of scope on the stack, since we have no clue if it will fire after the timeout before we can erase from the xa. Also we have some dependent loads and stores for which we need the correct ordering, and we lack the needed barriers. Fix this by grabbing the ct->lock after the wait, which is also held by the completion side.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Matthew Brost matthew.brost@intel.com Cc: Badal Nilawar badal.nilawar@intel.com Cc: stable@vger.kernel.org # v6.8+
drivers/gpu/drm/xe/xe_guc_ct.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 4b95f75b1546..232eb69bd8e4 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -903,16 +903,26 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, } ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
+ /* + * Ensure we serialize with completion side to prevent UAF with fence going out of scope on + * the stack, since we have no clue if it will fire after the timeout before we can erase + * from the xa. Also we have some dependent loads and stores below for which we need the + * correct ordering, and we lack the needed barriers. + */
Before acquiring lock it is still possible that fence will be fired. To know it it would be good to print g2h_fence.done in error message below.
Ok, will add.
Regards, Badal
+ mutex_lock(&ct->lock); if (!ret) { xe_gt_err(gt, "Timed out wait for G2H, fence %u, action %04x", g2h_fence.seqno, action[0]); xa_erase_irq(&ct->fence_lookup, g2h_fence.seqno); + mutex_unlock(&ct->lock); return -ETIME; } if (g2h_fence.retry) { xe_gt_dbg(gt, "H2G action %#x retrying: reason %#x\n", action[0], g2h_fence.reason); + mutex_unlock(&ct->lock); goto retry; } if (g2h_fence.fail) { @@ -921,7 +931,12 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, ret = -EIO; } - return ret > 0 ? response_buffer ? g2h_fence.response_len : g2h_fence.response_data : ret; + if (ret > 0) + ret = response_buffer ? g2h_fence.response_len : g2h_fence.response_data;
+ mutex_unlock(&ct->lock);
+ return ret; } /**
linux-stable-mirror@lists.linaro.org