Initialize eb->vma[].vma pointers to NULL when the eb structure is first set up.
During the execution of eb_lookup_vmas(), the eb->vma array is successively filled up with struct eb_vma objects. This process includes calling eb_add_vma(), which might fail; however, even in the event of failure, eb->vma[i].vma is set for the currently processed buffer.
If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which prompts a call to eb_release_vmas() to clean up the mess. Since eb_lookup_vmas() might fail during processing any (possibly not first) buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know at what point did the lookup function fail.
In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is set to NULL in case i915_gem_object_userptr_submit_init() fails; the current one needs to be cleaned up by eb_release_vmas() at this point, so the next one is set. If eb_add_vma() fails, neither the current nor the next vma is nullified, which is a source of a NULL deref bug described in [1].
When entering eb_lookup_vmas(), the vma pointers are set to the slab poison value, instead of NULL. This doesn't matter for the actual lookup, since it gets overwritten anyway, however the eb_release_vmas() function only recognizes NULL as the stopping value, hence the pointers are being nullified as they go in case of intermediate failure. This patch changes the approach to filling them all with NULL at the start instead, rather than handling that manually during failure.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15062 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Reported-by: Gangmin Kim km.kim1503@gmail.com Cc: stable@vger.kernel.org # 5.16.x Signed-off-by: Krzysztof Niemiec krzysztof.niemiec@intel.com --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 27 ++++++------------- 1 file changed, 8 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b057c2fa03a4..02120203af55 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -951,13 +951,13 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) vma = eb_lookup_vma(eb, eb->exec[i].handle); if (IS_ERR(vma)) { err = PTR_ERR(vma); - goto err; + return err; }
err = eb_validate_vma(eb, &eb->exec[i], vma); if (unlikely(err)) { i915_vma_put(vma); - goto err; + return err; }
err = eb_add_vma(eb, ¤t_batch, i, vma); @@ -966,19 +966,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
if (i915_gem_object_is_userptr(vma->obj)) { err = i915_gem_object_userptr_submit_init(vma->obj); - if (err) { - if (i + 1 < eb->buffer_count) { - /* - * Execbuffer code expects last vma entry to be NULL, - * since we already initialized this entry, - * set the next value to NULL or we mess up - * cleanup handling. - */ - eb->vma[i + 1].vma = NULL; - } - + if (err) return err; - }
eb->vma[i].flags |= __EXEC_OBJECT_USERPTR_INIT; eb->args->flags |= __EXEC_USERPTR_USED; @@ -986,10 +975,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) }
return 0; - -err: - eb->vma[i].vma = NULL; - return err; }
static int eb_lock_vmas(struct i915_execbuffer *eb) @@ -3362,6 +3347,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, struct sync_file *out_fence = NULL; int out_fence_fd = -1; int err; + int i;
BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS); BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & @@ -3375,7 +3361,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
eb.exec = exec; eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1); - eb.vma[0].vma = NULL; + + for (i = 0; i < args->buffer_count; i++) + eb.vma[i].vma = NULL; + eb.batch_pool = NULL;
eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
Re-sending because of my response unintentionally HTML formatted, with correct email address of Tvrtko by the way.
Hi Krzysztof,
On Tuesday, 25 November 2025 14:33:38 CET Krzysztof Niemiec wrote:
Initialize eb->vma[].vma pointers to NULL when the eb structure is first set up.
During the execution of eb_lookup_vmas(), the eb->vma array is successively filled up with struct eb_vma objects. This process includes calling eb_add_vma(), which might fail; however, even in the event of failure, eb->vma[i].vma is set for the currently processed buffer.
If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which prompts a call to eb_release_vmas() to clean up the mess. Since eb_lookup_vmas() might fail during processing any (possibly not first) buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know at what point did the lookup function fail.
In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is set to NULL in case i915_gem_object_userptr_submit_init() fails; the current one needs to be cleaned up by eb_release_vmas() at this point, so the next one is set. If eb_add_vma() fails, neither the current nor the next vma is nullified, which is a source of a NULL deref bug described in [1].
When entering eb_lookup_vmas(), the vma pointers are set to the slab poison value, instead of NULL.
Your commit description still doesn't answer my question why the whole memory area allocated to the table of VMAs is not initialized to 0 on allocation, only left populated with poison values.
Thanks, Janusz
This doesn't matter for the actual lookup, since it gets overwritten anyway, however the eb_release_vmas() function only recognizes NULL as the stopping value, hence the pointers are being nullified as they go in case of intermediate failure. This patch changes the approach to filling them all with NULL at the start instead, rather than handling that manually during failure.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15062 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Reported-by: Gangmin Kim km.kim1503@gmail.com Cc: stable@vger.kernel.org # 5.16.x Signed-off-by: Krzysztof Niemiec krzysztof.niemiec@intel.com
.../gpu/drm/i915/gem/i915_gem_execbuffer.c | 27 ++++++------------- 1 file changed, 8 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b057c2fa03a4..02120203af55 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -951,13 +951,13 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) vma = eb_lookup_vma(eb, eb->exec[i].handle); if (IS_ERR(vma)) { err = PTR_ERR(vma);
goto err;
}return err;err = eb_validate_vma(eb, &eb->exec[i], vma); if (unlikely(err)) { i915_vma_put(vma);
goto err;
}return err;err = eb_add_vma(eb, ¤t_batch, i, vma); @@ -966,19 +966,8 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) if (i915_gem_object_is_userptr(vma->obj)) { err = i915_gem_object_userptr_submit_init(vma->obj);
if (err) {if (i + 1 < eb->buffer_count) {/** Execbuffer code expects last vma entry to be NULL,* since we already initialized this entry,* set the next value to NULL or we mess up* cleanup handling.*/eb->vma[i + 1].vma = NULL;}
if (err) return err;
}eb->vma[i].flags |= __EXEC_OBJECT_USERPTR_INIT; eb->args->flags |= __EXEC_USERPTR_USED; @@ -986,10 +975,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) } return 0;
-err:
- eb->vma[i].vma = NULL;
- return err;
} static int eb_lock_vmas(struct i915_execbuffer *eb) @@ -3362,6 +3347,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, struct sync_file *out_fence = NULL; int out_fence_fd = -1; int err;
- int i;
BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS); BUILD_BUG_ON(__EXEC_OBJECT_INTERNAL_FLAGS & @@ -3375,7 +3361,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, eb.exec = exec; eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1);
- eb.vma[0].vma = NULL;
- for (i = 0; i < args->buffer_count; i++)
eb.vma[i].vma = NULL;- eb.batch_pool = NULL;
eb.invalid_flags = __EXEC_OBJECT_UNKNOWN_FLAGS;
On 2025-11-25 at 19:06:32 GMT, Janusz Krzysztofik wrote:
Re-sending because of my response unintentionally HTML formatted, with correct email address of Tvrtko by the way.
Hi Krzysztof,
On Tuesday, 25 November 2025 14:33:38 CET Krzysztof Niemiec wrote:
Initialize eb->vma[].vma pointers to NULL when the eb structure is first set up.
During the execution of eb_lookup_vmas(), the eb->vma array is successively filled up with struct eb_vma objects. This process includes calling eb_add_vma(), which might fail; however, even in the event of failure, eb->vma[i].vma is set for the currently processed buffer.
If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which prompts a call to eb_release_vmas() to clean up the mess. Since eb_lookup_vmas() might fail during processing any (possibly not first) buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know at what point did the lookup function fail.
In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is set to NULL in case i915_gem_object_userptr_submit_init() fails; the current one needs to be cleaned up by eb_release_vmas() at this point, so the next one is set. If eb_add_vma() fails, neither the current nor the next vma is nullified, which is a source of a NULL deref bug described in [1].
When entering eb_lookup_vmas(), the vma pointers are set to the slab poison value, instead of NULL.
Your commit description still doesn't answer my question why the whole memory area allocated to the table of VMAs is not initialized to 0 on allocation, only left populated with poison values.
Becuase kvmalloc_array() is used. [1]
I guess one could swap it to a call to kvcalloc() or something similar; the thing is that the call actually handles both allocations of exec_list2 and the eb_vma array, the former doesn't need to be zero-initialized, the latter technically also doesn't but it simplifies error paths (and fixes the linked bug). I'm not sure if a zero-initializing *alloc() would be more readable or not here.
Thanks Krzysztof
[1] https://elixir.bootlin.com/linux/v6.17.9/source/drivers/gpu/drm/i915/gem/i91...
Thanks, Janusz
Hi Krzysztof,
Initialize eb->vma[].vma pointers to NULL when the eb structure is first set up.
During the execution of eb_lookup_vmas(), the eb->vma array is successively filled up with struct eb_vma objects. This process includes calling eb_add_vma(), which might fail; however, even in the event of failure, eb->vma[i].vma is set for the currently processed buffer.
If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which prompts a call to eb_release_vmas() to clean up the mess. Since eb_lookup_vmas() might fail during processing any (possibly not first) buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know at what point did the lookup function fail.
In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is set to NULL in case i915_gem_object_userptr_submit_init() fails; the current one needs to be cleaned up by eb_release_vmas() at this point, so the next one is set. If eb_add_vma() fails, neither the current nor the next vma is nullified, which is a source of a NULL deref bug described in [1].
When entering eb_lookup_vmas(), the vma pointers are set to the slab poison value, instead of NULL. This doesn't matter for the actual lookup, since it gets overwritten anyway, however the eb_release_vmas() function only recognizes NULL as the stopping value, hence the pointers are being nullified as they go in case of intermediate failure. This patch changes the approach to filling them all with NULL at the start instead, rather than handling that manually during failure.
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15062 Fixes: 544460c33821 ("drm/i915: Multi-BB execbuf") Reported-by: Gangmin Kim km.kim1503@gmail.com Cc: stable@vger.kernel.org # 5.16.x Signed-off-by: Krzysztof Niemiec krzysztof.niemiec@intel.com
LGTM: Reviewed-by: Krzysztof Karas krzysztof.karas@intel.com
On Wednesday, 26 November 2025 18:28:55 CET Krzysztof Niemiec wrote:
On 2025-11-25 at 19:06:32 GMT, Janusz Krzysztofik wrote:
Re-sending because of my response unintentionally HTML formatted, with correct email address of Tvrtko by the way.
Hi Krzysztof,
On Tuesday, 25 November 2025 14:33:38 CET Krzysztof Niemiec wrote:
Initialize eb->vma[].vma pointers to NULL when the eb structure is first set up.
During the execution of eb_lookup_vmas(), the eb->vma array is successively filled up with struct eb_vma objects. This process includes calling eb_add_vma(), which might fail; however, even in the event of failure, eb->vma[i].vma is set for the currently processed buffer.
If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which prompts a call to eb_release_vmas() to clean up the mess. Since eb_lookup_vmas() might fail during processing any (possibly not first) buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know at what point did the lookup function fail.
In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is set to NULL in case i915_gem_object_userptr_submit_init() fails; the current one needs to be cleaned up by eb_release_vmas() at this point, so the next one is set. If eb_add_vma() fails, neither the current nor the next vma is nullified, which is a source of a NULL deref bug described in [1].
When entering eb_lookup_vmas(), the vma pointers are set to the slab poison value, instead of NULL.
Your commit description still doesn't answer my question why the whole memory area allocated to the table of VMAs is not initialized to 0 on allocation, only left populated with poison values.
Becuase kvmalloc_array() is used. [1]
I guess one could swap it to a call to kvcalloc() or something similar; the thing is that the call actually handles both allocations of exec_list2 and the eb_vma array, the former doesn't need to be zero-initialized, the latter technically also doesn't but it simplifies error paths (and fixes the linked bug). I'm not sure if a zero-initializing *alloc() would be more readable or not here.
To my taste, zeroing on allocation would be a more clean solution.
But, while being at it, please have a still closer look, especially at these two statements:
at drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:i915_gem_execbuffer2_ioctl():3588
/* Allocate extra slots for use by the command parser */ exec2_list = kvmalloc_array(count + 2, eb_element_size(), __GFP_NOWARN | GFP_KERNEL);
at drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:i915_gem_do_execbuffer():3354
eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1);
Why do we allocate space for 2 tables of count size plus 2 extra pairs of their elements, but then place the second table at count + 1 offset, leaving space for only one extra element of the first type?
Looking at git history, there was a couple of excessively complex patches and reverts that apparently introduced that discrepancy. Unfortunately, none of them, with exception of the one that introduced the above shown inline comment, provided a clear justification why we need to switch from 1 to 2 or vice versa.
Anyway, depending on how that extra space is actually used by the command parser, we may or may not get into troubles with that, so we should better fix it, I believe.
Thanks, Janusz
Thanks Krzysztof
[1] https://elixir.bootlin.com/linux/v6.17.9/source/drivers/gpu/drm/i915/gem/i91...
Thanks, Janusz
On Thu, 27 Nov 2025, Janusz Krzysztofik janusz.krzysztofik@linux.intel.com wrote:
To my taste, zeroing on allocation would be a more clean solution.
IIUC there are micro optimizations to not clear on allocation when you don't strictly have to...
I'm not advocating one or the other approach, just stating what I believe is the reason.
BR, Jani.
On Thursday, 27 November 2025 11:46:05 CET Jani Nikula wrote:
On Thu, 27 Nov 2025, Janusz Krzysztofik janusz.krzysztofik@linux.intel.com wrote:
To my taste, zeroing on allocation would be a more clean solution.
IIUC there are micro optimizations to not clear on allocation when you don't strictly have to...
I'm not advocating one or the other approach, just stating what I believe is the reason.
OK, good to hear there is still someone who is able to recall what the reason could be when no hints can be found in git history nor inline comments.
If that's the case, but we agree on pre-zeroing only the sub-area dedicated to the vma table rather than doing that on failure and limited to one element that follows the one that failed, as Krzysztof initially proposed, then I'd vote for restoring memset() that was dropped with commit 170fa29b14fad ("drm/ i915: Simplify eb_lookup_vmas()"). In any case, a clarification (in commit description or inline comment) on why we chose one solutions and not the another wouldn't hurt.
Thanks, Janusz
BR, Jani.
On 2025-11-27 at 12:21:11 GMT, Janusz Krzysztofik wrote:
On Thursday, 27 November 2025 11:46:05 CET Jani Nikula wrote:
On Thu, 27 Nov 2025, Janusz Krzysztofik janusz.krzysztofik@linux.intel.com wrote:
To my taste, zeroing on allocation would be a more clean solution.
IIUC there are micro optimizations to not clear on allocation when you don't strictly have to...
I'm not advocating one or the other approach, just stating what I believe is the reason.
OK, good to hear there is still someone who is able to recall what the reason could be when no hints can be found in git history nor inline comments.
Both approaches are suboptimal in a way - if we zero on allocation, there's redundant writes to the eb_list2 part of the array, but if we don't, then we need to make an additional call to memset() (which of course is the better way to do compared to a for loop). I have no earthly idea which would be faster though, unless some obvious observation eludes me. I vote for the additional memset() for the clarity of intent.
If that's the case, but we agree on pre-zeroing only the sub-area dedicated to the vma table rather than doing that on failure and limited to one element that follows the one that failed, as Krzysztof initially proposed, then I'd vote for restoring memset() that was dropped with commit 170fa29b14fad ("drm/ i915: Simplify eb_lookup_vmas()"). In any case, a clarification (in commit description or inline comment) on why we chose one solutions and not the another wouldn't hurt.
The way I see it done now given the comments: - keep the kvmalloc() - zero the array like in the mentioned commit (e.g. memset()) - add comments to the new code so no one gets lost in the future (I'd prefer putting explanations in comments instead of the commit log)
Thanks Krzysztof
Thanks, Janusz
BR, Jani.
linux-stable-mirror@lists.linaro.org