If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and incorrectly skip clearing the block. Ultimately, the (still dirty) block will be reused as if it were cleared, without any wiping of the memory contents.
Most severely, this means that any buffer allocated with AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE (which is the case for **all userspace buffers**) are neither guaranteed to contain cleared VRAM, nor are they being wiped on release, potentially leaking application memory to arbitrary other applications.
Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality") Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Signed-off-by: Natalie Vock natalie.vock@gmx.de --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 2d7f82e98df9..cecc67d0f0b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, list_for_each_entry(block, &vres->blocks, link) { unsigned long start;
+ /* + * Allocated blocks may be dirtied as soon as we return. + * Mark all blocks as dirty here, otherwise we might + * incorrectly assume the memory is still zeroed. + */ + drm_buddy_block_set_dirty(block); + start = amdgpu_vram_mgr_block_start(block) + amdgpu_vram_mgr_block_size(block); start >>= PAGE_SHIFT;
On 5/27/25 21:43, Natalie Vock wrote:
If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and incorrectly skip clearing the block. Ultimately, the (still dirty) block will be reused as if it were cleared, without any wiping of the memory contents.
Most severely, this means that any buffer allocated with AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE (which is the case for **all userspace buffers**) are neither guaranteed to contain cleared VRAM, nor are they being wiped on release, potentially leaking application memory to arbitrary other applications.
Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality") Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Signed-off-by: Natalie Vock natalie.vock@gmx.de
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 2d7f82e98df9..cecc67d0f0b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, list_for_each_entry(block, &vres->blocks, link) { unsigned long start;
/*
* Allocated blocks may be dirtied as soon as we return.
* Mark all blocks as dirty here, otherwise we might
* incorrectly assume the memory is still zeroed.
*/
drm_buddy_block_set_dirty(block);
Exactly that makes no sense.
We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.
So we should set them dirty as soon as we are done with the clearing.
But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
Regards, Christian.
- start = amdgpu_vram_mgr_block_start(block) + amdgpu_vram_mgr_block_size(block); start >>= PAGE_SHIFT;
Hi,
On 5/28/25 09:07, Christian König wrote:
On 5/27/25 21:43, Natalie Vock wrote:
If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and incorrectly skip clearing the block. Ultimately, the (still dirty) block will be reused as if it were cleared, without any wiping of the memory contents.
Most severely, this means that any buffer allocated with AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE (which is the case for **all userspace buffers**) are neither guaranteed to contain cleared VRAM, nor are they being wiped on release, potentially leaking application memory to arbitrary other applications.
Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality") Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Signed-off-by: Natalie Vock natalie.vock@gmx.de
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 2d7f82e98df9..cecc67d0f0b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, list_for_each_entry(block, &vres->blocks, link) { unsigned long start;
/*
* Allocated blocks may be dirtied as soon as we return.
* Mark all blocks as dirty here, otherwise we might
* incorrectly assume the memory is still zeroed.
*/
drm_buddy_block_set_dirty(block);
Exactly that makes no sense.
We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.
Right, I missed that separate clear on allocation. I was put a bit off-track by assuming DRM_BUDDY_ALLOCATE_CLEARED would guarantee cleared pages, when in reality it's more like a preference.
So we should set them dirty as soon as we are done with the clearing.
But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
If we don't mark the blocks as dirty after allocating, then the amdgpu_clear_buffer call on release will skip actually performing the clear like it did during allocation - this is obviously really broken.
After calling amdgpu_clear_buffer, we call amdgpu_vram_mgr_set_cleared which causes the drm_buddy blocks to be marked as "cleared" when freed. This part is correct in itself, but obviously breaks if amdgpu_clear_buffer didn't actually clear the buffer. That's how the dirty blocks end up in the buddy allocator as cleared ones.
I'm testing a v2 that sets the dirty flags after the initial clear, I'll send it once I confirmed it works.
Thanks, Natalie
Regards, Christian.
- start = amdgpu_vram_mgr_block_start(block) + amdgpu_vram_mgr_block_size(block); start >>= PAGE_SHIFT;
On 5/28/25 11:29, Natalie Vock wrote:
Hi,
On 5/28/25 09:07, Christian König wrote:
On 5/27/25 21:43, Natalie Vock wrote:
If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and incorrectly skip clearing the block. Ultimately, the (still dirty) block will be reused as if it were cleared, without any wiping of the memory contents.
Most severely, this means that any buffer allocated with AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE (which is the case for **all userspace buffers**) are neither guaranteed to contain cleared VRAM, nor are they being wiped on release, potentially leaking application memory to arbitrary other applications.
Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality") Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Signed-off-by: Natalie Vock natalie.vock@gmx.de
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 2d7f82e98df9..cecc67d0f0b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, list_for_each_entry(block, &vres->blocks, link) { unsigned long start; + /* + * Allocated blocks may be dirtied as soon as we return. + * Mark all blocks as dirty here, otherwise we might + * incorrectly assume the memory is still zeroed. + */ + drm_buddy_block_set_dirty(block);
Exactly that makes no sense.
We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.
Right, I missed that separate clear on allocation. I was put a bit off-track by assuming DRM_BUDDY_ALLOCATE_CLEARED would guarantee cleared pages, when in reality it's more like a preference.
So we should set them dirty as soon as we are done with the clearing.
But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
If we don't mark the blocks as dirty after allocating, then the amdgpu_clear_buffer call on release will skip actually performing the clear like it did during allocation - this is obviously really broken.
After calling amdgpu_clear_buffer, we call amdgpu_vram_mgr_set_cleared which causes the drm_buddy blocks to be marked as "cleared" when freed. This part is correct in itself, but obviously breaks if amdgpu_clear_buffer didn't actually clear the buffer. That's how the dirty blocks end up in the buddy allocator as cleared ones.
IIRC I've pointed out exactly that during internal discussions as well and suggested that amdgpu_vram_mgr_set_cleared() is renamed to amdgpu_vram_mgr_set_state() and given a boolean flag to indicate the state.
So that the clear on allocation will clear the dirty and set the state dirty and the clear on release will clear everything and set the clean state.
Sounds like this was never implemented like it was originally planned.
I'm testing a v2 that sets the dirty flags after the initial clear, I'll send it once I confirmed it works
Thanks a lot for looking into that!
Regards, Christian.
.
Thanks, Natalie
Regards, Christian.
start = amdgpu_vram_mgr_block_start(block) + amdgpu_vram_mgr_block_size(block); start >>= PAGE_SHIFT;
On 5/28/2025 2:59 PM, Natalie Vock wrote:
Hi,
On 5/28/25 09:07, Christian König wrote:
On 5/27/25 21:43, Natalie Vock wrote:
If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and incorrectly skip clearing the block. Ultimately, the (still dirty) block will be reused as if it were cleared, without any wiping of the memory contents.
Most severely, this means that any buffer allocated with AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE (which is the case for **all userspace buffers**) are neither guaranteed to contain cleared VRAM, nor are they being wiped on release, potentially leaking application memory to arbitrary other applications.
Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality") Cc: stable@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
Signed-off-by: Natalie Vock natalie.vock@gmx.de
drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 2d7f82e98df9..cecc67d0f0b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, list_for_each_entry(block, &vres->blocks, link) { unsigned long start; + /* + * Allocated blocks may be dirtied as soon as we return. + * Mark all blocks as dirty here, otherwise we might + * incorrectly assume the memory is still zeroed. + */ + drm_buddy_block_set_dirty(block);
Exactly that makes no sense.
We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.
Right, I missed that separate clear on allocation. I was put a bit off-track by assuming DRM_BUDDY_ALLOCATE_CLEARED would guarantee cleared pages, when in reality it's more like a preference.
So we should set them dirty as soon as we are done with the clearing.
But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
On buffer release [https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/amd/amdgpu/amd...], we call amdgpu_fill_buffer() and not amdgpu_clear_buffer() (in amdgpu_bo_release_notify() function), so the buffers are expected to be cleared without fail.
When the user space doesn't set the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE flag and having only AMDGPU_GEM_CREATE_VRAM_CLEARED, we don't call this amdgpu_fill_buffer() and amdgpu_vram_mgr_set_cleared(), and that's kind of makes sense. I think the problem here is, when we don't clear the buffer during BO release, but the flag remains as cleared and that's why these blocks are skipped during clear on allocation (in amdgpu_bo_create() function).
Therefore, if the release path clear is skipped for any reasons (for example, in case of AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE not set), we should set all buffer to dirty. Somehow, that is missed.
Thanks, Arun.
If we don't mark the blocks as dirty after allocating, then the amdgpu_clear_buffer call on release will skip actually performing the clear like it did during allocation - this is obviously really broken.
After calling amdgpu_clear_buffer, we call amdgpu_vram_mgr_set_cleared which causes the drm_buddy blocks to be marked as "cleared" when freed. This part is correct in itself, but obviously breaks if amdgpu_clear_buffer didn't actually clear the buffer. That's how the dirty blocks end up in the buddy allocator as cleared ones.
I'm testing a v2 that sets the dirty flags after the initial clear, I'll send it once I confirmed it works.
Thanks, Natalie
Regards, Christian.
start = amdgpu_vram_mgr_block_start(block) + amdgpu_vram_mgr_block_size(block); start >>= PAGE_SHIFT;
On 2025-05-28 14:14, Paneer Selvam, Arunpravin wrote:
On 5/28/2025 2:59 PM, Natalie Vock wrote:
On 5/28/25 09:07, Christian König wrote:
But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
On buffer release [https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/amd/amdgpu/amd...], we call amdgpu_fill_buffer() and not amdgpu_clear_buffer() (in amdgpu_bo_release_notify() function), so the buffers are expected to be cleared without fail.
When the user space doesn't set the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE flag and having only AMDGPU_GEM_CREATE_VRAM_CLEARED, we don't call this amdgpu_fill_buffer() and amdgpu_vram_mgr_set_cleared(), and that's kind of makes sense. I think the problem here is, when we don't clear the buffer during BO release, but the flag remains as cleared and that's why these blocks are skipped during clear on allocation (in amdgpu_bo_create() function).
Therefore, if the release path clear is skipped for any reasons (for example, in case of AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE not set), we should set all buffer to dirty. Somehow, that is missed.
BTW, I asked this before, but didn't get an answer:
Now that VRAM is always cleared before handing it out to user space, does AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE really need to do anything anymore? How can user space access the contents of a destroyed BO?
On 5/28/25 14:39, Michel Dänzer wrote:
On 2025-05-28 14:14, Paneer Selvam, Arunpravin wrote:
On 5/28/2025 2:59 PM, Natalie Vock wrote:
On 5/28/25 09:07, Christian König wrote:
But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
On buffer release [https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/amd/amdgpu/amd...], we call amdgpu_fill_buffer() and not amdgpu_clear_buffer() (in amdgpu_bo_release_notify() function), so the buffers are expected to be cleared without fail.
When the user space doesn't set the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE flag and having only AMDGPU_GEM_CREATE_VRAM_CLEARED, we don't call this amdgpu_fill_buffer() and amdgpu_vram_mgr_set_cleared(), and that's kind of makes sense. I think the problem here is, when we don't clear the buffer during BO release, but the flag remains as cleared and that's why these blocks are skipped during clear on allocation (in amdgpu_bo_create() function).
Therefore, if the release path clear is skipped for any reasons (for example, in case of AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE not set), we should set all buffer to dirty. Somehow, that is missed.
BTW, I asked this before, but didn't get an answer:
Now that VRAM is always cleared before handing it out to user space, does AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE really need to do anything anymore? How can user space access the contents of a destroyed BO?
The flag is now used to communicate to the backend that we want to wipe on release. E.g. we still don't do that for kernel allocations or during suspend.
Regards, Christian.
linux-stable-mirror@lists.linaro.org