[REGRESSION] BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

List overview All Threads
Download

newer

older

[PATCH] dma-buf: Fix NULL pointer...

Re: [PATCH 01/10] drm/mediatek:...

Oleksandr Natalenko

1 Oct 2023 1 Oct '23

4:32 p.m.

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

``` BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work ```

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

Please check.

Thanks.

-- Oleksandr Natalenko (post-factum)

Attachments:

signature.asc (application/pgp-signature — 833 bytes)

Show replies by date

Bagas Sanjaya

1 Oct 1 Oct

11:45 p.m.

On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:

...

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
 drm_gem_get_pages+0x94/0x2b0
 drm_gem_shmem_get_pages+0x5d/0x110
 drm_gem_shmem_object_vmap+0xc4/0x1e0
 drm_gem_vmap_unlocked+0x3c/0x70
 drm_client_buffer_vmap+0x23/0x50
 drm_fbdev_generic_helper_fb_dirty+0xae/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

Do you have this issue on v6.4?

-- An old man doll... just what I always wanted! - Clara

Oleksandr Natalenko

2 Oct 2 Oct

6:20 a.m.

Hello.

On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:

...

On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:

...

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
 drm_gem_get_pages+0x94/0x2b0
 drm_gem_shmem_get_pages+0x5d/0x110
 drm_gem_shmem_object_vmap+0xc4/0x1e0
 drm_gem_vmap_unlocked+0x3c/0x70
 drm_client_buffer_vmap+0x23/0x50
 drm_fbdev_generic_helper_fb_dirty+0xae/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

Do you have this issue on v6.4?

No, I did not have this issue with v6.4.

Thanks.

-- Oleksandr Natalenko (post-factum)

Bagas Sanjaya

10:42 a.m.

On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:

...

Hello.

On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:

...

On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:

...

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
 drm_gem_get_pages+0x94/0x2b0
 drm_gem_shmem_get_pages+0x5d/0x110
 drm_gem_shmem_object_vmap+0xc4/0x1e0
 drm_gem_vmap_unlocked+0x3c/0x70
 drm_client_buffer_vmap+0x23/0x50
 drm_fbdev_generic_helper_fb_dirty+0xae/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

Do you have this issue on v6.4?

No, I did not have this issue with v6.4.

Then proceed with kernel bisection. You can refer to Documentation/admin-guide/bug-bisect.rst in the kernel sources for the process.

-- An old man doll... just what I always wanted! - Clara

Oleksandr Natalenko

11:03 a.m.

/cc Matthew, Andrew (please see below)

On pondělí 2. října 2023 12:42:42 CEST Bagas Sanjaya wrote:

...

On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:

...
Hello.

On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:

...
On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:

...
Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
 drm_gem_get_pages+0x94/0x2b0
 drm_gem_shmem_get_pages+0x5d/0x110
 drm_gem_shmem_object_vmap+0xc4/0x1e0
 drm_gem_vmap_unlocked+0x3c/0x70
 drm_client_buffer_vmap+0x23/0x50
 drm_fbdev_generic_helper_fb_dirty+0xae/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Do you have this issue on v6.4?
No, I did not have this issue with v6.4.
Then proceed with kernel bisection. You can refer to Documentation/admin-guide/bug-bisect.rst in the kernel sources for the process.

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

In the git log between v6.4 and v6.5 I see this:

``` commit 3291e09a463870610b8227f32b16b19a587edf33 Author: Matthew Wilcox (Oracle) willy@infradead.org Date: Wed Jun 21 17:45:49 2023 +0100

drm: convert drm_gem_put_pages() to use a folio_batch

Remove a few hidden compound_head() calls by converting the returned page to a folio once and using the folio APIs. ```

Thanks.

-- Oleksandr Natalenko (post-factum)

Matthew Wilcox

2:32 p.m.

On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:

...

...
...
...
...
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.

Oleksandr Natalenko

3:39 p.m.

On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:

...

On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:

...
...
...
...
...
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:

``` git log --oneline HEAD~3.. 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec" 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()" fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch" ```

then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.

Thanks.

-- Oleksandr Natalenko (post-factum)

Thomas Zimmermann

5 Oct 5 Oct

7:44 a.m.

Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:

...

On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:

...
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:

...
...
...
...
> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): > drm_gem_put_pages+0x186/0x250 > drm_gem_shmem_put_pages_locked+0x43/0xc0 > drm_gem_shmem_object_vunmap+0x83/0xe0 > drm_gem_vunmap_unlocked+0x46/0xb0 > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 > drm_fb_helper_damage_work+0x96/0x170

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:
git log --oneline HEAD~3..
7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.

Ignore my other email. It's apparently been fixed already. Thanks!

Best regards Thomas

...

Thanks.

-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)

Oleksandr Natalenko

7:56 a.m.

Hello.

On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:

...

Hi

Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:

...
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:

...
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:

...
...
...
>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 >> >> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): >> drm_gem_put_pages+0x186/0x250 >> drm_gem_shmem_put_pages_locked+0x43/0xc0 >> drm_gem_shmem_object_vunmap+0x83/0xe0 >> drm_gem_vunmap_unlocked+0x46/0xb0 >> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 >> drm_fb_helper_damage_work+0x96/0x170

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:
git log --oneline HEAD~3..
7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!

Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.

Thanks.

...

Best regards Thomas

...
Thanks.

-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)

-- Oleksandr Natalenko (post-factum)

Matthew Wilcox

12:19 p.m.

On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:

...

Hello.

On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:

...
Hi

Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:

...
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:

...
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:

...
...
>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 >>> >>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): >>> drm_gem_put_pages+0x186/0x250 >>> drm_gem_shmem_put_pages_locked+0x43/0xc0 >>> drm_gem_shmem_object_vunmap+0x83/0xe0 >>> drm_gem_vunmap_unlocked+0x46/0xb0 >>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 >>> drm_fb_helper_damage_work+0x96/0x170

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:
git log --oneline HEAD~3..
7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!
Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.

I don't understand; you said reverting those DRM commits fixed the problem, so 863a8eb3f270 is the solution. No?

Oleksandr Natalenko

12:31 p.m.

Hello.

On čtvrtek 5. října 2023 14:19:44 CEST Matthew Wilcox wrote:

...

On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:

...
Hello.

On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:

...
Hi

Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:

...
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:

...
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:

...
>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 >>>> >>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): >>>> drm_gem_put_pages+0x186/0x250 >>>> drm_gem_shmem_put_pages_locked+0x43/0xc0 >>>> drm_gem_shmem_object_vunmap+0x83/0xe0 >>>> drm_gem_vunmap_unlocked+0x46/0xb0 >>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 >>>> drm_fb_helper_damage_work+0x96/0x170

Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?

Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.

OK, I've did this against v6.5.5:
git log --oneline HEAD~3..
7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec"
8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()"
fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.

So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!
Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
I don't understand; you said reverting those DRM commits fixed the problem, so 863a8eb3f270 is the solution. No?

No-no, sorry for possible confusion. Let me explain again:

1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270 2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied 3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed

IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Hope this gets clear.

Thanks.

-- Oleksandr Natalenko (post-factum)

Matthew Wilcox

1:05 p.m.

On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:

...

No-no, sorry for possible confusion. Let me explain again:

we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270

now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied

I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed

IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Thank you! Sorry about the misunderstanding. Try this:

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 6129b89bb366..44a948b80ee1 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj) struct page **pages; struct folio *folio; struct folio_batch fbatch; - int i, j, npages; + long i, j, npages;

if (WARN_ON(!obj->filp)) return ERR_PTR(-EINVAL); @@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)

i = 0; while (i < npages) { + long nr; folio = shmem_read_folio_gfp(mapping, i, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) goto fail; - for (j = 0; j < folio_nr_pages(folio); j++, i++) + nr = min(npages - i, folio_nr_pages(folio)); + for (j = 0; j < nr; j++, i++) pages[i] = folio_file_page(folio, i);

/* Make sure shmem keeps __GFP_DMA32 allocated pages in the

Oleksandr Natalenko

1:34 p.m.

On čtvrtek 5. října 2023 15:05:27 CEST Matthew Wilcox wrote:

...

On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:

...
No-no, sorry for possible confusion. Let me explain again:

we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270

now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied

I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed

IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.

Thank you! Sorry about the misunderstanding. Try this:

diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 6129b89bb366..44a948b80ee1 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj) struct page **pages; struct folio *folio; struct folio_batch fbatch;

int i, j, npages;

long i, j, npages;

if (WARN_ON(!obj->filp)) return ERR_PTR(-EINVAL); @@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj) i = 0; while (i < npages) {
long nr;
folio = shmem_read_folio_gfp(mapping, i, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) goto fail;
for (j = 0; j < folio_nr_pages(folio); j++, i++)
nr = min(npages - i, folio_nr_pages(folio));
for (j = 0; j < nr; j++, i++)
pages[i] = folio_file_page(folio, i);
/* Make sure shmem keeps __GFP_DMA32 allocated pages in the

No issues after five reboots with this patch applied on top of v6.5.5.

Reported-by: Oleksandr Natalenko oleksandr@natalenko.name Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Link: https://lore.kernel.org/lkml/13360591.uLZWGnKmhe@natalenko.name/ Fixes: 3291e09a4638 ("drm: convert drm_gem_put_pages() to use a folio_batch") Cc: stable@vger.kernel.org # 6.5.x

Thank you!

-- Oleksandr Natalenko (post-factum)

Bagas Sanjaya

2 Oct 2 Oct

10:40 a.m.

On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:

...

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
 drm_gem_get_pages+0x94/0x2b0
 drm_gem_shmem_get_pages+0x5d/0x110
 drm_gem_shmem_object_vmap+0xc4/0x1e0
 drm_gem_vmap_unlocked+0x3c/0x70
 drm_client_buffer_vmap+0x23/0x50
 drm_fbdev_generic_helper_fb_dirty+0xae/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
 drm_gem_put_pages+0x186/0x250
 drm_gem_shmem_put_pages_locked+0x43/0xc0
 drm_gem_shmem_object_vunmap+0x83/0xe0
 drm_gem_vunmap_unlocked+0x46/0xb0
 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
 drm_fb_helper_damage_work+0x96/0x170
 process_one_work+0x254/0x470
 worker_thread+0x55/0x4f0
 kthread+0xe8/0x120
 ret_from_fork+0x34/0x50
 ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: v6.4..v6.5

-- An old man doll... just what I always wanted! - Clara

Thomas Zimmermann

5 Oct 5 Oct

7:42 a.m.

Am 01.10.23 um 18:32 schrieb Oleksandr Natalenko:

...

Hello.

I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:

BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250

Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108):
  drm_gem_put_pages+0x186/0x250
  drm_gem_shmem_put_pages_locked+0x43/0xc0
  drm_gem_shmem_object_vunmap+0x83/0xe0
  drm_gem_vunmap_unlocked+0x46/0xb0
  drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
  drm_fb_helper_damage_work+0x96/0x170
  process_one_work+0x254/0x470
  worker_thread+0x55/0x4f0
  kthread+0xe8/0x120
  ret_from_fork+0x34/0x50
  ret_from_fork_asm+0x1b/0x30

kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k

allocated by task 51 on cpu 0 at 14.668667s:
  drm_gem_get_pages+0x94/0x2b0
  drm_gem_shmem_get_pages+0x5d/0x110
  drm_gem_shmem_object_vmap+0xc4/0x1e0
  drm_gem_vmap_unlocked+0x3c/0x70
  drm_client_buffer_vmap+0x23/0x50
  drm_fbdev_generic_helper_fb_dirty+0xae/0x310
  drm_fb_helper_damage_work+0x96/0x170
  process_one_work+0x254/0x470
  worker_thread+0x55/0x4f0
  kthread+0xe8/0x120
  ret_from_fork+0x34/0x50
  ret_from_fork_asm+0x1b/0x30

freed by task 51 on cpu 0 at 14.668697s:
  drm_gem_put_pages+0x186/0x250
  drm_gem_shmem_put_pages_locked+0x43/0xc0
  drm_gem_shmem_object_vunmap+0x83/0xe0
  drm_gem_vunmap_unlocked+0x46/0xb0
  drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310
  drm_fb_helper_damage_work+0x96/0x170
  process_one_work+0x254/0x470
  worker_thread+0x55/0x4f0
  kthread+0xe8/0x120
  ret_from_fork+0x34/0x50
  ret_from_fork_asm+0x1b/0x30

CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e
Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
Workqueue: events drm_fb_helper_damage_work

This repeats a couple of times and then stops.

Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.

The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446

There's nothing special about the cirrus driver. Can you please provide the full output of 'lspci -v' ?

Would you be able to bisect this bug?

Best regards Thomas

...

Please check.

Thanks.

821

days inactive

825

days old

linaro-mm-sig@lists.linaro.org

14 comments

participants

tags (0)

participants (4)

Bagas Sanjaya
Matthew Wilcox
Oleksandr Natalenko
Thomas Zimmermann