Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
``` BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30
kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k
allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30
freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30
CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work ```
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Please check.
Thanks.
On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Do you have this issue on v6.4?
Hello.
On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Do you have this issue on v6.4?
No, I did not have this issue with v6.4.
Thanks.
On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:
Hello.
On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Do you have this issue on v6.4?
No, I did not have this issue with v6.4.
Then proceed with kernel bisection. You can refer to Documentation/admin-guide/bug-bisect.rst in the kernel sources for the process.
/cc Matthew, Andrew (please see below)
On pondělí 2. října 2023 12:42:42 CEST Bagas Sanjaya wrote:
On Mon, Oct 02, 2023 at 08:20:15AM +0200, Oleksandr Natalenko wrote:
Hello.
On pondělí 2. října 2023 1:45:44 CEST Bagas Sanjaya wrote:
On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Do you have this issue on v6.4?
No, I did not have this issue with v6.4.
Then proceed with kernel bisection. You can refer to Documentation/admin-guide/bug-bisect.rst in the kernel sources for the process.
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
In the git log between v6.4 and v6.5 I see this:
``` commit 3291e09a463870610b8227f32b16b19a587edf33 Author: Matthew Wilcox (Oracle) willy@infradead.org Date: Wed Jun 21 17:45:49 2023 +0100
drm: convert drm_gem_put_pages() to use a folio_batch
Remove a few hidden compound_head() calls by converting the returned page to a folio once and using the folio APIs. ```
Thanks.
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250
Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.
OK, I've did this against v6.5.5:
``` git log --oneline HEAD~3.. 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec" 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()" fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch" ```
then rebooted the host multiple times, and the issue is not seen any more.
So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Thanks.
Hi
Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 > > Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): > drm_gem_put_pages+0x186/0x250 > drm_gem_shmem_put_pages_locked+0x43/0xc0 > drm_gem_shmem_object_vunmap+0x83/0xe0 > drm_gem_vunmap_unlocked+0x46/0xb0 > drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 > drm_fb_helper_damage_work+0x96/0x170
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.
OK, I've did this against v6.5.5:
git log --oneline HEAD~3.. 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec" 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()" fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.
So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!
Best regards Thomas
Thanks.
Hello.
On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
Hi
Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 >> >> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): >> drm_gem_put_pages+0x186/0x250 >> drm_gem_shmem_put_pages_locked+0x43/0xc0 >> drm_gem_shmem_object_vunmap+0x83/0xe0 >> drm_gem_vunmap_unlocked+0x46/0xb0 >> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 >> drm_fb_helper_damage_work+0x96/0x170
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.
OK, I've did this against v6.5.5:
git log --oneline HEAD~3.. 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec" 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()" fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.
So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!
Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
Thanks.
Best regards Thomas
Thanks.
-- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
Hello.
On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
Hi
Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 >>> >>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): >>> drm_gem_put_pages+0x186/0x250 >>> drm_gem_shmem_put_pages_locked+0x43/0xc0 >>> drm_gem_shmem_object_vunmap+0x83/0xe0 >>> drm_gem_vunmap_unlocked+0x46/0xb0 >>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 >>> drm_fb_helper_damage_work+0x96/0x170
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.
OK, I've did this against v6.5.5:
git log --oneline HEAD~3.. 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec" 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()" fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.
So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!
Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
I don't understand; you said reverting those DRM commits fixed the problem, so 863a8eb3f270 is the solution. No?
Hello.
On čtvrtek 5. října 2023 14:19:44 CEST Matthew Wilcox wrote:
On Thu, Oct 05, 2023 at 09:56:03AM +0200, Oleksandr Natalenko wrote:
Hello.
On čtvrtek 5. října 2023 9:44:42 CEST Thomas Zimmermann wrote:
Hi
Am 02.10.23 um 17:38 schrieb Oleksandr Natalenko:
On pondělí 2. října 2023 16:32:45 CEST Matthew Wilcox wrote:
On Mon, Oct 02, 2023 at 01:02:52PM +0200, Oleksandr Natalenko wrote:
>>>> BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 >>>> >>>> Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): >>>> drm_gem_put_pages+0x186/0x250 >>>> drm_gem_shmem_put_pages_locked+0x43/0xc0 >>>> drm_gem_shmem_object_vunmap+0x83/0xe0 >>>> drm_gem_vunmap_unlocked+0x46/0xb0 >>>> drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 >>>> drm_fb_helper_damage_work+0x96/0x170
Matthew, before I start dancing around, do you think ^^ could have the same cause as 0b62af28f249b9c4036a05acfb053058dc02e2e2 which got fixed by 863a8eb3f27098b42772f668e3977ff4cae10b04?
Yes, entirely plausible. I think you have two useful points to look at before delving into a full bisect -- 863a8e and the parent of 0b62af. If either of them work, I think you have no more work to do.
OK, I've did this against v6.5.5:
git log --oneline HEAD~3.. 7c1e7695ca9b8 (HEAD -> test) Revert "mm: remove struct pagevec" 8f2ad53b6eac6 Revert "mm: remove check_move_unevictable_pages()" fa1e3c0b5453c Revert "drm: convert drm_gem_put_pages() to use a folio_batch"
then rebooted the host multiple times, and the issue is not seen any more.
So I guess 3291e09a463870610b8227f32b16b19a587edf33 is the culprit.
Ignore my other email. It's apparently been fixed already. Thanks!
Has it? I think I was able to identify offending commit, but I'm not aware of any fix to that.
I don't understand; you said reverting those DRM commits fixed the problem, so 863a8eb3f270 is the solution. No?
No-no, sorry for possible confusion. Let me explain again:
1. we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270 2. now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied 3. I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.
Hope this gets clear.
Thanks.
On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:
No-no, sorry for possible confusion. Let me explain again:
- we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
- now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
- I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.
Thank you! Sorry about the misunderstanding. Try this:
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 6129b89bb366..44a948b80ee1 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj) struct page **pages; struct folio *folio; struct folio_batch fbatch; - int i, j, npages; + long i, j, npages;
if (WARN_ON(!obj->filp)) return ERR_PTR(-EINVAL); @@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj)
i = 0; while (i < npages) { + long nr; folio = shmem_read_folio_gfp(mapping, i, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) goto fail; - for (j = 0; j < folio_nr_pages(folio); j++, i++) + nr = min(npages - i, folio_nr_pages(folio)); + for (j = 0; j < nr; j++, i++) pages[i] = folio_file_page(folio, i);
/* Make sure shmem keeps __GFP_DMA32 allocated pages in the
On čtvrtek 5. října 2023 15:05:27 CEST Matthew Wilcox wrote:
On Thu, Oct 05, 2023 at 02:30:55PM +0200, Oleksandr Natalenko wrote:
No-no, sorry for possible confusion. Let me explain again:
- we had an issue with i915, which was introduced by 0b62af28f249, and later was fixed by 863a8eb3f270
- now I've discovered another issue, which looks very similar to 1., but in a VM with Cirrus VGA, and it happens even while having 863a8eb3f270 applied
- I've tried reverting 3291e09a4638, after which I cannot reproduce the issue with Cirrus VGA, but clearly there was no fix for it discussed
IOW, 863a8eb3f270 is the fix for 0b62af28f249, but not for 3291e09a4638. It looks like 3291e09a4638 requires a separate fix.
Thank you! Sorry about the misunderstanding. Try this:
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 6129b89bb366..44a948b80ee1 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -540,7 +540,7 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj) struct page **pages; struct folio *folio; struct folio_batch fbatch;
- int i, j, npages;
- long i, j, npages;
if (WARN_ON(!obj->filp)) return ERR_PTR(-EINVAL); @@ -564,11 +564,13 @@ struct page **drm_gem_get_pages(struct drm_gem_object *obj) i = 0; while (i < npages) {
folio = shmem_read_folio_gfp(mapping, i, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) goto fail;long nr;
for (j = 0; j < folio_nr_pages(folio); j++, i++)
nr = min(npages - i, folio_nr_pages(folio));
for (j = 0; j < nr; j++, i++) pages[i] = folio_file_page(folio, i);
/* Make sure shmem keeps __GFP_DMA32 allocated pages in the
No issues after five reboots with this patch applied on top of v6.5.5.
Reported-by: Oleksandr Natalenko oleksandr@natalenko.name Tested-by: Oleksandr Natalenko oleksandr@natalenko.name Link: https://lore.kernel.org/lkml/13360591.uLZWGnKmhe@natalenko.name/ Fixes: 3291e09a4638 ("drm: convert drm_gem_put_pages() to use a folio_batch") Cc: stable@vger.kernel.org # 6.5.x
Thank you!
On Sun, Oct 01, 2023 at 06:32:34PM +0200, Oleksandr Natalenko wrote:
Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
Thanks for the regression report. I'm adding it to regzbot:
#regzbot ^introduced: v6.4..v6.5
Hi
Am 01.10.23 um 18:32 schrieb Oleksandr Natalenko:
Hello.
I've got a VM from a cloud provider, and since v6.5 I observe the following kfence splat in dmesg during boot:
BUG: KFENCE: memory corruption in drm_gem_put_pages+0x186/0x250 Corrupted memory at 0x00000000e173a294 [ ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ] (in kfence-#108): drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 kfence-#108: 0x00000000cda343af-0x00000000aec2c095, size=3072, cache=kmalloc-4k allocated by task 51 on cpu 0 at 14.668667s: drm_gem_get_pages+0x94/0x2b0 drm_gem_shmem_get_pages+0x5d/0x110 drm_gem_shmem_object_vmap+0xc4/0x1e0 drm_gem_vmap_unlocked+0x3c/0x70 drm_client_buffer_vmap+0x23/0x50 drm_fbdev_generic_helper_fb_dirty+0xae/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 freed by task 51 on cpu 0 at 14.668697s: drm_gem_put_pages+0x186/0x250 drm_gem_shmem_put_pages_locked+0x43/0xc0 drm_gem_shmem_object_vunmap+0x83/0xe0 drm_gem_vunmap_unlocked+0x46/0xb0 drm_fbdev_generic_helper_fb_dirty+0x1dc/0x310 drm_fb_helper_damage_work+0x96/0x170 process_one_work+0x254/0x470 worker_thread+0x55/0x4f0 kthread+0xe8/0x120 ret_from_fork+0x34/0x50 ret_from_fork_asm+0x1b/0x30 CPU: 0 PID: 51 Comm: kworker/0:2 Not tainted 6.5.0-pf4 #1 8b557a4173114d86eef7240f7a080080cfc4617e Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 Workqueue: events drm_fb_helper_damage_work
This repeats a couple of times and then stops.
Currently, I'm running v6.5.5. So far, there's no impact on how VM functions for me.
The VGA adapter is as follows: 00:02.0 VGA compatible controller: Cirrus Logic GD 5446
There's nothing special about the cirrus driver. Can you please provide the full output of 'lspci -v' ?
Would you be able to bisect this bug?
Best regards Thomas
Please check.
Thanks.
linaro-mm-sig@lists.linaro.org