There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user.
To fix this it looks like we can pass use_comp_pat into emit_pte() on the src side.
There are reports of VRAM corruption in some heavy user workloads, which might be related: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4495
Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Akshata Jahagirdar akshata.jahagirdar@intel.com Cc: stable@vger.kernel.org # v6.12+ --- drivers/gpu/drm/xe/xe_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index 8f8e9fdfb2a8..16788ecf924a 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -863,7 +863,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it)) xe_res_next(&src_it, src_L0); else - emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs, + emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs || use_comp_pat, &src_it, src_L0, src);
if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it))
-----Original Message----- From: Intel-xe intel-xe-bounces@lists.freedesktop.org On Behalf Of Matthew Auld Sent: Wednesday, June 4, 2025 11:15 AM To: intel-xe@lists.freedesktop.org Cc: Ghimiray, Himal Prasad himal.prasad.ghimiray@intel.com; Thomas Hellström thomas.hellstrom@linux.intel.com; Jahagirdar, Akshata akshata.jahagirdar@intel.com; stable@vger.kernel.org Subject: [PATCH] drm/xe/bmg: fix compressed VRAM handling
There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user.
To fix this it looks like we can pass use_comp_pat into emit_pte() on the src side.
It would be better if we had more confidence here beyond "it looks like" (maybe just drop that part) and "There looks to be" (maybe "There is" instead), but if we're not comfortable making definitive statements about our compression handling, then I won't block this on some minor passive voice issues. Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com -Jonathan Cavitt
There are reports of VRAM corruption in some heavy user workloads, which might be related: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4495
Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Akshata Jahagirdar akshata.jahagirdar@intel.com Cc: stable@vger.kernel.org # v6.12+
drivers/gpu/drm/xe/xe_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index 8f8e9fdfb2a8..16788ecf924a 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -863,7 +863,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it)) xe_res_next(&src_it, src_L0); else
emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs,
emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs || use_comp_pat, &src_it, src_L0, src);
if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it)) -- 2.49.0
On 04/06/2025 19:21, Cavitt, Jonathan wrote:
-----Original Message----- From: Intel-xe intel-xe-bounces@lists.freedesktop.org On Behalf Of Matthew Auld Sent: Wednesday, June 4, 2025 11:15 AM To: intel-xe@lists.freedesktop.org Cc: Ghimiray, Himal Prasad himal.prasad.ghimiray@intel.com; Thomas Hellström thomas.hellstrom@linux.intel.com; Jahagirdar, Akshata akshata.jahagirdar@intel.com; stable@vger.kernel.org Subject: [PATCH] drm/xe/bmg: fix compressed VRAM handling
There looks to be an issue in our compression handling when the BO pages are very fragmented, where we choose to skip the identity map and instead fall back to emitting the PTEs by hand when migrating memory, such that we can hopefully do more work per blit operation. However in such a case we need to ensure the src PTEs are correctly tagged with a compression enabled PAT index on dgpu xe2+, otherwise the copy will simply treat the src memory as uncompressed, leading to corruption if the memory was compressed by the user.
To fix this it looks like we can pass use_comp_pat into emit_pte() on the src side.
It would be better if we had more confidence here beyond "it looks like" (maybe just drop that part) and "There looks to be" (maybe "There is" instead), but if we're not comfortable making definitive statements about our compression handling, then I won't block this on some minor passive voice issues.
Yeah, this was only really based on code inspection, so unclear if this was even a real issue, or whether this is even related to the user report. But once more certain of either, will update the commit message.
Reviewed-by: Jonathan Cavitt jonathan.cavitt@intel.com
Thanks.
-Jonathan Cavitt
There are reports of VRAM corruption in some heavy user workloads, which might be related: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4495
Fixes: 523f191cc0c7 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx") Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Akshata Jahagirdar akshata.jahagirdar@intel.com Cc: stable@vger.kernel.org # v6.12+
drivers/gpu/drm/xe/xe_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index 8f8e9fdfb2a8..16788ecf924a 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -863,7 +863,7 @@ struct dma_fence *xe_migrate_copy(struct xe_migrate *m, if (src_is_vram && xe_migrate_allow_identity(src_L0, &src_it)) xe_res_next(&src_it, src_L0); else
emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs,
emit_pte(m, bb, src_L0_pt, src_is_vram, copy_system_ccs || use_comp_pat, &src_it, src_L0, src);
if (dst_is_vram && xe_migrate_allow_identity(src_L0, &dst_it)) -- 2.49.0
linux-stable-mirror@lists.linaro.org