This is a workaround for s3 hang for r7340(amdgpu). When we test s3 with r7340 on arm64 platform, graphics card will hang up, the error message are as follows: Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.599374][ 7] [ T291] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.612869][ 7] [ T291] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <si_dpm> failed -22 Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.623392][ 7] [ T291] amdgpu 0000:02:00.0: amdgpu_device_ip_late_init failed Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.630696][ 7] [ T291] amdgpu 0000:02:00.0: Fatal error during GPU init Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.637477][ 7] [ T291] [drm] amdgpu: finishing device.
Change-Id: I5048b3894c0ca9faf2f4847ddab61f9eb17b4823 Signed-off-by: Zhenneng Li lizhenneng@kylinos.cn --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3987ecb24ef4..1eced991b5b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2903,6 +2903,8 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work) container_of(work, struct amdgpu_device, delayed_init_work.work); int r;
+ mdelay(1); + r = amdgpu_ib_ring_tests(adev); if (r) DRM_ERROR("ib ring test failed (%d).\n", r);
Dear Zhenneng,
Thank you for your patch.
Am 28.03.22 um 06:05 schrieb Zhenneng Li:
This is a workaround for s3 hang for r7340(amdgpu).
Is it hanging when resuming from S3? Maybe also use the line below for the commit message summary:
drm/amdgpu: Add 1 ms delay to init handler to fix s3 resume hang
Also, please add a space before the ( in “r7340(amdgpu)”.
When we test s3 with r7340 on arm64 platform, graphics card will hang up, the error message are as follows: Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.599374][ 7] [ T291] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.612869][ 7] [ T291] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <si_dpm> failed -22 Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.623392][ 7] [ T291] amdgpu 0000:02:00.0: amdgpu_device_ip_late_init failed Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.630696][ 7] [ T291] amdgpu 0000:02:00.0: Fatal error during GPU init Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.637477][ 7] [ T291] [drm] amdgpu: finishing device.
The prefix in the beginning is not really needed. Only the stuff after `kernel: `.
Maybe also add the output of `lspci -nn -s …` for that r7340 device.
Change-Id: I5048b3894c0ca9faf2f4847ddab61f9eb17b4823
Without the Gerrit instance this belongs to, the Change-Id is of no use in the public.
Signed-off-by: Zhenneng Li lizhenneng@kylinos.cn
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3987ecb24ef4..1eced991b5b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2903,6 +2903,8 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work) container_of(work, struct amdgpu_device, delayed_init_work.work); int r;
- mdelay(1);
Wow, I wonder how long it took you to find that workaround.
r = amdgpu_ib_ring_tests(adev); if (r) DRM_ERROR("ib ring test failed (%d).\n", r);
Kind regards,
Paul
[Cc: -Jack Zhang (invalid address)
Am 28.03.22 um 09:36 schrieb Paul Menzel:
Dear Zhenneng,
Thank you for your patch.
Am 28.03.22 um 06:05 schrieb Zhenneng Li:
This is a workaround for s3 hang for r7340(amdgpu).
Is it hanging when resuming from S3? Maybe also use the line below for the commit message summary:
drm/amdgpu: Add 1 ms delay to init handler to fix s3 resume hang
Also, please add a space before the ( in “r7340(amdgpu)”.
When we test s3 with r7340 on arm64 platform, graphics card will hang up, the error message are as follows: Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.599374][ 7] [ T291] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.612869][ 7] [ T291] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP block <si_dpm> failed -22 Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.623392][ 7] [ T291] amdgpu 0000:02:00.0: amdgpu_device_ip_late_init failed Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.630696][ 7] [ T291] amdgpu 0000:02:00.0: Fatal error during GPU init Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.637477][ 7] [ T291] [drm] amdgpu: finishing device.
The prefix in the beginning is not really needed. Only the stuff after `kernel: `.
Maybe also add the output of `lspci -nn -s …` for that r7340 device.
Change-Id: I5048b3894c0ca9faf2f4847ddab61f9eb17b4823
Without the Gerrit instance this belongs to, the Change-Id is of no use in the public.
Signed-off-by: Zhenneng Li lizhenneng@kylinos.cn
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3987ecb24ef4..1eced991b5b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2903,6 +2903,8 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work) container_of(work, struct amdgpu_device, delayed_init_work.work); int r; + mdelay(1);
Wow, I wonder how long it took you to find that workaround.
r = amdgpu_ib_ring_tests(adev); if (r) DRM_ERROR("ib ring test failed (%d).\n", r);
Kind regards,
Paul
Dear 李真,
[Your mailer formatted the message oddly. Maybe configure it to use only plain text email with no HTML parts – common in Linux kernel community –, or, if not possible, switch to something else (Mozilla Thunderbird, …).]
Am 29.03.22 um 04:54 schrieb 李真能:
[…]
*日 期:*2022-03-28 15:38 *发件人:*Paul Menzel
[…]
Am 28.03.22 um 09:36 schrieb Paul Menzel:
Dear Zhenneng,
Thank you for your patch.
Am 28.03.22 um 06:05 schrieb Zhenneng Li:
This is a workaround for s3 hang for r7340(amdgpu).
Is it hanging when resuming from S3?
Yes, this func is a delayed work after init graphics card.
Thank for clarifying it.
Maybe also use the line below for the commit message summary:
drm/amdgpu: Add 1 ms delay to init handler to fix s3 resume hang
Also, please add a space before the ( in “r7340(amdgpu)”.
When we test s3 with r7340 on arm64 platform, graphics card will hang up, the error message are as follows: Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.599374][ 7] [ T291] amdgpu 0000:02:00.0: fb0: amdgpudrmfb frame buffer device Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.612869][ 7] [ T291] [drm:amdgpu_device_ip_late_init [amdgpu]] *ERROR* late_init of IP blockfailed -22 Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.623392][ 7] [ T291] amdgpu 0000:02:00.0: amdgpu_device_ip_late_init failed Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.630696][ 7] [ T291] amdgpu 0000:02:00.0: Fatal error during GPU init Mar 4 01:14:11 greatwall-GW-XXXXXX-XXX kernel: [ 1.637477][ 7] [ T291] [drm] amdgpu: finishing device.
The prefix in the beginning is not really needed. Only the stuff after `kernel: `.
Maybe also add the output of `lspci -nn -s …` for that r7340 device.
Change-Id: I5048b3894c0ca9faf2f4847ddab61f9eb17b4823
Without the Gerrit instance this belongs to, the Change-Id is of no use in the public.
Signed-off-by: Zhenneng Li
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 3987ecb24ef4..1eced991b5b2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2903,6 +2903,8 @@ static void amdgpu_device_delayed_init_work_handler(struct work_struct *work) container_of(work, struct amdgpu_device, delayed_init_work.work); int r;
- mdelay(1);
Wow, I wonder how long it took you to find that workaround.
About 3 months, I try to add this delay work(amdgpu_device_delayed_init_work_handler) from 2000ms to 2500ms, or use mb() instead of mdelay(1), but it's useless, I don't know the reason,the occurrence probability of this bug is one ten-thousandth, do you know the possible reasons?
Oh, it’s not even always reproducible. That is hard. Did you try another graphics card or another ARM board to rule out hardware specific issues?
Sorry, I do not. Maybe the developers with access to non-public datasheets and erratas know.
r = amdgpu_ib_ring_tests(adev); if (r) DRM_ERROR("ib ring test failed (%d).\n", r);
Kind regards,
Paul
linaro-mm-sig@lists.linaro.org