Leonard Lausen reported an issue with suspend/resume of the sc7180 devices. Fix the WB atomic check, which caused the issue. Also make sure that DPU debugging logs are always directed to the drm_debug / DRIVER so that usual drm.debug masks work in an expected way.
Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org --- Changes in v2: - Reworked the writeback to just drop the connector->status check. - Expanded commit message for the debugging patch. - Link to v1: https://lore.kernel.org/r/20240709-dpu-fix-wb-v1-0-448348bfd4cb@linaro.org
--- Dmitry Baryshkov (2): drm/msm/dpu1: don't choke on disabling the writeback connector drm/msm/dpu: don't play tricks with debug macros
drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 14 ++------------ drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 2 files changed, 2 insertions(+), 15 deletions(-) --- base-commit: 668d33c9ff922c4590c58754ab064aaf53c387dd change-id: 20240709-dpu-fix-wb-6cd57e3eb182
Best regards,
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c index 16f144cbc0c9..8ff496082902 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL; - } else if (conn_state->connector->status != connector_status_connected) { - DPU_ERROR("connector not connected %d\n", conn_state->connector->status); - return -EINVAL; }
crtc = conn_state->crtc;
Dear Dmitry,
Thank you for the patch. Unfortunately, the patch triggers a regression with respect to DRM CRTC state handling. With the patch applied, suspending and resuming a lazor sc7180 with external display connected, looses CRTC state on resume and prevents applying a new CRTC state. Without the patch, CRTC state is preserved across suspend and resume and it remains possible to change CRTC settings after resume. This means the patch regresses the user experience, preventing "Night Light" mode to work as expected. I've validated this on v6.10.2 vs. v6.10.2 with this patch applied.
While the cause for the bug uncovered by this change is likely separate, given it's impact, would it be prudent to delay the application of this patch until the related bug is identified and fixed? Otherwise we would be fixing a dmesg error message "[dpu error]connector not connected 3" that appears to do no harm but thereby break more critical user visible behavior.
Best regards Leonard
On 8/2/24 15:47, Dmitry Baryshkov wrote:
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c index 16f144cbc0c9..8ff496082902 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
}return -EINVAL;
crtc = conn_state->crtc;
On August 5, 2024 9:27:39 AM GMT+07:00, Leonard Lausen leonard@lausen.nl wrote:
Dear Dmitry,
Thank you for the patch. Unfortunately, the patch triggers a regression with respect to DRM CRTC state handling. With the patch applied, suspending and resuming a lazor sc7180 with external display connected, looses CRTC state on resume and prevents applying a new CRTC state. Without the patch, CRTC state is preserved across suspend and resume and it remains possible to change CRTC settings after resume. This means the patch regresses the user experience, preventing "Night Light" mode to work as expected. I've validated this on v6.10.2 vs. v6.10.2 with this patch applied.
Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message.
While the cause for the bug uncovered by this change is likely separate, given it's impact, would it be prudent to delay the application of this patch until the related bug is identified and fixed? Otherwise we would be fixing a dmesg error message "[dpu error]connector not connected 3" that appears to do no harm but thereby break more critical user visible behavior.
Best regards Leonard
On 8/2/24 15:47, Dmitry Baryshkov wrote:
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c index 16f144cbc0c9..8ff496082902 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
}return -EINVAL;
crtc = conn_state->crtc;
On 8/7/24 06:44, Dmitry Baryshkov wrote:> Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message.
71174f362d67 specifically, or v6.9 more broadly regress in that we get "[dpu error]connector not connected 3" and "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" if suspending and resuming the system while external display is connected over USB-C DP. Suspend and resume itself still works, and the external display also works after the resume, albeit perhaps with a small delay due to the dpu error. This is also mentioned in the issue description of https://gitlab.freedesktop.org/drm/msm/-/issues/57. So while suspend/resume isn't fully broken, the error is still unexpected and I thus bisected and identified 71174f362d67 as the first commit to trigger it. While your patch avoids the dpu/drm error, it triggers issue with the CRTC state, breaking the CRTC functionality after resume.
Might we be facing a race condition here, which is accidentally exposed by 71174f362d67 but requires a separate fix?
On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote:
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c index 16f144cbc0c9..8ff496082902 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
}return -EINVAL;
For this issue, do we hit the connector->force = DRM_FORCE_OFF path?
Because otherwise, writeback does not implement .detect() callback today so its always connected.
But if that was the case how come this error is only for writeback. Even DP has the same connected check in atomic_check()
Change seems fine with me because ideally this seems like a no-op to me because writeback connector is assumed to be always connected but the issue is missing some details here.
crtc = conn_state->crtc;
On August 6, 2024 2:19:46 AM GMT+07:00, Abhinav Kumar quic_abhinavk@quicinc.com wrote:
On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote:
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c") Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57 Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c index 16f144cbc0c9..8ff496082902 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector, if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
}return -EINVAL;
For this issue, do we hit the connector->force = DRM_FORCE_OFF path?
It was hit during the suspend/resume, so yes, it is a forced off, but by the different means.
Because otherwise, writeback does not implement .detect() callback today so its always connected.
It is undefined/unkown (3), not connected (1)
But if that was the case how come this error is only for writeback. Even DP has the same connected check in atomic_check()
Change seems fine with me because ideally this seems like a no-op to me because writeback connector is assumed to be always connected but the issue is missing some details here.
crtc = conn_state->crtc;
Dear Dmitry,
For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.
[ 1185.831970] [dpu error]connector not connected 3
I can confirm that I was seeing this exact error message as well.
if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
}return -EINVAL;
crtc = conn_state->crtc;
After applying this patch, the screen now resumes successfully, and the errors are gone.
(For future reference, I am using this kernel: https://github.com/jhovold/linux/commits/wip/x1e80100-6.11-rc5/, commit cc2cb95cc77fec43edd407c993122085fa8c2dd7.)
Tested-by: György Kurucz me@kuruczgy.com
Best regards, György
Dear György,
On 8/30/24 13:36, György Kurucz wrote:
For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.
Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.
On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
Best regards Leonard
(Resend as there was an email SPF record issue)
Dear György,
On 8/30/24 13:36, György Kurucz wrote:
For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.
Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.
On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
Best regards Leonard
Dear Leonard,
Do you observe this issue on every suspend-resume cycle?
I just did 10 suspend/resume cycles in a row to double check, and without this patch the screen never comes back (always have to switch VT back-and-forth to bring it back). The
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)
pair of error messages also consistently appears after all resumes.
Though I think e.g. Rob Clark reported that suspend/resume already works properly for him without this patch, so this experience is not universal on the Yoga Slim 7x.
On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
Unfortunately I cannot test this, as color temperature adjustments seems to be completely non-functional for me in the first place. For color temperature adjustment, I use gammastep on my machines, which uses wlr_gamma_control_unstable_v1 under the hood. It outputs the following warnings:
Warning: Zero outputs support gamma adjustment. Warning: 1/1 output(s) do not support gamma adjustment.
I haven't dug deeper into the cause yet, based on these it seems that wlroots isn't detecting the display as being gamma-adjustable in the first place.
Best regards, György
Dear György,
Do you observe this issue on every suspend-resume cycle?
I just did 10 suspend/resume cycles in a row to double check, and without this patch the screen never comes back (always have to switch VT back-and-forth to bring it back). The
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)
pair of error messages also consistently appears after all resumes.
Though I think e.g. Rob Clark reported that suspend/resume already works properly for him without this patch, so this experience is not universal on the Yoga Slim 7x.
Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue?
On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
Unfortunately I cannot test this, as color temperature adjustments seems to be completely non-functional for me in the first place. For color temperature adjustment, I use gammastep on my machines, which uses wlr_gamma_control_unstable_v1 under the hood. It outputs the following warnings:
Warning: Zero outputs support gamma adjustment. Warning: 1/1 output(s) do not support gamma adjustment.
I haven't dug deeper into the cause yet, based on these it seems that wlroots isn't detecting the display as being gamma-adjustable in the first place.
The cause is simple: Qualcomm SoCs don't implement GAMMA_LUT support. Your desktop environment needs to use Color Transform Matrix (CTM) on ARM/QCom devices. You can refer to https://bugs.kde.org/show_bug.cgi?id=455720 for further details. It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE. Gnome does not support CTM yet (https://gitlab.gnome.org/GNOME/mutter/-/issues/2318).
Best regards Leonard
Dear Leonard,
I installed KDE. First, I ran it with the my regular kernel without this patch. The first interesting thing I notice is that the screen *does* come back after resume. (The error messages are still present though.)
Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue?
Yes, at least that's what I gathered from our conversations on IRC. But with the above in mind, I now suspect that this comes down to desktop environment differences.
It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE.
Well "Night Light" seems to be even more broken under KDE. I went into System Settings, set it to "Always on night light", and tried to adjust the temperature slider. While adjusting the slider, the screen goes black, and only comes back after a few seconds. The color temperature does not change, no matter what I change the slider to. Afterwards I tried with this patch as well, but it produces the exact same behavior.
Best regards, György
On Fri, Aug 30, 2024 at 07:36:32PM +0200, György Kurucz wrote:
For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.
[ 1185.831970] [dpu error]connector not connected 3
I can confirm that I was seeing this exact error message as well.
if (!conn_state || !conn_state->connector) { DPU_ERROR("invalid connector state\n"); return -EINVAL;
- } else if (conn_state->connector->status != connector_status_connected) {
DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
}return -EINVAL;
crtc = conn_state->crtc;
After applying this patch, the screen now resumes successfully, and the errors are gone.
I'm seeing the same issue as György on the x1e80100 CRD and Lenovo ThinkPad T14s. Without this patch, the internal display fails to resume properly (switching VT brings it back) and the following errors are logged:
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
I see the same symptoms with Xorg as well as sway.
Can we please get this fixed and backported as soon as possible?
Even if there are further issues with some "Night Light" functionality on one machine, keeping this bug as workaround does not seem warranted given that it breaks basic functionality for users.
The x1e80100 is the only platform I have access to with a writeback connector, but this regression potentially affects a whole host of older platforms as well.
Johan
Hi Johan,
I'm seeing the same issue as György on the x1e80100 CRD and Lenovo ThinkPad T14s. Without this patch, the internal display fails to resume properly (switching VT brings it back) and the following errors are logged:
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
I see the same symptoms with Xorg as well as sway.
The issue of "internal display fails to resume properly (switching VT brings it back)" also affects sc7180 platform during some resumes. Do you see the issue consistently during every resume?
Can we please get this fixed and backported as soon as possible?
Even if there are further issues with some "Night Light" functionality on one machine, keeping this bug as workaround does not seem warranted given that it breaks basic functionality for users.
I suspect this is not about "further issues with some 'Night Light' functionality on one machine", but rather a more fundamental issue or race condition in the qcom DRM devices stack, that is exposed when applying this patch. With this patch applied DRM device state is lost after resume and setting the state is no longer possible. Lots of kernel errors are printed if attempting to set DRM state such as the Color Transform Matrix, when running a kernel with this patch applied. Back in July 2024 I tested this patch on top of 6.9.8 and next-20240709, observing below snippet being logged tens of times:
[drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0 [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119
Full logs are attached at https://gitlab.freedesktop.org/drm/msm/-/issues/58.
The x1e80100 is the only platform I have access to with a writeback connector, but this regression potentially affects a whole host of older platforms as well.
Have you attempted setting CTM or other DRM state when running with this patch?
Best regards Leonard
On Tue, Nov 19, 2024 at 09:33:26AM -0500, Leonard Lausen wrote:
I'm seeing the same issue as György on the x1e80100 CRD and Lenovo ThinkPad T14s. Without this patch, the internal display fails to resume properly (switching VT brings it back) and the following errors are logged:
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
I see the same symptoms with Xorg as well as sway.
The issue of "internal display fails to resume properly (switching VT brings it back)" also affects sc7180 platform during some resumes. Do you see the issue consistently during every resume?
Yes, it happens on every suspend cycle here.
I didn't notice the issue initially as fbdev does not seem to be affected, and I've been running with this patch applied to suppress the resume errors since it was posted.
Can we please get this fixed and backported as soon as possible?
Even if there are further issues with some "Night Light" functionality on one machine, keeping this bug as workaround does not seem warranted given that it breaks basic functionality for users.
I suspect this is not about "further issues with some 'Night Light' functionality on one machine", but rather a more fundamental issue or race condition in the qcom DRM devices stack, that is exposed when applying this patch. With this patch applied DRM device state is lost after resume and setting the state is no longer possible. Lots of kernel errors are printed if attempting to set DRM state such as the Color Transform Matrix, when running a kernel with this patch applied. Back in July 2024 I tested this patch on top of 6.9.8 and next-20240709, observing below snippet being logged tens of times:
[drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0 [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119
Full logs are attached at https://gitlab.freedesktop.org/drm/msm/-/issues/58.
I would not be surprised if there are further issues here, but we can't just leave things completely broken as they currently are.
The x1e80100 is the only platform I have access to with a writeback connector, but this regression potentially affects a whole host of older platforms as well.
Have you attempted setting CTM or other DRM state when running with this patch?
Nope, I just want basic suspend to work.
Johan
The issue of "internal display fails to resume properly (switching VT brings it back)" also affects sc7180 platform during some resumes. Do you see the issue consistently during every resume?
Yes, it happens on every suspend cycle here.
I didn't notice the issue initially as fbdev does not seem to be affected, and I've been running with this patch applied to suppress the resume errors since it was posted.
Ok. Then situation is worse on x1e80100 than sc7180, where the issue only occurs sporadically.
The x1e80100 is the only platform I have access to with a writeback connector, but this regression potentially affects a whole host of older platforms as well.
Have you attempted setting CTM or other DRM state when running with this patch?
Nope, I just want basic suspend to work.
Ok.
Given my previous testing and finding of this patch causing unexplained CRTC CTM regression was back in July on 6.9.8 and next-20240709, I went ahead and tested on 6.10.14, 6.11.9 and 6.12 as well. To recall, the problematic behavior observed with this patch before was that CRTC CTM state would be lost after suspend with external display attached and re-setting the state was no longer possible after resume. (The "failed to get dspp on lm 0" etc. errors mentioned in my earlier email today were not associated with CRTC state issue, but actually occur with and without this patch. Apologies for misstating, given the elapsed time):
The finding is that while 6.10.14 with this patch applied still suffers from that regression, 6.11.9 and 6.12 do not face the CRTC state regression. Therefore, whatever issue the patch uncovered in older kernels and which justified not merging it before due to regressing basic CTM functionality, is now fixed. The patch should be good to merge and backport to 6.11, but from my perspective should not be backported to older kernels unless the interaction with the DRM CRTC state issue is understood and an associated fix backported as well.
I also confirmed that the patch (still) fixes the "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" error upon resume.
Tested-by: Leonard Lausen leonard@lausen.nl # on sc7180 lazor
Thank you Leonard
On Tue, Nov 19, 2024 at 10:02:33PM -0500, Leonard Lausen wrote:
The finding is that while 6.10.14 with this patch applied still suffers from that regression, 6.11.9 and 6.12 do not face the CRTC state regression. Therefore, whatever issue the patch uncovered in older kernels and which justified not merging it before due to regressing basic CTM functionality, is now fixed. The patch should be good to merge and backport to 6.11, but from my perspective should not be backported to older kernels unless the interaction with the DRM CRTC state issue is understood and an associated fix backported as well.
Thanks for testing. The 6.9 and 6.10 stable trees are EOL and backporting to 6.11 should not cause any trouble then.
Johan
On Fri, Aug 02, 2024 at 10:47:33PM +0300, Dmitry Baryshkov wrote:
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
Please also include the follow-on resume error. I'm seeing:
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
and say something about that this can prevent displays from being enabled on resume in some setups (preferably with an explanation why if you have one).
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
I noticed that the implementation had this status check also before 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c").
Why did this not cause any trouble back then? Or is this not the right Fixes tag?
Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
Perhaps you can include mine an György's reports here too.
Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
With the above addressed:
Reviewed-by: Johan Hovold johan+linaro@kernel.org Tested-by: Johan Hovold johan+linaro@kernel.org
Johan
Hi Dmitry,
On Wed, Nov 20, 2024 at 09:39:27AM +0100, Johan Hovold wrote:
On Fri, Aug 02, 2024 at 10:47:33PM +0300, Dmitry Baryshkov wrote:
During suspend/resume process all connectors are explicitly disabled and then reenabled. However resume fails because of the connector_status check:
[ 1185.831970] [dpu error]connector not connected 3
Please also include the follow-on resume error. I'm seeing:
[dpu error]connector not connected 3 [drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
and say something about that this can prevent displays from being enabled on resume in some setups (preferably with an explanation why if you have one).
It doesn't make sense to check for the Writeback connected status (and other drivers don't perform such check), so drop the check.
Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
I noticed that the implementation had this status check also before 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c").
Why did this not cause any trouble back then? Or is this not the right Fixes tag?
Cc: stable@vger.kernel.org Reported-by: Leonard Lausen leonard@lausen.nl Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
Perhaps you can include mine an György's reports here too.
Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org
With the above addressed:
Reviewed-by: Johan Hovold johan+linaro@kernel.org Tested-by: Johan Hovold johan+linaro@kernel.org
It's been over two weeks and I'm still waiting on a reply from you.
Can you please respin the patch as suggested above so that we can get this merged ASAP to fix suspend on X1E which has been broken since at least early summer.
Johan
linux-stable-mirror@lists.linaro.org