Am 06.04.23 um 22:09 schrieb Greg KH:
On Thu, Apr 06, 2023 at 05:39:07PM +0200, Rainer Fiebig wrote:
Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
[CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html]
On 06.04.23 14:06, Rainer Fiebig wrote:
Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a key on the keyboard fails. However, if the PC was switched off and on again (or reset), the resume is OK. The APU is a Ryzen 5600G.
Bisecting between 6.1.21/22 turned up this:
Author: Tim Huang tim.huang@amd.com Date: Thu Mar 9 16:27:51 2023 +0800
drm/amdgpu: skip ASIC reset for APUs when go to S4 commit b589626674de94d977e81c99bf7905872b991197 upstream. For GC IP v11.0.4/11, PSP TMR need to be reserved for ASIC mode2 reset. But for S4, when psp suspend, it will destroy the TMR that fails the ASIC reset.
[...]
Reverting the commit solves the problem. Thanks.
Please try 6.1.23 and report back, because from the thread https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/ it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when go to S4" might be fixing this, which went into 6.1.23.
Yes, 6.1.23 seems OK so far.
I think, however, that rc-kernels and LTS-kernels are different matters. With a bleeding edge kernel, problems are to be expected. But an LTS-kernel is chosen for stability. And this is the second time within just a few weeks that I've been bitten by a time-consuming hibernate-bug caused by a backport of a commit in amdgpu.
So I'm asking the devs to either test their patches more thoroughly or to be a bit more conservative with what they recommend for backporting to LTS-kernels. Thanks.
Please feel free to suggest better ways to have automated tests for stuff like this, or to help provide testing for the -rc LTS/stable kernel releases.
Well, I'm afraid I can't offer a panacea or the ultimate automated quality assurance system. But for the two cases that I've encountered lately, a simple hibernate/resume would have shown that there's a problem. After all, that's how I and other users noticed it.
So I think the primary line of defence against regressions remains the developer himself who should try hard to imagine what ramifications his patch might have and test it accordingly. But I'm aware of the fact that we are all only humans.
Another idea might be to give patches that introduce new features or only minimal improvements ample time to mature in the latest stable kernel before backporting them to LTS-kernels, say three or four point-releases. Or to only backport fixes for bugs or security issues.
We can't do this alone :)
Right. For now I can't commit to testing release-candidates because of a lack in time. But I try to bisect and report problems as soon as possible so that they can be resolved quickly.
To avoid a false impression: kernelwise - and including amdgpu - I'm rather happy with the current state of affairs. Thanks to all!
Rainer Fiebig