On Fri, Jan 06, 2023 at 11:33:06AM -0800, Brian Norris wrote:
On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
Ok I think I was a bit slow here, and it makes sense. Except this now means we loose this check, and I'm also not sure whether we really want drivers to implement this all.
What I think we want here is a bit more:
- for the self-refresh case check that the vblank all still works
You mean, keep the WARN_ONCE(), but invert it to ensure that 'ret == 0'? I did consider that, but I don't know why I stopped.
Yeah, so that we check that vblanks keep working in the self-refresh case.
- check that drivers which use self_refresh are not using drm_atomic_helper_wait_for_vblanks(), because that would defeat the point
I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part of the common drm_atomic_helper_commit_tail*() helpers, and so it's naturally used in many cases (including Rockchip/PSR). And how does it defeat the point?
Yeah, but that's for backwards compat reasons, the much better function is drm_atomic_helper_wait_for_flip_done(). And if you go into self refresh that's really the better one.
- have a drm_crtc_vblank_off/on which take the crtc state, so they can look at the self-refresh state
And I suppose you mean this helper variant would kick off the next step (fake vblank timer)?
Yeah, I figured that's the better way to implement this since it would be driver agnostic. But rockchip is still the only driver using the self-refresh helpers, so I guess it doesn't really matter.
- fake vblanks with hrtimer, because on most hw when you turn off the crtc the vblanks are also turned off, and so your compositor would still hang. The vblank machinery already has all the code to make this happen (and if it's not all, then i915 psr code should have it).
Is a timer better than an interrupt? I'm pretty sure the vblank interrupts still can fire on Rockchip CRTC (VOP) (see also the other branch of this thread), so this isn't really necessary. (IGT vblank tests pass without hanging.) Unless you simply prefer a fake timer for some reason.
Also, I still haven't found that fake timer machinery, but maybe I just don't know what I'm looking for.
I ... didn't find it either. I'm honestly not sure whether this works for intel, or whether we do something silly like disable self-refresh when a vblank interrupt is pending :-/
- I think kunit tests for this all would be really good, it's a rather complex state machinery between modesets and vblank functionality. You can speed up the kunit tests with some really high refresh rate, which isn't possible on real hw.
Last time I tried my hand at kunit in a subsystem with no prior kunit tests, I had a miserable time and gave up. At least DRM has a few already, so maybe this wouldn't be as terrible. Perhaps I can give this a shot, but there's a chance this will kick things to the back burner far enough that I simply don't get around to it at all. (So far, I'm only addressing this because KernelCI complained.)
Nah if we dont solve this in a generic way then we don't need kunit to make sure it keeps working.
I'm also wondering why we've had this code for years and only hit issues now?
I'd guess a few reasons:
- drm_self_refresh_helper_init() is only used by one driver -- Rockchip
- Rockchip systems are most commonly either Chromebooks, or else otherwise cheap embedded things, and may not have displays at all, let alone displays with PSR
- Rockchip Chromebooks shipped with a kernel forked off of the earlier PSR support, before everything got refactored (and vblank handling regressed) for the self-refresh "helpers". They only upgraded to a newer upstream kernel within the last few months.
- AFAICT, ChromeOS user space doesn't even exercise the vblank-related ioctls, so we don't actually notice that this is "broken". I suppose it would only be IGT tests that notice.
- I fixed up various upstream PSR bugs are part of #3 [0], along the way I unborked PSR enough that KernelCI finally caught the bug. See my explanation in [1] for why the vblank bug was masked, and appeared to be a "regression" due to my more recent fixes.
Yeah I thought we had more drivers using self-refresh helpers, bot that's not the case :-/
I think new proposal from me is to just respin this patch here with our discussion all summarized (it's good to record this stuff for the next person that comes around), and the WARN_ON adjusted so it also checks that vblank interrupts keep working (per the ret value at least, it's not a real functional check). And call that good enough.
Also maybe look into switching from wait_for_vblanks to wait_for_flip_done, it's the right thing to do (see kerneldoc, it should explain things a bit). -Daniel
Brian
[0] Combined with point #2: ChromeOS would be the first serious users of the refactored PSR support. All this was needed to make it actually usable:
(2021) c4c6ef229593 drm/bridge: analogix_dp: Make PSR-exit block less (2022) ca871659ec16 drm/bridge: analogix_dp: Support PSR-exit to disable transition <--- KernelCI "blamed" this one, because PSR was less broken (2022) e54a4424925a drm/atomic: Force bridge self-refresh-exit on CRTC switch
[1] https://lore.kernel.org/dri-devel/Y6OCg9BPnJvimQLT@google.com/ Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin