Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 5:12 p.m., Rob Clark wrote:
On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 1:38 a.m., Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.
I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.
Original description:
In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
- To continue to be able to use the atomic helpers
- To support cases where display and gpu are different drivers
Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
I guess you mean "no effect at all *except* for fullscreen..."?
I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
I'd perhaps recommend that wayland compositors, in cases where only a single layer is changing, not try to be clever and just push the update down to the kernel.
Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Regards, Christian.
BR, -R
-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 5:12 p.m., Rob Clark wrote:
On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 1:38 a.m., Rob Clark wrote:
From: Rob Clark robdclark@chromium.org
Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.
I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.
Original description:
In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
- To continue to be able to use the atomic helpers
- To support cases where display and gpu are different drivers
Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
I guess you mean "no effect at all *except* for fullscreen..."?
I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
I'd perhaps recommend that wayland compositors, in cases where only a single layer is changing, not try to be clever and just push the update down to the kernel.
Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 5:12 p.m., Rob Clark wrote:
On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 1:38 a.m., Rob Clark wrote: > From: Rob Clark robdclark@chromium.org > > Based on discussion from a previous series[1] to add a "boost" mechanism > when, for example, vblank deadlines are missed. Instead of a boost > callback, this approach adds a way to set a deadline on the fence, by > which the waiter would like to see the fence signalled. > > I've not yet had a chance to re-work the drm/msm part of this, but > wanted to send this out as an RFC in case I don't have a chance to > finish the drm/msm part this week. > > Original description: > > In some cases, like double-buffered rendering, missing vblanks can > trick the GPU into running at a lower frequence, when really we > want to be running at a higher frequency to not miss the vblanks > in the first place. > > This is partially inspired by a trick i915 does, but implemented > via dma-fence for a couple of reasons: > > 1) To continue to be able to use the atomic helpers > 2) To support cases where display and gpu are different drivers > > [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).
I guess you mean "no effect at all *except* for fullscreen..."?
I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
I'd perhaps recommend that wayland compositors, in cases where only a single layer is changing, not try to be clever and just push the update down to the kernel.
Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
Making such decisions is the responsibility of the kernel and not userspace in my opinion.
E.g. we could for example also need to reshuffle BOs so that a BO is even scanout able. Userspace can't know about such stuff before hand because the memory usage can change at any time.
Regards, Christian.
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 5:12 p.m., Rob Clark wrote:
On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: > On 2021-07-27 1:38 a.m., Rob Clark wrote: >> From: Rob Clark robdclark@chromium.org >> >> Based on discussion from a previous series[1] to add a "boost" mechanism >> when, for example, vblank deadlines are missed. Instead of a boost >> callback, this approach adds a way to set a deadline on the fence, by >> which the waiter would like to see the fence signalled. >> >> I've not yet had a chance to re-work the drm/msm part of this, but >> wanted to send this out as an RFC in case I don't have a chance to >> finish the drm/msm part this week. >> >> Original description: >> >> In some cases, like double-buffered rendering, missing vblanks can >> trick the GPU into running at a lower frequence, when really we >> want to be running at a higher frequency to not miss the vblanks >> in the first place. >> >> This is partially inspired by a trick i915 does, but implemented >> via dma-fence for a couple of reasons: >> >> 1) To continue to be able to use the atomic helpers >> 2) To support cases where display and gpu are different drivers >> >> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... > Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly). > I guess you mean "no effect at all *except* for fullscreen..."?
I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
I'd perhaps recommend that wayland compositors, in cases where only a single layer is changing, not try to be clever and just push the update down to the kernel.
Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 5:12 p.m., Rob Clark wrote: > On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>> From: Rob Clark robdclark@chromium.org >>> >>> Based on discussion from a previous series[1] to add a "boost" mechanism >>> when, for example, vblank deadlines are missed. Instead of a boost >>> callback, this approach adds a way to set a deadline on the fence, by >>> which the waiter would like to see the fence signalled. >>> >>> I've not yet had a chance to re-work the drm/msm part of this, but >>> wanted to send this out as an RFC in case I don't have a chance to >>> finish the drm/msm part this week. >>> >>> Original description: >>> >>> In some cases, like double-buffered rendering, missing vblanks can >>> trick the GPU into running at a lower frequence, when really we >>> want to be running at a higher frequency to not miss the vblanks >>> in the first place. >>> >>> This is partially inspired by a trick i915 does, but implemented >>> via dma-fence for a couple of reasons: >>> >>> 1) To continue to be able to use the atomic helpers >>> 2) To support cases where display and gpu are different drivers >>> >>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... >> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly). >> > I guess you mean "no effect at all *except* for fullscreen..."? I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
> I'd perhaps recommend that wayland compositors, in cases where only a > single layer is changing, not try to be clever and just push the > update down to the kernel. Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
Christian.
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Thanks, pq
Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Not really. You only need to decide if you want to use the new backing store or the old one based on if the new surface is ready or not.
At AMD hardware can already do this, we just don't have an OpenGL extension for it (but maybe already in Vulkan).
Regards, Christian.
Thanks, pq
On 2021-07-28 4:30 p.m., Christian König wrote:
Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote: > At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc). > > By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features. > > For composing use cases that makes sense, but certainly not for full screen applications as far as I can see. Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Not really. You only need to decide if you want to use the new backing store or the old one based on if the new surface is ready or not.
While something like that might be a possible optimization for (probably common) cases where the new buffer does not come with other state changes which affect the output frame beyond the buffer contents, there will always be cases (at least until a Wayland compositor can fully run on the GPU, as Pekka noted somewhat jokingly :) where this needs to be handled on the CPU.
I'm currently focusing on that general case. Optimizations for special cases may follow later.
On Wed, 28 Jul 2021 16:30:13 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote: > At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc). > > By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features. > > For composing use cases that makes sense, but certainly not for full screen applications as far as I can see. Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Not really. You only need to decide if you want to use the new backing store or the old one based on if the new surface is ready or not.
Except that a window content update in Wayland must be synchronised with all the possible and arbitrary other window system state changes, that will affect how and where other windows will get drawn *this frame*, how input events are routed, and more.
But, if the window manager made sure that *only* window contents are about to change and *all* other state remains as it was, then it would be possible to let the GPU decide which frame it uses. As long as it also tells back which one it actually did, so that presentation feedback etc. can trigger the right Wayland events.
Wayland has "atomic commits" to windows, and arbitrary protocol extensions can add arbitrary state to be tracked with it. A bit like KMS properties. Even atomic commits affecting multiple windows together are a thing, and they must be latched either all or none.
So it's quite a lot of work to determine if one can allow the GPU to choose the buffer it will texture from, or not.
Thanks, pq
Am 29.07.21 um 10:23 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 16:30:13 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer: > On 2021-07-28 1:36 p.m., Christian König wrote: >> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc). >> >> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features. >> >> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see. > Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering. > > (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle) Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Not really. You only need to decide if you want to use the new backing store or the old one based on if the new surface is ready or not.
Except that a window content update in Wayland must be synchronised with all the possible and arbitrary other window system state changes, that will affect how and where other windows will get drawn *this frame*, how input events are routed, and more.
But, if the window manager made sure that *only* window contents are about to change and *all* other state remains as it was, then it would be possible to let the GPU decide which frame it uses. As long as it also tells back which one it actually did, so that presentation feedback etc. can trigger the right Wayland events.
Wayland has "atomic commits" to windows, and arbitrary protocol extensions can add arbitrary state to be tracked with it. A bit like KMS properties. Even atomic commits affecting multiple windows together are a thing, and they must be latched either all or none.
So it's quite a lot of work to determine if one can allow the GPU to choose the buffer it will texture from, or not.
But how does it then help to wait on the CPU instead?
See what I'm proposing is to either render the next state of the window or compose from the old state (including all atomic properties).
E.g. what do you do if you timeout and can't have the new window content on time? What's the fallback here?
Regards, Christian.
Thanks, pq
On Thu, 29 Jul 2021 10:43:16 +0200 Christian König christian.koenig@amd.com wrote:
Am 29.07.21 um 10:23 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 16:30:13 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:57 schrieb Pekka Paalanen:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote: > Am 28.07.21 um 15:08 schrieb Michel Dänzer: >> On 2021-07-28 1:36 p.m., Christian König wrote: >>> At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc). >>> >>> By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features. >>> >>> For composing use cases that makes sense, but certainly not for full screen applications as far as I can see. >> Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering. >> >> (Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle) > Well then let's extend the KMS API instead of hacking together workarounds in userspace. That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Not really. You only need to decide if you want to use the new backing store or the old one based on if the new surface is ready or not.
Except that a window content update in Wayland must be synchronised with all the possible and arbitrary other window system state changes, that will affect how and where other windows will get drawn *this frame*, how input events are routed, and more.
But, if the window manager made sure that *only* window contents are about to change and *all* other state remains as it was, then it would be possible to let the GPU decide which frame it uses. As long as it also tells back which one it actually did, so that presentation feedback etc. can trigger the right Wayland events.
Wayland has "atomic commits" to windows, and arbitrary protocol extensions can add arbitrary state to be tracked with it. A bit like KMS properties. Even atomic commits affecting multiple windows together are a thing, and they must be latched either all or none.
So it's quite a lot of work to determine if one can allow the GPU to choose the buffer it will texture from, or not.
But how does it then help to wait on the CPU instead?
A compositor does not "wait" literally. It would only check which state set is ready to be used, and uses the most recent set that is ready. Any state sets that are not ready are ignored and reconsidered the next time the compositor updates the screen.
Depending on which state sets are selected for a screen update, the global window manager state may be updated accordingly, before the drawing commands for the composition can be created.
See what I'm proposing is to either render the next state of the window or compose from the old state (including all atomic properties).
Yes, that's exactly how it would work. It's just that state for a window is not an independent thing, it can affect how unrelated windows are managed.
A simplified example would be two windows side by side where the resizing of one causes the other to move. You can't resize the window or move the other until the buffer with the new size is ready. Until then the compositor uses the old state.
E.g. what do you do if you timeout and can't have the new window content on time? What's the fallback here?
As there is no wait, there is no timeout either.
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
Thanks, pq
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
[SNIP]
But how does it then help to wait on the CPU instead?
A compositor does not "wait" literally. It would only check which state set is ready to be used, and uses the most recent set that is ready. Any state sets that are not ready are ignored and reconsidered the next time the compositor updates the screen.
Mhm, then I'm not understanding what Michel's changes are actually doing.
Depending on which state sets are selected for a screen update, the global window manager state may be updated accordingly, before the drawing commands for the composition can be created.
See what I'm proposing is to either render the next state of the window or compose from the old state (including all atomic properties).
Yes, that's exactly how it would work. It's just that state for a window is not an independent thing, it can affect how unrelated windows are managed.
A simplified example would be two windows side by side where the resizing of one causes the other to move. You can't resize the window or move the other until the buffer with the new size is ready. Until then the compositor uses the old state.
E.g. what do you do if you timeout and can't have the new window content on time? What's the fallback here?
As there is no wait, there is no timeout either.
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
Regards, Christian.
Thanks, pq
On 2021-07-29 12:14 p.m., Christian König wrote:
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
[SNIP]
But how does it then help to wait on the CPU instead?
A compositor does not "wait" literally. It would only check which state set is ready to be used, and uses the most recent set that is ready. Any state sets that are not ready are ignored and reconsidered the next time the compositor updates the screen.
Mhm, then I'm not understanding what Michel's changes are actually doing.
In a nutshell, my mutter MR holds back all Wayland state changes which were committed together with a new buffer (and dependent later ones) until the dma-buf file descriptors for that buffer have become readable. This is achieved by adding the fds to the main event loop (if they aren't readable already when the buffer is committed), and when they become readable, all corresponding state changes are propagated such that they will be taken into account for drawing the next frame.
Depending on which state sets are selected for a screen update, the global window manager state may be updated accordingly, before the drawing commands for the composition can be created.
See what I'm proposing is to either render the next state of the window or compose from the old state (including all atomic properties).
Yes, that's exactly how it would work. It's just that state for a window is not an independent thing, it can affect how unrelated windows are managed.
A simplified example would be two windows side by side where the resizing of one causes the other to move. You can't resize the window or move the other until the buffer with the new size is ready. Until then the compositor uses the old state.
E.g. what do you do if you timeout and can't have the new window content on time? What's the fallback here?
As there is no wait, there is no timeout either.
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
Basically, the compositor draws its output as if the new buffer and all connected Wayland state changes had not been committed yet.
On Thu, 29 Jul 2021 12:14:18 +0200 Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
No, all the other windows will continue behaving normally just like they always did. It's just that one frozen window there that won't update; it won't resize, so there is no reason to move that other window either.
Everything continues as if the frozen window never even sent anything to the compositor after its last good update.
We have a principle in Wayland: the compositor cannot afford to wait for clients, the desktop as a whole must remain responsive. So there is always a backup plan even for cases where the compositor expects the client to change something. For resizes, in a floating-window manager it's easy: just let things continue as they are; in a tiling window manager they may have a timeout after which... whatever is appropriate.
Another example: If a compositor decides to make a window maximized, it tells the client the new size and state it must have. Until the client acks that specific state change, the compositor will continue managing that window as if nothing changed. Given the asynchronous nature of Wayland, the client might even continue submitting updates non-maximized for a while, and that will go through as if the compositor didn't ask for maximized. But at some point the client acks the window state change, and from that point on if it doesn't behave like maximized state requires, it will get a protocol error and be disconnected.
Thanks, pq
Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
On Thu, 29 Jul 2021 12:14:18 +0200 Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
No, all the other windows will continue behaving normally just like they always did. It's just that one frozen window there that won't update; it won't resize, so there is no reason to move that other window either.
Everything continues as if the frozen window never even sent anything to the compositor after its last good update.
We have a principle in Wayland: the compositor cannot afford to wait for clients, the desktop as a whole must remain responsive. So there is always a backup plan even for cases where the compositor expects the client to change something. For resizes, in a floating-window manager it's easy: just let things continue as they are; in a tiling window manager they may have a timeout after which... whatever is appropriate.
Another example: If a compositor decides to make a window maximized, it tells the client the new size and state it must have. Until the client acks that specific state change, the compositor will continue managing that window as if nothing changed. Given the asynchronous nature of Wayland, the client might even continue submitting updates non-maximized for a while, and that will go through as if the compositor didn't ask for maximized. But at some point the client acks the window state change, and from that point on if it doesn't behave like maximized state requires, it will get a protocol error and be disconnected.
Yeah and all of this totally makes sense.
The problem is that not forwarding the state changes to the hardware adds a CPU round trip which is rather bad for the driver design, especially power management.
E.g. when you submit the work only after everybody becomes available the GPU becomes idle in between and might think it is a good idea to reduce clocks etc...
How about doing this instead:
1. As soon as at least one window has new committed state you submit the rendering. As far as I understand it that is already the case anyway.
2. Before starting rendering the hardware driver waits with a timeout for all the window content to become ready. The timeout is picked in a way so that we at least reach a reasonable fps. Making that depending on the maximum refresh rate of the display device sounds reasonable to me.
3a. If all windows become ready on time we draw the frame as expected. 3b. If a timeout occurs the compositor is noted of this and goes on a fallback path rendering only the content known to be ready.
4. Repeat.
This way we should be able to handle all use cases gracefully, e.g. the hanging client won't cause the server to block and when everything becomes ready on time we just render as expected.
Regards, Christian.
Thanks, pq
On Thu, 29 Jul 2021 13:43:20 +0200 Christian König christian.koenig@amd.com wrote:
Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
On Thu, 29 Jul 2021 12:14:18 +0200 Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
No, all the other windows will continue behaving normally just like they always did. It's just that one frozen window there that won't update; it won't resize, so there is no reason to move that other window either.
Everything continues as if the frozen window never even sent anything to the compositor after its last good update.
We have a principle in Wayland: the compositor cannot afford to wait for clients, the desktop as a whole must remain responsive. So there is always a backup plan even for cases where the compositor expects the client to change something. For resizes, in a floating-window manager it's easy: just let things continue as they are; in a tiling window manager they may have a timeout after which... whatever is appropriate.
Another example: If a compositor decides to make a window maximized, it tells the client the new size and state it must have. Until the client acks that specific state change, the compositor will continue managing that window as if nothing changed. Given the asynchronous nature of Wayland, the client might even continue submitting updates non-maximized for a while, and that will go through as if the compositor didn't ask for maximized. But at some point the client acks the window state change, and from that point on if it doesn't behave like maximized state requires, it will get a protocol error and be disconnected.
Yeah and all of this totally makes sense.
The problem is that not forwarding the state changes to the hardware adds a CPU round trip which is rather bad for the driver design, especially power management.
E.g. when you submit the work only after everybody becomes available the GPU becomes idle in between and might think it is a good idea to reduce clocks etc...
Everybody does not need to be available. The compositor can submit its work anyway, it just uses old state for some of the windows.
But if everybody happens to be ready before the compositor repaints, then the GPU will be idle anyway, whether the compositor looked at the buffer readyness at all or not.
Given that Wayland clients are not expected (but can if they want) to draw again until the frame callback which ensures that their previous frame is definitely going to be used on screen, this idling of GPU might happen regularly with well-behaved clients I guess?
The aim is that co-operative clients never draw a frame that will only get discarded.
How about doing this instead:
- As soon as at least one window has new committed state you submit the
rendering. As far as I understand it that is already the case anyway.
At least Weston does not work like that. Doing that means that the first client to send a new frame will lock all other client updates out of that update cycle.
Hence, a compositor usually waits until some point before the target vblank before it starts the repaint, which locks the window state in place for the frame.
Any client update could contain window state changes that prevents the GPU from choosing the content buffer to use.
- Before starting rendering the hardware driver waits with a timeout
for all the window content to become ready. The timeout is picked in a way so that we at least reach a reasonable fps. Making that depending on the maximum refresh rate of the display device sounds reasonable to me.
3a. If all windows become ready on time we draw the frame as expected. 3b. If a timeout occurs the compositor is noted of this and goes on a fallback path rendering only the content known to be ready.
Sounds like the fallback path, where the compositor's rendering is already late, would need to re-do all the rendering with an extremely tight schedule just before the KMS submission deadline. IOW, when you're not going to make it in time, you have to do even more work and ping-pong even more between CPU and GPU after being a bit late already. Is that really a good idea?
It also means the compositor cannot submit the KMS atomic commit until the GPU is done or timed out the compositing job, which is another GPU-CPU ping-pong.
- Repeat.
This way we should be able to handle all use cases gracefully, e.g. the hanging client won't cause the server to block and when everything becomes ready on time we just render as expected.
Thanks, pq
Am 29.07.21 um 14:49 schrieb Pekka Paalanen:
On Thu, 29 Jul 2021 13:43:20 +0200 Christian König christian.koenig@amd.com wrote:
Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
On Thu, 29 Jul 2021 12:14:18 +0200 Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
No, all the other windows will continue behaving normally just like they always did. It's just that one frozen window there that won't update; it won't resize, so there is no reason to move that other window either.
Everything continues as if the frozen window never even sent anything to the compositor after its last good update.
We have a principle in Wayland: the compositor cannot afford to wait for clients, the desktop as a whole must remain responsive. So there is always a backup plan even for cases where the compositor expects the client to change something. For resizes, in a floating-window manager it's easy: just let things continue as they are; in a tiling window manager they may have a timeout after which... whatever is appropriate.
Another example: If a compositor decides to make a window maximized, it tells the client the new size and state it must have. Until the client acks that specific state change, the compositor will continue managing that window as if nothing changed. Given the asynchronous nature of Wayland, the client might even continue submitting updates non-maximized for a while, and that will go through as if the compositor didn't ask for maximized. But at some point the client acks the window state change, and from that point on if it doesn't behave like maximized state requires, it will get a protocol error and be disconnected.
Yeah and all of this totally makes sense.
The problem is that not forwarding the state changes to the hardware adds a CPU round trip which is rather bad for the driver design, especially power management.
E.g. when you submit the work only after everybody becomes available the GPU becomes idle in between and might think it is a good idea to reduce clocks etc...
Everybody does not need to be available. The compositor can submit its work anyway, it just uses old state for some of the windows.
But if everybody happens to be ready before the compositor repaints, then the GPU will be idle anyway, whether the compositor looked at the buffer readyness at all or not.
Ok good point.
Given that Wayland clients are not expected (but can if they want) to draw again until the frame callback which ensures that their previous frame is definitely going to be used on screen, this idling of GPU might happen regularly with well-behaved clients I guess?
Maybe I wasn't clear what the problem is: That the GPU goes idle is expected, but it should it should just not go idle multiple times.
The aim is that co-operative clients never draw a frame that will only get discarded.
How about doing this instead:
- As soon as at least one window has new committed state you submit the
rendering. As far as I understand it that is already the case anyway.
At least Weston does not work like that. Doing that means that the first client to send a new frame will lock all other client updates out of that update cycle.
Hence, a compositor usually waits until some point before the target vblank before it starts the repaint, which locks the window state in place for the frame.
Uff, that means we have lost this game anyway.
See you get the best energy utilization if the hardware wakes up as few as possible and still get everything done.
So what happens in the case you describes is that the hardware comes out of sleep at least twice, once for the client and once for the server which is rather sub optimal.
Any client update could contain window state changes that prevents the GPU from choosing the content buffer to use.
- Before starting rendering the hardware driver waits with a timeout
for all the window content to become ready. The timeout is picked in a way so that we at least reach a reasonable fps. Making that depending on the maximum refresh rate of the display device sounds reasonable to me.
3a. If all windows become ready on time we draw the frame as expected. 3b. If a timeout occurs the compositor is noted of this and goes on a fallback path rendering only the content known to be ready.
Sounds like the fallback path, where the compositor's rendering is already late, would need to re-do all the rendering with an extremely tight schedule just before the KMS submission deadline. IOW, when you're not going to make it in time, you have to do even more work and ping-pong even more between CPU and GPU after being a bit late already. Is that really a good idea?
My idea is that both the fallback path and the normal rendering are submitted at the same time, just with a big if/then/else around it. E.g. the timeout happens on the GPU hardware and not on the CPU.
But I think that stuff is just to complicated to implement.
I want to describe once more what the ideal configuration would be: 1. When you render a frame one or more clients submit jobs to the hardware. 2. Those jobs then execute on the hardware asynchronously to the CPU. 3. At the same time the CPU prepares a composition job which takes all the window content from clients and renders a new frame. 4. This new frame gets submitted to the hardware driver as new content on the screen. 5. The hardware driver waits for all the rendering to be completed and flips the screen.
The idea is that you have only one block of activity on the hardware, e.g. something like this: _------------_______flip_-------------_____flip.....
But what happens with Wayland currently is that you end up with: _--------_______-__flip_------------___-__flip.....
Or even worse when you have multiple clients rendering at random times: _---_---_---____-__flip_---_---_---___-__flip.....
I'm actually not that of a power management guy, but it is rather obvious that this is not optimal.
Regards, Christian.
It also means the compositor cannot submit the KMS atomic commit until the GPU is done or timed out the compositing job, which is another GPU-CPU ping-pong.
- Repeat.
This way we should be able to handle all use cases gracefully, e.g. the hanging client won't cause the server to block and when everything becomes ready on time we just render as expected.
Thanks, pq
On Thu, 29 Jul 2021 15:41:09 +0200 Christian König christian.koenig@amd.com wrote:
Am 29.07.21 um 14:49 schrieb Pekka Paalanen:
On Thu, 29 Jul 2021 13:43:20 +0200 Christian König christian.koenig@amd.com wrote:
Am 29.07.21 um 13:00 schrieb Pekka Paalanen:
On Thu, 29 Jul 2021 12:14:18 +0200 Christian König ckoenig.leichtzumerken@gmail.com wrote:
Am 29.07.21 um 11:15 schrieb Pekka Paalanen:
If the app happens to be frozen (e.g. some weird bug in fence handling to make it never ready, or maybe it's just bugged itself and never drawing again), then the app is frozen, and all the rest of the desktop continues running normally without a glitch.
But that is in contradict to what you told me before.
See when the window should move but fails to draw it's new content what happens?
Are the other windows which would be affected by the move not drawn as well?
No, all the other windows will continue behaving normally just like they always did. It's just that one frozen window there that won't update; it won't resize, so there is no reason to move that other window either.
Everything continues as if the frozen window never even sent anything to the compositor after its last good update.
We have a principle in Wayland: the compositor cannot afford to wait for clients, the desktop as a whole must remain responsive. So there is always a backup plan even for cases where the compositor expects the client to change something. For resizes, in a floating-window manager it's easy: just let things continue as they are; in a tiling window manager they may have a timeout after which... whatever is appropriate.
Another example: If a compositor decides to make a window maximized, it tells the client the new size and state it must have. Until the client acks that specific state change, the compositor will continue managing that window as if nothing changed. Given the asynchronous nature of Wayland, the client might even continue submitting updates non-maximized for a while, and that will go through as if the compositor didn't ask for maximized. But at some point the client acks the window state change, and from that point on if it doesn't behave like maximized state requires, it will get a protocol error and be disconnected.
Yeah and all of this totally makes sense.
The problem is that not forwarding the state changes to the hardware adds a CPU round trip which is rather bad for the driver design, especially power management.
E.g. when you submit the work only after everybody becomes available the GPU becomes idle in between and might think it is a good idea to reduce clocks etc...
Everybody does not need to be available. The compositor can submit its work anyway, it just uses old state for some of the windows.
But if everybody happens to be ready before the compositor repaints, then the GPU will be idle anyway, whether the compositor looked at the buffer readyness at all or not.
Ok good point.
Given that Wayland clients are not expected (but can if they want) to draw again until the frame callback which ensures that their previous frame is definitely going to be used on screen, this idling of GPU might happen regularly with well-behaved clients I guess?
Maybe I wasn't clear what the problem is: That the GPU goes idle is expected, but it should it should just not go idle multiple times.
The aim is that co-operative clients never draw a frame that will only get discarded.
How about doing this instead:
- As soon as at least one window has new committed state you submit the
rendering. As far as I understand it that is already the case anyway.
At least Weston does not work like that. Doing that means that the first client to send a new frame will lock all other client updates out of that update cycle.
Hence, a compositor usually waits until some point before the target vblank before it starts the repaint, which locks the window state in place for the frame.
Uff, that means we have lost this game anyway.
See you get the best energy utilization if the hardware wakes up as few as possible and still get everything done.
So what happens in the case you describes is that the hardware comes out of sleep at least twice, once for the client and once for the server which is rather sub optimal.
I can see the point, but what do we know about its significance?
If the alternative is the first-to-win and everyone else gets postponed by another full refresh cycle, isn't that worse? It could even cause jitter rather than just "high" latency to screen.
Is there any approach that would not have either disadvantage?
Here is an analysis of why Weston does what it does right now (the new algorithm): https://ppaalanen.blogspot.com/2015/02/weston-repaint-scheduling.html
Are we talking about desktops in general, or games, or fullscreen use case?
It's not unthinkable to have a different compositor scheduling policy for outputs that happen have only one fullscreen window.
Any client update could contain window state changes that prevents the GPU from choosing the content buffer to use.
- Before starting rendering the hardware driver waits with a timeout
for all the window content to become ready. The timeout is picked in a way so that we at least reach a reasonable fps. Making that depending on the maximum refresh rate of the display device sounds reasonable to me.
3a. If all windows become ready on time we draw the frame as expected. 3b. If a timeout occurs the compositor is noted of this and goes on a fallback path rendering only the content known to be ready.
Sounds like the fallback path, where the compositor's rendering is already late, would need to re-do all the rendering with an extremely tight schedule just before the KMS submission deadline. IOW, when you're not going to make it in time, you have to do even more work and ping-pong even more between CPU and GPU after being a bit late already. Is that really a good idea?
My idea is that both the fallback path and the normal rendering are submitted at the same time, just with a big if/then/else around it. E.g. the timeout happens on the GPU hardware and not on the CPU.
So for every refresh, the compositor needs to prepare a combinatorial explosion number of possible compositions to be rendered?
Or do we have the assumption that everything we talk about here is conditional to not having any window state changes other than content change?
Remember the example where one window is pending a resize, and if/when that happens another window needs to move.
But I think that stuff is just to complicated to implement.
I want to describe once more what the ideal configuration would be:
- When you render a frame one or more clients submit jobs to the hardware.
- Those jobs then execute on the hardware asynchronously to the CPU.
- At the same time the CPU prepares a composition job which takes all
the window content from clients and renders a new frame. 4. This new frame gets submitted to the hardware driver as new content on the screen. 5. The hardware driver waits for all the rendering to be completed and flips the screen.
I believe this is what happens right now, when compositors do not take into account that client buffers might not be ready, with the problem that any client GPU job that takes ages will stall the whole desktop's refresh.
The idea is that you have only one block of activity on the hardware, e.g. something like this: _------------_______flip_-------------_____flip.....
But what happens with Wayland currently is that you end up with: _--------_______-__flip_------------___-__flip.....
Or even worse when you have multiple clients rendering at random times: _---_---_---____-__flip_---_---_---___-__flip.....
I'm actually not that of a power management guy, but it is rather obvious that this is not optimal.
Possibly, but I haven't seen anyone come up with a better plan given the constraints that Wayland window state management raises.
Seems like it all boils down to the fundamental trade-off between latency and throughput, or latency and power efficiency.
Thanks, pq
Regards, Christian.
It also means the compositor cannot submit the KMS atomic commit until the GPU is done or timed out the compositing job, which is another GPU-CPU ping-pong.
- Repeat.
This way we should be able to handle all use cases gracefully, e.g. the hanging client won't cause the server to block and when everything becomes ready on time we just render as expected.
Thanks, pq
On Wed, Jul 28, 2021 at 6:57 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Hmm, seems like we could come up with a way for a shader to figure out if a fence has signaled or not on the GPU, and then either sample from the current or previous window surface?
BR, -R
Am 28.07.21 um 17:27 schrieb Rob Clark:
On Wed, Jul 28, 2021 at 6:57 AM Pekka Paalanen ppaalanen@gmail.com wrote:
On Wed, 28 Jul 2021 15:31:41 +0200 Christian König christian.koenig@amd.com wrote:
Am 28.07.21 um 15:24 schrieb Michel Dänzer:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote: > At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc). > > By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features. > > For composing use cases that makes sense, but certainly not for full screen applications as far as I can see. Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
Yeah, that's true as well.
At least as long as nobody invents a mechanism to do this decision on the GPU instead.
That would mean putting the whole window manager into the GPU.
Hmm, seems like we could come up with a way for a shader to figure out if a fence has signaled or not on the GPU, and then either sample from the current or previous window surface?
Yeah, exactly that's my thinking as well.
Christian.
BR, -R
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-27 5:12 p.m., Rob Clark wrote: > On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>> From: Rob Clark robdclark@chromium.org >>> >>> Based on discussion from a previous series[1] to add a "boost" mechanism >>> when, for example, vblank deadlines are missed. Instead of a boost >>> callback, this approach adds a way to set a deadline on the fence, by >>> which the waiter would like to see the fence signalled. >>> >>> I've not yet had a chance to re-work the drm/msm part of this, but >>> wanted to send this out as an RFC in case I don't have a chance to >>> finish the drm/msm part this week. >>> >>> Original description: >>> >>> In some cases, like double-buffered rendering, missing vblanks can >>> trick the GPU into running at a lower frequence, when really we >>> want to be running at a higher frequency to not miss the vblanks >>> in the first place. >>> >>> This is partially inspired by a trick i915 does, but implemented >>> via dma-fence for a couple of reasons: >>> >>> 1) To continue to be able to use the atomic helpers >>> 2) To support cases where display and gpu are different drivers >>> >>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... >> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly). >> > I guess you mean "no effect at all *except* for fullscreen..."? I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.
> I'd perhaps recommend that wayland compositors, in cases where only a > single layer is changing, not try to be clever and just push the > update down to the kernel. Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.
For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.
Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
I think solving the fullscreen game case is sufficient enough forward progress to be useful. And the results I'm seeing[1] are sufficiently positive to convince me that dma-fence deadline support is the right thing to do.
But maybe the solution to make this also useful for mutter is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
[1] https://patchwork.freedesktop.org/patch/447138/
BR, -R
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark:
On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: > On 2021-07-27 5:12 p.m., Rob Clark wrote: >> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>> From: Rob Clark robdclark@chromium.org >>>> >>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>> when, for example, vblank deadlines are missed. Instead of a boost >>>> callback, this approach adds a way to set a deadline on the fence, by >>>> which the waiter would like to see the fence signalled. >>>> >>>> I've not yet had a chance to re-work the drm/msm part of this, but >>>> wanted to send this out as an RFC in case I don't have a chance to >>>> finish the drm/msm part this week. >>>> >>>> Original description: >>>> >>>> In some cases, like double-buffered rendering, missing vblanks can >>>> trick the GPU into running at a lower frequence, when really we >>>> want to be running at a higher frequency to not miss the vblanks >>>> in the first place. >>>> >>>> This is partially inspired by a trick i915 does, but implemented >>>> via dma-fence for a couple of reasons: >>>> >>>> 1) To continue to be able to use the atomic helpers >>>> 2) To support cases where display and gpu are different drivers >>>> >>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... >>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly). >>> >> I guess you mean "no effect at all *except* for fullscreen..."? > I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either. > > >> I'd perhaps recommend that wayland compositors, in cases where only a >> single layer is changing, not try to be clever and just push the >> update down to the kernel. > Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC. > > For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well. > Well, in the end, there is more than one compositor out there.. and if some wayland compositors are going this route, they can also implement the same mechanism in userspace using the sysfs that devfreq exports.
But it sounds simpler to me for the compositor to have a sort of "game mode" for fullscreen games.. I'm less worried about UI interactive workloads, boosting the GPU freq upon sudden activity after a period of inactivity seems to work reasonably well there.
At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
I think solving the fullscreen game case is sufficient enough forward progress to be useful. And the results I'm seeing[1] are sufficiently positive to convince me that dma-fence deadline support is the right thing to do.
But maybe the solution to make this also useful for mutter is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
Also mutter would need to supply the deadline, because we need to fit the rendering in still before the actual flip. So gets a bit quirky maybe ... -Daniel
[1] https://patchwork.freedesktop.org/patch/447138/
BR, -R
On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote:
Am 27.07.21 um 17:37 schrieb Rob Clark: > On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: >> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>> From: Rob Clark robdclark@chromium.org >>>>> >>>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>> callback, this approach adds a way to set a deadline on the fence, by >>>>> which the waiter would like to see the fence signalled. >>>>> >>>>> I've not yet had a chance to re-work the drm/msm part of this, but >>>>> wanted to send this out as an RFC in case I don't have a chance to >>>>> finish the drm/msm part this week. >>>>> >>>>> Original description: >>>>> >>>>> In some cases, like double-buffered rendering, missing vblanks can >>>>> trick the GPU into running at a lower frequence, when really we >>>>> want to be running at a higher frequency to not miss the vblanks >>>>> in the first place. >>>>> >>>>> This is partially inspired by a trick i915 does, but implemented >>>>> via dma-fence for a couple of reasons: >>>>> >>>>> 1) To continue to be able to use the atomic helpers >>>>> 2) To support cases where display and gpu are different drivers >>>>> >>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... >>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly). >>>> >>> I guess you mean "no effect at all *except* for fullscreen..."? >> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either. >> >> >>> I'd perhaps recommend that wayland compositors, in cases where only a >>> single layer is changing, not try to be clever and just push the >>> update down to the kernel. >> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC. >> >> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well. >> > Well, in the end, there is more than one compositor out there.. and if > some wayland compositors are going this route, they can also implement > the same mechanism in userspace using the sysfs that devfreq exports. > > But it sounds simpler to me for the compositor to have a sort of "game > mode" for fullscreen games.. I'm less worried about UI interactive > workloads, boosting the GPU freq upon sudden activity after a period > of inactivity seems to work reasonably well there. At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc).
By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features.
For composing use cases that makes sense, but certainly not for full screen applications as far as I can see.
Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
I think solving the fullscreen game case is sufficient enough forward progress to be useful. And the results I'm seeing[1] are sufficiently positive to convince me that dma-fence deadline support is the right thing to do.
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
But maybe the solution to make this also useful for mutter
It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
I was thinking in a similar direction.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
This is a bit cryptic though, can you elaborate?
Also mutter would need to supply the deadline, because we need to fit the rendering in still before the actual flip. So gets a bit quirky maybe ...
That should be fine. mutter is already keeping track of how long its rendering takes.
On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer:
On 2021-07-28 1:36 p.m., Christian König wrote: > Am 27.07.21 um 17:37 schrieb Rob Clark: >> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: >>> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>>> From: Rob Clark robdclark@chromium.org >>>>>> >>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>>> callback, this approach adds a way to set a deadline on the fence, by >>>>>> which the waiter would like to see the fence signalled. >>>>>> >>>>>> I've not yet had a chance to re-work the drm/msm part of this, but >>>>>> wanted to send this out as an RFC in case I don't have a chance to >>>>>> finish the drm/msm part this week. >>>>>> >>>>>> Original description: >>>>>> >>>>>> In some cases, like double-buffered rendering, missing vblanks can >>>>>> trick the GPU into running at a lower frequence, when really we >>>>>> want to be running at a higher frequency to not miss the vblanks >>>>>> in the first place. >>>>>> >>>>>> This is partially inspired by a trick i915 does, but implemented >>>>>> via dma-fence for a couple of reasons: >>>>>> >>>>>> 1) To continue to be able to use the atomic helpers >>>>>> 2) To support cases where display and gpu are different drivers >>>>>> >>>>>> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... >>>>> Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gno... (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly). >>>>> >>>> I guess you mean "no effect at all *except* for fullscreen..."? >>> I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either. >>> >>> >>>> I'd perhaps recommend that wayland compositors, in cases where only a >>>> single layer is changing, not try to be clever and just push the >>>> update down to the kernel. >>> Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC. >>> >>> For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well. >>> >> Well, in the end, there is more than one compositor out there.. and if >> some wayland compositors are going this route, they can also implement >> the same mechanism in userspace using the sysfs that devfreq exports. >> >> But it sounds simpler to me for the compositor to have a sort of "game >> mode" for fullscreen games.. I'm less worried about UI interactive >> workloads, boosting the GPU freq upon sudden activity after a period >> of inactivity seems to work reasonably well there. > At least AMD hardware is already capable of flipping frames on GPU events like finishing rendering (or uploading etc). > > By waiting in userspace on the CPU before send the frame to the hardware you are completely killing of such features. > > For composing use cases that makes sense, but certainly not for full screen applications as far as I can see. Even for fullscreen, the current KMS API only allows queuing a single page flip per CRTC, with no way to cancel or otherwise modify it. Therefore, a Wayland compositor has to set a deadline for the next refresh cycle, and when the deadline passes, it has to select the best buffer available for the fullscreen surface. To make sure the flip will not miss the next refresh cycle, the compositor has to pick an idle buffer. If it picks a non-idle buffer, and the pending rendering does not finish in time for vertical blank, the flip will be delayed by at least one refresh cycle, which results in visible stuttering.
(Until the deadline passes, the Wayland compositor can't even know if a previously fullscreen surface will still be fullscreen for the next refresh cycle)
Well then let's extend the KMS API instead of hacking together workarounds in userspace.
That's indeed a possible solution for the fullscreen / direct scanout case.
Not for the general compositing case though, since a compositor does not want to composite multiple output frames per display refresh cycle, so it has to make sure the one frame hits the target.
I think solving the fullscreen game case is sufficient enough forward progress to be useful. And the results I'm seeing[1] are sufficiently positive to convince me that dma-fence deadline support is the right thing to do.
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
But maybe the solution to make this also useful for mutter
It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
I was thinking in a similar direction.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
This is a bit cryptic though, can you elaborate?
So essentially when the mutter compositor guesstimator is fairly confident about the next frame's composition (recall you're keeping track of clients to estimate their usual latency or something like that), then it does a TEST_ONLY commit to check it all works and prep the rendering, but _not_ yet fire it off.
Instead it waits until all buffers complete, and if some don't, pick the previous one. Which I guess in an extreme case would mean you need a different window tree configuration and maybe different TEST_ONLY check and all that, not sure how you solve that.
Anyway, in that TEST_ONLY commit my idea is that you'd also supply all the in-fences you expect to depend upon (maybe we need an additional list of in-fences for your rendering job), plus a deadline when you want to have them done (so that there's enough time for your render job still). And the kernel then calls dma_fence_set_deadline on all of them.
Pondering this more, maybe a separate ioctl is simpler where you just supply a list of in-fences and deadlines.
The real reason I want to tie this to atomic is for priviledge checking reasons. I don't think normal userspace should have the power to set arbitrary deadlines like this - at least on i915 it will also give you a slight priority boost and stuff like that, to make sure your rendering for the current frame goes in ahead of the next frame's prep work.
So maybe just a new ioctl that does this which is limited to the current kms owner (aka drm_master)?
In i915 we also do a mild boost for when userspace waits on a buffer (assuming it's blocking the cpu), but that boost has a pretty sharp decay/cooldown to prevent abuse and keeping the gpu artificially upclocked. That really is just meant to avoid the tailspin when you have a ping-pong workload between gpu and cpu and both downclock in turn because the other side is too slow and the gpu/cpu aren't really busy enough. Until you're crawling at idle clocks getting nothing done.
I think on the windows side they "fix" this by making the clock adjustments extremely conservative and slow (except when they detect that it's a game/benchmark). Good enough for battery tests, not so great in reality. -Daniel
Also mutter would need to supply the deadline, because we need to fit the rendering in still before the actual flip. So gets a bit quirky maybe ...
That should be fine. mutter is already keeping track of how long its rendering takes.
-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer
On Thu, 29 Jul 2021 11:03:36 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-28 3:13 p.m., Christian König wrote:
Am 28.07.21 um 15:08 schrieb Michel Dänzer: > On 2021-07-28 1:36 p.m., Christian König wrote: >> Am 27.07.21 um 17:37 schrieb Rob Clark: >>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: >>>> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>>>> From: Rob Clark robdclark@chromium.org >>>>>>> >>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>>>> callback, this approach adds a way to set a deadline on the fence, by >>>>>>> which the waiter would like to see the fence signalled.
...
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
But maybe the solution to make this also useful for mutter
It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
I was thinking in a similar direction.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
This is a bit cryptic though, can you elaborate?
So essentially when the mutter compositor guesstimator is fairly confident about the next frame's composition (recall you're keeping track of clients to estimate their usual latency or something like that), then it does a TEST_ONLY commit to check it all works and prep the rendering, but _not_ yet fire it off.
Instead it waits until all buffers complete, and if some don't, pick the previous one. Which I guess in an extreme case would mean you need a different window tree configuration and maybe different TEST_ONLY check and all that, not sure how you solve that.
Anyway, in that TEST_ONLY commit my idea is that you'd also supply all the in-fences you expect to depend upon (maybe we need an additional list of in-fences for your rendering job), plus a deadline when you want to have them done (so that there's enough time for your render job still). And the kernel then calls dma_fence_set_deadline on all of them.
Pondering this more, maybe a separate ioctl is simpler where you just supply a list of in-fences and deadlines.
The real reason I want to tie this to atomic is for priviledge checking reasons. I don't think normal userspace should have the power to set arbitrary deadlines like this - at least on i915 it will also give you a slight priority boost and stuff like that, to make sure your rendering for the current frame goes in ahead of the next frame's prep work.
So maybe just a new ioctl that does this which is limited to the current kms owner (aka drm_master)?
Yeah.
Why not have a Wayland compositor *always* "set the deadlines" for the next screen update as soon as it gets the wl_surface.commit with the new buffer and fences (a simplified description of what is actually necessary to take a new window state set into use)?
The Wayland client posted the frame to the compositor, so surely it wants it ready and displayed ASAP. If we happen to have a Wayland frame queuing extension, then also take that into account when setting the deadline.
Then, *independently* of that, the compositor will choose which frames it will actually use in its composition when the time comes.
No need for any KMS atomic commit fiddling, userspace just explicitly sets the deadline on the fence and that's it. You could tie the privilege of setting deadlines to simply holding DRM master on whatever device? So the ioctl would need both the fence and any DRM device fd.
A rogue application opening a DRM device and becoming DRM master on it just to be able to abuse deadlines feels both unlikely and with insignificant consequences. It stops the obvious abuse, and if someone actually goes the extra effort, then so what.
Thanks, pq
On Thu, Jul 29, 2021 at 12:37:32PM +0300, Pekka Paalanen wrote:
On Thu, 29 Jul 2021 11:03:36 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote:
On 2021-07-28 3:13 p.m., Christian König wrote: > Am 28.07.21 um 15:08 schrieb Michel Dänzer: >> On 2021-07-28 1:36 p.m., Christian König wrote: >>> Am 27.07.21 um 17:37 schrieb Rob Clark: >>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: >>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>>>>> From: Rob Clark robdclark@chromium.org >>>>>>>> >>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>>>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>>>>> callback, this approach adds a way to set a deadline on the fence, by >>>>>>>> which the waiter would like to see the fence signalled.
...
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
But maybe the solution to make this also useful for mutter
It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
I was thinking in a similar direction.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
This is a bit cryptic though, can you elaborate?
So essentially when the mutter compositor guesstimator is fairly confident about the next frame's composition (recall you're keeping track of clients to estimate their usual latency or something like that), then it does a TEST_ONLY commit to check it all works and prep the rendering, but _not_ yet fire it off.
Instead it waits until all buffers complete, and if some don't, pick the previous one. Which I guess in an extreme case would mean you need a different window tree configuration and maybe different TEST_ONLY check and all that, not sure how you solve that.
Anyway, in that TEST_ONLY commit my idea is that you'd also supply all the in-fences you expect to depend upon (maybe we need an additional list of in-fences for your rendering job), plus a deadline when you want to have them done (so that there's enough time for your render job still). And the kernel then calls dma_fence_set_deadline on all of them.
Pondering this more, maybe a separate ioctl is simpler where you just supply a list of in-fences and deadlines.
The real reason I want to tie this to atomic is for priviledge checking reasons. I don't think normal userspace should have the power to set arbitrary deadlines like this - at least on i915 it will also give you a slight priority boost and stuff like that, to make sure your rendering for the current frame goes in ahead of the next frame's prep work.
So maybe just a new ioctl that does this which is limited to the current kms owner (aka drm_master)?
Yeah.
Why not have a Wayland compositor *always* "set the deadlines" for the next screen update as soon as it gets the wl_surface.commit with the new buffer and fences (a simplified description of what is actually necessary to take a new window state set into use)?
Yeah taht's probably best. And if the frame is scheduled (video at 24fps or whatever) you can also immediately set the deadline for that too, just a few frames later. Always minus compositor budget taken into account.
The Wayland client posted the frame to the compositor, so surely it wants it ready and displayed ASAP. If we happen to have a Wayland frame queuing extension, then also take that into account when setting the deadline.
Then, *independently* of that, the compositor will choose which frames it will actually use in its composition when the time comes.
No need for any KMS atomic commit fiddling, userspace just explicitly sets the deadline on the fence and that's it. You could tie the privilege of setting deadlines to simply holding DRM master on whatever device? So the ioctl would need both the fence and any DRM device fd.
Yeah tying that up with atomic doesn't make sense.
A rogue application opening a DRM device and becoming DRM master on it just to be able to abuse deadlines feels both unlikely and with insignificant consequences. It stops the obvious abuse, and if someone actually goes the extra effort, then so what.
With logind you can't become drm master just for lolz anymore, so I'm not worried about that. On such systems only logind has the rights to access the primary node, everyone doing headless goes through the render node.
So just limiting the deadline ioctl to current kms owner is imo perfectly good enough for a security model. -Daniel
On Thu, 29 Jul 2021 14:18:29 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 29, 2021 at 12:37:32PM +0300, Pekka Paalanen wrote:
On Thu, 29 Jul 2021 11:03:36 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote:
On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote: > On 2021-07-28 3:13 p.m., Christian König wrote: >> Am 28.07.21 um 15:08 schrieb Michel Dänzer: >>> On 2021-07-28 1:36 p.m., Christian König wrote: >>>> Am 27.07.21 um 17:37 schrieb Rob Clark: >>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: >>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>>>>>> From: Rob Clark robdclark@chromium.org >>>>>>>>> >>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>>>>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by >>>>>>>>> which the waiter would like to see the fence signalled.
...
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
But maybe the solution to make this also useful for mutter
It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
is to, once we have deadline support, extend it with an ioctl to the dma-fence fd so userspace can be the one setting the deadline.
I was thinking in a similar direction.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
This is a bit cryptic though, can you elaborate?
So essentially when the mutter compositor guesstimator is fairly confident about the next frame's composition (recall you're keeping track of clients to estimate their usual latency or something like that), then it does a TEST_ONLY commit to check it all works and prep the rendering, but _not_ yet fire it off.
Instead it waits until all buffers complete, and if some don't, pick the previous one. Which I guess in an extreme case would mean you need a different window tree configuration and maybe different TEST_ONLY check and all that, not sure how you solve that.
Anyway, in that TEST_ONLY commit my idea is that you'd also supply all the in-fences you expect to depend upon (maybe we need an additional list of in-fences for your rendering job), plus a deadline when you want to have them done (so that there's enough time for your render job still). And the kernel then calls dma_fence_set_deadline on all of them.
Pondering this more, maybe a separate ioctl is simpler where you just supply a list of in-fences and deadlines.
The real reason I want to tie this to atomic is for priviledge checking reasons. I don't think normal userspace should have the power to set arbitrary deadlines like this - at least on i915 it will also give you a slight priority boost and stuff like that, to make sure your rendering for the current frame goes in ahead of the next frame's prep work.
So maybe just a new ioctl that does this which is limited to the current kms owner (aka drm_master)?
Yeah.
Why not have a Wayland compositor *always* "set the deadlines" for the next screen update as soon as it gets the wl_surface.commit with the new buffer and fences (a simplified description of what is actually necessary to take a new window state set into use)?
Yeah taht's probably best. And if the frame is scheduled (video at 24fps or whatever) you can also immediately set the deadline for that too, just a few frames later. Always minus compositor budget taken into account.
The Wayland client posted the frame to the compositor, so surely it wants it ready and displayed ASAP. If we happen to have a Wayland frame queuing extension, then also take that into account when setting the deadline.
Then, *independently* of that, the compositor will choose which frames it will actually use in its composition when the time comes.
No need for any KMS atomic commit fiddling, userspace just explicitly sets the deadline on the fence and that's it. You could tie the privilege of setting deadlines to simply holding DRM master on whatever device? So the ioctl would need both the fence and any DRM device fd.
Yeah tying that up with atomic doesn't make sense.
A rogue application opening a DRM device and becoming DRM master on it just to be able to abuse deadlines feels both unlikely and with insignificant consequences. It stops the obvious abuse, and if someone actually goes the extra effort, then so what.
With logind you can't become drm master just for lolz anymore, so I'm not worried about that. On such systems only logind has the rights to access the primary node, everyone doing headless goes through the render node.
Mm, I hope the DRM leasing protocols don't rely on clients being able to open KMS nodes anymore... they used to at some point, I think, for the initial resource discovery before actually leasing anything.
"only logind has rights" might be a bit off still.
So just limiting the deadline ioctl to current kms owner is imo perfectly good enough for a security model.
There could be multiple DRM devices. Including VKMS. Some of them not used. The deadline setting ioctl can't guarantee the fenced buffer is going to be used on the same DRM device the ioctl was called with. Or used at all with KMS.
Anyway, even if that is not completely secure, I wouldn't think that setting deadlines can do more than change GPU job priorities and power consumption, which seem quite benign. It's enough hoops to jump through that I think it stops everything we care to stop.
Thanks, pq
On Thu, Jul 29, 2021 at 3:00 PM Pekka Paalanen ppaalanen@gmail.com wrote:
On Thu, 29 Jul 2021 14:18:29 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 29, 2021 at 12:37:32PM +0300, Pekka Paalanen wrote:
On Thu, 29 Jul 2021 11:03:36 +0200 Daniel Vetter daniel@ffwll.ch wrote:
On Thu, Jul 29, 2021 at 10:17:43AM +0200, Michel Dänzer wrote:
On 2021-07-29 9:09 a.m., Daniel Vetter wrote:
On Wed, Jul 28, 2021 at 08:34:13AM -0700, Rob Clark wrote: > On Wed, Jul 28, 2021 at 6:24 AM Michel Dänzer michel@daenzer.net wrote: >> On 2021-07-28 3:13 p.m., Christian König wrote: >>> Am 28.07.21 um 15:08 schrieb Michel Dänzer: >>>> On 2021-07-28 1:36 p.m., Christian König wrote: >>>>> Am 27.07.21 um 17:37 schrieb Rob Clark: >>>>>> On Tue, Jul 27, 2021 at 8:19 AM Michel Dänzer michel@daenzer.net wrote: >>>>>>> On 2021-07-27 5:12 p.m., Rob Clark wrote: >>>>>>>> On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote: >>>>>>>>> On 2021-07-27 1:38 a.m., Rob Clark wrote: >>>>>>>>>> From: Rob Clark robdclark@chromium.org >>>>>>>>>> >>>>>>>>>> Based on discussion from a previous series[1] to add a "boost" mechanism >>>>>>>>>> when, for example, vblank deadlines are missed. Instead of a boost >>>>>>>>>> callback, this approach adds a way to set a deadline on the fence, by >>>>>>>>>> which the waiter would like to see the fence signalled.
...
I'm not questioning that this approach helps when there's a direct chain of fences from the client to the page flip. I'm pointing out there will not always be such a chain.
> But maybe the solution to make this also useful for mutter
It's not just mutter BTW. I understand gamescope has been doing this for some time already. And there seems to be consensus among developers of Wayland compositors that this is needed, so I expect at least all the major compositors to do this longer term.
> is to, once we have deadline support, extend it with an ioctl to > the dma-fence fd so userspace can be the one setting the > deadline.
I was thinking in a similar direction.
atomic ioctl with TEST_ONLY and SET_DEADLINES? Still gives mutter the option to bail out with an old frame if it's too late?
This is a bit cryptic though, can you elaborate?
So essentially when the mutter compositor guesstimator is fairly confident about the next frame's composition (recall you're keeping track of clients to estimate their usual latency or something like that), then it does a TEST_ONLY commit to check it all works and prep the rendering, but _not_ yet fire it off.
Instead it waits until all buffers complete, and if some don't, pick the previous one. Which I guess in an extreme case would mean you need a different window tree configuration and maybe different TEST_ONLY check and all that, not sure how you solve that.
Anyway, in that TEST_ONLY commit my idea is that you'd also supply all the in-fences you expect to depend upon (maybe we need an additional list of in-fences for your rendering job), plus a deadline when you want to have them done (so that there's enough time for your render job still). And the kernel then calls dma_fence_set_deadline on all of them.
Pondering this more, maybe a separate ioctl is simpler where you just supply a list of in-fences and deadlines.
The real reason I want to tie this to atomic is for priviledge checking reasons. I don't think normal userspace should have the power to set arbitrary deadlines like this - at least on i915 it will also give you a slight priority boost and stuff like that, to make sure your rendering for the current frame goes in ahead of the next frame's prep work.
So maybe just a new ioctl that does this which is limited to the current kms owner (aka drm_master)?
Yeah.
Why not have a Wayland compositor *always* "set the deadlines" for the next screen update as soon as it gets the wl_surface.commit with the new buffer and fences (a simplified description of what is actually necessary to take a new window state set into use)?
Yeah taht's probably best. And if the frame is scheduled (video at 24fps or whatever) you can also immediately set the deadline for that too, just a few frames later. Always minus compositor budget taken into account.
The Wayland client posted the frame to the compositor, so surely it wants it ready and displayed ASAP. If we happen to have a Wayland frame queuing extension, then also take that into account when setting the deadline.
Then, *independently* of that, the compositor will choose which frames it will actually use in its composition when the time comes.
No need for any KMS atomic commit fiddling, userspace just explicitly sets the deadline on the fence and that's it. You could tie the privilege of setting deadlines to simply holding DRM master on whatever device? So the ioctl would need both the fence and any DRM device fd.
Yeah tying that up with atomic doesn't make sense.
A rogue application opening a DRM device and becoming DRM master on it just to be able to abuse deadlines feels both unlikely and with insignificant consequences. It stops the obvious abuse, and if someone actually goes the extra effort, then so what.
With logind you can't become drm master just for lolz anymore, so I'm not worried about that. On such systems only logind has the rights to access the primary node, everyone doing headless goes through the render node.
Mm, I hope the DRM leasing protocols don't rely on clients being able to open KMS nodes anymore... they used to at some point, I think, for the initial resource discovery before actually leasing anything.
Yeah I thought that was fixed with additional xrandr/wayland discovery protocols. It doesn't work anyone on systems with display/render split. I think that was just to get it all going.
"only logind has rights" might be a bit off still.
So just limiting the deadline ioctl to current kms owner is imo perfectly good enough for a security model.
There could be multiple DRM devices. Including VKMS. Some of them not used. The deadline setting ioctl can't guarantee the fenced buffer is going to be used on the same DRM device the ioctl was called with. Or used at all with KMS.
That's not a problem, fence deadline interface is cross-driver.
Anyway, even if that is not completely secure, I wouldn't think that setting deadlines can do more than change GPU job priorities and power consumption, which seem quite benign. It's enough hoops to jump through that I think it stops everything we care to stop.
Yeah. Plus with this patch set you can do this already, just need to send out an atomic flip with all the fences merged together into your in-fence slots. -Daniel
linaro-mm-sig@lists.linaro.org