From: Rob Clark robdclark@chromium.org
Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.
I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.
Original description:
In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers
[1] https://patchwork.freedesktop.org/series/90331/
v1: https://patchwork.freedesktop.org/series/93035/ v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence
Rob Clark (5): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support drm/msm: Add deadline based boost support
drivers/dma-buf/dma-fence.c | 20 +++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 ++++++++++++ drivers/gpu/drm/drm_vblank.c | 31 ++++++++++ drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ drivers/gpu/drm/scheduler/sched_fence.c | 25 ++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 + include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 6 ++ include/linux/dma-fence.h | 16 ++++++ 12 files changed, 255 insertions(+)
From: Rob Clark robdclark@chromium.org
Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.
v2: Drop dma_fence::deadline and related logic to filter duplicate deadlines, to avoid increasing dma_fence size. The fence-context implementation will need similar logic to track deadlines of all the fences on the same timeline. [ckoenig]
Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/dma-buf/dma-fence.c | 20 ++++++++++++++++++++ include/linux/dma-fence.h | 16 ++++++++++++++++ 2 files changed, 36 insertions(+)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..1f444863b94d 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);
+ +/** + * dma_fence_set_deadline - set desired fence-wait deadline + * @fence: the fence that is to be waited on + * @deadline: the time by which the waiter hopes for the fence to be + * signaled + * + * Inform the fence signaler of an upcoming deadline, such as vblank, by + * which point the waiter would prefer the fence to be signaled by. This + * is intended to give feedback to the fence signaler to aid in power + * management decisions, such as boosting GPU frequency if a periodic + * vblank deadline is approaching. + */ +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) + fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline); + /** * dma_fence_init - Initialize a custom fence. * @fence: the fence to initialize diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..9c809f0d5d0a 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ };
@@ -261,6 +262,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size); + + /** + * @set_deadline: + * + * Callback to allow a fence waiter to inform the fence signaler of an + * upcoming deadline, such as vblank, by which point the waiter would + * prefer the fence to be signaled by. This is intended to give feedback + * to the fence signaler to aid in power management decisions, such as + * boosting GPU frequency. + * + * This callback is optional. + */ + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); };
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -586,6 +600,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline); + struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.
v2: Drop dma_fence::deadline and related logic to filter duplicate deadlines, to avoid increasing dma_fence size. The fence-context implementation will need similar logic to track deadlines of all the fences on the same timeline. [ckoenig]
Signed-off-by: Rob Clark robdclark@chromium.org
Reviewed-by: Christian König christian.koenig@amd.com
drivers/dma-buf/dma-fence.c | 20 ++++++++++++++++++++ include/linux/dma-fence.h | 16 ++++++++++++++++ 2 files changed, 36 insertions(+)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..1f444863b94d 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);
+/**
- dma_fence_set_deadline - set desired fence-wait deadline
- @fence: the fence that is to be waited on
- @deadline: the time by which the waiter hopes for the fence to be
signaled
- Inform the fence signaler of an upcoming deadline, such as vblank, by
- which point the waiter would prefer the fence to be signaled by. This
- is intended to give feedback to the fence signaler to aid in power
- management decisions, such as boosting GPU frequency if a periodic
- vblank deadline is approaching.
- */
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
- if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
fence->ops->set_deadline(fence, deadline);
+} +EXPORT_SYMBOL(dma_fence_set_deadline);
- /**
- dma_fence_init - Initialize a custom fence.
- @fence: the fence to initialize
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..9c809f0d5d0a 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
- DMA_FENCE_FLAG_HAS_DEADLINE_BIT, DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ };
@@ -261,6 +262,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
- /**
* @set_deadline:
*
* Callback to allow a fence waiter to inform the fence signaler of an
* upcoming deadline, such as vblank, by which point the waiter would
* prefer the fence to be signaled by. This is intended to give feedback
* to the fence signaler to aid in power management decisions, such as
* boosting GPU frequency.
*
* This callback is optional.
*/
- void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); };
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -586,6 +600,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; } +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
- struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);
From: Rob Clark robdclark@chromium.org
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.
Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/gpu/drm/scheduler/sched_fence.c | 25 +++++++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 +++ include/drm/gpu_scheduler.h | 6 ++++++ 3 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 69de2c76731f..f389dca44185 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); }
+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, + ktime_t deadline) +{ + struct drm_sched_fence *fence = to_drm_sched_fence(f); + unsigned long flags; + + spin_lock_irqsave(&fence->lock, flags); + + /* If we already have an earlier deadline, keep it: */ + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) && + ktime_before(fence->deadline, deadline)) { + spin_unlock_irqrestore(&fence->lock, flags); + return; + } + + fence->deadline = deadline; + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags); + + spin_unlock_irqrestore(&fence->lock, flags); + + if (fence->parent) + dma_fence_set_deadline(fence->parent, deadline); +} + static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, @@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished, + .set_deadline = drm_sched_fence_set_deadline_finished, };
struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..3ab0900d3596 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -818,6 +818,9 @@ static int drm_sched_main(void *param)
if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence); + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, + &s_fence->finished.flags)) + dma_fence_set_deadline(fence, s_fence->deadline); r = dma_fence_add_callback(fence, &sched_job->cb, drm_sched_job_done_cb); if (r == -ENOENT) diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd009..0f08ade614ae 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -144,6 +144,12 @@ struct drm_sched_fence { */ struct dma_fence finished;
+ /** + * @deadline: deadline set on &drm_sched_fence.finished which + * potentially needs to be propagated to &drm_sched_fence.parent + */ + ktime_t deadline; + /** * @parent: the fence returned by &drm_sched_backend_ops.run_job * when scheduling the job on hardware. We signal the
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.
Signed-off-by: Rob Clark robdclark@chromium.org
drivers/gpu/drm/scheduler/sched_fence.c | 25 +++++++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 +++ include/drm/gpu_scheduler.h | 6 ++++++ 3 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 69de2c76731f..f389dca44185 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); } +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
ktime_t deadline)
+{
- struct drm_sched_fence *fence = to_drm_sched_fence(f);
- unsigned long flags;
- spin_lock_irqsave(&fence->lock, flags);
- /* If we already have an earlier deadline, keep it: */
- if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
ktime_before(fence->deadline, deadline)) {
spin_unlock_irqrestore(&fence->lock, flags);
return;
- }
- fence->deadline = deadline;
- set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
- spin_unlock_irqrestore(&fence->lock, flags);
- if (fence->parent)
dma_fence_set_deadline(fence->parent, deadline);
+}
- static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished,
- .set_deadline = drm_sched_fence_set_deadline_finished, };
struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..3ab0900d3596 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -818,6 +818,9 @@ static int drm_sched_main(void *param) if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence);
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
&s_fence->finished.flags))
dma_fence_set_deadline(fence, s_fence->deadline);
Maybe move this into a dma_sched_fence_set_parent() function.
Apart from that looks good to me.
Regards, Christian.
r = dma_fence_add_callback(fence, &sched_job->cb, drm_sched_job_done_cb); if (r == -ENOENT)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd009..0f08ade614ae 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -144,6 +144,12 @@ struct drm_sched_fence { */ struct dma_fence finished;
- /**
* @deadline: deadline set on &drm_sched_fence.finished which
* potentially needs to be propagated to &drm_sched_fence.parent
*/
- ktime_t deadline;
/** * @parent: the fence returned by &drm_sched_backend_ops.run_job * when scheduling the job on hardware. We signal the
On Mon, Aug 16, 2021 at 12:14:35PM +0200, Christian König wrote:
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.
Signed-off-by: Rob Clark robdclark@chromium.org
I guess you're already letting the compositor run at a higher gpu priority so that your deadline'd drm_sched_job isn't stuck behind the app rendering the next frame?
I'm not sure whether you wire that one up as part of the conversion to drm/sched. Without that I think we might need to ponder how we can do a prio-boost for these, e.g. within a scheduling class we pick the jobs with the nearest deadline first, before we pick others. -Daniel
drivers/gpu/drm/scheduler/sched_fence.c | 25 +++++++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 +++ include/drm/gpu_scheduler.h | 6 ++++++ 3 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 69de2c76731f..f389dca44185 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); } +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
ktime_t deadline)
+{
- struct drm_sched_fence *fence = to_drm_sched_fence(f);
- unsigned long flags;
- spin_lock_irqsave(&fence->lock, flags);
- /* If we already have an earlier deadline, keep it: */
- if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
ktime_before(fence->deadline, deadline)) {
spin_unlock_irqrestore(&fence->lock, flags);
return;
- }
- fence->deadline = deadline;
- set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
- spin_unlock_irqrestore(&fence->lock, flags);
- if (fence->parent)
dma_fence_set_deadline(fence->parent, deadline);
+}
- static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished,
- .set_deadline = drm_sched_fence_set_deadline_finished, }; struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..3ab0900d3596 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -818,6 +818,9 @@ static int drm_sched_main(void *param) if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence);
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
&s_fence->finished.flags))
dma_fence_set_deadline(fence, s_fence->deadline);
Maybe move this into a dma_sched_fence_set_parent() function.
Apart from that looks good to me.
Regards, Christian.
r = dma_fence_add_callback(fence, &sched_job->cb, drm_sched_job_done_cb); if (r == -ENOENT)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd009..0f08ade614ae 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -144,6 +144,12 @@ struct drm_sched_fence { */ struct dma_fence finished;
- /**
* @deadline: deadline set on &drm_sched_fence.finished which
* potentially needs to be propagated to &drm_sched_fence.parent
*/
- ktime_t deadline;
/** * @parent: the fence returned by &drm_sched_backend_ops.run_job * when scheduling the job on hardware. We signal the
On Mon, Aug 16, 2021 at 8:38 AM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Aug 16, 2021 at 12:14:35PM +0200, Christian König wrote:
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.
Signed-off-by: Rob Clark robdclark@chromium.org
I guess you're already letting the compositor run at a higher gpu priority so that your deadline'd drm_sched_job isn't stuck behind the app rendering the next frame?
With the scheduler conversion we do have multiple priorities (provided by scheduler) for all generations.. but not yet preemption for all generations.
But the most common use-case where we need this ends up being display composition (either fullscreen app/game or foreground app/game composited via overlay) so I haven't thought too much about the next step of boosting job priority. I might leave that to someone who already has preemption wired up ;-)
BR, -R
I'm not sure whether you wire that one up as part of the conversion to drm/sched. Without that I think we might need to ponder how we can do a prio-boost for these, e.g. within a scheduling class we pick the jobs with the nearest deadline first, before we pick others. -Daniel
drivers/gpu/drm/scheduler/sched_fence.c | 25 +++++++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 +++ include/drm/gpu_scheduler.h | 6 ++++++ 3 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 69de2c76731f..f389dca44185 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); } +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
ktime_t deadline)
+{
- struct drm_sched_fence *fence = to_drm_sched_fence(f);
- unsigned long flags;
- spin_lock_irqsave(&fence->lock, flags);
- /* If we already have an earlier deadline, keep it: */
- if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
ktime_before(fence->deadline, deadline)) {
spin_unlock_irqrestore(&fence->lock, flags);
return;
- }
- fence->deadline = deadline;
- set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
- spin_unlock_irqrestore(&fence->lock, flags);
- if (fence->parent)
dma_fence_set_deadline(fence->parent, deadline);
+}
- static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished,
- .set_deadline = drm_sched_fence_set_deadline_finished, }; struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..3ab0900d3596 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -818,6 +818,9 @@ static int drm_sched_main(void *param) if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence);
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
&s_fence->finished.flags))
dma_fence_set_deadline(fence, s_fence->deadline);
Maybe move this into a dma_sched_fence_set_parent() function.
Apart from that looks good to me.
Regards, Christian.
r = dma_fence_add_callback(fence, &sched_job->cb, drm_sched_job_done_cb); if (r == -ENOENT)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd009..0f08ade614ae 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -144,6 +144,12 @@ struct drm_sched_fence { */ struct dma_fence finished;
- /**
- @deadline: deadline set on &drm_sched_fence.finished which
- potentially needs to be propagated to &drm_sched_fence.parent
- */
- ktime_t deadline;
/** * @parent: the fence returned by &drm_sched_backend_ops.run_job * when scheduling the job on hardware. We signal the
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
On Mon, Aug 16, 2021 at 03:25:20PM -0700, Rob Clark wrote:
On Mon, Aug 16, 2021 at 8:38 AM Daniel Vetter daniel@ffwll.ch wrote:
On Mon, Aug 16, 2021 at 12:14:35PM +0200, Christian König wrote:
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.
Signed-off-by: Rob Clark robdclark@chromium.org
I guess you're already letting the compositor run at a higher gpu priority so that your deadline'd drm_sched_job isn't stuck behind the app rendering the next frame?
With the scheduler conversion we do have multiple priorities (provided by scheduler) for all generations.. but not yet preemption for all generations.
But the most common use-case where we need this ends up being display composition (either fullscreen app/game or foreground app/game composited via overlay) so I haven't thought too much about the next step of boosting job priority. I might leave that to someone who already has preemption wired up ;-)
Atm no-one, drm/sched isn't really aware that's a concept. I was more thinking of just boosting that request as a first step. Maybe within the same priority class we pick jobs with deadlines first, or something like that.
Preempting is an entire can of worms on top. -Daniel
BR, -R
I'm not sure whether you wire that one up as part of the conversion to drm/sched. Without that I think we might need to ponder how we can do a prio-boost for these, e.g. within a scheduling class we pick the jobs with the nearest deadline first, before we pick others. -Daniel
drivers/gpu/drm/scheduler/sched_fence.c | 25 +++++++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 +++ include/drm/gpu_scheduler.h | 6 ++++++ 3 files changed, 34 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 69de2c76731f..f389dca44185 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); } +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
ktime_t deadline)
+{
- struct drm_sched_fence *fence = to_drm_sched_fence(f);
- unsigned long flags;
- spin_lock_irqsave(&fence->lock, flags);
- /* If we already have an earlier deadline, keep it: */
- if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
ktime_before(fence->deadline, deadline)) {
spin_unlock_irqrestore(&fence->lock, flags);
return;
- }
- fence->deadline = deadline;
- set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
- spin_unlock_irqrestore(&fence->lock, flags);
- if (fence->parent)
dma_fence_set_deadline(fence->parent, deadline);
+}
- static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name,
@@ -138,6 +162,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished,
- .set_deadline = drm_sched_fence_set_deadline_finished, }; struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..3ab0900d3596 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -818,6 +818,9 @@ static int drm_sched_main(void *param) if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence);
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
&s_fence->finished.flags))
dma_fence_set_deadline(fence, s_fence->deadline);
Maybe move this into a dma_sched_fence_set_parent() function.
Apart from that looks good to me.
Regards, Christian.
r = dma_fence_add_callback(fence, &sched_job->cb, drm_sched_job_done_cb); if (r == -ENOENT)
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index d18af49fd009..0f08ade614ae 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -144,6 +144,12 @@ struct drm_sched_fence { */ struct dma_fence finished;
- /**
- @deadline: deadline set on &drm_sched_fence.finished which
- potentially needs to be propagated to &drm_sched_fence.parent
- */
- ktime_t deadline;
/** * @parent: the fence returned by &drm_sched_backend_ops.run_job * when scheduling the job on hardware. We signal the
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
From: Rob Clark robdclark@chromium.org
Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ 4 files changed, 117 insertions(+)
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index f2cece542c3f..67c2a96e1c85 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -8,6 +8,37 @@
#include "msm_drv.h" #include "msm_fence.h" +#include "msm_gpu.h" + +static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fence); + +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) +{ + struct msm_drm_private *priv = fctx->dev->dev_private; + return priv->gpu; +} + +static enum hrtimer_restart deadline_timer(struct hrtimer *t) +{ + struct msm_fence_context *fctx = container_of(t, + struct msm_fence_context, deadline_timer); + + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); + + return HRTIMER_NORESTART; +} + +static void deadline_work(struct kthread_work *work) +{ + struct msm_fence_context *fctx = container_of(work, + struct msm_fence_context, deadline_work); + + /* If deadline fence has already passed, nothing to do: */ + if (fence_completed(fctx, fctx->next_deadline_fence)) + return; + + msm_devfreq_boost(fctx2gpu(fctx), 2); +}
struct msm_fence_context * @@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, fctx->fenceptr = fenceptr; spin_lock_init(&fctx->spinlock);
+ hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + fctx->deadline_timer.function = deadline_timer; + + kthread_init_work(&fctx->deadline_work, deadline_work); + + fctx->next_deadline = ktime_get(); + return fctx; }
@@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) { spin_lock(&fctx->spinlock); fctx->completed_fence = max(fence, fctx->completed_fence); + if (fence_completed(fctx, fctx->next_deadline_fence)) + hrtimer_cancel(&fctx->deadline_timer); spin_unlock(&fctx->spinlock); }
@@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) return fence_completed(f->fctx, f->base.seqno); }
+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + struct msm_fence *f = to_msm_fence(fence); + struct msm_fence_context *fctx = f->fctx; + unsigned long flags; + ktime_t now; + + spin_lock_irqsave(&fctx->spinlock, flags); + now = ktime_get(); + + if (ktime_after(now, fctx->next_deadline) || + ktime_before(deadline, fctx->next_deadline)) { + fctx->next_deadline = deadline; + fctx->next_deadline_fence = + max(fctx->next_deadline_fence, (uint32_t)fence->seqno); + + /* + * Set timer to trigger boost 3ms before deadline, or + * if we are already less than 3ms before the deadline + * schedule boost work immediately. + */ + deadline = ktime_sub(deadline, ms_to_ktime(3)); + + if (ktime_after(now, deadline)) { + kthread_queue_work(fctx2gpu(fctx)->worker, + &fctx->deadline_work); + } else { + hrtimer_start(&fctx->deadline_timer, deadline, + HRTIMER_MODE_ABS); + } + } + + spin_unlock_irqrestore(&fctx->spinlock, flags); +} + static const struct dma_fence_ops msm_fence_ops = { .get_driver_name = msm_fence_get_driver_name, .get_timeline_name = msm_fence_get_timeline_name, .signaled = msm_fence_signaled, + .set_deadline = msm_fence_set_deadline, };
struct dma_fence * diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h index 4783db528bcc..d34e853c555a 100644 --- a/drivers/gpu/drm/msm/msm_fence.h +++ b/drivers/gpu/drm/msm/msm_fence.h @@ -50,6 +50,26 @@ struct msm_fence_context { volatile uint32_t *fenceptr;
spinlock_t spinlock; + + /* + * TODO this doesn't really deal with multiple deadlines, like + * if userspace got multiple frames ahead.. OTOH atomic updates + * don't queue, so maybe that is ok + */ + + /** next_deadline: Time of next deadline */ + ktime_t next_deadline; + + /** + * next_deadline_fence: + * + * Fence value for next pending deadline. The deadline timer is + * canceled when this fence is signaled. + */ + uint32_t next_deadline_fence; + + struct hrtimer deadline_timer; + struct kthread_work deadline_work; };
struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev, diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 0e4b45bff2e6..e031c9b495ed 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -425,6 +425,7 @@ void msm_devfreq_init(struct msm_gpu *gpu); void msm_devfreq_cleanup(struct msm_gpu *gpu); void msm_devfreq_resume(struct msm_gpu *gpu); void msm_devfreq_suspend(struct msm_gpu *gpu); +void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor); void msm_devfreq_active(struct msm_gpu *gpu); void msm_devfreq_idle(struct msm_gpu *gpu);
diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c index 0a1ee20296a2..8a8d7b9028a3 100644 --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c @@ -144,6 +144,26 @@ void msm_devfreq_suspend(struct msm_gpu *gpu) devfreq_suspend_device(gpu->devfreq.devfreq); }
+void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor) +{ + struct msm_gpu_devfreq *df = &gpu->devfreq; + unsigned long freq; + + /* + * Hold devfreq lock to synchronize with get_dev_status()/ + * target() callbacks + */ + mutex_lock(&df->devfreq->lock); + + freq = get_freq(gpu); + + freq *= factor; + + msm_devfreq_target(&gpu->pdev->dev, &freq, 0); + + mutex_unlock(&df->devfreq->lock); +} + void msm_devfreq_active(struct msm_gpu *gpu) { struct msm_gpu_devfreq *df = &gpu->devfreq;
The general approach seems to make sense now I think.
One minor thing which I'm missing is adding support for this to the dma_fence_array and dma_fence_chain containers.
Regards, Christian.
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.
I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.
Original description:
In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
- To continue to be able to use the atomic helpers
- To support cases where display and gpu are different drivers
[1] https://patchwork.freedesktop.org/series/90331/
v1: https://patchwork.freedesktop.org/series/93035/ v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence
Rob Clark (5): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support drm/msm: Add deadline based boost support
drivers/dma-buf/dma-fence.c | 20 +++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 ++++++++++++ drivers/gpu/drm/drm_vblank.c | 31 ++++++++++ drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ drivers/gpu/drm/scheduler/sched_fence.c | 25 ++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 + include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 6 ++ include/linux/dma-fence.h | 16 ++++++ 12 files changed, 255 insertions(+)
dma_fence_array looks simple enough, just propagate the deadline to all children.
I guess dma_fence_chain is similar (ie. fence is signalled when all children are signalled), the difference being simply that children are added dynamically?
BR, -R
On Mon, Aug 16, 2021 at 3:17 AM Christian König christian.koenig@amd.com wrote:
The general approach seems to make sense now I think.
One minor thing which I'm missing is adding support for this to the dma_fence_array and dma_fence_chain containers.
Regards, Christian.
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.
I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.
Original description:
In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
- To continue to be able to use the atomic helpers
- To support cases where display and gpu are different drivers
[1] https://patchwork.freedesktop.org/series/90331/
v1: https://patchwork.freedesktop.org/series/93035/ v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence
Rob Clark (5): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support drm/msm: Add deadline based boost support
drivers/dma-buf/dma-fence.c | 20 +++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 ++++++++++++ drivers/gpu/drm/drm_vblank.c | 31 ++++++++++ drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ drivers/gpu/drm/scheduler/sched_fence.c | 25 ++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 + include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 6 ++ include/linux/dma-fence.h | 16 ++++++ 12 files changed, 255 insertions(+)
Am 17.08.21 um 00:29 schrieb Rob Clark:
dma_fence_array looks simple enough, just propagate the deadline to all children.
I guess dma_fence_chain is similar (ie. fence is signalled when all children are signalled), the difference being simply that children are added dynamically?
No, new chain nodes are always added at the top.
So when you have a dma_fence_chain as a starting point the linked nodes after it will stay the same (except for garbage collection).
The tricky part is you can't use recursion, cause that would easily exceed the kernels stack depth. So you need something similar to dma_fence_chain_signaled().
Something like this should do it:
static bool dma_fence_chain_set_deadline(struct dma_fence *fence, ktime_t deadline) { dma_fence_chain_for_each(fence, fence) { struct dma_fence_chain *chain = to_dma_fence_chain(fence); struct dma_fence *f = chain ? chain->fence : fence;
dma_fence_set_deadline(f, deadline); } }
Regards, Christian.
BR, -R
On Mon, Aug 16, 2021 at 3:17 AM Christian König christian.koenig@amd.com wrote:
The general approach seems to make sense now I think.
One minor thing which I'm missing is adding support for this to the dma_fence_array and dma_fence_chain containers.
Regards, Christian.
Am 07.08.21 um 20:37 schrieb Rob Clark:
From: Rob Clark robdclark@chromium.org
Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.
I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.
Original description:
In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.
This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:
- To continue to be able to use the atomic helpers
- To support cases where display and gpu are different drivers
[1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....
v1: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.... v2: Move filtering out of later deadlines to fence implementation to avoid increasing the size of dma_fence
Rob Clark (5): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support drm/msm: Add deadline based boost support
drivers/dma-buf/dma-fence.c | 20 +++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 ++++++++++++ drivers/gpu/drm/drm_vblank.c | 31 ++++++++++ drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ drivers/gpu/drm/scheduler/sched_fence.c | 25 ++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 + include/drm/drm_vblank.h | 1 + include/drm/gpu_scheduler.h | 6 ++ include/linux/dma-fence.h | 16 ++++++ 12 files changed, 255 insertions(+)
linaro-mm-sig@lists.linaro.org