[RFC 0/4] dma-fence: Deadline awareness

List overview All Threads
Download

newer

older

[PATCH 1/3] dma-buf: nuke...

Re: [Linaro-mm-sig] [RFC 0/4]...

Rob Clark

26 Jul 2021 26 Jul '21

11:38 p.m.

From: Rob Clark robdclark@chromium.org

Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.

Original description:

In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.

This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:

1) To continue to be able to use the atomic helpers 2) To support cases where display and gpu are different drivers

[1] https://patchwork.freedesktop.org/series/90331/

Rob Clark (4): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++++++++++++++ drivers/gpu/drm/drm_vblank.c | 31 ++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_fence.c | 10 +++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 ++ include/drm/drm_vblank.h | 1 + include/linux/dma-fence.h | 17 +++++++++++ 7 files changed, 137 insertions(+)

-- 2.31.1

Show replies by date

Rob Clark

26 Jul 26 Jul

11:38 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 17 ++++++++++++++++ 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+ +/** + * dma_fence_set_deadline - set desired fence-wait deadline + * @fence: the fence that is to be waited on + * @deadline: the time by which the waiter hopes for the fence to be + * signaled + * + * Inform the fence signaler of an upcoming deadline, such as vblank, by + * which point the waiter would prefer the fence to be signaled by. This + * is intended to give feedback to the fence signaler to aid in power + * management decisions, such as boosting GPU frequency if a periodic + * vblank deadline is approaching. + */ +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + unsigned long flags; + + if (dma_fence_is_signaled(fence)) + return; + + spin_lock_irqsave(fence->lock, flags); + + /* If we already have an earlier deadline, keep it: */ + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && + ktime_before(fence->deadline, deadline)) { + spin_unlock_irqrestore(fence->lock, flags); + return; + } + + fence->deadline = deadline; + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags); + + spin_unlock_irqrestore(fence->lock, flags); + + if (fence->ops->set_deadline) + fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline); + /** * dma_fence_init - Initialize a custom fence. * @fence: the fence to initialize diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence { /* @timestamp replaced by @rcu on dma_fence_release() */ struct rcu_head rcu; }; + ktime_t deadline; u64 context; u64 seqno; unsigned long flags; @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ };

@@ -261,6 +263,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size); + + /** + * @set_deadline: + * + * Callback to allow a fence waiter to inform the fence signaler of an + * upcoming deadline, such as vblank, by which point the waiter would + * prefer the fence to be signaled by. This is intended to give feedback + * to the fence signaler to aid in power management decisions, such as + * boosting GPU frequency. + * + * This callback is optional. + */ + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); };

void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline); + struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);

-- 2.31.1

Christian König

27 Jul 27 Jul

7:11 a.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

Am 27.07.21 um 01:38 schrieb Rob Clark:

...

From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 17 ++++++++++++++++ 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
dma_fence_set_deadline - set desired fence-wait deadline

@fence: the fence that is to be waited on

@deadline: the time by which the waiter hopes for the fence to be
       signaled
Inform the fence signaler of an upcoming deadline, such as vblank, by

which point the waiter would prefer the fence to be signaled by. This

is intended to give feedback to the fence signaler to aid in power

management decisions, such as boosting GPU frequency if a periodic

vblank deadline is approaching.

*/
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
unsigned long flags;

if (dma_fence_is_signaled(fence))
return;
spin_lock_irqsave(fence->lock, flags);

/* If we already have an earlier deadline, keep it: */

if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
   ktime_before(fence->deadline, deadline)) {
spin_unlock_irqrestore(fence->lock, flags);
return;
}

fence->deadline = deadline;

set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);

spin_unlock_irqrestore(fence->lock, flags);

if (fence->ops->set_deadline)
fence->ops->set_deadline(fence, deadline);
+} +EXPORT_SYMBOL(dma_fence_set_deadline);

/**

dma_fence_init - Initialize a custom fence.

@fence: the fence to initialize

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence { /* @timestamp replaced by @rcu on dma_fence_release() */ struct rcu_head rcu; };

ktime_t deadline;

Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.

Regards, Christian.

...

u64 context; u64 seqno; unsigned long flags; @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,

DMA_FENCE_FLAG_HAS_DEADLINE_BIT, DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ };

@@ -261,6 +263,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
/**
* @set_deadline:
*
* Callback to allow a fence waiter to inform the fence signaler of an
* upcoming deadline, such as vblank, by which point the waiter would
* prefer the fence to be signaled by.  This is intended to give feedback
* to the fence signaler to aid in power management decisions, such as
* boosting GPU frequency.
*
* This callback is optional.
*/
void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); };
void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; } +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);

struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);

Rob Clark

2:25 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...

Am 27.07.21 um 01:38 schrieb Rob Clark:

...
From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 17 ++++++++++++++++ 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
dma_fence_set_deadline - set desired fence-wait deadline

@fence: the fence that is to be waited on

@deadline: the time by which the waiter hopes for the fence to be
       signaled
Inform the fence signaler of an upcoming deadline, such as vblank, by

which point the waiter would prefer the fence to be signaled by. This

is intended to give feedback to the fence signaler to aid in power

management decisions, such as boosting GPU frequency if a periodic

vblank deadline is approaching.

*/
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
unsigned long flags;
if (dma_fence_is_signaled(fence))
        return;
spin_lock_irqsave(fence->lock, flags);
/* If we already have an earlier deadline, keep it: */
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
    ktime_before(fence->deadline, deadline)) {
        spin_unlock_irqrestore(fence->lock, flags);
        return;
}
fence->deadline = deadline;
set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
spin_unlock_irqrestore(fence->lock, flags);
if (fence->ops->set_deadline)
        fence->ops->set_deadline(fence, deadline);
+} +EXPORT_SYMBOL(dma_fence_set_deadline);

/**

dma_fence_init - Initialize a custom fence.

@fence: the fence to initialize

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence { /* @timestamp replaced by @rcu on dma_fence_release() */ struct rcu_head rcu; };
ktime_t deadline;
Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.

Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?

BR, -R

...

Regards, Christian.

...
  u64 context;
  u64 seqno;
  unsigned long flags;
@@ -99,6 +100,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
};
@@ -261,6 +263,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
/**
 * @set_deadline:
 *
 * Callback to allow a fence waiter to inform the fence signaler of an
 * upcoming deadline, such as vblank, by which point the waiter would
 * prefer the fence to be signaled by.  This is intended to give feedback
 * to the fence signaler to aid in power management decisions, such as
 * boosting GPU frequency.
 *
 * This callback is optional.
 */
void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
};

void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);

struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);

Christian König

28 Jul 28 Jul

7:03 a.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

Am 27.07.21 um 16:25 schrieb Rob Clark:

...

On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 27.07.21 um 01:38 schrieb Rob Clark:

...
From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 17 ++++++++++++++++ 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
dma_fence_set_deadline - set desired fence-wait deadline

@fence: the fence that is to be waited on

@deadline: the time by which the waiter hopes for the fence to be
       signaled
Inform the fence signaler of an upcoming deadline, such as vblank, by

which point the waiter would prefer the fence to be signaled by. This

is intended to give feedback to the fence signaler to aid in power

management decisions, such as boosting GPU frequency if a periodic

vblank deadline is approaching.

*/
+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
unsigned long flags;
if (dma_fence_is_signaled(fence))
        return;
spin_lock_irqsave(fence->lock, flags);
/* If we already have an earlier deadline, keep it: */
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
    ktime_before(fence->deadline, deadline)) {
        spin_unlock_irqrestore(fence->lock, flags);
        return;
}
fence->deadline = deadline;
set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
spin_unlock_irqrestore(fence->lock, flags);
if (fence->ops->set_deadline)
        fence->ops->set_deadline(fence, deadline);
+} +EXPORT_SYMBOL(dma_fence_set_deadline);

/**

dma_fence_init - Initialize a custom fence.

@fence: the fence to initialize

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence { /* @timestamp replaced by @rcu on dma_fence_release() */ struct rcu_head rcu; };
ktime_t deadline;
Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.
Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?

I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.

Regards, Christian.

...

BR, -R

...
Regards, Christian.

...
   u64 context;
   u64 seqno;
   unsigned long flags;
@@ -99,6 +100,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
 DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
};
@@ -261,6 +263,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
/**
 * @set_deadline:
 *
 * Callback to allow a fence waiter to inform the fence signaler of an
 * upcoming deadline, such as vblank, by which point the waiter would
 * prefer the fence to be signaled by.  This is intended to give feedback
 * to the fence signaler to aid in power management decisions, such as
 * boosting GPU frequency.
 *
 * This callback is optional.
 */
void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
};

void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
@@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);

struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);

Christian König

11:37 a.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

Am 28.07.21 um 09:03 schrieb Christian König:

...

Am 27.07.21 um 16:25 schrieb Rob Clark:

...
On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 27.07.21 um 01:38 schrieb Rob Clark:

...
From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++    include/linux/dma-fence.h   | 17 ++++++++++++++++    2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count,    }    EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**

dma_fence_set_deadline - set desired fence-wait deadline

@fence:    the fence that is to be waited on

@deadline: the time by which the waiter hopes for the fence to be

*            signaled

Inform the fence signaler of an upcoming deadline, such as

vblank, by

which point the waiter would prefer the fence to be signaled

by. This

is intended to give feedback to the fence signaler to aid in power

management decisions, such as boosting GPU frequency if a periodic

vblank deadline is approaching.

*/

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ +     unsigned long flags;

+     if (dma_fence_is_signaled(fence)) +             return;

+     spin_lock_irqsave(fence->lock, flags);

+     /* If we already have an earlier deadline, keep it: */ +     if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && +         ktime_before(fence->deadline, deadline)) { +             spin_unlock_irqrestore(fence->lock, flags); +             return; +     }

+     fence->deadline = deadline; +     set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);

+     spin_unlock_irqrestore(fence->lock, flags);

+     if (fence->ops->set_deadline) +             fence->ops->set_deadline(fence, deadline); +} +EXPORT_SYMBOL(dma_fence_set_deadline);

/**     * dma_fence_init - Initialize a custom fence.     * @fence: the fence to initialize diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence {                /* @timestamp replaced by @rcu on dma_fence_release() */                struct rcu_head rcu;        }; +     ktime_t deadline;

Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.

Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?

I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.

Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

...

Regards, Christian.

...
BR, -R

...
Regards, Christian.

...
u64 context;        u64 seqno;        unsigned long flags; @@ -99,6 +100,7 @@ enum dma_fence_flag_bits {        DMA_FENCE_FLAG_SIGNALED_BIT,        DMA_FENCE_FLAG_TIMESTAMP_BIT,        DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, +     DMA_FENCE_FLAG_HAS_DEADLINE_BIT,        DMA_FENCE_FLAG_USER_BITS, /* must always be last member */    };

@@ -261,6 +263,19 @@ struct dma_fence_ops {         */        void (*timeline_value_str)(struct dma_fence *fence,                                   char *str, int size);

+     /** +      * @set_deadline: +      * +      * Callback to allow a fence waiter to inform the fence signaler of an +      * upcoming deadline, such as vblank, by which point the waiter would +      * prefer the fence to be signaled by. This is intended to give feedback +      * to the fence signaler to aid in power management decisions, such as +      * boosting GPU frequency. +      * +      * This callback is optional. +      */ +     void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);    };

void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr)        return ret < 0 ? ret : 0;    }

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);

struct dma_fence *dma_fence_get_stub(void);    struct dma_fence *dma_fence_allocate_private_stub(void);    u64 dma_fence_context_alloc(unsigned num);

Rob Clark

3:15 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...

Am 28.07.21 um 09:03 schrieb Christian König:

...
Am 27.07.21 um 16:25 schrieb Rob Clark:

...
On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 27.07.21 um 01:38 schrieb Rob Clark:

...
From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 17 ++++++++++++++++ 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
dma_fence_set_deadline - set desired fence-wait deadline

@fence: the fence that is to be waited on

@deadline: the time by which the waiter hopes for the fence to be
       signaled
Inform the fence signaler of an upcoming deadline, such as
vblank, by

which point the waiter would prefer the fence to be signaled

by. This

is intended to give feedback to the fence signaler to aid in power

management decisions, such as boosting GPU frequency if a periodic

vblank deadline is approaching.

*/

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
unsigned long flags;
if (dma_fence_is_signaled(fence))
        return;
spin_lock_irqsave(fence->lock, flags);
/* If we already have an earlier deadline, keep it: */
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
    ktime_before(fence->deadline, deadline)) {
        spin_unlock_irqrestore(fence->lock, flags);
        return;
}
fence->deadline = deadline;
set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
spin_unlock_irqrestore(fence->lock, flags);
if (fence->ops->set_deadline)
        fence->ops->set_deadline(fence, deadline);
+} +EXPORT_SYMBOL(dma_fence_set_deadline);

/**

dma_fence_init - Initialize a custom fence.

@fence: the fence to initialize

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence { /* @timestamp replaced by @rcu on dma_fence_release() */ struct rcu_head rcu; };
ktime_t deadline;
Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.
Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?
I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.

Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://patchwork.freedesktop.org/patch/447138/

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

...

Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

BR, -R

...

But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

...
Regards, Christian.

...
BR, -R

...
Regards, Christian.

...
   u64 context;
   u64 seqno;
   unsigned long flags;
@@ -99,6 +100,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
 DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
};
@@ -261,6 +263,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
/**
 * @set_deadline:
 *
 * Callback to allow a fence waiter to inform the fence
signaler of an
 * upcoming deadline, such as vblank, by which point the
waiter would
 * prefer the fence to be signaled by.  This is intended to
give feedback
 * to the fence signaler to aid in power management
decisions, such as
 * boosting GPU frequency.
 *
 * This callback is optional.
 */
void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
};

void dma_fence_init(struct dma_fence *fence, const struct
dma_fence_ops *ops, @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);

struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);

Christian König

5:23 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

Am 28.07.21 um 17:15 schrieb Rob Clark:

...

On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 28.07.21 um 09:03 schrieb Christian König:

...
Am 27.07.21 um 16:25 schrieb Rob Clark:

...
On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 27.07.21 um 01:38 schrieb Rob Clark:

...
From: Rob Clark robdclark@chromium.org

Add a way to hint to the fence signaler of an upcoming deadline, such as vblank, which the fence waiter would prefer not to miss. This is to aid the fence signaler in making power management decisions, like boosting frequency as the deadline approaches and awareness of missing deadlines so that can be factored in to the frequency scaling.

Signed-off-by: Rob Clark robdclark@chromium.org
drivers/dma-buf/dma-fence.c | 39
+++++++++++++++++++++++++++++++++++++ include/linux/dma-fence.h | 17 ++++++++++++++++ 2 files changed, 56 insertions(+)

diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index ce0f5eff575d..2e0d25ab457e 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, uint32_t count, } EXPORT_SYMBOL(dma_fence_wait_any_timeout);

+/**
dma_fence_set_deadline - set desired fence-wait deadline

@fence: the fence that is to be waited on

@deadline: the time by which the waiter hopes for the fence to be
       signaled
Inform the fence signaler of an upcoming deadline, such as
vblank, by

which point the waiter would prefer the fence to be signaled

by. This

is intended to give feedback to the fence signaler to aid in power

management decisions, such as boosting GPU frequency if a periodic

vblank deadline is approaching.

*/

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{
unsigned long flags;
if (dma_fence_is_signaled(fence))
        return;
spin_lock_irqsave(fence->lock, flags);
/* If we already have an earlier deadline, keep it: */
if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) &&
    ktime_before(fence->deadline, deadline)) {
        spin_unlock_irqrestore(fence->lock, flags);
        return;
}
fence->deadline = deadline;
set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags);
spin_unlock_irqrestore(fence->lock, flags);
if (fence->ops->set_deadline)
        fence->ops->set_deadline(fence, deadline);
+} +EXPORT_SYMBOL(dma_fence_set_deadline);

/**

dma_fence_init - Initialize a custom fence.

@fence: the fence to initialize

diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 6ffb4b2c6371..4e6cfe4e6fbc 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -88,6 +88,7 @@ struct dma_fence { /* @timestamp replaced by @rcu on dma_fence_release() */ struct rcu_head rcu; };
ktime_t deadline;
Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.
Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?
I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.
Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and have strong doubt that anybody else will implement that.

...

...
Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate memory, but must allocate a context.

Christian.

...

BR, -R

...
But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

...
Regards, Christian.

...
BR, -R

...
Regards, Christian.

...
    u64 context;
    u64 seqno;
    unsigned long flags;
@@ -99,6 +100,7 @@ enum dma_fence_flag_bits { DMA_FENCE_FLAG_SIGNALED_BIT, DMA_FENCE_FLAG_TIMESTAMP_BIT, DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
  DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
};
@@ -261,6 +263,19 @@ struct dma_fence_ops { */ void (*timeline_value_str)(struct dma_fence *fence, char *str, int size);
/**
 * @set_deadline:
 *
 * Callback to allow a fence waiter to inform the fence
signaler of an
 * upcoming deadline, such as vblank, by which point the
waiter would
 * prefer the fence to be signaled by.  This is intended to
give feedback
 * to the fence signaler to aid in power management
decisions, such as
 * boosting GPU frequency.
 *
 * This callback is optional.
 */
void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
};

void dma_fence_init(struct dma_fence *fence, const struct
dma_fence_ops *ops, @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct dma_fence *fence, bool intr) return ret < 0 ? ret : 0; }

+void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);

struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num);

Rob Clark

5:58 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Wed, Jul 28, 2021 at 10:23 AM Christian König christian.koenig@amd.com wrote:

...

Am 28.07.21 um 17:15 schrieb Rob Clark:

...
On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 28.07.21 um 09:03 schrieb Christian König:

...
Am 27.07.21 um 16:25 schrieb Rob Clark:

...
On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 27.07.21 um 01:38 schrieb Rob Clark: > From: Rob Clark robdclark@chromium.org > > Add a way to hint to the fence signaler of an upcoming deadline, > such as > vblank, which the fence waiter would prefer not to miss. This is to > aid > the fence signaler in making power management decisions, like boosting > frequency as the deadline approaches and awareness of missing > deadlines > so that can be factored in to the frequency scaling. > > Signed-off-by: Rob Clark robdclark@chromium.org > --- > drivers/dma-buf/dma-fence.c | 39 > +++++++++++++++++++++++++++++++++++++ > include/linux/dma-fence.h | 17 ++++++++++++++++ > 2 files changed, 56 insertions(+) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index ce0f5eff575d..2e0d25ab457e 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence > **fences, uint32_t count, > } > EXPORT_SYMBOL(dma_fence_wait_any_timeout); > > + > +/** > + * dma_fence_set_deadline - set desired fence-wait deadline > + * @fence: the fence that is to be waited on > + * @deadline: the time by which the waiter hopes for the fence to be > + * signaled > + * > + * Inform the fence signaler of an upcoming deadline, such as > vblank, by > + * which point the waiter would prefer the fence to be signaled > by. This > + * is intended to give feedback to the fence signaler to aid in power > + * management decisions, such as boosting GPU frequency if a periodic > + * vblank deadline is approaching. > + */ > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t > deadline) > +{ > + unsigned long flags; > + > + if (dma_fence_is_signaled(fence)) > + return; > + > + spin_lock_irqsave(fence->lock, flags); > + > + /* If we already have an earlier deadline, keep it: */ > + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && > + ktime_before(fence->deadline, deadline)) { > + spin_unlock_irqrestore(fence->lock, flags); > + return; > + } > + > + fence->deadline = deadline; > + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags); > + > + spin_unlock_irqrestore(fence->lock, flags); > + > + if (fence->ops->set_deadline) > + fence->ops->set_deadline(fence, deadline); > +} > +EXPORT_SYMBOL(dma_fence_set_deadline); > + > /** > * dma_fence_init - Initialize a custom fence. > * @fence: the fence to initialize > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index 6ffb4b2c6371..4e6cfe4e6fbc 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -88,6 +88,7 @@ struct dma_fence { > /* @timestamp replaced by @rcu on > dma_fence_release() */ > struct rcu_head rcu; > }; > + ktime_t deadline; Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding the deadline as extra field here.

We tuned the dma_fence structure intentionally so that it is only 64 bytes.

Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?

I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.

Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and have strong doubt that anybody else will implement that.

i915 does already have a similar thing in it's hand-rolled atomic commit path. So I think msm won't be the only one. It should be also useful to the other mobile GPUs with a gpu vs kms driver split, although looking at the other gpu devfreq implementations, I don't think they've yet gotten to this point in the fine tuning..

BR, -R

...

...
...
Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate memory, but must allocate a context.

Christian.

...
BR, -R

...
But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

...
Regards, Christian.

...
BR, -R

...
Regards, Christian.

> u64 context; > u64 seqno; > unsigned long flags; > @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { > DMA_FENCE_FLAG_SIGNALED_BIT, > DMA_FENCE_FLAG_TIMESTAMP_BIT, > DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, > + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ > }; > > @@ -261,6 +263,19 @@ struct dma_fence_ops { > */ > void (*timeline_value_str)(struct dma_fence *fence, > char *str, int size); > + > + /** > + * @set_deadline: > + * > + * Callback to allow a fence waiter to inform the fence > signaler of an > + * upcoming deadline, such as vblank, by which point the > waiter would > + * prefer the fence to be signaled by. This is intended to > give feedback > + * to the fence signaler to aid in power management > decisions, such as > + * boosting GPU frequency. > + * > + * This callback is optional. > + */ > + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); > }; > > void dma_fence_init(struct dma_fence *fence, const struct > dma_fence_ops *ops, > @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct > dma_fence *fence, bool intr) > return ret < 0 ? ret : 0; > } > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t > deadline); > + > struct dma_fence *dma_fence_get_stub(void); > struct dma_fence *dma_fence_allocate_private_stub(void); > u64 dma_fence_context_alloc(unsigned num);

Daniel Vetter

29 Jul 29 Jul

7:03 a.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:

...

On Wed, Jul 28, 2021 at 10:23 AM Christian König christian.koenig@amd.com wrote:

...
Am 28.07.21 um 17:15 schrieb Rob Clark:

...
On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 28.07.21 um 09:03 schrieb Christian König:

...
Am 27.07.21 um 16:25 schrieb Rob Clark:

...
On Tue, Jul 27, 2021 at 12:11 AM Christian König ckoenig.leichtzumerken@gmail.com wrote: > Am 27.07.21 um 01:38 schrieb Rob Clark: >> From: Rob Clark robdclark@chromium.org >> >> Add a way to hint to the fence signaler of an upcoming deadline, >> such as >> vblank, which the fence waiter would prefer not to miss. This is to >> aid >> the fence signaler in making power management decisions, like boosting >> frequency as the deadline approaches and awareness of missing >> deadlines >> so that can be factored in to the frequency scaling. >> >> Signed-off-by: Rob Clark robdclark@chromium.org >> --- >> drivers/dma-buf/dma-fence.c | 39 >> +++++++++++++++++++++++++++++++++++++ >> include/linux/dma-fence.h | 17 ++++++++++++++++ >> 2 files changed, 56 insertions(+) >> >> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >> index ce0f5eff575d..2e0d25ab457e 100644 >> --- a/drivers/dma-buf/dma-fence.c >> +++ b/drivers/dma-buf/dma-fence.c >> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence >> **fences, uint32_t count, >> } >> EXPORT_SYMBOL(dma_fence_wait_any_timeout); >> >> + >> +/** >> + * dma_fence_set_deadline - set desired fence-wait deadline >> + * @fence: the fence that is to be waited on >> + * @deadline: the time by which the waiter hopes for the fence to be >> + * signaled >> + * >> + * Inform the fence signaler of an upcoming deadline, such as >> vblank, by >> + * which point the waiter would prefer the fence to be signaled >> by. This >> + * is intended to give feedback to the fence signaler to aid in power >> + * management decisions, such as boosting GPU frequency if a periodic >> + * vblank deadline is approaching. >> + */ >> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >> deadline) >> +{ >> + unsigned long flags; >> + >> + if (dma_fence_is_signaled(fence)) >> + return; >> + >> + spin_lock_irqsave(fence->lock, flags); >> + >> + /* If we already have an earlier deadline, keep it: */ >> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && >> + ktime_before(fence->deadline, deadline)) { >> + spin_unlock_irqrestore(fence->lock, flags); >> + return; >> + } >> + >> + fence->deadline = deadline; >> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags); >> + >> + spin_unlock_irqrestore(fence->lock, flags); >> + >> + if (fence->ops->set_deadline) >> + fence->ops->set_deadline(fence, deadline); >> +} >> +EXPORT_SYMBOL(dma_fence_set_deadline); >> + >> /** >> * dma_fence_init - Initialize a custom fence. >> * @fence: the fence to initialize >> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >> index 6ffb4b2c6371..4e6cfe4e6fbc 100644 >> --- a/include/linux/dma-fence.h >> +++ b/include/linux/dma-fence.h >> @@ -88,6 +88,7 @@ struct dma_fence { >> /* @timestamp replaced by @rcu on >> dma_fence_release() */ >> struct rcu_head rcu; >> }; >> + ktime_t deadline; > Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding > the > deadline as extra field here. > > We tuned the dma_fence structure intentionally so that it is only 64 > bytes. Hmm, then I guess you wouldn't be a fan of also adding an hrtimer?

We could push the ktime_t (and timer) down into the derived fence class, but I think there is going to need to be some extra storage *somewhere*.. maybe the fence signaler could get away with just storing the nearest upcoming deadline per fence-context instead?

I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.

Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and have strong doubt that anybody else will implement that.

i915 does already have a similar thing in it's hand-rolled atomic commit path. So I think msm won't be the only one. It should be also useful to the other mobile GPUs with a gpu vs kms driver split, although looking at the other gpu devfreq implementations, I don't think they've yet gotten to this point in the fine tuning..

Yeah I have a dream that maybe i915 will use the atomic commit helpers, I originally wrote them with i915 in mind :-) even had patches!

I also think we'll need this eventually in other areas, Android also has some hacks like this to make sure idle->first touch doesn't suck and similar things. -Daniel

...

BR, -R

...
...
...
Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate memory, but must allocate a context.

Christian.

...
BR, -R

...
But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

...
Regards, Christian.

...
BR, -R

> Regards, > Christian. > >> u64 context; >> u64 seqno; >> unsigned long flags; >> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { >> DMA_FENCE_FLAG_SIGNALED_BIT, >> DMA_FENCE_FLAG_TIMESTAMP_BIT, >> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >> + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, >> DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ >> }; >> >> @@ -261,6 +263,19 @@ struct dma_fence_ops { >> */ >> void (*timeline_value_str)(struct dma_fence *fence, >> char *str, int size); >> + >> + /** >> + * @set_deadline: >> + * >> + * Callback to allow a fence waiter to inform the fence >> signaler of an >> + * upcoming deadline, such as vblank, by which point the >> waiter would >> + * prefer the fence to be signaled by. This is intended to >> give feedback >> + * to the fence signaler to aid in power management >> decisions, such as >> + * boosting GPU frequency. >> + * >> + * This callback is optional. >> + */ >> + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); >> }; >> >> void dma_fence_init(struct dma_fence *fence, const struct >> dma_fence_ops *ops, >> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct >> dma_fence *fence, bool intr) >> return ret < 0 ? ret : 0; >> } >> >> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >> deadline); >> + >> struct dma_fence *dma_fence_get_stub(void); >> struct dma_fence *dma_fence_allocate_private_stub(void); >> u64 dma_fence_context_alloc(unsigned num);

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Rob Clark

3:23 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Thu, Jul 29, 2021 at 12:03 AM Daniel Vetter daniel@ffwll.ch wrote:

...

On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:

...
On Wed, Jul 28, 2021 at 10:23 AM Christian König christian.koenig@amd.com wrote:

...
Am 28.07.21 um 17:15 schrieb Rob Clark:

...
On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 28.07.21 um 09:03 schrieb Christian König:

...
Am 27.07.21 um 16:25 schrieb Rob Clark: > On Tue, Jul 27, 2021 at 12:11 AM Christian König > ckoenig.leichtzumerken@gmail.com wrote: >> Am 27.07.21 um 01:38 schrieb Rob Clark: >>> From: Rob Clark robdclark@chromium.org >>> >>> Add a way to hint to the fence signaler of an upcoming deadline, >>> such as >>> vblank, which the fence waiter would prefer not to miss. This is to >>> aid >>> the fence signaler in making power management decisions, like boosting >>> frequency as the deadline approaches and awareness of missing >>> deadlines >>> so that can be factored in to the frequency scaling. >>> >>> Signed-off-by: Rob Clark robdclark@chromium.org >>> --- >>> drivers/dma-buf/dma-fence.c | 39 >>> +++++++++++++++++++++++++++++++++++++ >>> include/linux/dma-fence.h | 17 ++++++++++++++++ >>> 2 files changed, 56 insertions(+) >>> >>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >>> index ce0f5eff575d..2e0d25ab457e 100644 >>> --- a/drivers/dma-buf/dma-fence.c >>> +++ b/drivers/dma-buf/dma-fence.c >>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence >>> **fences, uint32_t count, >>> } >>> EXPORT_SYMBOL(dma_fence_wait_any_timeout); >>> >>> + >>> +/** >>> + * dma_fence_set_deadline - set desired fence-wait deadline >>> + * @fence: the fence that is to be waited on >>> + * @deadline: the time by which the waiter hopes for the fence to be >>> + * signaled >>> + * >>> + * Inform the fence signaler of an upcoming deadline, such as >>> vblank, by >>> + * which point the waiter would prefer the fence to be signaled >>> by. This >>> + * is intended to give feedback to the fence signaler to aid in power >>> + * management decisions, such as boosting GPU frequency if a periodic >>> + * vblank deadline is approaching. >>> + */ >>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >>> deadline) >>> +{ >>> + unsigned long flags; >>> + >>> + if (dma_fence_is_signaled(fence)) >>> + return; >>> + >>> + spin_lock_irqsave(fence->lock, flags); >>> + >>> + /* If we already have an earlier deadline, keep it: */ >>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && >>> + ktime_before(fence->deadline, deadline)) { >>> + spin_unlock_irqrestore(fence->lock, flags); >>> + return; >>> + } >>> + >>> + fence->deadline = deadline; >>> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags); >>> + >>> + spin_unlock_irqrestore(fence->lock, flags); >>> + >>> + if (fence->ops->set_deadline) >>> + fence->ops->set_deadline(fence, deadline); >>> +} >>> +EXPORT_SYMBOL(dma_fence_set_deadline); >>> + >>> /** >>> * dma_fence_init - Initialize a custom fence. >>> * @fence: the fence to initialize >>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644 >>> --- a/include/linux/dma-fence.h >>> +++ b/include/linux/dma-fence.h >>> @@ -88,6 +88,7 @@ struct dma_fence { >>> /* @timestamp replaced by @rcu on >>> dma_fence_release() */ >>> struct rcu_head rcu; >>> }; >>> + ktime_t deadline; >> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding >> the >> deadline as extra field here. >> >> We tuned the dma_fence structure intentionally so that it is only 64 >> bytes. > Hmm, then I guess you wouldn't be a fan of also adding an hrtimer? > > We could push the ktime_t (and timer) down into the derived fence > class, but I think there is going to need to be some extra storage > *somewhere*.. maybe the fence signaler could get away with just > storing the nearest upcoming deadline per fence-context instead? I would just push that into the driver instead.

You most likely don't want the deadline per fence anyway in complex scenarios, but rather per frame. And a frame is usually composed from multiple fences.

Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and have strong doubt that anybody else will implement that.

i915 does already have a similar thing in it's hand-rolled atomic commit path. So I think msm won't be the only one. It should be also useful to the other mobile GPUs with a gpu vs kms driver split, although looking at the other gpu devfreq implementations, I don't think they've yet gotten to this point in the fine tuning..

Yeah I have a dream that maybe i915 will use the atomic commit helpers, I originally wrote them with i915 in mind :-) even had patches!

I also think we'll need this eventually in other areas, Android also has some hacks like this to make sure idle->first touch doesn't suck and similar things.

input-boost is another thing I have on my roadmap.. part of the solution is:

commit 9bc95570175a7fbca29d86d22c54bbf399f4ad5a Author: Rob Clark robdclark@chromium.org AuthorDate: Mon Jul 26 07:46:50 2021 -0700 Commit: Rob Clark robdclark@chromium.org CommitDate: Tue Jul 27 17:54:36 2021 -0700

drm/msm: Devfreq tuning

which gives the freq a bit of a nudge if the GPU has been idle for longer than a certain threshold.

But the other part is that if the GPU has been idle for more than 66ms (typical autosuspend delay for adreno) it will suspend. For modern adreno's it takes ~2ms to "boot up" the GPU from suspend. Which is something you want to take out of the submit/execbuf path if you are trying to reduce input-to-pageflip latency.

We have a downstream patch that boosts the CPUs on input events (with a cooldown period to prevent spacebar-heater) and I have been thinking of something along those lines to trigger resuming the GPU.. it is straightforward enough for touch based devices, but gets more complicated with keyboard input. In particular, some keys you want to trigger boost on key-release. Ie. modifier keys (ctrl/shift/alt/etc.. the "search" key on chromebooks, etc) you want to boost on key-release, not on key-press because unless you type *really* fast you'll be in the cooldown period when the key-release event happens. Unfortunately the kernel doesn't really know this "policy" sort of information about which keys should boost on press vs release. So I think the long-term/upstream solution is to do input-boost in userspace.. sysfs already has all the knobs that a userspace input-boost daemon would need to twiddle, so no real need for this to be in the kernel. I guess the only drawback is the sysfs knobs are a bit less standardized on the "desktop GPUs" which don't use devfreq.

BR, -R

...

-Daniel

...
BR, -R

...
...
...
Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate memory, but must allocate a context.

Christian.

...
BR, -R

...
But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

...
Regards, Christian.

> BR, > -R > >> Regards, >> Christian. >> >>> u64 context; >>> u64 seqno; >>> unsigned long flags; >>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { >>> DMA_FENCE_FLAG_SIGNALED_BIT, >>> DMA_FENCE_FLAG_TIMESTAMP_BIT, >>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >>> + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, >>> DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ >>> }; >>> >>> @@ -261,6 +263,19 @@ struct dma_fence_ops { >>> */ >>> void (*timeline_value_str)(struct dma_fence *fence, >>> char *str, int size); >>> + >>> + /** >>> + * @set_deadline: >>> + * >>> + * Callback to allow a fence waiter to inform the fence >>> signaler of an >>> + * upcoming deadline, such as vblank, by which point the >>> waiter would >>> + * prefer the fence to be signaled by. This is intended to >>> give feedback >>> + * to the fence signaler to aid in power management >>> decisions, such as >>> + * boosting GPU frequency. >>> + * >>> + * This callback is optional. >>> + */ >>> + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); >>> }; >>> >>> void dma_fence_init(struct dma_fence *fence, const struct >>> dma_fence_ops *ops, >>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct >>> dma_fence *fence, bool intr) >>> return ret < 0 ? ret : 0; >>> } >>> >>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >>> deadline); >>> + >>> struct dma_fence *dma_fence_get_stub(void); >>> struct dma_fence *dma_fence_allocate_private_stub(void); >>> u64 dma_fence_context_alloc(unsigned num);

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Daniel Vetter

4:18 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Thu, Jul 29, 2021 at 5:19 PM Rob Clark robdclark@gmail.com wrote:

...

On Thu, Jul 29, 2021 at 12:03 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:

...
On Wed, Jul 28, 2021 at 10:23 AM Christian König christian.koenig@amd.com wrote:

...
Am 28.07.21 um 17:15 schrieb Rob Clark:

...
On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote:

...
Am 28.07.21 um 09:03 schrieb Christian König: > Am 27.07.21 um 16:25 schrieb Rob Clark: >> On Tue, Jul 27, 2021 at 12:11 AM Christian König >> ckoenig.leichtzumerken@gmail.com wrote: >>> Am 27.07.21 um 01:38 schrieb Rob Clark: >>>> From: Rob Clark robdclark@chromium.org >>>> >>>> Add a way to hint to the fence signaler of an upcoming deadline, >>>> such as >>>> vblank, which the fence waiter would prefer not to miss. This is to >>>> aid >>>> the fence signaler in making power management decisions, like boosting >>>> frequency as the deadline approaches and awareness of missing >>>> deadlines >>>> so that can be factored in to the frequency scaling. >>>> >>>> Signed-off-by: Rob Clark robdclark@chromium.org >>>> --- >>>> drivers/dma-buf/dma-fence.c | 39 >>>> +++++++++++++++++++++++++++++++++++++ >>>> include/linux/dma-fence.h | 17 ++++++++++++++++ >>>> 2 files changed, 56 insertions(+) >>>> >>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >>>> index ce0f5eff575d..2e0d25ab457e 100644 >>>> --- a/drivers/dma-buf/dma-fence.c >>>> +++ b/drivers/dma-buf/dma-fence.c >>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence >>>> **fences, uint32_t count, >>>> } >>>> EXPORT_SYMBOL(dma_fence_wait_any_timeout); >>>> >>>> + >>>> +/** >>>> + * dma_fence_set_deadline - set desired fence-wait deadline >>>> + * @fence: the fence that is to be waited on >>>> + * @deadline: the time by which the waiter hopes for the fence to be >>>> + * signaled >>>> + * >>>> + * Inform the fence signaler of an upcoming deadline, such as >>>> vblank, by >>>> + * which point the waiter would prefer the fence to be signaled >>>> by. This >>>> + * is intended to give feedback to the fence signaler to aid in power >>>> + * management decisions, such as boosting GPU frequency if a periodic >>>> + * vblank deadline is approaching. >>>> + */ >>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >>>> deadline) >>>> +{ >>>> + unsigned long flags; >>>> + >>>> + if (dma_fence_is_signaled(fence)) >>>> + return; >>>> + >>>> + spin_lock_irqsave(fence->lock, flags); >>>> + >>>> + /* If we already have an earlier deadline, keep it: */ >>>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && >>>> + ktime_before(fence->deadline, deadline)) { >>>> + spin_unlock_irqrestore(fence->lock, flags); >>>> + return; >>>> + } >>>> + >>>> + fence->deadline = deadline; >>>> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags); >>>> + >>>> + spin_unlock_irqrestore(fence->lock, flags); >>>> + >>>> + if (fence->ops->set_deadline) >>>> + fence->ops->set_deadline(fence, deadline); >>>> +} >>>> +EXPORT_SYMBOL(dma_fence_set_deadline); >>>> + >>>> /** >>>> * dma_fence_init - Initialize a custom fence. >>>> * @fence: the fence to initialize >>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644 >>>> --- a/include/linux/dma-fence.h >>>> +++ b/include/linux/dma-fence.h >>>> @@ -88,6 +88,7 @@ struct dma_fence { >>>> /* @timestamp replaced by @rcu on >>>> dma_fence_release() */ >>>> struct rcu_head rcu; >>>> }; >>>> + ktime_t deadline; >>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding >>> the >>> deadline as extra field here. >>> >>> We tuned the dma_fence structure intentionally so that it is only 64 >>> bytes. >> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer? >> >> We could push the ktime_t (and timer) down into the derived fence >> class, but I think there is going to need to be some extra storage >> *somewhere*.. maybe the fence signaler could get away with just >> storing the nearest upcoming deadline per fence-context instead? > I would just push that into the driver instead. > > You most likely don't want the deadline per fence anyway in complex > scenarios, but rather per frame. And a frame is usually composed from > multiple fences.

Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and have strong doubt that anybody else will implement that.

i915 does already have a similar thing in it's hand-rolled atomic commit path. So I think msm won't be the only one. It should be also useful to the other mobile GPUs with a gpu vs kms driver split, although looking at the other gpu devfreq implementations, I don't think they've yet gotten to this point in the fine tuning..

Yeah I have a dream that maybe i915 will use the atomic commit helpers, I originally wrote them with i915 in mind :-) even had patches!

I also think we'll need this eventually in other areas, Android also has some hacks like this to make sure idle->first touch doesn't suck and similar things.

input-boost is another thing I have on my roadmap.. part of the solution is:
commit 9bc95570175a7fbca29d86d22c54bbf399f4ad5a
Author:     Rob Clark <robdclark@chromium.org>
AuthorDate: Mon Jul 26 07:46:50 2021 -0700
Commit:     Rob Clark <robdclark@chromium.org>
CommitDate: Tue Jul 27 17:54:36 2021 -0700

    drm/msm: Devfreq tuning
which gives the freq a bit of a nudge if the GPU has been idle for longer than a certain threshold.

But the other part is that if the GPU has been idle for more than 66ms (typical autosuspend delay for adreno) it will suspend. For modern adreno's it takes ~2ms to "boot up" the GPU from suspend. Which is something you want to take out of the submit/execbuf path if you are trying to reduce input-to-pageflip latency.

We have a downstream patch that boosts the CPUs on input events (with a cooldown period to prevent spacebar-heater) and I have been thinking of something along those lines to trigger resuming the GPU.. it is straightforward enough for touch based devices, but gets more complicated with keyboard input. In particular, some keys you want to trigger boost on key-release. Ie. modifier keys (ctrl/shift/alt/etc.. the "search" key on chromebooks, etc) you want to boost on key-release, not on key-press because unless you type *really* fast you'll be in the cooldown period when the key-release event happens. Unfortunately the kernel doesn't really know this "policy" sort of information about which keys should boost on press vs release. So I think the long-term/upstream solution is to do input-boost in userspace.. sysfs already has all the knobs that a userspace input-boost daemon would need to twiddle, so no real need for this to be in the kernel. I guess the only drawback is the sysfs knobs are a bit less standardized on the "desktop GPUs" which don't use devfreq.

I think we could do a standard interface for this, either on the drm owner/master or somewhere in sysfs. Essentially "I expect to use the gpu for the very next frame, get it going". Across all hw there's a lot of things we can do. I think abuse is pretty easy to prevent with a cooldown or similar. -Daniel

...

BR, -R

...
-Daniel

...
BR, -R

...
...
...
Thinking more about it we could probably kill the spinlock pointer and make the flags 32bit if we absolutely need that here.

If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate memory, but must allocate a context.

Christian.

...
BR, -R

...
But I still don't see the need for that, especially since most drivers probably won't implement it.

Regards, Christian.

> Regards, > Christian. > >> BR, >> -R >> >>> Regards, >>> Christian. >>> >>>> u64 context; >>>> u64 seqno; >>>> unsigned long flags; >>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { >>>> DMA_FENCE_FLAG_SIGNALED_BIT, >>>> DMA_FENCE_FLAG_TIMESTAMP_BIT, >>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >>>> + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, >>>> DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ >>>> }; >>>> >>>> @@ -261,6 +263,19 @@ struct dma_fence_ops { >>>> */ >>>> void (*timeline_value_str)(struct dma_fence *fence, >>>> char *str, int size); >>>> + >>>> + /** >>>> + * @set_deadline: >>>> + * >>>> + * Callback to allow a fence waiter to inform the fence >>>> signaler of an >>>> + * upcoming deadline, such as vblank, by which point the >>>> waiter would >>>> + * prefer the fence to be signaled by. This is intended to >>>> give feedback >>>> + * to the fence signaler to aid in power management >>>> decisions, such as >>>> + * boosting GPU frequency. >>>> + * >>>> + * This callback is optional. >>>> + */ >>>> + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); >>>> }; >>>> >>>> void dma_fence_init(struct dma_fence *fence, const struct >>>> dma_fence_ops *ops, >>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct >>>> dma_fence *fence, bool intr) >>>> return ret < 0 ? ret : 0; >>>> } >>>> >>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >>>> deadline); >>>> + >>>> struct dma_fence *dma_fence_get_stub(void); >>>> struct dma_fence *dma_fence_allocate_private_stub(void); >>>> u64 dma_fence_context_alloc(unsigned num);

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Rob Clark

5:32 p.m.

New subject: [RFC 1/4] dma-fence: Add deadline awareness

On Thu, Jul 29, 2021 at 9:18 AM Daniel Vetter daniel@ffwll.ch wrote:

...

On Thu, Jul 29, 2021 at 5:19 PM Rob Clark robdclark@gmail.com wrote:

...
On Thu, Jul 29, 2021 at 12:03 AM Daniel Vetter daniel@ffwll.ch wrote:

...
On Wed, Jul 28, 2021 at 10:58:51AM -0700, Rob Clark wrote:

...
On Wed, Jul 28, 2021 at 10:23 AM Christian König christian.koenig@amd.com wrote:

...
Am 28.07.21 um 17:15 schrieb Rob Clark:

...
On Wed, Jul 28, 2021 at 4:37 AM Christian König ckoenig.leichtzumerken@gmail.com wrote: > Am 28.07.21 um 09:03 schrieb Christian König: >> Am 27.07.21 um 16:25 schrieb Rob Clark: >>> On Tue, Jul 27, 2021 at 12:11 AM Christian König >>> ckoenig.leichtzumerken@gmail.com wrote: >>>> Am 27.07.21 um 01:38 schrieb Rob Clark: >>>>> From: Rob Clark robdclark@chromium.org >>>>> >>>>> Add a way to hint to the fence signaler of an upcoming deadline, >>>>> such as >>>>> vblank, which the fence waiter would prefer not to miss. This is to >>>>> aid >>>>> the fence signaler in making power management decisions, like boosting >>>>> frequency as the deadline approaches and awareness of missing >>>>> deadlines >>>>> so that can be factored in to the frequency scaling. >>>>> >>>>> Signed-off-by: Rob Clark robdclark@chromium.org >>>>> --- >>>>> drivers/dma-buf/dma-fence.c | 39 >>>>> +++++++++++++++++++++++++++++++++++++ >>>>> include/linux/dma-fence.h | 17 ++++++++++++++++ >>>>> 2 files changed, 56 insertions(+) >>>>> >>>>> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c >>>>> index ce0f5eff575d..2e0d25ab457e 100644 >>>>> --- a/drivers/dma-buf/dma-fence.c >>>>> +++ b/drivers/dma-buf/dma-fence.c >>>>> @@ -910,6 +910,45 @@ dma_fence_wait_any_timeout(struct dma_fence >>>>> **fences, uint32_t count, >>>>> } >>>>> EXPORT_SYMBOL(dma_fence_wait_any_timeout); >>>>> >>>>> + >>>>> +/** >>>>> + * dma_fence_set_deadline - set desired fence-wait deadline >>>>> + * @fence: the fence that is to be waited on >>>>> + * @deadline: the time by which the waiter hopes for the fence to be >>>>> + * signaled >>>>> + * >>>>> + * Inform the fence signaler of an upcoming deadline, such as >>>>> vblank, by >>>>> + * which point the waiter would prefer the fence to be signaled >>>>> by. This >>>>> + * is intended to give feedback to the fence signaler to aid in power >>>>> + * management decisions, such as boosting GPU frequency if a periodic >>>>> + * vblank deadline is approaching. >>>>> + */ >>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >>>>> deadline) >>>>> +{ >>>>> + unsigned long flags; >>>>> + >>>>> + if (dma_fence_is_signaled(fence)) >>>>> + return; >>>>> + >>>>> + spin_lock_irqsave(fence->lock, flags); >>>>> + >>>>> + /* If we already have an earlier deadline, keep it: */ >>>>> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags) && >>>>> + ktime_before(fence->deadline, deadline)) { >>>>> + spin_unlock_irqrestore(fence->lock, flags); >>>>> + return; >>>>> + } >>>>> + >>>>> + fence->deadline = deadline; >>>>> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &fence->flags); >>>>> + >>>>> + spin_unlock_irqrestore(fence->lock, flags); >>>>> + >>>>> + if (fence->ops->set_deadline) >>>>> + fence->ops->set_deadline(fence, deadline); >>>>> +} >>>>> +EXPORT_SYMBOL(dma_fence_set_deadline); >>>>> + >>>>> /** >>>>> * dma_fence_init - Initialize a custom fence. >>>>> * @fence: the fence to initialize >>>>> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h >>>>> index 6ffb4b2c6371..4e6cfe4e6fbc 100644 >>>>> --- a/include/linux/dma-fence.h >>>>> +++ b/include/linux/dma-fence.h >>>>> @@ -88,6 +88,7 @@ struct dma_fence { >>>>> /* @timestamp replaced by @rcu on >>>>> dma_fence_release() */ >>>>> struct rcu_head rcu; >>>>> }; >>>>> + ktime_t deadline; >>>> Mhm, adding the flag sounds ok to me but I'm a bit hesitating adding >>>> the >>>> deadline as extra field here. >>>> >>>> We tuned the dma_fence structure intentionally so that it is only 64 >>>> bytes. >>> Hmm, then I guess you wouldn't be a fan of also adding an hrtimer? >>> >>> We could push the ktime_t (and timer) down into the derived fence >>> class, but I think there is going to need to be some extra storage >>> *somewhere*.. maybe the fence signaler could get away with just >>> storing the nearest upcoming deadline per fence-context instead? >> I would just push that into the driver instead. >> >> You most likely don't want the deadline per fence anyway in complex >> scenarios, but rather per frame. And a frame is usually composed from >> multiple fences. Right, I ended up keeping track of the nearest deadline in patch 5/4 which added drm/msm support:

https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork....

But if we do have the ktime_t in dma_fence in dma_fence, we can add some checks and avoid calling back to the driver if a later deadline is set on a fence that already has an earlier deadline. OTOH I suppose I can push all that back to the driver to start, and we can revisit once we have more drivers implementing deadline support.

I still think that all of this is rather specific to your use case and have strong doubt that anybody else will implement that.

i915 does already have a similar thing in it's hand-rolled atomic commit path. So I think msm won't be the only one. It should be also useful to the other mobile GPUs with a gpu vs kms driver split, although looking at the other gpu devfreq implementations, I don't think they've yet gotten to this point in the fine tuning..

Yeah I have a dream that maybe i915 will use the atomic commit helpers, I originally wrote them with i915 in mind :-) even had patches!

I also think we'll need this eventually in other areas, Android also has some hacks like this to make sure idle->first touch doesn't suck and similar things.

input-boost is another thing I have on my roadmap.. part of the solution is:
commit 9bc95570175a7fbca29d86d22c54bbf399f4ad5a
Author:     Rob Clark <robdclark@chromium.org>
AuthorDate: Mon Jul 26 07:46:50 2021 -0700
Commit:     Rob Clark <robdclark@chromium.org>
CommitDate: Tue Jul 27 17:54:36 2021 -0700

    drm/msm: Devfreq tuning
which gives the freq a bit of a nudge if the GPU has been idle for longer than a certain threshold.

But the other part is that if the GPU has been idle for more than 66ms (typical autosuspend delay for adreno) it will suspend. For modern adreno's it takes ~2ms to "boot up" the GPU from suspend. Which is something you want to take out of the submit/execbuf path if you are trying to reduce input-to-pageflip latency.

We have a downstream patch that boosts the CPUs on input events (with a cooldown period to prevent spacebar-heater) and I have been thinking of something along those lines to trigger resuming the GPU.. it is straightforward enough for touch based devices, but gets more complicated with keyboard input. In particular, some keys you want to trigger boost on key-release. Ie. modifier keys (ctrl/shift/alt/etc.. the "search" key on chromebooks, etc) you want to boost on key-release, not on key-press because unless you type *really* fast you'll be in the cooldown period when the key-release event happens. Unfortunately the kernel doesn't really know this "policy" sort of information about which keys should boost on press vs release. So I think the long-term/upstream solution is to do input-boost in userspace.. sysfs already has all the knobs that a userspace input-boost daemon would need to twiddle, so no real need for this to be in the kernel. I guess the only drawback is the sysfs knobs are a bit less standardized on the "desktop GPUs" which don't use devfreq.
I think we could do a standard interface for this, either on the drm owner/master or somewhere in sysfs. Essentially "I expect to use the gpu for the very next frame, get it going". Across all hw there's a lot of things we can do. I think abuse is pretty easy to prevent with a cooldown or similar.

The userspace input-boost ends up needing to be either part of the compositor, or a privileged process, in order to sniff input events.. so I don't think the kernel needs to try to prevent abuse here (but the userspace part defn wants a cooldown period)

BR, -R

...

-Daniel

...
BR, -R

...
-Daniel

...
BR, -R

...
...
> Thinking more about it we could probably kill the spinlock pointer and > make the flags 32bit if we absolutely need that here. If we had a 'struct dma_fence_context' we could push the spinlock, ops pointer, and u64 context into that and replace with a single dma_fence_context ptr, fwiw

That won't work. We have a lot of use cases where you can't allocate memory, but must allocate a context.

Christian.

...
BR, -R

> But I still don't see the need for that, especially since most drivers > probably won't implement it. > > Regards, > Christian. > >> Regards, >> Christian. >> >>> BR, >>> -R >>> >>>> Regards, >>>> Christian. >>>> >>>>> u64 context; >>>>> u64 seqno; >>>>> unsigned long flags; >>>>> @@ -99,6 +100,7 @@ enum dma_fence_flag_bits { >>>>> DMA_FENCE_FLAG_SIGNALED_BIT, >>>>> DMA_FENCE_FLAG_TIMESTAMP_BIT, >>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, >>>>> + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, >>>>> DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ >>>>> }; >>>>> >>>>> @@ -261,6 +263,19 @@ struct dma_fence_ops { >>>>> */ >>>>> void (*timeline_value_str)(struct dma_fence *fence, >>>>> char *str, int size); >>>>> + >>>>> + /** >>>>> + * @set_deadline: >>>>> + * >>>>> + * Callback to allow a fence waiter to inform the fence >>>>> signaler of an >>>>> + * upcoming deadline, such as vblank, by which point the >>>>> waiter would >>>>> + * prefer the fence to be signaled by. This is intended to >>>>> give feedback >>>>> + * to the fence signaler to aid in power management >>>>> decisions, such as >>>>> + * boosting GPU frequency. >>>>> + * >>>>> + * This callback is optional. >>>>> + */ >>>>> + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); >>>>> }; >>>>> >>>>> void dma_fence_init(struct dma_fence *fence, const struct >>>>> dma_fence_ops *ops, >>>>> @@ -586,6 +601,8 @@ static inline signed long dma_fence_wait(struct >>>>> dma_fence *fence, bool intr) >>>>> return ret < 0 ? ret : 0; >>>>> } >>>>> >>>>> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t >>>>> deadline); >>>>> + >>>>> struct dma_fence *dma_fence_get_stub(void); >>>>> struct dma_fence *dma_fence_allocate_private_stub(void); >>>>> u64 dma_fence_context_alloc(unsigned num);

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Rob Clark

26 Jul 26 Jul

11:38 p.m.

New subject: [RFC 4/4] drm/scheduler: Add fence deadline support

From: Rob Clark robdclark@chromium.org

As the finished fence is the one that is exposed to userspace, and therefore the one that other operations, like atomic update, would block on, we need to propagate the deadline from from the finished fence to the actual hw fence.

Signed-off-by: Rob Clark robdclark@chromium.org --- drivers/gpu/drm/scheduler/sched_fence.c | 10 ++++++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index 69de2c76731f..3aa6351d2101 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -128,6 +128,15 @@ static void drm_sched_fence_release_finished(struct dma_fence *f) dma_fence_put(&fence->scheduled); }

+static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, + ktime_t deadline) +{ + struct drm_sched_fence *fence = to_drm_sched_fence(f); + + if (fence->parent) + dma_fence_set_deadline(fence->parent, deadline); +} + static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, @@ -138,6 +147,7 @@ static const struct dma_fence_ops drm_sched_fence_ops_finished = { .get_driver_name = drm_sched_fence_get_driver_name, .get_timeline_name = drm_sched_fence_get_timeline_name, .release = drm_sched_fence_release_finished, + .set_deadline = drm_sched_fence_set_deadline_finished, };

struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index a2a953693b45..fcc601962e92 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -818,6 +818,9 @@ static int drm_sched_main(void *param)

if (!IS_ERR_OR_NULL(fence)) { s_fence->parent = dma_fence_get(fence); + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, + &s_fence->finished.flags)) + dma_fence_set_deadline(fence, s_fence->finished.deadline); r = dma_fence_add_callback(fence, &sched_job->cb, drm_sched_job_done_cb); if (r == -ENOENT)

-- 2.31.1

Rob Clark

11:51 p.m.

On Mon, Jul 26, 2021 at 4:34 PM Rob Clark robdclark@gmail.com wrote:

...

From: Rob Clark robdclark@chromium.org

Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.

Fwiw, what I'm thinking for the drm/msm part is a timer set to expire a bit (couple ms?) before the deadline, which boosts if the timer expires before the fence is signaled.

Assuming this is roughly in line with what other drivers would do, possibly there is some room to build this timer into dma-fence itself?

BR, -R

...

Original description:

In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.

This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:

To continue to be able to use the atomic helpers

To support cases where display and gpu are different drivers

[1] https://patchwork.freedesktop.org/series/90331/

Rob Clark (4): dma-fence: Add deadline awareness drm/vblank: Add helper to get next vblank time drm/atomic-helper: Set fence deadline for vblank drm/scheduler: Add fence deadline support

drivers/dma-buf/dma-fence.c | 39 +++++++++++++++++++++++++ drivers/gpu/drm/drm_atomic_helper.c | 36 +++++++++++++++++++++++ drivers/gpu/drm/drm_vblank.c | 31 ++++++++++++++++++++ drivers/gpu/drm/scheduler/sched_fence.c | 10 +++++++ drivers/gpu/drm/scheduler/sched_main.c | 3 ++ include/drm/drm_vblank.h | 1 + include/linux/dma-fence.h | 17 +++++++++++ 7 files changed, 137 insertions(+)

-- 2.31.1

Michel Dänzer

27 Jul 27 Jul

2:41 p.m.

On 2021-07-27 1:38 a.m., Rob Clark wrote:

...

From: Rob Clark robdclark@chromium.org

Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.

Original description:

In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.

This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:

To continue to be able to use the atomic helpers

To support cases where display and gpu are different drivers

[1] https://patchwork.freedesktop.org/series/90331/

Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).

-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer

Rob Clark

3:12 p.m.

On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote:

...

On 2021-07-27 1:38 a.m., Rob Clark wrote:

...
From: Rob Clark robdclark@chromium.org

Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.

Original description:

In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.

This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:

To continue to be able to use the atomic helpers

To support cases where display and gpu are different drivers

[1] https://patchwork.freedesktop.org/series/90331/

Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).

I guess you mean "no effect at all *except* for fullscreen..."? Games are usually running fullscreen, it is a case I care about a lot ;-)

I'd perhaps recommend that wayland compositors, in cases where only a single layer is changing, not try to be clever and just push the update down to the kernel.

BR, -R

...

-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer

Michel Dänzer

3:19 p.m.

On 2021-07-27 5:12 p.m., Rob Clark wrote:

...

On Tue, Jul 27, 2021 at 7:50 AM Michel Dänzer michel@daenzer.net wrote:

...
On 2021-07-27 1:38 a.m., Rob Clark wrote:

...
From: Rob Clark robdclark@chromium.org

Based on discussion from a previous series[1] to add a "boost" mechanism when, for example, vblank deadlines are missed. Instead of a boost callback, this approach adds a way to set a deadline on the fence, by which the waiter would like to see the fence signalled.

I've not yet had a chance to re-work the drm/msm part of this, but wanted to send this out as an RFC in case I don't have a chance to finish the drm/msm part this week.

Original description:

In some cases, like double-buffered rendering, missing vblanks can trick the GPU into running at a lower frequence, when really we want to be running at a higher frequency to not miss the vblanks in the first place.

This is partially inspired by a trick i915 does, but implemented via dma-fence for a couple of reasons:

To continue to be able to use the atomic helpers

To support cases where display and gpu are different drivers

[1] https://patchwork.freedesktop.org/series/90331/

Unfortunately, none of these approaches will have the full intended effect once Wayland compositors start waiting for client buffers to become idle before using them for an output frame (to prevent output frames from getting delayed by client work). See https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 (shameless plug :) for a proof of concept of this for mutter. The boost will only affect the compositor's own GPU work, not the client work (which means no effect at all for fullscreen apps where the compositor can scan out the client buffers directly).

I guess you mean "no effect at all *except* for fullscreen..."?

I meant what I wrote: The compositor will wait for the next buffer to become idle, so there's no boost from this mechanism for the client drawing to that buffer. And since the compositor does no drawing of its own in this case, there's no boost from that either.

...

I'd perhaps recommend that wayland compositors, in cases where only a single layer is changing, not try to be clever and just push the update down to the kernel.

Even just for the fullscreen direct scanout case, that would require some kind of atomic KMS API extension to allow queuing multiple page flips for the same CRTC.

For other cases, this would also require a mechanism to cancel a pending atomic commit, for when another surface update comes in before the compositor's deadline, which affects the previously single updating surface as well.

-- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and X developer

Rob Clark

9:17 p.m.

New subject: [RFC 5/4] drm/msm: Add deadline based boost support

From: Rob Clark robdclark@chromium.org

Signed-off-by: Rob Clark robdclark@chromium.org --- This is a quick implementation of what I had in mind for driver side of deadline boost. For a couple games with bad gpu devfreq behavior this boosts "Render quality" from ~35% to ~95%. (The "Render quality" metric in chrome://arc-overview-tracing is basically a measure of the deviation in frame/commit time, so 100% would be a consistent fps with no variantion.) Not quite 100%, this is still a bit of a re- active mechanism.

A similar result can be had by tuning devfreq to boost to max OPP at a much lower threshold of busyness. With the obvious downside that you spend a lot of time running the GPU much faster than needed.

drivers/gpu/drm/msm/msm_fence.c | 76 +++++++++++++++++++++++++++ drivers/gpu/drm/msm/msm_fence.h | 20 +++++++ drivers/gpu/drm/msm/msm_gpu.h | 1 + drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++++++ 4 files changed, 117 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c index f2cece542c3f..67c2a96e1c85 100644 --- a/drivers/gpu/drm/msm/msm_fence.c +++ b/drivers/gpu/drm/msm/msm_fence.c @@ -8,6 +8,37 @@

#include "msm_drv.h" #include "msm_fence.h" +#include "msm_gpu.h" + +static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fence); + +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) +{ + struct msm_drm_private *priv = fctx->dev->dev_private; + return priv->gpu; +} + +static enum hrtimer_restart deadline_timer(struct hrtimer *t) +{ + struct msm_fence_context *fctx = container_of(t, + struct msm_fence_context, deadline_timer); + + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); + + return HRTIMER_NORESTART; +} + +static void deadline_work(struct kthread_work *work) +{ + struct msm_fence_context *fctx = container_of(work, + struct msm_fence_context, deadline_work); + + /* If deadline fence has already passed, nothing to do: */ + if (fence_completed(fctx, fctx->next_deadline_fence)) + return; + + msm_devfreq_boost(fctx2gpu(fctx), 2); +}

struct msm_fence_context * @@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile uint32_t *fenceptr, fctx->fenceptr = fenceptr; spin_lock_init(&fctx->spinlock);

+ hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); + fctx->deadline_timer.function = deadline_timer; + + kthread_init_work(&fctx->deadline_work, deadline_work); + + fctx->next_deadline = ktime_get(); + return fctx; }

@@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence) { spin_lock(&fctx->spinlock); fctx->completed_fence = max(fence, fctx->completed_fence); + if (fence_completed(fctx, fctx->next_deadline_fence)) + hrtimer_cancel(&fctx->deadline_timer); spin_unlock(&fctx->spinlock); }

@@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) return fence_completed(f->fctx, f->base.seqno); }

+static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) +{ + struct msm_fence *f = to_msm_fence(fence); + struct msm_fence_context *fctx = f->fctx; + unsigned long flags; + ktime_t now; + + spin_lock_irqsave(&fctx->spinlock, flags); + now = ktime_get(); + + if (ktime_after(now, fctx->next_deadline) || + ktime_before(deadline, fctx->next_deadline)) { + fctx->next_deadline = deadline; + fctx->next_deadline_fence = + max(fctx->next_deadline_fence, (uint32_t)fence->seqno); + + /* + * Set timer to trigger boost 3ms before deadline, or + * if we are already less than 3ms before the deadline + * schedule boost work immediately. + */ + deadline = ktime_sub(deadline, ms_to_ktime(3)); + + if (ktime_after(now, deadline)) { + kthread_queue_work(fctx2gpu(fctx)->worker, + &fctx->deadline_work); + } else { + hrtimer_start(&fctx->deadline_timer, deadline, + HRTIMER_MODE_ABS); + } + } + + spin_unlock_irqrestore(&fctx->spinlock, flags); +} + static const struct dma_fence_ops msm_fence_ops = { .get_driver_name = msm_fence_get_driver_name, .get_timeline_name = msm_fence_get_timeline_name, .signaled = msm_fence_signaled, + .set_deadline = msm_fence_set_deadline, };

struct dma_fence * diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h index 4783db528bcc..d34e853c555a 100644 --- a/drivers/gpu/drm/msm/msm_fence.h +++ b/drivers/gpu/drm/msm/msm_fence.h @@ -50,6 +50,26 @@ struct msm_fence_context { volatile uint32_t *fenceptr;

spinlock_t spinlock; + + /* + * TODO this doesn't really deal with multiple deadlines, like + * if userspace got multiple frames ahead.. OTOH atomic updates + * don't queue, so maybe that is ok + */ + + /** next_deadline: Time of next deadline */ + ktime_t next_deadline; + + /** + * next_deadline_fence: + * + * Fence value for next pending deadline. The deadline timer is + * canceled when this fence is signaled. + */ + uint32_t next_deadline_fence; + + struct hrtimer deadline_timer; + struct kthread_work deadline_work; };

struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev, diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 0e4b45bff2e6..e031c9b495ed 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -425,6 +425,7 @@ void msm_devfreq_init(struct msm_gpu *gpu); void msm_devfreq_cleanup(struct msm_gpu *gpu); void msm_devfreq_resume(struct msm_gpu *gpu); void msm_devfreq_suspend(struct msm_gpu *gpu); +void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor); void msm_devfreq_active(struct msm_gpu *gpu); void msm_devfreq_idle(struct msm_gpu *gpu);

diff --git a/drivers/gpu/drm/msm/msm_gpu_devfreq.c b/drivers/gpu/drm/msm/msm_gpu_devfreq.c index 0a1ee20296a2..8a8d7b9028a3 100644 --- a/drivers/gpu/drm/msm/msm_gpu_devfreq.c +++ b/drivers/gpu/drm/msm/msm_gpu_devfreq.c @@ -144,6 +144,26 @@ void msm_devfreq_suspend(struct msm_gpu *gpu) devfreq_suspend_device(gpu->devfreq.devfreq); }

+void msm_devfreq_boost(struct msm_gpu *gpu, unsigned factor) +{ + struct msm_gpu_devfreq *df = &gpu->devfreq; + unsigned long freq; + + /* + * Hold devfreq lock to synchronize with get_dev_status()/ + * target() callbacks + */ + mutex_lock(&df->devfreq->lock); + + freq = get_freq(gpu); + + freq *= factor; + + msm_devfreq_target(&gpu->pdev->dev, &freq, 0); + + mutex_unlock(&df->devfreq->lock); +} + void msm_devfreq_active(struct msm_gpu *gpu) { struct msm_gpu_devfreq *df = &gpu->devfreq;

-- 2.31.1

1445

days inactive

1448

days old

linaro-mm-sig@lists.linaro.org

18 comments

participants

tags (0)

participants (5)

Christian König
Christian König
Daniel Vetter
Michel Dänzer
Rob Clark