We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice.
Reported-by: Edward Baker edward.baker@intel.com Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Edward Baker edward.baker@intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org # v5.8+ --- drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index b13e7845d483..f74d5e09e176 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps) adj = -2; rps->last_adj = adj; rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq); + if (rps->cur_freq < rps->efficient_freq) { + rps->cur_freq = rps->efficient_freq; + rps->last_adj = 0; + }
GT_TRACE(rps_to_gt(rps), "park:%x\n", rps->cur_freq); }
On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice.
Reported-by: Edward Baker edward.baker@intel.com Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Edward Baker edward.baker@intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org # v5.8+
drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index b13e7845d483..f74d5e09e176 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps) adj = -2; rps->last_adj = adj; rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
- if (rps->cur_freq < rps->efficient_freq) {
rps->cur_freq = rps->efficient_freq;
rps->last_adj = 0;
this is indeed the smallest fix we can propagate:
Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com
but I wonder now if we couldn't simply kill the last_adj now and always go with the rpe on park/unpark
- }
GT_TRACE(rps_to_gt(rps), "park:%x\n", rps->cur_freq); } -- 2.20.1
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Rodrigo Vivi (2020-11-24 19:46:29)
On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice.
Reported-by: Edward Baker edward.baker@intel.com Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Edward Baker edward.baker@intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org # v5.8+
drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index b13e7845d483..f74d5e09e176 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps) adj = -2; rps->last_adj = adj; rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
if (rps->cur_freq < rps->efficient_freq) {
rps->cur_freq = rps->efficient_freq;
rps->last_adj = 0;
this is indeed the smallest fix we can propagate:
Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com
but I wonder now if we couldn't simply kill the last_adj now and always go with the rpe on park/unpark
Since we often have very bursty workloads that are less than 1ms, we do want to keep the frequency across idling, or else we incur more latency than is desired by the user (although unpark latency is no joke, although that is mostly the context switches). The compromise for always running shorter than an RPS interval is to "gradually" reduce the frequency (so that compositors do not get stuck at max clocks, yet those very same compositors also do require very quick autotuning so that animations are smooth from idle.) Compute is another one where they have both sustained and bursty workloads, and the shorter-than-RPS bursty workloads are naturally expected to be to low latency.
So I still think keeping cur_freq is most often the best approach. -Chris
Hi Chris,
On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
We treat idling the GT (intel_rps_park) as a downclock event, and reduce the frequency we intend to restart the GT with. Since the two workloads are likely related (e.g. a compositor rendering every 16ms), we want to carry the frequency and load information from across the idling. However, we do also need to update the frequencies so that workloads that run for less than 1ms are autotuned by RPS (otherwise we leave compositors running at max clocks, draining excess power). Conversely, if we try to run too slowly, the next workload has to run longer. Since there is a hysteresis in the power graph, below a certain frequency running a short workload for longer consumes more energy than running it slightly higher for less time. The exact balance point is unknown beforehand, but measurements with 30fps media playback indicate that RPe is a better choice.
Reported-by: Edward Baker edward.baker@intel.com Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Edward Baker edward.baker@intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org # v5.8+
drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index b13e7845d483..f74d5e09e176 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps) adj = -2; rps->last_adj = adj; rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
- if (rps->cur_freq < rps->efficient_freq) {
rps->cur_freq = rps->efficient_freq;
rps->last_adj = 0;
- }
looks OK to me, makes sense:
Reviewed-by: Andi Shyti andi.shyti@intel.com
Thanks, Andi
linux-stable-mirror@lists.linaro.org