-----Original Message----- From: Chris Wilson chris@chris-wilson.co.uk Sent: Saturday, January 09, 2021 7:49 AM To: intel-gfx@lists.freedesktop.org Cc: Chris Wilson chris@chris-wilson.co.uk; Mika Kuoppala mika.kuoppala@linux.intel.com; Kumar Valsan, Prathap prathap.kumar.valsan@intel.com; Abodunrin, Akeem G akeem.g.abodunrin@intel.com; Bloomfield, Jon jon.bloomfield@intel.com; Vivi, Rodrigo rodrigo.vivi@intel.com; Randy Wright rwright@hpe.com; stable@vger.kernel.org Subject: [PATCH 1/3] drm/i915/gt: Limit VFE threads based on GT
MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the range [0, n-1] where n is #EU * (#threads/EU) with the number of threads based on plaform and the number of EU based on the number of slices and subslices. This is a fixed number per platform/gt, so appropriately limit the number of threads we spawn to match the device.
v2: Oversaturate the system with tasks to force execution on every HW thread; if the thread idles it is returned to the pool and may be reused again before an unused thread.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com Cc: Jon Bloomfield jon.bloomfield@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Cc: Randy Wright rwright@hpe.com Cc: stable@vger.kernel.org # v5.7+
drivers/gpu/drm/i915/gt/gen7_renderclear.c | 91 ++++++++++++---------- 1 file changed, 49 insertions(+), 42 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c index d93d85cd3027..3ea7c9cc0f3d 100644 --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c @@ -7,8 +7,6 @@ #include "i915_drv.h" #include "intel_gpu_commands.h"
-#define MAX_URB_ENTRIES 64 -#define STATE_SIZE (4 * 1024) #define GT3_INLINE_DATA_DELAYS 0x1E00 #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS))
@@ -34,38 +32,57 @@ struct batch_chunk { };
struct batch_vals {
- u32 max_primitives;
- u32 max_urb_entries;
- u32 cmd_size;
- u32 state_size;
- u32 max_threads; u32 state_start;
- u32 batch_size;
- u32 surface_start; u32 surface_height; u32 surface_width;
- u32 scratch_size;
- u32 max_size;
- u32 size;
};
+static inline int num_primitives(const struct batch_vals *bv) {
- /*
* We need to oversaturate the GPU with work in order to dispatch
* a shader on every HW thread.
*/
- return bv->max_threads + 2;
+}
static void batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv) { if (IS_HASWELL(i915)) {
bv->max_primitives = 280;
bv->max_urb_entries = MAX_URB_ENTRIES;
switch (INTEL_INFO(i915)->gt) {
default:
case 1:
bv->max_threads = 70;
break;
case 2:
bv->max_threads = 140;
break;
case 3:
bv->max_threads = 280;
break;
bv->surface_height = 16 * 16; bv->surface_width = 32 * 2 * 16; } else {}
bv->max_primitives = 128;
bv->max_urb_entries = MAX_URB_ENTRIES / 2;
switch (INTEL_INFO(i915)->gt) {
default:
case 1: /* including vlv */
bv->max_threads = 36;
break;
case 2:
bv->max_threads = 128;
break;
}
Do we really need to hardcode max number of threads per gt/platform? Why not calculating the number of active threads from the no_of_slices * 1024? - Also, is "64" not the minimum number of threads supported?
Thanks, ~Akeem