MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the range [0, n-1] where n is #EU * (#threads/EU) with the number of threads based on plaform and the number of EU based on the number of slices and subslices. This is a fixed number per platform/gt, so appropriately limit the number of threads we spawn to match the device.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com Cc: Balestrieri Francesco francesco.balestrieri@intel.com Cc: Bloomfield Jon jon.bloomfield@intel.com Cc: stable@vger.kernel.org # v5.7+ --- drivers/gpu/drm/i915/gt/gen7_renderclear.c | 35 +++++++++++++++------- 1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c index d93d85cd3027..f3b8fea6226e 100644 --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c @@ -7,8 +7,6 @@ #include "i915_drv.h" #include "intel_gpu_commands.h"
-#define MAX_URB_ENTRIES 64 -#define STATE_SIZE (4 * 1024) #define GT3_INLINE_DATA_DELAYS 0x1E00 #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS))
@@ -34,8 +32,7 @@ struct batch_chunk { };
struct batch_vals { - u32 max_primitives; - u32 max_urb_entries; + u32 max_primitives; /* == number of VFE threads */ u32 cmd_size; u32 state_size; u32 state_start; @@ -50,18 +47,35 @@ static void batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv) { if (IS_HASWELL(i915)) { - bv->max_primitives = 280; - bv->max_urb_entries = MAX_URB_ENTRIES; + switch (INTEL_INFO(i915)->gt) { + default: + case 1: + bv->max_primitives = 70; + break; + case 2: + bv->max_primitives = 140; + break; + case 3: + bv->max_primitives = 280; + break; + } bv->surface_height = 16 * 16; bv->surface_width = 32 * 2 * 16; } else { - bv->max_primitives = 128; - bv->max_urb_entries = MAX_URB_ENTRIES / 2; + switch (INTEL_INFO(i915)->gt) { + default: + case 1: /* including vlv */ + bv->max_primitives = 36; + break; + case 2: + bv->max_primitives = 128; + break; + } bv->surface_height = 16 * 8; bv->surface_width = 32 * 16; } bv->cmd_size = bv->max_primitives * 4096; - bv->state_size = STATE_SIZE; + bv->state_size = SZ_4K; bv->state_start = bv->cmd_size; bv->batch_size = bv->cmd_size + bv->state_size; bv->scratch_size = bv->surface_height * bv->surface_width; @@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch, u32 urb_size, u32 curbe_size, u32 mode) { - u32 urb_entries = bv->max_urb_entries; u32 threads = bv->max_primitives - 1; u32 *cs = batch_alloc_items(batch, 32, 8);
@@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch, *cs++ = 0;
/* number of threads & urb entries for GPGPU vs Media Mode */ - *cs++ = threads << 16 | urb_entries << 8 | mode << 2; + *cs++ = threads << 16 | 1 << 8 | mode << 2;
*cs++ = 0;
On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote:
MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the range [0, n-1] where n is #EU * (#threads/EU) with the number of threads based on plaform and the number of EU based on the number of slices and subslices. This is a fixed number per platform/gt, so appropriately limit the number of threads we spawn to match the device.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024 Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com> Cc: Prathap Kumar Valsan <prathap.kumar.valsan at intel.com> Cc: Akeem G Abodunrin <akeem.g.abodunrin at intel.com> Cc: Balestrieri Francesco <francesco.balestrieri at intel.com> Cc: Bloomfield Jon <jon.bloomfield at intel.com> Cc: <stable at vger.kernel.org> # v5.7+
...
I tested this patch and found that it prevents the GPU hang I had reported on the HP Pavilion Mini 300-020 in https://gitlab.freedesktop.org/drm/intel/-/issues/2413.
In more detail: I built linux-next at tag next-20201106 without the patch, and booted the result on an Ubuntu 20.04 base system. As expected, I observed the hang that I had previously reported as soon as Cinnnamon started when I entered graphical.target.
I then applied this patch - that being the only change to my kernel - and I was able to boot to graphical.target 5 times consecutively without any GPU hang.
You may add my endorsements:
Tested-by: Randy Wright rwright@hpe.com Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2413 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2106
On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote:
MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the range [0, n-1] where n is #EU * (#threads/EU) with the number of threads based on plaform and the number of EU based on the number of slices and subslices. This is a fixed number per platform/gt, so appropriately limit the number of threads we spawn to match the device.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024
we need to get this closed...
Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com Cc: Balestrieri Francesco francesco.balestrieri@intel.com Cc: Bloomfield Jon jon.bloomfield@intel.com Cc: stable@vger.kernel.org # v5.7+
drivers/gpu/drm/i915/gt/gen7_renderclear.c | 35 +++++++++++++++------- 1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c b/drivers/gpu/drm/i915/gt/gen7_renderclear.c index d93d85cd3027..f3b8fea6226e 100644 --- a/drivers/gpu/drm/i915/gt/gen7_renderclear.c +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c @@ -7,8 +7,6 @@ #include "i915_drv.h" #include "intel_gpu_commands.h" -#define MAX_URB_ENTRIES 64 -#define STATE_SIZE (4 * 1024) #define GT3_INLINE_DATA_DELAYS 0x1E00 #define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS)) @@ -34,8 +32,7 @@ struct batch_chunk { }; struct batch_vals {
- u32 max_primitives;
- u32 max_urb_entries;
- u32 max_primitives; /* == number of VFE threads */ u32 cmd_size; u32 state_size; u32 state_start;
@@ -50,18 +47,35 @@ static void batch_get_defaults(struct drm_i915_private *i915, struct batch_vals *bv) { if (IS_HASWELL(i915)) {
bv->max_primitives = 280;
bv->max_urb_entries = MAX_URB_ENTRIES;
switch (INTEL_INFO(i915)->gt) {
default:
case 1:
bv->max_primitives = 70;
break;
case 2:
bv->max_primitives = 140;
break;
case 3:
bv->max_primitives = 280;
break;
bv->surface_height = 16 * 16; bv->surface_width = 32 * 2 * 16; } else {}
bv->max_primitives = 128;
bv->max_urb_entries = MAX_URB_ENTRIES / 2;
switch (INTEL_INFO(i915)->gt) {
default:
case 1: /* including vlv */
bv->max_primitives = 36;
break;
case 2:
bv->max_primitives = 128;
break;
bv->surface_height = 16 * 8; bv->surface_width = 32 * 16; } bv->cmd_size = bv->max_primitives * 4096;}
- bv->state_size = STATE_SIZE;
- bv->state_size = SZ_4K; bv->state_start = bv->cmd_size; bv->batch_size = bv->cmd_size + bv->state_size; bv->scratch_size = bv->surface_height * bv->surface_width;
@@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch, u32 urb_size, u32 curbe_size, u32 mode) {
- u32 urb_entries = bv->max_urb_entries; u32 threads = bv->max_primitives - 1; u32 *cs = batch_alloc_items(batch, 32, 8);
@@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch, *cs++ = 0; /* number of threads & urb entries for GPGPU vs Media Mode */
- *cs++ = threads << 16 | urb_entries << 8 | mode << 2;
- *cs++ = threads << 16 | 1 << 8 | mode << 2;
why urb_entries = 1 ?
the range is 0,64 and 0,128 depending on the sku.
in general there's a min of 32 URBs
*cs++ = 0; -- 2.20.1
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Rodrigo Vivi (2021-01-07 19:50:37)
On Fri, Oct 16, 2020 at 06:54:11PM +0100, Chris Wilson wrote:
MEDIA_STATE_VFE only accepts the 'maximum number of threads' in the range [0, n-1] where n is #EU * (#threads/EU) with the number of threads based on plaform and the number of EU based on the number of slices and subslices. This is a fixed number per platform/gt, so appropriately limit the number of threads we spawn to match the device.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2024
we need to get this closed...
Unfortunately this failed the validation test. And as that test is still not in CI, I cannot say why. My vote would be to remove the clear_residuals until it works on all target platforms. Plus we clearly need a hsw-gt1 in CI.
bv->scratch_size = bv->surface_height * bv->surface_width;
@@ -244,7 +258,6 @@ gen7_emit_vfe_state(struct batch_chunk *batch, u32 urb_size, u32 curbe_size, u32 mode) {
u32 urb_entries = bv->max_urb_entries; u32 threads = bv->max_primitives - 1; u32 *cs = batch_alloc_items(batch, 32, 8);
@@ -254,7 +267,7 @@ gen7_emit_vfe_state(struct batch_chunk *batch, *cs++ = 0; /* number of threads & urb entries for GPGPU vs Media Mode */
*cs++ = threads << 16 | urb_entries << 8 | mode << 2;
*cs++ = threads << 16 | 1 << 8 | mode << 2;
why urb_entries = 1 ?
We only used a single entry. There was no measurable benefit from assigning more entries, and the importance of any side effects from doing so unknown.
the range is 0,64 and 0,128 depending on the sku.
in general there's a min of 32 URBs
Don't forget num_entries * entry_size must fit within the URB allocation/allotment. -Chris
linux-stable-mirror@lists.linaro.org