We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue.
This trys to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one.
Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" terrence.xu@intel.com Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com --- drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 98eb48c24c46..345b4be5ebad 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW); + MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write); diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index b8ac80765461..f776c470914d 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ + {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ + {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang zhenyuw@linux.intel.com wrote:
We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue.
This trys to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one.
Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" terrence.xu@intel.com Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com
Adding Jon Bloomfield, since different settings between linux and windows for something that can hard-hang the machine on gen9 sounds ... not good. -Daniel
drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 98eb48c24c46..345b4be5ebad 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW);
MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS); MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write);
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index b8ac80765461..f776c470914d 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
{RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
{RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
-- 2.32.0.rc2
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Wed, Jul 21, 2021 at 4:08 AM Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Jul 21, 2021 at 8:21 AM Zhenyu Wang zhenyuw@linux.intel.com wrote:
We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue.
This trys to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one.
Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" terrence.xu@intel.com Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com
Adding Jon Bloomfield, since different settings between linux and windows for something that can hard-hang the machine on gen9 sounds ... not good.
The difference there is legit and intentional.
As far as what we do about it for GVT, if we can safely smash L3 atomics off underneath Windows without causing problems for the VM, we should do that. If not, we need to discuss this internally before proceeding.
--Jason
-Daniel
drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 98eb48c24c46..345b4be5ebad 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3134,6 +3134,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW);
MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS); MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write);
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index b8ac80765461..f776c470914d 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
{RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
{RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
-- 2.32.0.rc2
Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to commit 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue.
This tries to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one.
Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" terrence.xu@intel.com Cc: "Bloomfield, Jon" jon.bloomfield@intel.com Cc: "Ekstrand, Jason" jason.ekstrand@intel.com Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com --- drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 06024d321a1a..cde0a477fb49 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3149,6 +3149,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW); + MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write); diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index b8ac80765461..f776c470914d 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */ + {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */ + {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
On Fri, 6 Aug 2021, Zhenyu Wang wrote:
Thanks for the fix! Otherwise Windows VM is unusable with recent kernel.
Reviewed-by: Colin Xu colin.xu@intel.com
We've seen recent regression with host and windows VM running simultaneously that cause gpu hang or even crash. Finally bisect to commit 58586680ffad ("drm/i915: Disable atomics in L3 for gen9"), which seems cached atomics behavior difference caused regression issue.
This tries to add new scratch register handler and add those in mmio save/restore list for context switch. No gpu hang produced with this one.
Cc: stable@vger.kernel.org # 5.12+ Cc: "Xu, Terrence" terrence.xu@intel.com Cc: "Bloomfield, Jon" jon.bloomfield@intel.com Cc: "Ekstrand, Jason" jason.ekstrand@intel.com Fixes: 58586680ffad ("drm/i915: Disable atomics in L3 for gen9") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com
drivers/gpu/drm/i915/gvt/handlers.c | 1 + drivers/gpu/drm/i915/gvt/mmio_context.c | 2 ++ 2 files changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c index 06024d321a1a..cde0a477fb49 100644 --- a/drivers/gpu/drm/i915/gvt/handlers.c +++ b/drivers/gpu/drm/i915/gvt/handlers.c @@ -3149,6 +3149,7 @@ static int init_bdw_mmio_info(struct intel_gvt *gvt) MMIO_DFH(_MMIO(0xb100), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_DFH(_MMIO(0xb10c), D_BDW, F_CMD_ACCESS, NULL, NULL); MMIO_D(_MMIO(0xb110), D_BDW);
MMIO_D(GEN9_SCRATCH_LNCF1, D_BDW_PLUS);
MMIO_F(_MMIO(0x24d0), 48, F_CMD_ACCESS | F_CMD_WRITE_PATCH, 0, 0, D_BDW_PLUS, NULL, force_nonpriv_write);
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index b8ac80765461..f776c470914d 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -105,6 +105,8 @@ static struct engine_mmio gen9_engine_mmio_list[] __cacheline_aligned = { {RCS0, COMMON_SLICE_CHICKEN2, 0xffff, true}, /* 0x7014 */ {RCS0, GEN9_CS_DEBUG_MODE1, 0xffff, false}, /* 0x20ec */ {RCS0, GEN8_L3SQCREG4, 0, false}, /* 0xb118 */
- {RCS0, GEN9_SCRATCH1, 0, false}, /* 0xb11c */
- {RCS0, GEN9_SCRATCH_LNCF1, 0, false}, /* 0xb008 */ {RCS0, GEN7_HALF_SLICE_CHICKEN1, 0xffff, true}, /* 0xe100 */ {RCS0, HALF_SLICE_CHICKEN2, 0xffff, true}, /* 0xe180 */ {RCS0, HALF_SLICE_CHICKEN3, 0xffff, true}, /* 0xe184 */
-- 2.32.0.rc2
-- Best Regards, Colin Xu
linux-stable-mirror@lists.linaro.org