[PATCH 5/6] drm/i915/gt: Serialize GRDOM access between multiple engine resets

List overview All Threads
Download

newer

older

Re: KASAN: use-after-free Read in...

[PATCH 5.4 00/16] 5.4.203-rc1...

Mauro Carvalho Chehab

15 Jun 2022 15 Jun '22

3:27 p.m.

From: Chris Wilson chris.p.wilson@intel.com

Don't allow two engines to be reset in parallel, as they would both try to select a reset bit (and send requests to common registers) and wait on that register, at the same time. Serialize control of the reset requests/acks using the uncore->lock, which will also ensure that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: stable@vger.kernel.org Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org ---

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++------- 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index a5338c3fde7a..c68d36fb5bbd 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask) return err; }

-static int gen6_reset_engines(struct intel_gt *gt, - intel_engine_mask_t engine_mask, - unsigned int retry) +static int __gen6_reset_engines(struct intel_gt *gt, + intel_engine_mask_t engine_mask, + unsigned int retry) { struct intel_engine_cs *engine; u32 hw_mask; @@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt, return gen6_hw_domain_reset(gt, hw_mask); }

+static int gen6_reset_engines(struct intel_gt *gt, + intel_engine_mask_t engine_mask, + unsigned int retry) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&gt->uncore->lock, flags); + ret = __gen6_reset_engines(gt, engine_mask, retry); + spin_unlock_irqrestore(&gt->uncore->lock, flags); + + return ret; +} + static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine) { int vecs_id; @@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine) rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit); }

-static int gen11_reset_engines(struct intel_gt *gt, - intel_engine_mask_t engine_mask, - unsigned int retry) +static int __gen11_reset_engines(struct intel_gt *gt, + intel_engine_mask_t engine_mask, + unsigned int retry) { struct intel_engine_cs *engine; intel_engine_mask_t tmp; @@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt, struct intel_engine_cs *engine; const bool reset_non_ready = retry >= 1; intel_engine_mask_t tmp; + unsigned long flags; int ret;

+ spin_lock_irqsave(&gt->uncore->lock, flags); + for_each_engine_masked(engine, gt, engine_mask, tmp) { ret = gen8_engine_reset_prepare(engine); if (ret && !reset_non_ready) @@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt, * This is best effort, so ignore any error from the initial reset. */ if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES) - gen11_reset_engines(gt, gt->info.engine_mask, 0); + __gen11_reset_engines(gt, gt->info.engine_mask, 0);

if (GRAPHICS_VER(gt->i915) >= 11) - ret = gen11_reset_engines(gt, engine_mask, retry); + ret = __gen11_reset_engines(gt, engine_mask, retry); else - ret = gen6_reset_engines(gt, engine_mask, retry); + ret = __gen6_reset_engines(gt, engine_mask, retry);

skip_reset: for_each_engine_masked(engine, gt, engine_mask, tmp) gen8_engine_reset_cancel(engine);

+ spin_unlock_irqrestore(&gt->uncore->lock, flags); + return ret; }

-- 2.36.1

Show replies by date

Tvrtko Ursulin

16 Jun 16 Jun

7:35 a.m.

On 15/06/2022 16:27, Mauro Carvalho Chehab wrote:

...

From: Chris Wilson chris.p.wilson@intel.com

Don't allow two engines to be reset in parallel, as they would both try to select a reset bit (and send requests to common registers) and wait on that register, at the same time. Serialize control of the reset requests/acks using the uncore->lock, which will also ensure that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Ah okay I get it, the fixes tag was applied indiscriminately to the whole series. :) It definitely does not belong in this patch.

Otherwise LGTM:

Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

Regards,

Tvrtko

...

Reported-by: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: stable@vger.kernel.org Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org

See [PATCH 0/6] at: https://lore.kernel.org/all/cover.1655306128.git.mchehab@kernel.org/

drivers/gpu/drm/i915/gt/intel_reset.c | 37 ++++++++++++++++++++------- 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index a5338c3fde7a..c68d36fb5bbd 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -300,9 +300,9 @@ static int gen6_hw_domain_reset(struct intel_gt *gt, u32 hw_domain_mask) return err; } -static int gen6_reset_engines(struct intel_gt *gt,
	      intel_engine_mask_t engine_mask,
	      unsigned int retry)
+static int __gen6_reset_engines(struct intel_gt *gt,
		intel_engine_mask_t engine_mask,
		unsigned int retry)
{ struct intel_engine_cs *engine; u32 hw_mask;
@@ -321,6 +321,20 @@ static int gen6_reset_engines(struct intel_gt *gt, return gen6_hw_domain_reset(gt, hw_mask); } +static int gen6_reset_engines(struct intel_gt *gt,
	      intel_engine_mask_t engine_mask,
	      unsigned int retry)
+{

unsigned long flags;

int ret;

spin_lock_irqsave(&gt->uncore->lock, flags);

ret = __gen6_reset_engines(gt, engine_mask, retry);

spin_unlock_irqrestore(&gt->uncore->lock, flags);

return ret;

+}

static struct intel_engine_cs *find_sfc_paired_vecs_engine(struct intel_engine_cs *engine) { int vecs_id;

@@ -487,9 +501,9 @@ static void gen11_unlock_sfc(struct intel_engine_cs *engine) rmw_clear_fw(uncore, sfc_lock.lock_reg, sfc_lock.lock_bit); } -static int gen11_reset_engines(struct intel_gt *gt,
	       intel_engine_mask_t engine_mask,
	       unsigned int retry)
+static int __gen11_reset_engines(struct intel_gt *gt,
		 intel_engine_mask_t engine_mask,
		 unsigned int retry)
{ struct intel_engine_cs *engine; intel_engine_mask_t tmp;
@@ -583,8 +597,11 @@ static int gen8_reset_engines(struct intel_gt *gt, struct intel_engine_cs *engine; const bool reset_non_ready = retry >= 1; intel_engine_mask_t tmp;

unsigned long flags; int ret;

spin_lock_irqsave(&gt->uncore->lock, flags);

for_each_engine_masked(engine, gt, engine_mask, tmp) { ret = gen8_engine_reset_prepare(engine); if (ret && !reset_non_ready)

@@ -612,17 +629,19 @@ static int gen8_reset_engines(struct intel_gt *gt, * This is best effort, so ignore any error from the initial reset. */ if (IS_DG2(gt->i915) && engine_mask == ALL_ENGINES)
gen11_reset_engines(gt, gt->info.engine_mask, 0);
__gen11_reset_engines(gt, gt->info.engine_mask, 0);
if (GRAPHICS_VER(gt->i915) >= 11)
ret = gen11_reset_engines(gt, engine_mask, retry);
ret = __gen11_reset_engines(gt, engine_mask, retry);
else
ret = gen6_reset_engines(gt, engine_mask, retry);
ret = __gen6_reset_engines(gt, engine_mask, retry);
skip_reset: for_each_engine_masked(engine, gt, engine_mask, tmp) gen8_engine_reset_cancel(engine);

spin_unlock_irqrestore(&gt->uncore->lock, flags);

return ret; }

Andi Shyti

23 Jun 23 Jun

11:17 a.m.

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:

...

From: Chris Wilson chris.p.wilson@intel.com

Don't allow two engines to be reset in parallel, as they would both try to select a reset bit (and send requests to common registers) and wait on that register, at the same time. Serialize control of the reset requests/acks using the uncore->lock, which will also ensure that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: stable@vger.kernel.org Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org

Reviewed-by: Andi Shyti andi.shyti@linux.intel.com

Thanks, Andi

Tvrtko Ursulin

24 Jun 24 Jun

8:34 a.m.

On 23/06/2022 12:17, Andi Shyti wrote:

...

Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:

...
From: Chris Wilson chris.p.wilson@intel.com

Don't allow two engines to be reset in parallel, as they would both try to select a reset bit (and send requests to common registers) and wait on that register, at the same time. Serialize control of the reset requests/acks using the uncore->lock, which will also ensure that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: stable@vger.kernel.org Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org

Reviewed-by: Andi Shyti andi.shyti@linux.intel.com

Notice I had a bunch of questions and asks in this series so please do not merge until those are addressed.

In this particular patch (and some others) for instance Fixes: tag, at least against that sha, shouldn't be there.

Regards,

Tvrtko

Mauro Carvalho Chehab

28 Jun 28 Jun

2:08 p.m.

Hi Tvrtko,

On Fri, 24 Jun 2022 09:34:21 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

On 23/06/2022 12:17, Andi Shyti wrote:

...
Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:

...
From: Chris Wilson chris.p.wilson@intel.com

Don't allow two engines to be reset in parallel, as they would both try to select a reset bit (and send requests to common registers) and wait on that register, at the same time. Serialize control of the reset requests/acks using the uncore->lock, which will also ensure that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: stable@vger.kernel.org Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org

Reviewed-by: Andi Shyti andi.shyti@linux.intel.com

Notice I had a bunch of questions and asks in this series so please do not merge until those are addressed.

In this particular patch (and some others) for instance Fixes: tag, at least against that sha, shouldn't be there.

Hmm... I sent an answer to your points, but I can't see it at:

https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel...

Maybe it got lost somewhere, I dunno.

Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not directly related to changeset 7938d61591d3. Yet, this one is required for patch 6 to work.

The other patches on this series, though, are modifying the code introduced by changeset 7938d61591d3.

Patch 2 is clearly a workaround needed for TLB cache invalidation to work on some GPUs. So, while not related to Broadwell, they're also fixing some TLB cache issues. So, IMO, it should keep the fixes.

I tried to port just the two serialize patches to drm-tip, in order to solve the issues on Broadwell, but it didn't work, as the logic inside the spinlock could be calling schedule() with a spinlock hold:

Jun 14 17:38:48 silver kernel: [ 23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496 Jun 14 17:38:48 silver kernel: [ 23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1 Jun 14 17:38:48 silver kernel: [ 23.227818] preempt_count: 1, expected: 0 Jun 14 17:38:48 silver kernel: [ 23.227819] RCU nest depth: 0, expected: 0 Jun 14 17:38:48 silver kernel: [ 23.227820] 5 locks held by kworker/u8:1/37: Jun 14 17:38:48 silver kernel: [ 23.227822] #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580 Jun 14 17:38:48 silver kernel: [ 23.227831] #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580 Jun 14 17:38:48 silver kernel: [ 23.227837] #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915] Jun 14 17:38:48 silver kernel: [ 23.228283] #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915] Jun 14 17:38:48 silver kernel: [ 23.228663] #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]

I didn't investigate the root cause, but it seems related to PM, so patches 1 and 3 seem to be required for the serialization logic to actually work.

So, I would keep the Fixes: tag mentioning changeset 7938d61591d3 on patches: 1, 2, 3 and 6.

Yet, IMO the entire series should be merged on -stable.

If that's OK for you and there's no additional issues to be addressed, I'll submit a v2 of this series removing the Fixes tag from patches 4 and 5.

Regards, Mauro

Tvrtko Ursulin

3:49 p.m.

Hi,

On 27/06/2022 10:00, Mauro Carvalho Chehab (by way of Mauro Carvalho Chehab mauro.chehab@linux.intel.com) wrote:

...

Hi Tvrtko,

On Fri, 24 Jun 2022 09:34:21 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
On 23/06/2022 12:17, Andi Shyti wrote:

...
Hi Mauro,

On Wed, Jun 15, 2022 at 04:27:39PM +0100, Mauro Carvalho Chehab wrote:

...
From: Chris Wilson chris.p.wilson@intel.com

Don't allow two engines to be reset in parallel, as they would both try to select a reset bit (and send requests to common registers) and wait on that register, at the same time. Serialize control of the reset requests/acks using the uncore->lock, which will also ensure that no other GT state changes at the same time as the actual reset.

Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")

Reported-by: Mika Kuoppala mika.kuoppala@linux.intel.com Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Andi Shyti andi.shyti@intel.com Cc: stable@vger.kernel.org Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org

Reviewed-by: Andi Shyti andi.shyti@linux.intel.com

Notice I had a bunch of questions and asks in this series so please do not merge until those are addressed.

In this particular patch (and some others) for instance Fixes: tag, at least against that sha, shouldn't be there.

Hmm... I sent an answer to your points, but I can't see it at:

https://lore.kernel.org/all/160e613f-a0a8-18ff-5d4b-249d4280caa8@linux.intel...

Maybe it got lost somewhere, I dunno.

Yeah, no replies received on my end I'm afraid.

...

Yeah, indeed the fixes tag on patch 5/6 should be removed as this is not directly related to changeset 7938d61591d3. Yet, this one is required for patch 6 to work.

The other patches on this series, though, are modifying the code introduced by changeset 7938d61591d3.

Modifying the code does not strictly means something is a fix for a certain patch.

...

Patch 2 is clearly a workaround needed for TLB cache invalidation to work on some GPUs. So, while not related to Broadwell, they're also fixing some TLB cache issues. So, IMO, it should keep the fixes.

Umesh commented that patch 2 is not needed - who is right then? :)

...

I tried to port just the two serialize patches to drm-tip, in order to solve the issues on Broadwell, but it didn't work, as the logic inside the spinlock could be calling schedule() with a spinlock hold: Jun 14 17:38:48 silver kernel: [ 23.227813] BUG: sleeping function called from invalid context at drivers/gpu/drm/i915/intel_uncore.c:2496 Jun 14 17:38:48 silver kernel: [ 23.227816] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 37, name: kworker/u8:1 Jun 14 17:38:48 silver kernel: [ 23.227818] preempt_count: 1, expected: 0 Jun 14 17:38:48 silver kernel: [ 23.227819] RCU nest depth: 0, expected: 0 Jun 14 17:38:48 silver kernel: [ 23.227820] 5 locks held by kworker/u8:1/37: Jun 14 17:38:48 silver kernel: [ 23.227822] #0: ffff88811159b538 ((wq_completion)i915){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580 Jun 14 17:38:48 silver kernel: [ 23.227831] #1: ffffc90000183e60 ((work_completion)(&(&i915->mm.free_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1e0/0x580 Jun 14 17:38:48 silver kernel: [ 23.227837] #2: ffff88811b34c5e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: __i915_gem_free_objects+0xba/0x210 [i915] Jun 14 17:38:48 silver kernel: [ 23.228283] #3: ffff88810a66c2d8 (&gt->tlb_invalidate_lock){+.+.}-{3:3}, at: intel_gt_invalidate_tlbs+0xe7/0x4d0 [i915] Jun 14 17:38:48 silver kernel: [ 23.228663] #4: ffff88810a668f28 (&uncore->lock){-.-.}-{2:2}, at: intel_gt_invalidate_tlbs+0x115/0x4d0 [i915]

I didn't investigate the root cause, but it seems related to PM, so patches 1 and 3 seem to be required for the serialization logic to actually work.

Yes that is clear, what is needed is the split of the for_each_engine loop into request and wait.

But question is how much backporting trouble will the _extra_ changes patch 1 brings create.

In the ideal world patch 1 wouldn't be an optimising one, I mean adding skipping of TLB invalidations on idle engines but just the loop split. That would make it smaller and more suitable for Cc: stable. Because both i915_gem_pages.c and intel_gt_pm.h hunks wouldn't even be there. And the refactor in intel_gt_invalidate_tlbs would be smaller since it wouldn't be adding the engine awake checks...

...

So, I would keep the Fixes: tag mentioning changeset 7938d61591d3 on patches: 1, 2, 3 and 6.

... which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Maybe it even makes sense to squash such 1&2 into a single patch.

Again, since the original TLB flush was backported quite far back into long term stable releases I think it would be much easier to really have a minimal patch/series to fix Broadwell in those kernels.

Regards,

Tvrtko

...

Yet, IMO the entire series should be merged on -stable.

If that's OK for you and there's no additional issues to be addressed, I'll submit a v2 of this series removing the Fixes tag from patches 4 and 5.

Regards, Mauro

Mauro Carvalho Chehab

29 Jun 29 Jun

3:30 p.m.

On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

.. which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I still think that other TLB patches are needed/desired upstream, but I'll submit them on a separate series. Let's fix the regression first ;-)

Regards, Mauro

Tvrtko Ursulin

4:02 p.m.

On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:

...

On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
.. which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson chris.p.wilson@intel.com Date: Wed Jun 29 16:25:24 2022 +0100

drm/i915/gt: Serialize TLB invalidates with GT resets

Avoid trying to invalidate the TLB in the middle of performing an engine reset, as this may result in the reset timing out. Currently, the TLB invalidate is only serialised by its own mutex, forgoing the uncore lock, but we can take the uncore->lock as well to serialise the mmio access, thereby serialising with the GDRST.

Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with i915 selftest/hangcheck.

Cc: stable@vger.kernel.org Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store") Reported-by: Mauro Carvalho Chehab mchehab@kernel.org Tested-by: Mauro Carvalho Chehab mchehab@kernel.org Reviewed-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Chris Wilson chris.p.wilson@intel.com Cc: Tvrtko Ursulin tvrtko.ursulin@linux.intel.com Acked-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Andi Shyti andi.shyti@intel.com Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Tvrtko Ursulin tvrtko.ursulin@intel.com

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(&gt->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);

+ spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */ + + for_each_engine(engine, gt, id) { + struct reg_and_bit rb; + + rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); + if (!i915_mmio_reg_offset(rb.reg)) + continue; + + intel_uncore_write_fw(uncore, rb.reg, rb.bit); + } + + spin_unlock_irq(&uncore->lock); + for_each_engine(engine, gt, id) { + struct reg_and_bit rb; + /* * HW architecture suggest typical invalidation time at 40us, * with pessimistic cases up to 100us and a recommendation to @@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4; - struct reg_and_bit rb;

rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue;

- intel_uncore_write_fw(uncore, rb.reg, rb.bit); if (__intel_wait_for_register_fw(uncore, rb.reg, rb.bit, 0, timeout_us, timeout_ms,

If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

...

I still think that other TLB patches are needed/desired upstream, but I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko

Mauro Carvalho Chehab

30 Jun 30 Jun

7:32 a.m.

Em Wed, 29 Jun 2022 17:02:59 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...

On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:

...
On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
.. which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson chris.p.wilson@intel.com Date: Wed Jun 29 16:25:24 2022 +0100
 drm/i915/gt: Serialize TLB invalidates with GT resets
 
 Avoid trying to invalidate the TLB in the middle of performing an
 engine reset, as this may result in the reset timing out. Currently,
 the TLB invalidate is only serialised by its own mutex, forgoing the
 uncore lock, but we can take the uncore->lock as well to serialise
 the mmio access, thereby serialising with the GDRST.
 
 Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
 i915 selftest/hangcheck.
 
 Cc: stable@vger.kernel.org
 Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
 Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
 Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
 Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
 Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
 Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
 Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
 Reviewed-by: Andi Shyti <andi.shyti@intel.com>
 Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
 Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(&gt->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
  for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
          rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
          if (!i915_mmio_reg_offset(rb.reg))
                  continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
  }
  spin_unlock_irq(&uncore->lock);
   for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
           /*
            * HW architecture suggest typical invalidation time at 40us,
            * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4;
          struct reg_and_bit rb;
rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
           if (__intel_wait_for_register_fw(uncore,
                                            rb.reg, rb.bit, 0,
                                            timeout_us, timeout_ms,

This won't work, as it is not serializing TLB cache invalidation with i915 resets. Besides that, this is more or less merging patches 1 and 3, placing patches with different rationales altogether. Upstream rule is to have one logical change per patch.

...

If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch or two. See, intel_gt_invalidate_tlbs() function doesn't exist before changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"), so, it shouldn't have merge conflicts while backporting it, maybe except if some functions it calls (or parameters) have changed. On such case, the backport fix should be trivial, and the end result of backporting one folded patch or two would be the same.

If any conflict happens, I can help doing the backports.

...

...
I still think that other TLB patches are needed/desired upstream, but I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko

Tvrtko Ursulin

8:12 a.m.

On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:

...

Em Wed, 29 Jun 2022 17:02:59 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...
On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:

...
On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
.. which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson chris.p.wilson@intel.com Date: Wed Jun 29 16:25:24 2022 +0100
  drm/i915/gt: Serialize TLB invalidates with GT resets
  
  Avoid trying to invalidate the TLB in the middle of performing an
  engine reset, as this may result in the reset timing out. Currently,
  the TLB invalidate is only serialised by its own mutex, forgoing the
  uncore lock, but we can take the uncore->lock as well to serialise
  the mmio access, thereby serialising with the GDRST.
  
  Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
  i915 selftest/hangcheck.
  
  Cc: stable@vger.kernel.org
  Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
  Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
  Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
  Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
  Reviewed-by: Andi Shyti <andi.shyti@intel.com>
  Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(&gt->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
  for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
          rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
          if (!i915_mmio_reg_offset(rb.reg))
                  continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
  }
  spin_unlock_irq(&uncore->lock);
    for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
            /*
             * HW architecture suggest typical invalidation time at 40us,
             * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4;
          struct reg_and_bit rb;
rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
            if (__intel_wait_for_register_fw(uncore,
                                             rb.reg, rb.bit, 0,
                                             timeout_us, timeout_ms,
This won't work, as it is not serializing TLB cache invalidation with i915 resets. Besides that, this is more or less merging patches 1 and 3,

Could you explain why you think it is not doing exactly that? In both versions end result is TLB flush requests are under the uncore lock and waits are outside it.

...

placing patches with different rationales altogether. Upstream rule is to have one logical change per patch.

I don't think it applies in this case. It is simply splitting into two loops so lock can be held across all mmio writes. I think of it this way - what is the rationale for sending only the first patch to stable? What does it _fix_ on it's own?

...

...
If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch or two. See, intel_gt_invalidate_tlbs() function doesn't exist before changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"), so, it shouldn't have merge conflicts while backporting it, maybe except if some functions it calls (or parameters) have changed. On such case, the backport fix should be trivial, and the end result of backporting one folded patch or two would be the same.

Yes a lot of things changed. Not least engine and GT pm code. Note that TLB flushing was backported all the way to 4.4 so any hunk you don't strictly need can and will bite you. I have attached a tarball of patches for you to explore. :) Regards,

Tvrtko

...

If any conflict happens, I can help doing the backports.

...
...
I still think that other TLB patches are needed/desired upstream, but I'll submit them on a separate series. Let's fix the regression first ;-)

Yep, that's exactly right.

Regards,

Tvrtko

Mauro Carvalho Chehab

4:01 p.m.

Em Thu, 30 Jun 2022 09:12:41 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...

On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:

...
Em Wed, 29 Jun 2022 17:02:59 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...
On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:

...
On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
.. which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson chris.p.wilson@intel.com Date: Wed Jun 29 16:25:24 2022 +0100
  drm/i915/gt: Serialize TLB invalidates with GT resets
  
  Avoid trying to invalidate the TLB in the middle of performing an
  engine reset, as this may result in the reset timing out. Currently,
  the TLB invalidate is only serialised by its own mutex, forgoing the
  uncore lock, but we can take the uncore->lock as well to serialise
  the mmio access, thereby serialising with the GDRST.
  
  Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
  i915 selftest/hangcheck.
  
  Cc: stable@vger.kernel.org
  Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
  Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
  Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
  Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
  Reviewed-by: Andi Shyti <andi.shyti@intel.com>
  Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
  Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(&gt->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
  for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
          rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
          if (!i915_mmio_reg_offset(rb.reg))
                  continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
  }
  spin_unlock_irq(&uncore->lock);
    for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
            /*
             * HW architecture suggest typical invalidation time at 40us,
             * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4;
          struct reg_and_bit rb;
rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
            if (__intel_wait_for_register_fw(uncore,
                                             rb.reg, rb.bit, 0,
                                             timeout_us, timeout_ms,
This won't work, as it is not serializing TLB cache invalidation with i915 resets. Besides that, this is more or less merging patches 1 and 3,
Could you explain why you think it is not doing exactly that? In both versions end result is TLB flush requests are under the uncore lock and waits are outside it.

Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes. This is needed in order to fix the regression.

...

...
placing patches with different rationales altogether. Upstream rule is to have one logical change per patch.

I don't think it applies in this case. It is simply splitting into two loops so lock can be held across all mmio writes. I think of it this way

what is the rationale for sending only the first patch to stable? What

does it _fix_ on it's own?

There's no -stable rule enforcing that only one patch would be allowed, nor saying that patches should be fold, doing multiple changes on as single patch just due to "Fixes" tag.

So, while several -stable fixes can be done on a single patch, there are fixes that will require multiple patches. That's nothing wrong with that.

The only rule is that backports should follow what's merged upstream. So, if, in order to fix a regression, multiple patches are needed upstream, in principle, all of those can be backported if they fit at -stable rules.

As an example, once we backported a patch series on media that had ~20 patches, addressing security issues at the media compat32 logic (media ioctls usually pass structs and some with pointers). As the issue was discovered several years after compat32 got introduced, those 22 patches (some containing compat32 redesigns) had to be backported to all maintained LTS.

In this specific case, fixing the regression requires 3 logical changes:

1) Split the loop; 2) Add serialize logic to i915 reset; 3) use the same i915 reset spinlock to serialize TLB cache invalidation.

Neither one of those logical changes alone would solve the issue. That's why I originally added the same Fixes: to the entire series: basically, any Kernel that has the TLB patch backported will require those three logical changes to be backported too.

That basically will follow what's there at the Kernel process docs:

"If your patch fixes a bug in a specific commit, e.g. you found an issue using ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of the SHA-1 ID, and the one line summary."

Documentation/process/submitting-patches.rst

See, Fixes was originally introduced to be a hint to help stable and distro maintainers to identify how far they need to backport a patch. That's mainly why I placed fixes to the entire series. Yet, the same will also happen, in practice, if we place:

Cc: stable@vger.kernel.org # Up to version 4.4

Greg, Sasha and others -stable/distro maintainers will also have a (much less precise) hint about how far the backport is needed.

...

...
If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch or two. See, intel_gt_invalidate_tlbs() function doesn't exist before changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"), so, it shouldn't have merge conflicts while backporting it, maybe except if some functions it calls (or parameters) have changed. On such case, the backport fix should be trivial, and the end result of backporting one folded patch or two would be the same.

Yes a lot of things changed. Not least engine and GT pm code. Note that TLB flushing was backported all the way to 4.4 so any hunk you don't strictly need can and will bite you. I have attached a tarball of patches for you to explore. :) Regards,

Thanks! That's very helpful to check the amount of work. It makes easy to use interdiff and (k)diff3 to check what changed.

From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs() are really trivial.

On 4.14, the function was added on a different file (intel_gem), and there were a few more API differences, as only gen8 code is there, but again, the changes are trivial: mostly macros/functions were renamed and some function parameters changed.

From 4.9 to 4.14 there were also some changes but they also look trivial.

Kernel 4.4 has some other differences - the loop logic is different, and there's a ring initialization function, but, as version 4.4 is not listed anymore as LTS at kernel.org, we probably need to backport only up to 4.9.

All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just have spin lock/unlock for the gt uncore spinlock. Those will very likely require some work on Kernels 4.x, but folding (or not) the patches won't really help.

Regards, Mauro

Tvrtko Ursulin

1 Jul 1 Jul

7:56 a.m.

On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:

...

Em Thu, 30 Jun 2022 09:12:41 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...
On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:

...
Em Wed, 29 Jun 2022 17:02:59 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...
On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:

...
On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...
.. which for me means a different patch 1, followed by patch 6 (moved to be patch 2) would be ideal stable material.

Then we have the current patch 2 which is open/unknown (to me at least).

And the rest seem like optimisations which shouldn't be tagged as fixes.

Apart from patch 5 which should be cc: stable, but no fixes as agreed.

Could you please double check if what I am suggesting here is feasible to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson chris.p.wilson@intel.com Date: Wed Jun 29 16:25:24 2022 +0100
   drm/i915/gt: Serialize TLB invalidates with GT resets
   
   Avoid trying to invalidate the TLB in the middle of performing an
   engine reset, as this may result in the reset timing out. Currently,
   the TLB invalidate is only serialised by its own mutex, forgoing the
   uncore lock, but we can take the uncore->lock as well to serialise
   the mmio access, thereby serialising with the GDRST.
   
   Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
   i915 selftest/hangcheck.
   
   Cc: stable@vger.kernel.org
   Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
   Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
   Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
   Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
   Reviewed-by: Andi Shyti <andi.shyti@intel.com>
   Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(&gt->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
  for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
          rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
          if (!i915_mmio_reg_offset(rb.reg))
                  continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
  }
  spin_unlock_irq(&uncore->lock);
     for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
             /*
              * HW architecture suggest typical invalidation time at 40us,
              * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4;
          struct reg_and_bit rb;
rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
             if (__intel_wait_for_register_fw(uncore,
                                              rb.reg, rb.bit, 0,
                                              timeout_us, timeout_ms,
This won't work, as it is not serializing TLB cache invalidation with i915 resets. Besides that, this is more or less merging patches 1 and 3,
Could you explain why you think it is not doing exactly that? In both versions end result is TLB flush requests are under the uncore lock and waits are outside it.
Sure, but patch 2/3 (see v2) serializes i915 reset with TLB cache changes. This is needed in order to fix the regression.

Not "the" regression, and not even _a_ *regression*. 2/3 fixes an pre-existing and unrelated problem. Or only tangentially related if you want. 2/3 fixes a hang if two engine resets would happen to coincide. Nothing about TLB flushing.

...

...
...
placing patches with different rationales altogether. Upstream rule is to have one logical change per patch.

I don't think it applies in this case. It is simply splitting into two loops so lock can be held across all mmio writes. I think of it this way

what is the rationale for sending only the first patch to stable? What

does it _fix_ on it's own?

There's no -stable rule enforcing that only one patch would be allowed, nor saying that patches should be fold, doing multiple changes on as single patch just due to "Fixes" tag.

Well if we want to be pedantic what do stable rules say about adding new features - is skipping idle engines (which is a software concept) a fix or a new optimisation?

...

So, while several -stable fixes can be done on a single patch, there are fixes that will require multiple patches. That's nothing wrong with that.

Agreed. But the point of my argument is that a) 1st patch does not fix anything on it's own (in relation to the regression), b) is adding improvements which will just be extra work to backport to old kernels.

...

The only rule is that backports should follow what's merged upstream. So, if, in order to fix a regression, multiple patches are needed upstream, in principle, all of those can be backported if they fit at -stable rules.

As an example, once we backported a patch series on media that had ~20 patches, addressing security issues at the media compat32 logic (media ioctls usually pass structs and some with pointers). As the issue was discovered several years after compat32 got introduced, those 22 patches (some containing compat32 redesigns) had to be backported to all maintained LTS.

In this specific case, fixing the regression requires 3 logical changes:

Split the loop;

Add serialize logic to i915 reset;

use the same i915 reset spinlock to serialize TLB cache invalidation.

Neither one of those logical changes alone would solve the issue. That's why I originally added the same Fixes: to the entire series: basically, any Kernel that has the TLB patch backported will require those three logical changes to be backported too.

That basically will follow what's there at the Kernel process docs:

"If your patch fixes a bug in a specific commit, e.g. you found an issue using ``git bisect``, please use the 'Fixes:' tag with the first 12 characters of the SHA-1 ID, and the one line summary."

Documentation/process/submitting-patches.rst

See, Fixes was originally introduced to be a hint to help stable and distro maintainers to identify how far they need to backport a patch. That's mainly why I placed fixes to the entire series. Yet, the same will also happen, in practice, if we place:

Cc: stable@vger.kernel.org # Up to version 4.4

Greg, Sasha and others -stable/distro maintainers will also have a (much less precise) hint about how far the backport is needed.

...
...
If this works it would be least painful to backport. The other improvements can then be devoid of the fixes tag.

From backport PoV, it wouldn't make any difference applying one patch or two. See, intel_gt_invalidate_tlbs() function doesn't exist before changeset 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store"), so, it shouldn't have merge conflicts while backporting it, maybe except if some functions it calls (or parameters) have changed. On such case, the backport fix should be trivial, and the end result of backporting one folded patch or two would be the same.

Yes a lot of things changed. Not least engine and GT pm code. Note that TLB flushing was backported all the way to 4.4 so any hunk you don't strictly need can and will bite you. I have attached a tarball of patches for you to explore. :) Regards,

Thanks! That's very helpful to check the amount of work. It makes easy to use interdiff and (k)diff3 to check what changed.

From it, the differences between 5.4 and 5.16 at intel_gt_invalidate_tlbs() are really trivial.

On 4.14, the function was added on a different file (intel_gem), and there were a few more API differences, as only gen8 code is there, but again, the changes are trivial: mostly macros/functions were renamed and some function parameters changed.

From 4.9 to 4.14 there were also some changes but they also look trivial.

Kernel 4.4 has some other differences - the loop logic is different, and there's a ring initialization function, but, as version 4.4 is not listed anymore as LTS at kernel.org, we probably need to backport only up to 4.9.

All the above should be affecting patch v2 1/3. Patches v2 2/3 and 3/3 just have spin lock/unlock for the gt uncore spinlock. Those will very likely require some work on Kernels 4.x, but folding (or not) the patches won't really help.

What about intel_engine_pm_is_awake, what will you do with that one?

Regards,

Tvrtko

Mauro Carvalho Chehab

4 Jul 4 Jul

8:42 a.m.

On Fri, 1 Jul 2022 08:56:53 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote:

...

On 30/06/2022 17:01, Mauro Carvalho Chehab wrote:

...
Em Thu, 30 Jun 2022 09:12:41 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...
On 30/06/2022 08:32, Mauro Carvalho Chehab wrote:

...
Em Wed, 29 Jun 2022 17:02:59 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com escreveu:

...
On 29/06/2022 16:30, Mauro Carvalho Chehab wrote:

...
On Tue, 28 Jun 2022 16:49:23 +0100 Tvrtko Ursulin tvrtko.ursulin@linux.intel.com wrote: > .. which for me means a different patch 1, followed by patch 6 (moved > to be patch 2) would be ideal stable material. > > Then we have the current patch 2 which is open/unknown (to me at least). > > And the rest seem like optimisations which shouldn't be tagged as fixes. > > Apart from patch 5 which should be cc: stable, but no fixes as agreed. > > Could you please double check if what I am suggesting here is feasible > to implement and if it is just send those minimal patches out alone?

Tested and porting just those 3 patches are enough to fix the Broadwell bug.

So, I submitted a v2 of this series with just those. They all need to be backported to stable.

I would really like to give even a smaller fix a try. Something like, although not even compile tested:

commit 4d5e94aef164772f4d85b3b4c1a46eac9a2bd680 Author: Chris Wilson chris.p.wilson@intel.com Date: Wed Jun 29 16:25:24 2022 +0100
   drm/i915/gt: Serialize TLB invalidates with GT resets
   
   Avoid trying to invalidate the TLB in the middle of performing an
   engine reset, as this may result in the reset timing out. Currently,
   the TLB invalidate is only serialised by its own mutex, forgoing the
   uncore lock, but we can take the uncore->lock as well to serialise
   the mmio access, thereby serialising with the GDRST.
   
   Tested on a NUC5i7RYB, BIOS RYBDWi35.86A.0380.2019.0517.1530 with
   i915 selftest/hangcheck.
   
   Cc: stable@vger.kernel.org
   Fixes: 7938d61591d3 ("drm/i915: Flush TLBs before releasing backing store")
   Reported-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Tested-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Signed-off-by: Chris Wilson <chris.p.wilson@intel.com>
   Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
   Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
   Reviewed-by: Andi Shyti <andi.shyti@intel.com>
   Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org>
   Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c b/drivers/gpu/drm/i915/gt/intel_gt.c index 8da3314bb6bf..aaadd0b02043 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt.c +++ b/drivers/gpu/drm/i915/gt/intel_gt.c @@ -952,7 +952,23 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) mutex_lock(&gt->tlb_invalidate_lock); intel_uncore_forcewake_get(uncore, FORCEWAKE_ALL);
  spin_lock_irq(&uncore->lock); /* serialise invalidate with GT reset */
  for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
          rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num);
          if (!i915_mmio_reg_offset(rb.reg))
                  continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
  }
  spin_unlock_irq(&uncore->lock);
     for_each_engine(engine, gt, id) {
          struct reg_and_bit rb;
             /*
              * HW architecture suggest typical invalidation time at 40us,
              * with pessimistic cases up to 100us and a recommendation to
@@ -960,13 +976,11 @@ void intel_gt_invalidate_tlbs(struct intel_gt *gt) */ const unsigned int timeout_us = 100; const unsigned int timeout_ms = 4;
          struct reg_and_bit rb;
rb = get_reg_and_bit(engine, regs == gen8_regs, regs, num); if (!i915_mmio_reg_offset(rb.reg)) continue;
          intel_uncore_write_fw(uncore, rb.reg, rb.bit);
             if (__intel_wait_for_register_fw(uncore,
                                              rb.reg, rb.bit, 0,
                                              timeout_us, timeout_ms,

...

What about intel_engine_pm_is_awake, what will you do with that one?

Ok, let's keep this series plain simple. I'm dropping PM awake logic as you suggested on v3, keeping just the bare minimal required to fix the selftest breakage.

That actually means that we're not considering on such backports that TLB cache invalidation does add performance penalties and might cause apps to break.

I suspect that we'll need to also backport at least some of the other patches like the PM awake logic and the one that avoids TLB cache invalidation when the memory was not touched by userspace, but let's focus first on fixing the regression pointed by selftest.

Regards, Mauro

1200

days inactive

1219

days old

linux-stable-mirror@lists.linaro.org

12 comments

participants

tags (0)

participants (4)

Andi Shyti
Mauro Carvalho Chehab
Mauro Carvalho Chehab
Tvrtko Ursulin