TLB invalidation is a slow operation. It should not be doing lightly, as it
causes performance regressions, like this:
[178.821002] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
This series contain
1) some patches that makes TLB invalidation to happen only on
active, non-wedged engines, doing cache invalidation in batch
and only when GT objects are exposed to userspace:
drm/i915/gt: Ignore TLB invalidations on idle engines
drm/i915/gt: Only invalidate TLBs exposed to user manipulation
drm/i915/gt: Skip TLB invalidations once wedged
drm/i915/gt: Batch TLB invalidations
drm/i915/gt: Move TLB invalidation to its own file
2) It fixes two bugs, being the first a workaround:
drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
drm/i915: Invalidate the TLBs on each GT
drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
3) It adds GuC support. Besides providing TLB invalidation on some
additional hardware, this should also help serializing GuC operations
with TLB invalidation:
drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
drm/i915/guc: Define CTB based TLB invalidation routines
drm/i915: Add platform macro for selective tlb flush
drm/i915: Define GuC Based TLB invalidation routines
drm/i915: Add generic interface for tlb invalidation for XeHP
drm/i915: Use selective tlb invalidations where supported
4) It adds the corresponding kernel-doc markups for the kAPI
used for TLB invalidation.
While I could have split this into smaller pieces, I'm opting to send
them altogether, in order for CI trybot to better verify what issues
will be closed with this series.
---
v2:
- no changes. Just rebased on the top of drm-tip: 2022y-07m-14d-08h-35m-36s,
as CI trybot was having troubles applying it. Hopefully, it will now work.
Chris Wilson (7):
drm/i915/gt: Ignore TLB invalidations on idle engines
drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
drm/i915/gt: Only invalidate TLBs exposed to user manipulation
drm/i915/gt: Skip TLB invalidations once wedged
drm/i915/gt: Batch TLB invalidations
drm/i915/gt: Move TLB invalidation to its own file
drm/i915: Invalidate the TLBs on each GT
Mauro Carvalho Chehab (8):
drm/i915/gt: document with_intel_gt_pm_if_awake()
drm/i915/gt: describe the new tlb parameter at i915_vma_resource
drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode
drm/i915/guc: document the TLB invalidation struct members
drm/i915: document tlb field at struct drm_i915_gem_object
drm/i915/gt: document TLB cache invalidation functions
drm/i915/guc: describe enum intel_guc_tlb_invalidation_type
drm/i915/guc: document TLB cache invalidation functions
Piotr Piórkowski (1):
drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
Prathap Kumar Valsan (5):
drm/i915/guc: Define CTB based TLB invalidation routines
drm/i915: Add platform macro for selective tlb flush
drm/i915: Define GuC Based TLB invalidation routines
drm/i915: Add generic interface for tlb invalidation for XeHP
drm/i915: Use selective tlb invalidations where supported
drivers/gpu/drm/i915/Makefile | 1 +
.../gpu/drm/i915/gem/i915_gem_object_types.h | 6 +-
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 28 +-
drivers/gpu/drm/i915/gt/intel_engine.h | 1 +
drivers/gpu/drm/i915/gt/intel_gt.c | 125 +-------
drivers/gpu/drm/i915/gt/intel_gt.h | 2 -
.../gpu/drm/i915/gt/intel_gt_buffer_pool.h | 3 +-
drivers/gpu/drm/i915/gt/intel_gt_defines.h | 11 +
drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 +
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 8 +
drivers/gpu/drm/i915/gt/intel_gt_types.h | 22 +-
drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +-
drivers/gpu/drm/i915/gt/intel_tlb.c | 295 ++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 30 ++
.../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 54 ++++
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 232 ++++++++++++++
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 36 +++
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 24 +-
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 +
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 +++++-
drivers/gpu/drm/i915/i915_drv.h | 4 +-
drivers/gpu/drm/i915/i915_pci.c | 1 +
drivers/gpu/drm/i915/i915_vma.c | 46 ++-
drivers/gpu/drm/i915/i915_vma.h | 2 +
drivers/gpu/drm/i915/i915_vma_resource.c | 9 +-
drivers/gpu/drm/i915/i915_vma_resource.h | 6 +-
drivers/gpu/drm/i915/intel_device_info.h | 1 +
27 files changed, 910 insertions(+), 155 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_defines.h
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h
--
2.36.1
On 20/07/2022 08:13, Mauro Carvalho Chehab wrote:
> On Mon, 18 Jul 2022 14:52:05 +0100
> Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com> wrote:
>
>>
>> On 14/07/2022 13:06, Mauro Carvalho Chehab wrote:
>>> From: Chris Wilson <chris.p.wilson(a)intel.com>
>>>
>>> Invalidate TLB in patch, in order to reduce performance regressions.
>>
>> "in batches"?
>
> Yeah. Will fix it.
>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> index d8b94d638559..2da6c82a8bd2 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c
>>> @@ -206,8 +206,12 @@ void ppgtt_bind_vma(struct i915_address_space *vm,
>>> void ppgtt_unbind_vma(struct i915_address_space *vm,
>>> struct i915_vma_resource *vma_res)
>>> {
>>> - if (vma_res->allocated)
>>> - vm->clear_range(vm, vma_res->start, vma_res->vma_size);
>>> + if (!vma_res->allocated)
>>> + return;
>>> +
>>> + vm->clear_range(vm, vma_res->start, vma_res->vma_size);
>>> + if (vma_res->tlb)
>>> + vma_invalidate_tlb(vm, *vma_res->tlb);
>>
>> The patch is about more than batching? If there is a security hole in
>> this area (unbind) with the current code?
>
> No, I don't think there's a security hole. The rationale for this is
> not due to it.
In this case obvious question is why are these changes in the patch
which declares itself to be about batching invalidations? Because...
> Since commit 2f6b90da9192 ("drm/i915: Use vma resources for async unbinding"),
> VMA unbind can happen either sync or async.
>
> So, the logic needs to do TLB invalidate on two places. After this
> patch, the code at __i915_vma_evict is:
>
> struct dma_fence *__i915_vma_evict(struct i915_vma *vma, bool async)
> {
> ...
> if (async)
> unbind_fence = i915_vma_resource_unbind(vma_res,
> &vma->obj->mm.tlb);
> else
> unbind_fence = i915_vma_resource_unbind(vma_res, NULL);
>
> vma->resource = NULL;
>
> atomic_and(~(I915_VMA_BIND_MASK | I915_VMA_ERROR | I915_VMA_GGTT_WRITE),
> &vma->flags);
>
> i915_vma_detach(vma);
>
> if (!async) {
> if (unbind_fence) {
> dma_fence_wait(unbind_fence, false);
> dma_fence_put(unbind_fence);
> unbind_fence = NULL;
> }
> vma_invalidate_tlb(vma->vm, vma->obj->mm.tlb);
> }
> ...
>
> So, basically, if !async, __i915_vma_evict() will do TLB cache invalidation.
>
> However, when async is used, the actual page release will happen later,
> at this function:
>
> void ppgtt_unbind_vma(struct i915_address_space *vm,
> struct i915_vma_resource *vma_res)
> {
> if (!vma_res->allocated)
> return;
>
> vm->clear_range(vm, vma_res->start, vma_res->vma_size);
> if (vma_res->tlb)
> vma_invalidate_tlb(vm, *vma_res->tlb);
> }
.. frankly I don't follow since I don't see any page release happening
in here. Just PTE clearing.
I am explaining why it looks to me that the patch is doing two things.
Implementing batching _and_ adding invalidation points at VMA unbind
sites, while so far we had it at backing store release only. Maybe I am
wrong and perhaps I am too slow to pick up on the explanation here.
So if the patch is doing two things please split it up.
I am further confused by the invalidation call site in evict and in
unbind - why there can't be one logical site since the logical sequence
is evict -> unbind.
Regards,
Tvrtko
Hello,
This series moves all drivers to a dynamic dma-buf locking specification.
From now on all dma-buf importers are made responsible for holding
dma-buf's reservation lock around all operations performed over dma-bufs.
This common locking convention allows us to utilize reservation lock more
broadly around kernel without fearing of potential dead locks.
This patchset passes all i915 selftests. It was also tested using VirtIO,
Panfrost, Lima and Tegra drivers. I tested cases of display+GPU,
display+V4L and GPU+V4L dma-buf sharing, which covers majority of kernel
drivers since rest of the drivers share same or similar code paths.
This is a continuation of [1] where Christian König asked to factor out
the dma-buf locking changes into separate series.
[1] https://lore.kernel.org/dri-devel/20220526235040.678984-1-dmitry.osipenko@c…
Dmitry Osipenko (6):
dma-buf: Add _unlocked postfix to function names
drm/gem: Take reservation lock for vmap/vunmap operations
dma-buf: Move all dma-bufs to dynamic locking specification
dma-buf: Acquire wait-wound context on attachment
media: videobuf2: Stop using internal dma-buf lock
dma-buf: Remove internal lock
drivers/dma-buf/dma-buf.c | 198 +++++++++++-------
drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +-
drivers/gpu/drm/armada/armada_gem.c | 14 +-
drivers/gpu/drm/drm_client.c | 4 +-
drivers/gpu/drm/drm_gem.c | 28 +++
drivers/gpu/drm/drm_gem_cma_helper.c | 6 +-
drivers/gpu/drm/drm_gem_framebuffer_helper.c | 6 +-
drivers/gpu/drm/drm_gem_shmem_helper.c | 6 +-
drivers/gpu/drm/drm_prime.c | 12 +-
drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c | 6 +-
drivers/gpu/drm/exynos/exynos_drm_gem.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c | 20 +-
.../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +-
drivers/gpu/drm/i915/gem/i915_gem_object.h | 6 +-
.../drm/i915/gem/selftests/i915_gem_dmabuf.c | 20 +-
drivers/gpu/drm/i915/i915_gem_evict.c | 2 +-
drivers/gpu/drm/i915/i915_gem_ww.c | 26 ++-
drivers/gpu/drm/i915/i915_gem_ww.h | 15 +-
drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c | 8 +-
drivers/gpu/drm/qxl/qxl_object.c | 17 +-
drivers/gpu/drm/qxl/qxl_prime.c | 4 +-
drivers/gpu/drm/tegra/gem.c | 27 +--
drivers/infiniband/core/umem_dmabuf.c | 11 +-
.../common/videobuf2/videobuf2-dma-contig.c | 26 +--
.../media/common/videobuf2/videobuf2-dma-sg.c | 23 +-
.../common/videobuf2/videobuf2-vmalloc.c | 17 +-
.../platform/nvidia/tegra-vde/dmabuf-cache.c | 12 +-
drivers/misc/fastrpc.c | 12 +-
drivers/xen/gntdev-dmabuf.c | 14 +-
include/drm/drm_gem.h | 3 +
include/linux/dma-buf.h | 49 ++---
32 files changed, 347 insertions(+), 257 deletions(-)
--
2.36.1
Hello,
I found a bug in the most usb driver.
When the driver fails at
mdev->conf = kcalloc(num_endpoints, sizeof(*mdev->conf), GFP_KERNEL);
I got the following warning message:
[ 15.406256] kobject: '(null)' (ffff8881068f8000): is not
initialized, yet kobject_put() is being called.
[ 15.406986] WARNING: CPU: 3 PID: 396 at lib/kobject.c:720
kobject_put+0x6e/0x1c0
[ 15.410120] RIP: 0010:kobject_put+0x6e/0x1c0
[ 15.410470] Code: 01 75 29 4c 89 f8 48 c1 e8 03 80 3c 28 00 74 08
4c 89 ff e8 14 2e 73 ff 49 8b 37 48 c7 c7 c0 fc de 85 4c 89 fa e8 e2
61 21 ff <0f> 0b 49 8d 5f 38 48 89 df be 04 00 00 00 e8 df 2e 73 ff b8
ff ff
[ 15.416529] Call Trace:
[ 15.416896] hdm_probe+0xf3d/0x1090 [most_usb]
Since I'm not familiar with the driver, I ask for your help to solve
the warning.
regards,
Zheyu Ma
to provid --> to provide
Signed-off-by: Flavio Suligoi <f.suligoi(a)asem.it>
---
v2:
- fix typo in subject
drivers/i2c/busses/i2c-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index e9e2db68b9fb..78fb1a4274a6 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -66,7 +66,7 @@
/* IMX I2C registers:
* the I2C register offset is different between SoCs,
- * to provid support for all these chips, split the
+ * to provide support for all these chips, split the
* register offset into a fixed base address and a
* variable shift value, then the full register offset
* will be calculated by
--
2.25.1
to provid --> to provide
Signed-off-by: Flavio Suligoi <f.suligoi(a)asem.it>
---
drivers/i2c/busses/i2c-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-imx.c b/drivers/i2c/busses/i2c-imx.c
index e9e2db68b9fb..78fb1a4274a6 100644
--- a/drivers/i2c/busses/i2c-imx.c
+++ b/drivers/i2c/busses/i2c-imx.c
@@ -66,7 +66,7 @@
/* IMX I2C registers:
* the I2C register offset is different between SoCs,
- * to provid support for all these chips, split the
+ * to provide support for all these chips, split the
* register offset into a fixed base address and a
* variable shift value, then the full register offset
* will be calculated by
--
2.25.1
TLB invalidation is a slow operation. It should not be doing lightly, as it
causes performance regressions, like this:
[178.821002] i915 0000:00:02.0: [drm] *ERROR* rcs0 TLB invalidation did not complete in 4ms!
This series contain
1) some patches that makes TLB invalidation to happen only on
active, non-wedged engines, doing cache invalidation in batch
and only when GT objects are exposed to userspace:
drm/i915/gt: Ignore TLB invalidations on idle engines
drm/i915/gt: Only invalidate TLBs exposed to user manipulation
drm/i915/gt: Skip TLB invalidations once wedged
drm/i915/gt: Batch TLB invalidations
drm/i915/gt: Move TLB invalidation to its own file
2) It fixes two bugs, being the first a workaround:
drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
drm/i915: Invalidate the TLBs on each GT
drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
3) It adds GuC support. Besides providing TLB invalidation on some
additional hardware, this should also help serializing GuC operations
with TLB invalidation:
drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
drm/i915/guc: Define CTB based TLB invalidation routines
drm/i915: Add platform macro for selective tlb flush
drm/i915: Define GuC Based TLB invalidation routines
drm/i915: Add generic interface for tlb invalidation for XeHP
drm/i915: Use selective tlb invalidations where supported
4) It adds the corresponding kernel-doc markups for the kAPI
used for TLB invalidation.
While I could have split this into smaller pieces, I'm opting to send
them altogether, in order for CI trybot to better verify what issues
will be closed with this series.
---
Chris Wilson (7):
drm/i915/gt: Ignore TLB invalidations on idle engines
drm/i915/gt: Invalidate TLB of the OA unit at TLB invalidations
drm/i915/gt: Only invalidate TLBs exposed to user manipulation
drm/i915/gt: Skip TLB invalidations once wedged
drm/i915/gt: Batch TLB invalidations
drm/i915/gt: Move TLB invalidation to its own file
drm/i915: Invalidate the TLBs on each GT
Mauro Carvalho Chehab (8):
drm/i915/gt: document with_intel_gt_pm_if_awake()
drm/i915/gt: describe the new tlb parameter at i915_vma_resource
drm/i915/guc: use kernel-doc for enum intel_guc_tlb_inval_mode
drm/i915/guc: document the TLB invalidation struct members
drm/i915: document tlb field at struct drm_i915_gem_object
drm/i915/gt: document TLB cache invalidation functions
drm/i915/guc: describe enum intel_guc_tlb_invalidation_type
drm/i915/guc: document TLB cache invalidation functions
Piotr Piórkowski (1):
drm/i915/guc: Introduce TLB_INVALIDATION_ALL action
Prathap Kumar Valsan (5):
drm/i915/guc: Define CTB based TLB invalidation routines
drm/i915: Add platform macro for selective tlb flush
drm/i915: Define GuC Based TLB invalidation routines
drm/i915: Add generic interface for tlb invalidation for XeHP
drm/i915: Use selective tlb invalidations where supported
drivers/gpu/drm/i915/Makefile | 1 +
.../gpu/drm/i915/gem/i915_gem_object_types.h | 6 +-
drivers/gpu/drm/i915/gem/i915_gem_pages.c | 28 +-
drivers/gpu/drm/i915/gt/intel_engine.h | 1 +
drivers/gpu/drm/i915/gt/intel_gt.c | 125 +-------
drivers/gpu/drm/i915/gt/intel_gt.h | 2 -
.../gpu/drm/i915/gt/intel_gt_buffer_pool.h | 3 +-
drivers/gpu/drm/i915/gt/intel_gt_defines.h | 11 +
drivers/gpu/drm/i915/gt/intel_gt_pm.h | 10 +
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 8 +
drivers/gpu/drm/i915/gt/intel_gt_types.h | 22 +-
drivers/gpu/drm/i915/gt/intel_ppgtt.c | 8 +-
drivers/gpu/drm/i915/gt/intel_tlb.c | 295 ++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_tlb.h | 30 ++
.../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 54 ++++
drivers/gpu/drm/i915/gt/uc/intel_guc.c | 232 ++++++++++++++
drivers/gpu/drm/i915/gt/uc/intel_guc.h | 36 +++
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 24 +-
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 9 +
.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 91 +++++-
drivers/gpu/drm/i915/i915_drv.h | 4 +-
drivers/gpu/drm/i915/i915_pci.c | 1 +
drivers/gpu/drm/i915/i915_vma.c | 46 ++-
drivers/gpu/drm/i915/i915_vma.h | 2 +
drivers/gpu/drm/i915/i915_vma_resource.c | 9 +-
drivers/gpu/drm/i915/i915_vma_resource.h | 6 +-
drivers/gpu/drm/i915/intel_device_info.h | 1 +
27 files changed, 910 insertions(+), 155 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_defines.h
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_tlb.h
--
2.36.1