The content of this message was lost. It was probably cross-posted to multiple lists and previously handled on another list.
On 6/29/22 00:26, Thomas Hellström (Intel) wrote:
On 5/30/22 15:57, Dmitry Osipenko wrote:
On 5/30/22 16:41, Christian König wrote:
Hi Dmitry,
Am 30.05.22 um 15:26 schrieb Dmitry Osipenko:
Hello Christian,
On 5/30/22 09:50, Christian König wrote:
Hi Dmitry,
First of all please separate out this patch from the rest of the series, since this is a complex separate structural change.
I assume all the patches will go via the DRM tree in the end since the rest of the DRM patches in this series depend on this dma-buf change. But I see that separation may ease reviewing of the dma-buf changes, so let's try it.
That sounds like you are underestimating a bit how much trouble this will be.
I have tried this before and failed because catching all the locks in the right code paths are very tricky. So expect some fallout from this and make sure the kernel test robot and CI systems are clean.
Sure, I'll fix up all the reported things in the next iteration.
BTW, have you ever posted yours version of the patch? Will be great if we could compare the changed code paths.
No, I never even finished creating it after realizing how much work it would be.
This patch introduces new locking convention for dma-buf users. From now on all dma-buf importers are responsible for holding dma-buf reservation lock around operations performed over dma-bufs.
This patch implements the new dma-buf locking convention by:
1. Making dma-buf API functions to take the reservation lock.
2. Adding new locked variants of the dma-buf API functions for drivers that need to manage imported dma-bufs under the held lock.
Instead of adding new locked variants please mark all variants which expect to be called without a lock with an _unlocked postfix.
This should make it easier to remove those in a follow up patch set and then fully move the locking into the importer.
Do we really want to move all the locks to the importers? Seems the majority of drivers should be happy with the dma-buf helpers handling the locking for them.
Yes, I clearly think so.
3. Converting all drivers to the new locking scheme.
I have strong doubts that you got all of them. At least radeon and nouveau should grab the reservation lock in their ->attach callbacks somehow.
Radeon and Nouveau use gem_prime_import_sg_table() and they take resv lock already, seems they should be okay (?)
You are looking at the wrong side. You need to fix the export code path, not the import ones.
See for example attach on radeon works like this drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock.
Yeah, I was looking at the both sides, but missed this one.
Also i915 will run into trouble with attach. In particular since i915 starts a full ww transaction in its attach callback to be able to lock other objects if migration is needed. I think i915 CI would catch this in a selftest.
Seems it indeed it should deadlock. But i915 selftests apparently should've caught it and they didn't, I'll re-check what happened.
Perhaps it's worthwile to take a step back and figure out, if the importer is required to lock, which callbacks might need a ww acquire context?
I'll take this into account, thanks.
(And off-topic, Since we do a lot of fancy stuff under dma-resv locks including waiting for fences and other locks, IMO taking these locks uninterruptible should ring a warning bell)
I had the same thought and had a version that used the interruptible locking variant, but then decided to fall back to the uninterruptible, don't remember why. I'll revisit this.
On 7/1/22 13:43, Dmitry Osipenko wrote:
On 6/29/22 00:26, Thomas Hellström (Intel) wrote:
On 5/30/22 15:57, Dmitry Osipenko wrote:
On 5/30/22 16:41, Christian König wrote:
Hi Dmitry,
Am 30.05.22 um 15:26 schrieb Dmitry Osipenko:
Hello Christian,
On 5/30/22 09:50, Christian König wrote:
Hi Dmitry,
First of all please separate out this patch from the rest of the series, since this is a complex separate structural change.
I assume all the patches will go via the DRM tree in the end since the rest of the DRM patches in this series depend on this dma-buf change. But I see that separation may ease reviewing of the dma-buf changes, so let's try it.
That sounds like you are underestimating a bit how much trouble this will be.
I have tried this before and failed because catching all the locks in the right code paths are very tricky. So expect some fallout from this and make sure the kernel test robot and CI systems are clean.
Sure, I'll fix up all the reported things in the next iteration.
BTW, have you ever posted yours version of the patch? Will be great if we could compare the changed code paths.
No, I never even finished creating it after realizing how much work it would be.
> This patch introduces new locking convention for dma-buf users. From > now > on all dma-buf importers are responsible for holding dma-buf > reservation > lock around operations performed over dma-bufs. > > This patch implements the new dma-buf locking convention by: > > 1. Making dma-buf API functions to take the reservation lock. > > 2. Adding new locked variants of the dma-buf API functions for > drivers > that need to manage imported dma-bufs under the held lock. Instead of adding new locked variants please mark all variants which expect to be called without a lock with an _unlocked postfix.
This should make it easier to remove those in a follow up patch set and then fully move the locking into the importer.
Do we really want to move all the locks to the importers? Seems the majority of drivers should be happy with the dma-buf helpers handling the locking for them.
Yes, I clearly think so.
> 3. Converting all drivers to the new locking scheme. I have strong doubts that you got all of them. At least radeon and nouveau should grab the reservation lock in their ->attach callbacks somehow.
Radeon and Nouveau use gem_prime_import_sg_table() and they take resv lock already, seems they should be okay (?)
You are looking at the wrong side. You need to fix the export code path, not the import ones.
See for example attach on radeon works like this drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock.
Yeah, I was looking at the both sides, but missed this one.
Also i915 will run into trouble with attach. In particular since i915 starts a full ww transaction in its attach callback to be able to lock other objects if migration is needed. I think i915 CI would catch this in a selftest.
Seems it indeed it should deadlock. But i915 selftests apparently should've caught it and they didn't, I'll re-check what happened.
The i915 selftests use a separate mock_dmabuf_ops. That's why it works for the selftests, i.e. there is no deadlock.
Thomas, would i915 CI run a different set of tests or will it be the default i915 selftests ran by IGT?
On 7/5/22 01:38, Dmitry Osipenko wrote: ...
Also i915 will run into trouble with attach. In particular since i915 starts a full ww transaction in its attach callback to be able to lock other objects if migration is needed. I think i915 CI would catch this in a selftest.
Seems it indeed it should deadlock. But i915 selftests apparently should've caught it and they didn't, I'll re-check what happened.
The i915 selftests use a separate mock_dmabuf_ops. That's why it works for the selftests, i.e. there is no deadlock.
Thomas, would i915 CI run a different set of tests or will it be the default i915 selftests ran by IGT?
Nevermind, I had a local kernel change that was forgotten about.. it prevented the i915 live tests from running.
linaro-mm-sig@lists.linaro.org