Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2

26 Mar 2018

      On Thu, Mar 22, 2018 at 10:58:55AM +0100, Christian König wrote:
...
Am 22.03.2018 um 08:18 schrieb Daniel Vetter:
...
On Wed, Mar 21, 2018 at 12:54:20PM +0100, Christian König wrote:
...
Am 21.03.2018 um 09:28 schrieb Daniel Vetter:
...
On Tue, Mar 20, 2018 at 06:47:57PM +0100, Christian König wrote:
...
Am 20.03.2018 um 15:08 schrieb Daniel Vetter:
...
[SNIP]
For the in-driver reservation path (CS) having a slow-path that grabs a
temporary reference, drops the vram lock and then locks the reservation
normally (using the acquire context used already for the entire CS) is a
bit tricky, but totally feasible. Ttm doesn't do that though.
That is exactly what we do in amdgpu as well, it's just not very efficient
nor reliable to retry getting the right pages for a submission over and over
again.
Out of curiosity, where's that code? I did read the ttm eviction code way
back, and that one definitely didn't do that. Would be interesting to
update my understanding.
That is in amdgpu_cs.c. amdgpu_cs_parser_bos() does a horrible dance with
grabbing, releasing and regrabbing locks in a loop.
Then in amdgpu_cs_submit() we grab an lock preventing page table updates and
check if all pages are still the one we want to have:
...
amdgpu_mn_lock(p->mn);
         if (p->bo_list) {
                 for (i = p->bo_list->first_userptr;
                      i < p->bo_list->num_entries; ++i) {
                         struct amdgpu_bo *bo = p->bo_list->array[i].robj;
if
(amdgpu_ttm_tt_userptr_needs_pages(bo->tbo.ttm)) {
                                 amdgpu_mn_unlock(p->mn);
                                 return -ERESTARTSYS;
                         }
                 }
         }
If anything changed on the page tables we restart the whole IOCTL using
-ERESTARTSYS and try again.
I'm not talking about userptr here, but general bo eviction. Sorry for the
confusion.
The reason I'm dragging all the general bo management into this
discussions is because we do seem to have fairly fundamental difference in
how that's done, with resulting consequences for the locking hierarchy.
And if this invalidate_mapping stuff should work, together with userptr
and everything else, I think we're required to agree on how this is all
supposed to nest, and how exactly we should back off for the other side
that needs to break the locking circle.
That aside, I don't entirely understand why you need to restart so much. I
figured that get_user_pages is ordered correctly against mmu
invalidations, but I get the impression you think that's not the case. How
does that happen?
Correct. I've had the same assumption, but both Jerome as well as our
internal tests proved me wrong on that.
Key take away from that was that you can't take any locks from neither the
MMU notifier nor the shrinker you also take while calling kmalloc (or
simpler speaking get_user_pages()).
Additional to that in the MMU or shrinker callback all different kinds of
locks might be held, so you basically can't assume that you do thinks like
recursive page table walks or call dma_unmap_anything.
That sounds like a design bug in mmu_notifiers, since it would render them
useless for KVM. And they were developed for that originally. I think I'll
chat with Jerome to understand this, since it's all rather confusing.
...
Thinking a moment about that it actually seems to make perfect sense.
So it doesn't matter what order you got between the mmap_sem and your buffer
or allocation lock, it will simply be incorrect with other locks in the
system anyway.
Hm, doesn't make sense to me, at least from a locking  inversion pov. I
thought the only locks that are definitely part of the mmu_nofitier
callback contexts are the mm locks. We're definitely fine with those, and
kmalloc didn't bite us yet. Direct reclaim is really limited in what it
can accomplish for good reasons.
So the only thing I'm worried here is that we have a race between the
invalidate and the gup calls, and I think fixing that race will make the
locking more fun. But again, if that means you can'd dma_unmap stuff then
that feels like mmu notifiers are broken by design.
-Daniel
...
The only solution to that problem we have found is the dance we have in
amdgpu now:

Inside invalidate_range_start you grab a recursive read side lock.
Wait for all GPU operation on that pages to finish.
Then mark all pages as dirty and accessed.
Inside invalidate_range_end you release your recursive read side lock

again.
Then during command submission you do the following:

Take the locks Figure out all the userptr BOs which needs new pages.
Drop the locks again call get_user_pages().
Install the new page arrays and reiterate with #1
As soon as everybody has pages update your page tables and prepare all

command submission.
5. Take the write side of our recursive lock.
6. Check if all pages are still valid, if not restart the whole IOCTL.
7. Submit the job to the hardware.
8. Drop the write side of our recursive lock.
Regards,
Christian.
...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [Linaro-mm-sig] [PATCH 1/5] dma-buf: add optional invalidate_mappings callback v2