[PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

List overview All Threads
Download

newer

older

Changing vma->vm_file in...

Re: [Linaro-mm-sig] [PATCH]...

Daniel Vetter

28 Jul 2020 28 Jul '20

1:58 p.m.

GPU drivers need this in their shrinkers, to be able to throw out mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers, but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding relevant people.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Dave Chinner david@fromorbit.com Cc: Qian Cai cai@lca.pw Cc: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: Thomas Hellström (Intel) thomas_os@shipmail.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jason Gunthorpe jgg@mellanox.com Cc: linux-mm@kvack.org Cc: linux-rdma@vger.kernel.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com --- drivers/dma-buf/dma-resv.c | 5 +++++ 1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 0e6675ec1d11..9678162a4ac5 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -104,12 +104,14 @@ static int __init dma_resv_lockdep(void) struct mm_struct *mm = mm_alloc(); struct ww_acquire_ctx ctx; struct dma_resv obj; + struct address_space mapping; int ret;

if (!mm) return -ENOMEM;

dma_resv_init(&obj); + address_space_init_once(&mapping);

mmap_read_lock(mm); ww_acquire_init(&ctx, &reservation_ww_class); @@ -117,6 +119,9 @@ static int __init dma_resv_lockdep(void) if (ret == -EDEADLK) dma_resv_lock_slow(&obj, &ctx); fs_reclaim_acquire(GFP_KERNEL); + /* for unmap_mapping_range on trylocked buffer objects in shrinkers */ + i_mmap_lock_write(&mapping); + i_mmap_unlock_write(&mapping); #ifdef CONFIG_MMU_NOTIFIER lock_map_acquire(&__mmu_notifier_invalidate_range_start_map); __dma_fence_might_wait();

-- 2.27.0

Show replies by date

Christian König

30 Jul 30 Jul

12:03 p.m.

New subject: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

Am 28.07.20 um 15:58 schrieb Daniel Vetter:

...

GPU drivers need this in their shrinkers, to be able to throw out mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers, but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding relevant people.

Signed-off-by: Daniel Vetter daniel.vetter@intel.com Cc: Sumit Semwal sumit.semwal@linaro.org Cc: "Christian König" christian.koenig@amd.com Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Cc: Dave Chinner david@fromorbit.com Cc: Qian Cai cai@lca.pw Cc: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: Thomas Hellström (Intel) thomas_os@shipmail.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jason Gunthorpe jgg@mellanox.com Cc: linux-mm@kvack.org Cc: linux-rdma@vger.kernel.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com

Reviewed-by: Christian König christian.koenig@amd.com

...

drivers/dma-buf/dma-resv.c | 5 +++++ 1 file changed, 5 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c index 0e6675ec1d11..9678162a4ac5 100644 --- a/drivers/dma-buf/dma-resv.c +++ b/drivers/dma-buf/dma-resv.c @@ -104,12 +104,14 @@ static int __init dma_resv_lockdep(void) struct mm_struct *mm = mm_alloc(); struct ww_acquire_ctx ctx; struct dma_resv obj;

struct address_space mapping; int ret;

if (!mm) return -ENOMEM; dma_resv_init(&obj);

address_space_init_once(&mapping);

mmap_read_lock(mm); ww_acquire_init(&ctx, &reservation_ww_class); @@ -117,6 +119,9 @@ static int __init dma_resv_lockdep(void) if (ret == -EDEADLK) dma_resv_lock_slow(&obj, &ctx); fs_reclaim_acquire(GFP_KERNEL);

/* for unmap_mapping_range on trylocked buffer objects in shrinkers */

i_mmap_lock_write(&mapping);

i_mmap_unlock_write(&mapping); #ifdef CONFIG_MMU_NOTIFIER lock_map_acquire(&__mmu_notifier_invalidate_range_start_map); __dma_fence_might_wait();

Thomas Hellström (Intel)

12:17 p.m.

New subject: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

On 7/28/20 3:58 PM, Daniel Vetter wrote:

...

GPU drivers need this in their shrinkers, to be able to throw out mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers, but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding relevant people.

Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but don't allocate any memory AFAICT.

Since huge page-table-entry splitting may happen under the i_mmap_lock from unmap_mapping_range() it might be worth figuring out how new page directory pages are allocated, though.

/Thomas

Daniel Vetter

1:17 p.m.

On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel) thomas_os@shipmail.org wrote:

...

On 7/28/20 3:58 PM, Daniel Vetter wrote:

...
GPU drivers need this in their shrinkers, to be able to throw out mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers, but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding relevant people.

Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but don't allocate any memory AFAICT.

Since huge page-table-entry splitting may happen under the i_mmap_lock from unmap_mapping_range() it might be worth figuring out how new page directory pages are allocated, though.

ofc I'm not an mm expert at all, but I did try to scroll through all i_mmap_lock_write/read callers. Found the following:

- kernel/events/uprobes.c in build_map_info:

/* * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through * reclaim. This is optimistic, no harm done if it fails. */

- I got lost in the hugetlb.c code and couldn't convince myself it's not allocating page directories at various levels with something else than GFP_KERNEL.

So looks like the recursion is clearly there and known, but the hugepage code is too complex and flying over my head. -Daniel

...

/Thomas

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

Thomas Hellström (Intel)

4:45 p.m.

New subject: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

On 7/30/20 3:17 PM, Daniel Vetter wrote:

...

On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel) thomas_os@shipmail.org wrote:

...
On 7/28/20 3:58 PM, Daniel Vetter wrote:

...
GPU drivers need this in their shrinkers, to be able to throw out mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers, but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding relevant people.

Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but don't allocate any memory AFAICT.

Since huge page-table-entry splitting may happen under the i_mmap_lock from unmap_mapping_range() it might be worth figuring out how new page directory pages are allocated, though.

ofc I'm not an mm expert at all, but I did try to scroll through all i_mmap_lock_write/read callers. Found the following:
kernel/events/uprobes.c in build_map_info:
       /*
        * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
        * reclaim. This is optimistic, no harm done if it fails.
        */
I got lost in the hugetlb.c code and couldn't convince myself it's
not allocating page directories at various levels with something else than GFP_KERNEL.

So looks like the recursion is clearly there and known, but the hugepage code is too complex and flying over my head. -Daniel

OK, so I inverted your annotation and ran a memory hog, and got the below splat. So clearly your proposed reclaim->i_mmap_lock locking order is an already established one.

Reviewed-by: Thomas Hellström thomas.hellstrom@intel.com

8<---------------------------------------------------------------------------------------------

[ 308.324654] WARNING: possible circular locking dependency detected [ 308.324655] 5.8.0-rc2+ #16 Not tainted [ 308.324656] ------------------------------------------------------ [ 308.324657] kswapd0/98 is trying to acquire lock: [ 308.324658] ffff92a16f758428 (&mapping->i_mmap_rwsem){++++}-{3:3}, at: rmap_walk_file+0x1c0/0x2f0 [ 308.324663] but task is already holding lock: [ 308.324664] ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 [ 308.324666] which lock already depends on the new lock.

[ 308.324667] the existing dependency chain (in reverse order) is: [ 308.324667] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 308.324670] fs_reclaim_acquire+0x34/0x40 [ 308.324672] dma_resv_lockdep+0x186/0x224 [ 308.324675] do_one_initcall+0x5d/0x2c0 [ 308.324676] kernel_init_freeable+0x222/0x288 [ 308.324678] kernel_init+0xa/0x107 [ 308.324679] ret_from_fork+0x1f/0x30 [ 308.324680] -> #0 (&mapping->i_mmap_rwsem){++++}-{3:3}: [ 308.324682] __lock_acquire+0x119f/0x1fc0 [ 308.324683] lock_acquire+0xa4/0x3b0 [ 308.324685] down_read+0x2d/0x110 [ 308.324686] rmap_walk_file+0x1c0/0x2f0 [ 308.324687] page_referenced+0x133/0x150 [ 308.324689] shrink_active_list+0x142/0x610 [ 308.324690] balance_pgdat+0x229/0x620 [ 308.324691] kswapd+0x200/0x470 [ 308.324693] kthread+0x11f/0x140 [ 308.324694] ret_from_fork+0x1f/0x30 [ 308.324694] other info that might help us debug this:

[ 308.324695] Possible unsafe locking scenario:

[ 308.324695] CPU0 CPU1 [ 308.324696] ---- ---- [ 308.324696] lock(fs_reclaim); [ 308.324697] lock(&mapping->i_mmap_rwsem); [ 308.324698] lock(fs_reclaim); [ 308.324699] lock(&mapping->i_mmap_rwsem); [ 308.324699] *** DEADLOCK ***

[ 308.324700] 1 lock held by kswapd0/98: [ 308.324701] #0: ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 [ 308.324702] stack backtrace: [ 308.324704] CPU: 1 PID: 98 Comm: kswapd0 Not tainted 5.8.0-rc2+ #16 [ 308.324705] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 308.324706] Call Trace: [ 308.324710] dump_stack+0x92/0xc8 [ 308.324711] check_noncircular+0x12d/0x150 [ 308.324713] __lock_acquire+0x119f/0x1fc0 [ 308.324715] lock_acquire+0xa4/0x3b0 [ 308.324716] ? rmap_walk_file+0x1c0/0x2f0 [ 308.324717] ? __lock_acquire+0x394/0x1fc0 [ 308.324719] down_read+0x2d/0x110 [ 308.324720] ? rmap_walk_file+0x1c0/0x2f0 [ 308.324721] rmap_walk_file+0x1c0/0x2f0 [ 308.324722] page_referenced+0x133/0x150 [ 308.324724] ? __page_set_anon_rmap+0x70/0x70 [ 308.324725] ? page_get_anon_vma+0x190/0x190 [ 308.324726] shrink_active_list+0x142/0x610 [ 308.324728] balance_pgdat+0x229/0x620 [ 308.324730] kswapd+0x200/0x470 [ 308.324731] ? lockdep_hardirqs_on_prepare+0xf5/0x170 [ 308.324733] ? finish_wait+0x80/0x80 [ 308.324734] ? balance_pgdat+0x620/0x620 [ 308.324736] kthread+0x11f/0x140 [ 308.324737] ? kthread_create_worker_on_cpu+0x40/0x40 [ 308.324739] ret_from_fork+0x1f/0x30

...

...
/Thomas

Daniel Vetter

17 Sep 17 Sep

1:19 p.m.

New subject: [PATCH] dma-resv: lockdep-prime address_space->i_mmap_rwsem for dma-resv

On Thu, Jul 30, 2020 at 06:45:14PM +0200, Thomas Hellström (Intel) wrote:

...

On 7/30/20 3:17 PM, Daniel Vetter wrote:

...
On Thu, Jul 30, 2020 at 2:17 PM Thomas Hellström (Intel) thomas_os@shipmail.org wrote:

...
On 7/28/20 3:58 PM, Daniel Vetter wrote:

...
GPU drivers need this in their shrinkers, to be able to throw out mmap'ed buffers. Note that we also need dma_resv_lock in shrinkers, but that loop is resolved by trylocking in shrinkers.

So full hierarchy is now (ignore some of the other branches we already have primed):

mmap_read_lock -> dma_resv -> shrinkers -> i_mmap_lock_write

I hope that's not inconsistent with anything mm or fs does, adding relevant people.

Looks OK to me. The mapping_dirty_helpers run under the i_mmap_lock, but don't allocate any memory AFAICT.

Since huge page-table-entry splitting may happen under the i_mmap_lock from unmap_mapping_range() it might be worth figuring out how new page directory pages are allocated, though.

ofc I'm not an mm expert at all, but I did try to scroll through all i_mmap_lock_write/read callers. Found the following:
kernel/events/uprobes.c in build_map_info:
       /*
        * Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion through
        * reclaim. This is optimistic, no harm done if it fails.
        */
I got lost in the hugetlb.c code and couldn't convince myself it's
not allocating page directories at various levels with something else than GFP_KERNEL.

So looks like the recursion is clearly there and known, but the hugepage code is too complex and flying over my head. -Daniel
OK, so I inverted your annotation and ran a memory hog, and got the below splat. So clearly your proposed reclaim->i_mmap_lock locking order is an already established one.

So

Reviewed-by: Thomas Hellström thomas.hellstrom@intel.com

No one complaining that this is a terrible idea and two reviews from people who know stuff, so I went ahead and pushed this to drm-misc-next.

Thanks for taking a look at this. -Daniel

...

8<---------------------------------------------------------------------------------------------

[ 308.324654] WARNING: possible circular locking dependency detected [ 308.324655] 5.8.0-rc2+ #16 Not tainted [ 308.324656] ------------------------------------------------------ [ 308.324657] kswapd0/98 is trying to acquire lock: [ 308.324658] ffff92a16f758428 (&mapping->i_mmap_rwsem){++++}-{3:3}, at: rmap_walk_file+0x1c0/0x2f0 [ 308.324663]                but task is already holding lock: [ 308.324664] ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 [ 308.324666]                which lock already depends on the new lock.

[ 308.324667]                the existing dependency chain (in reverse order) is: [ 308.324667]                -> #1 (fs_reclaim){+.+.}-{0:0}: [ 308.324670]        fs_reclaim_acquire+0x34/0x40 [ 308.324672]        dma_resv_lockdep+0x186/0x224 [ 308.324675]        do_one_initcall+0x5d/0x2c0 [ 308.324676]        kernel_init_freeable+0x222/0x288 [ 308.324678]        kernel_init+0xa/0x107 [ 308.324679]        ret_from_fork+0x1f/0x30 [ 308.324680]                -> #0 (&mapping->i_mmap_rwsem){++++}-{3:3}: [ 308.324682]        __lock_acquire+0x119f/0x1fc0 [ 308.324683]        lock_acquire+0xa4/0x3b0 [ 308.324685]        down_read+0x2d/0x110 [ 308.324686]        rmap_walk_file+0x1c0/0x2f0 [ 308.324687]        page_referenced+0x133/0x150 [ 308.324689]        shrink_active_list+0x142/0x610 [ 308.324690]        balance_pgdat+0x229/0x620 [ 308.324691]        kswapd+0x200/0x470 [ 308.324693]        kthread+0x11f/0x140 [ 308.324694]        ret_from_fork+0x1f/0x30 [ 308.324694]                other info that might help us debug this:

[ 308.324695] Possible unsafe locking scenario:

[ 308.324695]        CPU0                    CPU1 [ 308.324696]        ----                    ---- [ 308.324696]   lock(fs_reclaim); [ 308.324697] lock(&mapping->i_mmap_rwsem); [ 308.324698]                                lock(fs_reclaim); [ 308.324699]   lock(&mapping->i_mmap_rwsem); [ 308.324699]                 *** DEADLOCK ***

[ 308.324700] 1 lock held by kswapd0/98: [ 308.324701] #0: ffffffffb0960240 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30 [ 308.324702]                stack backtrace: [ 308.324704] CPU: 1 PID: 98 Comm: kswapd0 Not tainted 5.8.0-rc2+ #16 [ 308.324705] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 308.324706] Call Trace: [ 308.324710] dump_stack+0x92/0xc8 [ 308.324711] check_noncircular+0x12d/0x150 [ 308.324713] __lock_acquire+0x119f/0x1fc0 [ 308.324715] lock_acquire+0xa4/0x3b0 [ 308.324716] ? rmap_walk_file+0x1c0/0x2f0 [ 308.324717] ? __lock_acquire+0x394/0x1fc0 [ 308.324719] down_read+0x2d/0x110 [ 308.324720] ? rmap_walk_file+0x1c0/0x2f0 [ 308.324721] rmap_walk_file+0x1c0/0x2f0 [ 308.324722] page_referenced+0x133/0x150 [ 308.324724] ? __page_set_anon_rmap+0x70/0x70 [ 308.324725] ? page_get_anon_vma+0x190/0x190 [ 308.324726] shrink_active_list+0x142/0x610 [ 308.324728] balance_pgdat+0x229/0x620 [ 308.324730] kswapd+0x200/0x470 [ 308.324731] ? lockdep_hardirqs_on_prepare+0xf5/0x170 [ 308.324733] ? finish_wait+0x80/0x80 [ 308.324734] ? balance_pgdat+0x620/0x620 [ 308.324736] kthread+0x11f/0x140 [ 308.324737] ? kthread_create_worker_on_cpu+0x40/0x40 [ 308.324739] ret_from_fork+0x1f/0x30

...
...
/Thomas

-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch

2055

days inactive

2106

days old

linaro-mm-sig@lists.linaro.org

5 comments

participants

tags (0)

participants (4)

Christian König
Daniel Vetter
Daniel Vetter
Thomas Hellström (Intel)