On Fri, Aug 20, 2021 at 9:23 PM Thomas Zimmermann <tzimmermann(a)suse.de> wrote:
> Hi
>
> Am 20.08.21 um 17:45 schrieb syzbot:
> > syzbot has bisected this issue to:
>
> Good bot!
>
> >
> > commit ea40d7857d5250e5400f38c69ef9e17321e9c4a2
> > Author: Daniel Vetter <daniel.vetter(a)ffwll.ch>
> > Date: Fri Oct 9 23:21:56 2020 +0000
> >
> > drm/vkms: fbdev emulation support
>
> Here's a guess.
>
> GEM SHMEM + fbdev emulation requires that
> (drm_mode_config.prefer_shadow_fbdev = true). Otherwise, deferred I/O
> and SHMEM conflict over the use of page flags IIRC.
But we should only set up defio if fb->dirty is set, which vkms
doesn't do. So there's something else going on? So there must be
something else funny going on here I think ... No idea what's going on
really.
-Daniel
> From a quick grep, vkms doesn't set prefer_shadow_fbdev and an alarming
> amount of SHMEM-based drivers don't do either.
>
> Best regards
> Thomas
>
> >
> > bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=11c31d55300000
> > start commit: 614cb2751d31 Merge tag 'trace-v5.14-rc6' of git://git.kern..
> > git tree: upstream
> > final oops: https://syzkaller.appspot.com/x/report.txt?x=13c31d55300000
> > console output: https://syzkaller.appspot.com/x/log.txt?x=15c31d55300000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=96f0602203250753
> > dashboard link: https://syzkaller.appspot.com/bug?extid=91525b2bd4b5dff71619
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=122bce0e300000
> >
> > Reported-by: syzbot+91525b2bd4b5dff71619(a)syzkaller.appspotmail.com
> > Fixes: ea40d7857d52 ("drm/vkms: fbdev emulation support")
> >
> > For information about bisection process see: https://goo.gl/tpsmEJ#bisection
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Thu, Aug 19, 2021 at 12:53 PM Desmond Cheong Zhi Xi
<desmondcheongzx(a)gmail.com> wrote:
>
> On 18/8/21 7:02 pm, Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
> >> In a future patch, a read lock on drm_device.master_rwsem is
> >> held in the ioctl handler before the check for ioctl
> >> permissions. However, this produces the following lockdep splat:
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
> >> ------------------------------------------------------
> >> kms_lease/1752 is trying to acquire lock:
> >> ffffffff827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
> >>
> >> but task is already holding lock:
> >> ffff88812e350108 (&dev->master_rwsem){++++}-{3:3}, at:
> >> drm_ioctl_kernel+0xfb/0x1a0
> >>
> >> which lock already depends on the new lock.
> >>
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #2 (&dev->master_rwsem){++++}-{3:3}:
> >> lock_acquire+0xd3/0x310
> >> down_read+0x3b/0x140
> >> drm_master_internal_acquire+0x1d/0x60
> >> drm_client_modeset_commit+0x10/0x40
> >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
> >> drm_fb_helper_set_par+0x34/0x40
> >> intel_fbdev_set_par+0x11/0x40 [i915]
> >> fbcon_init+0x270/0x4f0
> >> visual_init+0xc6/0x130
> >> do_bind_con_driver+0x1de/0x2c0
> >> do_take_over_console+0x10e/0x180
> >> do_fbcon_takeover+0x53/0xb0
> >> register_framebuffer+0x22d/0x310
> >> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
> >> intel_fbdev_initial_config+0xf/0x20 [i915]
> >> async_run_entry_fn+0x28/0x130
> >> process_one_work+0x26d/0x5c0
> >> worker_thread+0x37/0x390
> >> kthread+0x13b/0x170
> >> ret_from_fork+0x1f/0x30
> >>
> >> -> #1 (&helper->lock){+.+.}-{3:3}:
> >> lock_acquire+0xd3/0x310
> >> __mutex_lock+0xa8/0x930
> >> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
> >> intel_fbdev_restore_mode+0x2b/0x50 [i915]
> >> drm_lastclose+0x27/0x50
> >> drm_release_noglobal+0x42/0x60
> >> __fput+0x9e/0x250
> >> task_work_run+0x6b/0xb0
> >> exit_to_user_mode_prepare+0x1c5/0x1d0
> >> syscall_exit_to_user_mode+0x19/0x50
> >> do_syscall_64+0x46/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >> -> #0 (drm_global_mutex){+.+.}-{3:3}:
> >> validate_chain+0xb39/0x1e70
> >> __lock_acquire+0x5a1/0xb70
> >> lock_acquire+0xd3/0x310
> >> __mutex_lock+0xa8/0x930
> >> drm_open+0x64/0x280
> >> drm_stub_open+0x9f/0x100
> >> chrdev_open+0x9f/0x1d0
> >> do_dentry_open+0x14a/0x3a0
> >> dentry_open+0x53/0x70
> >> drm_mode_create_lease_ioctl+0x3cb/0x970
> >> drm_ioctl_kernel+0xc9/0x1a0
> >> drm_ioctl+0x201/0x3d0
> >> __x64_sys_ioctl+0x6a/0xa0
> >> do_syscall_64+0x37/0xb0
> >> entry_SYSCALL_64_after_hwframe+0x44/0xae
> >>
> >> other info that might help us debug this:
> >> Chain exists of:
> >> drm_global_mutex --> &helper->lock --> &dev->master_rwsem
> >> Possible unsafe locking scenario:
> >> CPU0 CPU1
> >> ---- ----
> >> lock(&dev->master_rwsem);
> >> lock(&helper->lock);
> >> lock(&dev->master_rwsem);
> >> lock(drm_global_mutex);
> >>
> >> *** DEADLOCK ***
> >>
> >> The lock hierarchy inversion happens because we grab the
> >> drm_global_mutex while already holding on to master_rwsem. To avoid
> >> this, we do some prep work to grab the drm_global_mutex before
> >> checking for ioctl permissions.
> >>
> >> At the same time, we update the check for the global mutex to use the
> >> drm_dev_needs_global_mutex helper function.
> >
> > This is intentional, essentially we force all non-legacy drivers to have
> > unlocked ioctl (otherwise everyone forgets to set that flag).
> >
> > For non-legacy drivers the global lock only ensures ordering between
> > drm_open and lastclose (I think at least), and between
> > drm_dev_register/unregister and the backwards ->load/unload callbacks
> > (which are called in the wrong place, but we cannot fix that for legacy
> > drivers).
> >
> > ->load/unload should be completely unused (maybe radeon still uses it),
> > and ->lastclose is also on the decline.
> >
>
> Ah ok got it, I'll change the check back to
> drm_core_check_feature(dev, DRIVER_LEGACY) then.
>
> > Maybe we should update the comment of drm_global_mutex to explain what it
> > protects and why.
> >
>
> The comments in drm_dev_needs_global_mutex make sense I think, I just
> didn't read the code closely enough.
>
> > I'm also confused how this patch connects to the splat, since for i915 we
>
> Right, my bad, this is a separate instance of circular locking. I was
> too hasty when I saw that for legacy drivers we might grab master_rwsem
> then drm_global_mutex in the ioctl handler.
>
> > shouldn't be taking the drm_global_lock here at all. The problem seems to
> > be the drm_open_helper when we create a new lease, which is an entirely
> > different can of worms.
> >
> > I'm honestly not sure how to best do that, but we should be able to create
> > a file and then call drm_open_helper directly, or well a version of that
> > which never takes the drm_global_mutex. Because that is not needed for
> > nested drm_file opening:
> > - legacy drivers never go down this path because leases are only supported
> > with modesetting, and modesetting is only supported for non-legacy
> > drivers
> > - the races against dev->open_count due to last_close or ->load callbacks
> > don't matter, because for the entire ioctl we already have an open
> > drm_file and that wont disappear.
> >
> > So this should work, but I'm not entirely sure how to make it work.
> > -Daniel
> >
>
> One idea that comes to mind is to change the outcome of
> drm_dev_needs_global_mutex while we're in the ioctl, but that requires
> more locking which sounds like a bad idea.
>
> Another idea, which is quite messy, but just for thoughts, uses the idea
> of pushing the master_rwsem read lock down:
Yeah I think that's cleaner, and I think that also should work a lot
better for the other ioctls:
- We don't have a need to flush readers anymore since we'll just take
the rwsem in write mode
- There's much less inversions, and maybe we could even get rid of the
spinlock since at that point all readers should at least have the
rwsem read-locked.
>
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 7f523e1c5650..5d05e744b728 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -712,7 +712,7 @@ static const struct drm_ioctl_desc drm_ioctls[] = {
> DRM_RENDER_ALLOW),
> DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0),
> DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0),
> - DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER),
> + DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, 0),
> DRM_IOCTL_DEF(DRM_IOCTL_MODE_LIST_LESSEES, drm_mode_list_lessees_ioctl, DRM_MASTER),
> DRM_IOCTL_DEF(DRM_IOCTL_MODE_GET_LEASE, drm_mode_get_lease_ioctl, DRM_MASTER),
> DRM_IOCTL_DEF(DRM_IOCTL_MODE_REVOKE_LEASE, drm_mode_revoke_lease_ioctl, DRM_MASTER),
> diff --git a/drivers/gpu/drm/drm_lease.c b/drivers/gpu/drm/drm_lease.c
> index 983701198ffd..a25bc69522b4 100644
> --- a/drivers/gpu/drm/drm_lease.c
> +++ b/drivers/gpu/drm/drm_lease.c
> @@ -500,6 +500,19 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
> return -EINVAL;
> }
>
> + /* Clone the lessor file to create a new file for us */
> + DRM_DEBUG_LEASE("Allocating lease file\n");
> + lessee_file = file_clone_open(lessor_file);
> + if (IS_ERR(lessee_file))
> + return PTR_ERR(lessee_file);
> +
> + down_read(&dev->master_rwsem);
> +
> + if (!drm_is_current_master(lessor_priv)) {
> + ret = -EACCES;
> + goto out_file;
> + }
> +
> lessor = drm_file_get_master(lessor_priv);
> /* Do not allow sub-leases */
> if (lessor->lessor) {
> @@ -547,14 +560,6 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
> goto out_leases;
> }
>
> - /* Clone the lessor file to create a new file for us */
> - DRM_DEBUG_LEASE("Allocating lease file\n");
> - lessee_file = file_clone_open(lessor_file);
> - if (IS_ERR(lessee_file)) {
> - ret = PTR_ERR(lessee_file);
> - goto out_lessee;
> - }
> -
> lessee_priv = lessee_file->private_data;
> /* Change the file to a master one */
> drm_master_put(&lessee_priv->master);
> @@ -571,17 +576,19 @@ int drm_mode_create_lease_ioctl(struct drm_device *dev,
> fd_install(fd, lessee_file);
>
> drm_master_put(&lessor);
> + up_read(&dev->master_rwsem);
> DRM_DEBUG_LEASE("drm_mode_create_lease_ioctl succeeded\n");
> return 0;
>
> -out_lessee:
> - drm_master_put(&lessee);
> -
> out_leases:
> put_unused_fd(fd);
>
> out_lessor:
> drm_master_put(&lessor);
> +
> +out_file:
> + up_read(&dev->master_rwsem);
> + fput(lessee_file);
> DRM_DEBUG_LEASE("drm_mode_create_lease_ioctl failed: %d\n", ret);
> return ret;
> }
>
>
> Something like this would also address the other deadlock we'd hit in
> drm_mode_create_lease_ioctl():
>
> drm_ioctl_kernel():
> down_read(&master_rwsem); <--- down_read()
> drm_mode_create_lease_ioctl():
> drm_lease_create():
> file_clone_open():
> ...
> drm_open():
> drm_open_helper():
> drm_master_open():
> down_write(&master_rwsem); <--- down_write()
>
> Overall, I think the suggestion to push master_rwsem write locks down
> into ioctls would solve the nesting problem for those ioctls.
Yup, my gut feeling agress. And the above is a nice solution without
having to dig out all the code for creating a file directly (it's
doable I think at least, we do it for dma-buf).
> Although I'm still a little concerned that, just like here, there might
> be deeply embedded nested locking, so locking becomes prone to breaking.
> It does smell a bit to me.
Yeah, that's pretty much the bane of locking cleanup/rework. You have
to do it to figure out what goes boom :-/ Even with the most careful
audit there's surprises left.
-Daniel
> >> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
> >> ---
> >> drivers/gpu/drm/drm_ioctl.c | 18 +++++++++---------
> >> 1 file changed, 9 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> >> index 880fc565d599..2cb57378a787 100644
> >> --- a/drivers/gpu/drm/drm_ioctl.c
> >> +++ b/drivers/gpu/drm/drm_ioctl.c
> >> @@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
> >> if (drm_dev_is_unplugged(dev))
> >> return -ENODEV;
> >>
> >> + /* Enforce sane locking for modern driver ioctls. */
> >> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> >> + mutex_lock(&drm_global_mutex);
> >> +
> >> retcode = drm_ioctl_permit(flags, file_priv);
> >> if (unlikely(retcode))
> >> - return retcode;
> >> + goto out;
> >>
> >> - /* Enforce sane locking for modern driver ioctls. */
> >> - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
> >> - (flags & DRM_UNLOCKED))
> >> - retcode = func(dev, kdata, file_priv);
> >> - else {
> >> - mutex_lock(&drm_global_mutex);
> >> - retcode = func(dev, kdata, file_priv);
> >> + retcode = func(dev, kdata, file_priv);
> >> +
> >> +out:
> >> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> >> mutex_unlock(&drm_global_mutex);
> >> - }
> >> return retcode;
> >> }
> >> EXPORT_SYMBOL(drm_ioctl_kernel);
> >> --
> >> 2.25.1
> >>
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Wed, Aug 18, 2021 at 03:38:23PM +0800, Desmond Cheong Zhi Xi wrote:
> +EXPORT_SYMBOL(task_work_add);
EXPORT_SYMBOL_GPL for this kinds of functionality, please.
On Wed, Aug 18, 2021 at 5:37 PM Desmond Cheong Zhi Xi
<desmondcheongzx(a)gmail.com> wrote:
>
> On 18/8/21 6:11 pm, Daniel Vetter wrote:
> > On Wed, Aug 18, 2021 at 03:38:19PM +0800, Desmond Cheong Zhi Xi wrote:
> >> There are three areas where we dereference struct drm_master without
> >> checking if the pointer is non-NULL.
> >>
> >> 1. drm_getmagic is called from the ioctl_handler. Since
> >> DRM_IOCTL_GET_MAGIC has no ioctl flags, drm_getmagic is run without
> >> any check that drm_file.master has been set.
> >>
> >> 2. Similarly, drm_getunique is called from the ioctl_handler, but
> >> DRM_IOCTL_GET_UNIQUE has no ioctl flags. So there is no guarantee that
> >> drm_file.master has been set.
> >
> > I think the above two are impossible, due to the refcounting rules for
> > struct file.
> >
>
> Right, will drop those two parts from the patch.
>
> >> 3. drm_master_release can also be called without having a
> >> drm_file.master set. Here is one error path:
> >> drm_open():
> >> drm_open_helper():
> >> drm_master_open():
> >> drm_new_set_master(); <--- returns -ENOMEM,
> >> drm_file.master not set
> >> drm_file_free():
> >> drm_master_release(); <--- NULL ptr dereference
> >> (file_priv->master->magic_map)
> >>
> >> Fix these by checking if the master pointers are NULL before use.
> >>
> >> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
> >> ---
> >> drivers/gpu/drm/drm_auth.c | 16 ++++++++++++++--
> >> drivers/gpu/drm/drm_ioctl.c | 5 +++++
> >> 2 files changed, 19 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c
> >> index f9267b21556e..b7230604496b 100644
> >> --- a/drivers/gpu/drm/drm_auth.c
> >> +++ b/drivers/gpu/drm/drm_auth.c
> >> @@ -95,11 +95,18 @@ EXPORT_SYMBOL(drm_is_current_master);
> >> int drm_getmagic(struct drm_device *dev, void *data, struct drm_file *file_priv)
> >> {
> >> struct drm_auth *auth = data;
> >> + struct drm_master *master;
> >> int ret = 0;
> >>
> >> mutex_lock(&dev->master_mutex);
> >> + master = file_priv->master;
> >> + if (!master) {
> >> + mutex_unlock(&dev->master_mutex);
> >> + return -EINVAL;
> >> + }
> >> +
> >> if (!file_priv->magic) {
> >> - ret = idr_alloc(&file_priv->master->magic_map, file_priv,
> >> + ret = idr_alloc(&master->magic_map, file_priv,
> >> 1, 0, GFP_KERNEL);
> >> if (ret >= 0)
> >> file_priv->magic = ret;
> >> @@ -355,8 +362,12 @@ void drm_master_release(struct drm_file *file_priv)
> >>
> >> mutex_lock(&dev->master_mutex);
> >> master = file_priv->master;
> >> +
> >> + if (!master)
> >> + goto unlock;
> >
> > This is a bit convoluted, since we're in the single-threaded release path
> > we don't need any locking for file_priv related things. Therefore we can
> > pull the master check out and just directly return.
> >
> > But since it's a bit surprising maybe a comment that this can happen when
> > drm_master_open in drm_open_helper fails?
> >
>
> Sounds good. This can actually also happen in the failure path of
> mock_drm_getfile if anon_inode_getfile fails. I'll leave a short note
> about both of them.
>
> > Another option, and maybe cleaner, would be to move the drm_master_release
> > from drm_file_free into drm_close_helper. That would be fully symmetrical
> > and should also fix the bug here?
> > -Daniel
> >
> Hmmm maybe the first option to move the check out of the lock might be
> better. If I'm not wrong, we would otherwise also need to move
> drm_master_release into drm_client_close.
Do we have to?
If I haven't missed anything, the drm_client stuff only calls
drm_file_alloc and doesn't set up a master. So this should work?
-Daniel
>
> >
> >> +
> >> if (file_priv->magic)
> >> - idr_remove(&file_priv->master->magic_map, file_priv->magic);
> >> + idr_remove(&master->magic_map, file_priv->magic);
> >>
> >> if (!drm_is_current_master_locked(file_priv))
> >> goto out;
> >> @@ -379,6 +390,7 @@ void drm_master_release(struct drm_file *file_priv)
> >> drm_master_put(&file_priv->master);
> >> spin_unlock(&dev->master_lookup_lock);
> >> }
> >> +unlock:
> >> mutex_unlock(&dev->master_mutex);
> >> }
> >>
> >> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> >> index 26f3a9ede8fe..4d029d3061d9 100644
> >> --- a/drivers/gpu/drm/drm_ioctl.c
> >> +++ b/drivers/gpu/drm/drm_ioctl.c
> >> @@ -121,6 +121,11 @@ int drm_getunique(struct drm_device *dev, void *data,
> >>
> >> mutex_lock(&dev->master_mutex);
> >> master = file_priv->master;
> >> + if (!master) {
> >> + mutex_unlock(&dev->master_mutex);
> >> + return -EINVAL;
> >> + }
> >> +
> >> if (u->unique_len >= master->unique_len) {
> >> if (copy_to_user(u->unique, master->unique, master->unique_len)) {
> >> mutex_unlock(&dev->master_mutex);
> >> --
> >> 2.25.1
> >>
> >
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Am 18.08.21 um 15:02 schrieb Wentao_Liang:
> In line 317 (#1), drm_gem_prime_import() is called, it will call
> drm_gem_prime_import_dev(). At the end of the function
> drm_gem_prime_import_dev() (line 956, #2), "dma_buf_put(dma_buf);" puts
> dma_buf->file and may cause it to be released. However, after
> drm_gem_prime_import() returning, the dma_buf may be put again by the
> same put function in lines 342, 351 and 358 (#3, #4, #5). Putting the
> dma_buf improperly more than once can lead to an incorrect dma_buf-
>> file put.
> We believe that the put of the dma_buf in the function
> drm_gem_prime_import() is unnecessary (#2). We can fix the above bug by
> removing the redundant "dma_buf_put(dma_buf);" in line 956.
Guys I'm getting tired of NAKing those incorrect reference count analysis.
The dma_buf_put() in the error handling of drm_gem_prime_import_dev()
function is balanced with the get_dma_buf() in the same function
directly above.
This is for the creating a GEM object for a DMA-buf imported from other
device use case and certainly correct.
The various dma_buf_put() in drm_gem_prime_fd_to_handle() is balanced
with the dma_buf_get(prime_fd) at the beginning of the function.
This is for extracting the DMA-buf from the file descriptor and keeping
a reference to it while we are busy importing it (e.g. to prevent a race
when somebody changes the fd at the same time).
As far as I can see this is correct as well.
Regards,
Christian.
>
> 314 if (dev->driver->gem_prime_import)
> 315 obj = dev->driver->gem_prime_import(dev, dma_buf);
> 316 else
> 317 obj = drm_gem_prime_import(dev, dma_buf);
> //#1 call to drm_gem_prime_import
> // ->drm_gem_prime_import_dev
> // ->dma_buf_put
> ...
>
> 336 ret = drm_prime_add_buf_handle(&file_priv->prime,
> 337 dma_buf, *handle);
>
> ...
>
> 342 dma_buf_put(dma_buf); //#3 put again
> 343
> 344 return 0;
> 345
> 346 fail:
>
> 351 dma_buf_put(dma_buf); //#4 put again
> 352 return ret;
>
> 356 out_put:
> 357 mutex_unlock(&file_priv->prime.lock);
> 358 dma_buf_put(dma_buf); //#5 put again
> 359 return ret;
> 360 }
>
> 905 struct drm_gem_object *drm_gem_prime_import_dev
> (struct drm_device *dev,
> 906 struct dma_buf *dma_buf,
> 907 struct device *attach_dev)
> 908 {
>
> ...
>
> 952 fail_unmap:
> 953 dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
> 954 fail_detach:
> 955 dma_buf_detach(dma_buf, attach);
> 956 dma_buf_put(dma_buf); //#2 the first put of dma_buf
> // (unnecessary)
> 957
> 958 return ERR_PTR(ret);
> 959 }
>
> Signed-off-by: Wentao_Liang <Wentao_Liang_g(a)163.com>
> ---
> drivers/gpu/drm/drm_prime.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index 2a54f86856af..cef03ad0d5cd 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -953,7 +953,6 @@ struct drm_gem_object *drm_gem_prime_import_dev(struct drm_device *dev,
> dma_buf_unmap_attachment(attach, sgt, DMA_BIDIRECTIONAL);
> fail_detach:
> dma_buf_detach(dma_buf, attach);
> - dma_buf_put(dma_buf);
>
> return ERR_PTR(ret);
> }
Am 18.08.21 um 15:13 schrieb Sa, Nuno:
>> From: Christian König <christian.koenig(a)amd.com>
>> Sent: Wednesday, August 18, 2021 2:58 PM
>> To: Daniel Vetter <daniel(a)ffwll.ch>
>> Cc: Sa, Nuno <Nuno.Sa(a)analog.com>; linaro-mm-sig(a)lists.linaro.org;
>> dri-devel(a)lists.freedesktop.org; linux-media(a)vger.kernel.org; Rob
>> Clark <rob(a)ti.com>
>> Subject: Re: [Linaro-mm-sig] [PATCH] dma-buf: return -EINVAL if
>> dmabuf object is NULL
>>
>> [External]
>>
>> Am 18.08.21 um 14:46 schrieb Daniel Vetter:
>>> On Wed, Aug 18, 2021 at 02:31:34PM +0200, Christian König wrote:
>>>> Am 18.08.21 um 14:17 schrieb Sa, Nuno:
>>>>>> From: Christian König <christian.koenig(a)amd.com>
>>>>>> Sent: Wednesday, August 18, 2021 2:10 PM
>>>>>> To: Sa, Nuno <Nuno.Sa(a)analog.com>; linaro-mm-
>> sig(a)lists.linaro.org;
>>>>>> dri-devel(a)lists.freedesktop.org; linux-media(a)vger.kernel.org
>>>>>> Cc: Rob Clark <rob(a)ti.com>; Sumit Semwal
>>>>>> <sumit.semwal(a)linaro.org>
>>>>>> Subject: Re: [PATCH] dma-buf: return -EINVAL if dmabuf object
>> is
>>>>>> NULL
>>>>>>
>>>>>> [External]
>>>>>>
>>>>>> To be honest I think the if(WARN_ON(!dmabuf)) return -EINVAL
>>>>>> handling
>>>>>> here is misleading in the first place.
>>>>>>
>>>>>> Returning -EINVAL on a hard coding error is not good practice and
>>>>>> should
>>>>>> probably be removed from the DMA-buf subsystem in general.
>>>>> Would you say to just return 0 then? I don't think that having the
>>>>> dereference is also good..
>>>> No, just run into the dereference.
>>>>
>>>> Passing NULL as the core object you are working on is a hard coding
>> error
>>>> and not something we should bubble up as recoverable error.
>>>>
>>>>> I used -EINVAL to be coherent with the rest of the code.
>>>> I rather suggest to remove the check elsewhere as well.
>>> It's a lot more complicated, and WARN_ON + bail out is rather
>>> well-established code-pattern. There's been plenty of discussions in
>> the
>>> past that a BUG_ON is harmful since it makes debugging a major
>> pain, e.g.
>>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefens…
>> ook.com/?url=https*3A*2F*2Flore.kernel.org*2Flkml*2FCA*2B55aFw
>> yNTLuZgOWMTRuabWobF27ygskuxvFd-P0n-
>> 3UNT*3D0Og*40mail.gmail.com*2F&data=04*7C01*7Cchristian.k
>> oenig*40amd.com*7C19f53e2a2d1843b65adc08d962463b78*7C3dd896
>> 1fe4884e608e11a82d994e183d*7C0*7C0*7C637648876076613233*7CU
>> nknown*7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiL
>> CJBTiI6Ik1haWwiLCJXVCI6Mn0*3D*7C1000&sdata=ajyBnjePRak3
>> o7ObpBAuJNd08HgkANM9C*2BgzOAeHrMk*3D&reserved=0__;J
>> SUlJSUlJSUlJSUlJSUlJSUlJSUlJSU!!A3Ni8CS0y2Y!qiDegx4svPUMZrvnzUo
>> X7VKvvFpDcedH9gYbRCiWfe_N3fw4zpmA54qxefvMiQ$
>>> There's also a checkpatch check for this.
>>>
>>> commit 9d3e3c705eb395528fd8f17208c87581b134da48
>>> Author: Joe Perches <joe(a)perches.com>
>>> Date: Wed Sep 9 15:37:27 2015 -0700
>>>
>>> checkpatch: add warning on BUG/BUG_ON use
>>>
>>> Anyone who is paranoid about security crashes their machine on any
>> WARNING
>>> anyway (like syzkaller does).
>>>
>>> My rule of thumb is that if the WARN_ON + bail-out code is just an if
>>> (WARN_ON()) return; then it's fine, if it's more then BUG_ON is the
>> better
>>> choice perhaps.
>>>
>>> I think the worst choice is just removing all these checks, because a
>> few
>>> code reorgs later you might not Oops immediately afterwards
>> anymore, and
>>> then we'll merge potentially very busted new code. Which is no
>> good.
>>
>> Well BUG_ON(some_codition) is a different problem which I agree on
>> with
>> Linus that this is problematic.
>>
>> But "if (WARN_ON(!dmabuf)) return -EINVAL;" is really bad coding
>> style
>> as well since it hides real problems which are hard errors behind
>> warnings.
> I agree that doing these kind of checks in the core object of an API is
> abusing parameter "validation". I guess a good pattern is having the
> warning and let the code flow. But since these checks are already all
> over the place I'm not sure the way to go. I'm very new to dma-buf
> and I was just checking the code and saw this was not be coherent with
> the rest of the API so I thought it would be a straight easy patch... Well,
> I could not be more wrong :)
Well that existing stuff might actually depend on this is a really good
argument to keep it for now or at least until we have a consent on what
to do.
> Anyways, depending on what's decided, I can send another patch trying
> to make these stuff more coherent. At this point, my feeling is that this
> patch is to be dropped...
At least for now I would hold it back.
Thanks,
Christian.
>
> - Nuno Sá
>
>> Returning -EINVAL indicates a recoverable error which is usually caused
>> by userspace giving invalid parameters and should never be abused to
>> indicate a driver coding error.
>>
>> Functions are either intended to take NULL as valid parameter, e.g. like
>> kfree(NULL). Or they are intended to work on an object which is
>> mandatory to provide.
>>
>> Christian.
>>
>>> -Daniel
>>>
>>>
>>>
>>>> Christian.
>>>>
>>>>> - Nuno Sá
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>> Am 18.08.21 um 13:58 schrieb Nuno Sá:
>>>>>>> On top of warning about a NULL object, we also want to return
>> with a
>>>>>>> proper error code (as done in 'dma_buf_begin_cpu_access()').
>>>>>> Otherwise,
>>>>>>> we will get a NULL pointer dereference.
>>>>>>>
>>>>>>> Fixes: fc13020e086b ("dma-buf: add support for kernel cpu
>> access")
>>>>>>> Signed-off-by: Nuno Sá <nuno.sa(a)analog.com>
>>>>>>> ---
>>>>>>> drivers/dma-buf/dma-buf.c | 3 ++-
>>>>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-
>> buf/dma-
>>>>>> buf.c
>>>>>>> index 63d32261b63f..8ec7876dd523 100644
>>>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>>>> @@ -1231,7 +1231,8 @@ int dma_buf_end_cpu_access(struct
>>>>>> dma_buf *dmabuf,
>>>>>>> {
>>>>>>> int ret = 0;
>>>>>>>
>>>>>>> - WARN_ON(!dmabuf);
>>>>>>> + if (WARN_ON(!dmabuf))
>>>>>>> + return -EINVAL;
>>>>>>>
>>>>>>> might_lock(&dmabuf->resv->lock.base);
>>>>>>>
>>>> _______________________________________________
>>>> Linaro-mm-sig mailing list
>>>> Linaro-mm-sig(a)lists.linaro.org
>>>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Furldefens…
>> ook.com/?url=https*3A*2F*2Flists.linaro.org*2Fmailman*2Flistinfo*2
>> Flinaro-mm-
>> sig&data=04*7C01*7Cchristian.koenig*40amd.com*7C19f53e2a2
>> d1843b65adc08d962463b78*7C3dd8961fe4884e608e11a82d994e183d*
>> 7C0*7C0*7C637648876076613233*7CUnknown*7CTWFpbGZsb3d8eyJ
>> WIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0
>> *3D*7C1000&sdata=0E5L4Kid5ZPeKT8Uxx7K61fBXmI4TOsz*2F5IL
>> sFpLB*2Fo*3D&reserved=0__;JSUlJSUlJSUlJSUlJSUlJSUlJSUl!!A3N
>> i8CS0y2Y!qiDegx4svPUMZrvnzUoX7VKvvFpDcedH9gYbRCiWfe_N3fw4z
>> pmA54oQstzSNA$
Am 18.08.21 um 14:17 schrieb Sa, Nuno:
>> From: Christian König <christian.koenig(a)amd.com>
>> Sent: Wednesday, August 18, 2021 2:10 PM
>> To: Sa, Nuno <Nuno.Sa(a)analog.com>; linaro-mm-sig(a)lists.linaro.org;
>> dri-devel(a)lists.freedesktop.org; linux-media(a)vger.kernel.org
>> Cc: Rob Clark <rob(a)ti.com>; Sumit Semwal
>> <sumit.semwal(a)linaro.org>
>> Subject: Re: [PATCH] dma-buf: return -EINVAL if dmabuf object is
>> NULL
>>
>> [External]
>>
>> To be honest I think the if(WARN_ON(!dmabuf)) return -EINVAL
>> handling
>> here is misleading in the first place.
>>
>> Returning -EINVAL on a hard coding error is not good practice and
>> should
>> probably be removed from the DMA-buf subsystem in general.
> Would you say to just return 0 then? I don't think that having the
> dereference is also good..
No, just run into the dereference.
Passing NULL as the core object you are working on is a hard coding
error and not something we should bubble up as recoverable error.
> I used -EINVAL to be coherent with the rest of the code.
I rather suggest to remove the check elsewhere as well.
Christian.
>
> - Nuno Sá
>
>> Christian.
>>
>> Am 18.08.21 um 13:58 schrieb Nuno Sá:
>>> On top of warning about a NULL object, we also want to return with a
>>> proper error code (as done in 'dma_buf_begin_cpu_access()').
>> Otherwise,
>>> we will get a NULL pointer dereference.
>>>
>>> Fixes: fc13020e086b ("dma-buf: add support for kernel cpu access")
>>> Signed-off-by: Nuno Sá <nuno.sa(a)analog.com>
>>> ---
>>> drivers/dma-buf/dma-buf.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-
>> buf.c
>>> index 63d32261b63f..8ec7876dd523 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -1231,7 +1231,8 @@ int dma_buf_end_cpu_access(struct
>> dma_buf *dmabuf,
>>> {
>>> int ret = 0;
>>>
>>> - WARN_ON(!dmabuf);
>>> + if (WARN_ON(!dmabuf))
>>> + return -EINVAL;
>>>
>>> might_lock(&dmabuf->resv->lock.base);
>>>
To be honest I think the if(WARN_ON(!dmabuf)) return -EINVAL handling
here is misleading in the first place.
Returning -EINVAL on a hard coding error is not good practice and should
probably be removed from the DMA-buf subsystem in general.
Christian.
Am 18.08.21 um 13:58 schrieb Nuno Sá:
> On top of warning about a NULL object, we also want to return with a
> proper error code (as done in 'dma_buf_begin_cpu_access()'). Otherwise,
> we will get a NULL pointer dereference.
>
> Fixes: fc13020e086b ("dma-buf: add support for kernel cpu access")
> Signed-off-by: Nuno Sá <nuno.sa(a)analog.com>
> ---
> drivers/dma-buf/dma-buf.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 63d32261b63f..8ec7876dd523 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -1231,7 +1231,8 @@ int dma_buf_end_cpu_access(struct dma_buf *dmabuf,
> {
> int ret = 0;
>
> - WARN_ON(!dmabuf);
> + if (WARN_ON(!dmabuf))
> + return -EINVAL;
>
> might_lock(&dmabuf->resv->lock.base);
>
On Wed, Aug 18, 2021 at 03:38:23PM +0800, Desmond Cheong Zhi Xi wrote:
> The task_work_add function is needed to prevent userspace races with
> DRM modesetting rights.
>
> Some DRM ioctls can change modesetting permissions while other
> concurrent users are performing modesetting. To prevent races with
> userspace, such functions should flush readers of old permissions
> before returning to user mode. As the function that changes
> permissions might itself be a reader of the old permissions, we intend
> to schedule this flush using task_work_add.
>
> However, when DRM is compiled as a loadable kernel module without
> exporting task_work_add, we get the following compilation error:
>
> ERROR: modpost: "task_work_add" [drivers/gpu/drm/drm.ko] undefined!
>
> Reported-by: kernel test robot <lkp(a)intel.com>
> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
Just realized another benefit of pushing the dev->master_rwsem write
locks down into ioctls that need them: We wouldn't need this function here
exported for use in drm. But also I'm not sure that works any better than
the design in your current patch set ...
-Daniel
> ---
> kernel/task_work.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/task_work.c b/kernel/task_work.c
> index 1698fbe6f0e1..90000404af2b 100644
> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -60,6 +60,7 @@ int task_work_add(struct task_struct *task, struct callback_head *work,
>
> return 0;
> }
> +EXPORT_SYMBOL(task_work_add);
>
> /**
> * task_work_cancel_match - cancel a pending work added by task_work_add()
> --
> 2.25.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Wed, Aug 18, 2021 at 03:38:22PM +0800, Desmond Cheong Zhi Xi wrote:
> In a future patch, a read lock on drm_device.master_rwsem is
> held in the ioctl handler before the check for ioctl
> permissions. However, this produces the following lockdep splat:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 5.14.0-rc6-CI-Patchwork_20831+ #1 Tainted: G U
> ------------------------------------------------------
> kms_lease/1752 is trying to acquire lock:
> ffffffff827bad88 (drm_global_mutex){+.+.}-{3:3}, at: drm_open+0x64/0x280
>
> but task is already holding lock:
> ffff88812e350108 (&dev->master_rwsem){++++}-{3:3}, at:
> drm_ioctl_kernel+0xfb/0x1a0
>
> which lock already depends on the new lock.
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&dev->master_rwsem){++++}-{3:3}:
> lock_acquire+0xd3/0x310
> down_read+0x3b/0x140
> drm_master_internal_acquire+0x1d/0x60
> drm_client_modeset_commit+0x10/0x40
> __drm_fb_helper_restore_fbdev_mode_unlocked+0x88/0xb0
> drm_fb_helper_set_par+0x34/0x40
> intel_fbdev_set_par+0x11/0x40 [i915]
> fbcon_init+0x270/0x4f0
> visual_init+0xc6/0x130
> do_bind_con_driver+0x1de/0x2c0
> do_take_over_console+0x10e/0x180
> do_fbcon_takeover+0x53/0xb0
> register_framebuffer+0x22d/0x310
> __drm_fb_helper_initial_config_and_unlock+0x36c/0x540
> intel_fbdev_initial_config+0xf/0x20 [i915]
> async_run_entry_fn+0x28/0x130
> process_one_work+0x26d/0x5c0
> worker_thread+0x37/0x390
> kthread+0x13b/0x170
> ret_from_fork+0x1f/0x30
>
> -> #1 (&helper->lock){+.+.}-{3:3}:
> lock_acquire+0xd3/0x310
> __mutex_lock+0xa8/0x930
> __drm_fb_helper_restore_fbdev_mode_unlocked+0x44/0xb0
> intel_fbdev_restore_mode+0x2b/0x50 [i915]
> drm_lastclose+0x27/0x50
> drm_release_noglobal+0x42/0x60
> __fput+0x9e/0x250
> task_work_run+0x6b/0xb0
> exit_to_user_mode_prepare+0x1c5/0x1d0
> syscall_exit_to_user_mode+0x19/0x50
> do_syscall_64+0x46/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> -> #0 (drm_global_mutex){+.+.}-{3:3}:
> validate_chain+0xb39/0x1e70
> __lock_acquire+0x5a1/0xb70
> lock_acquire+0xd3/0x310
> __mutex_lock+0xa8/0x930
> drm_open+0x64/0x280
> drm_stub_open+0x9f/0x100
> chrdev_open+0x9f/0x1d0
> do_dentry_open+0x14a/0x3a0
> dentry_open+0x53/0x70
> drm_mode_create_lease_ioctl+0x3cb/0x970
> drm_ioctl_kernel+0xc9/0x1a0
> drm_ioctl+0x201/0x3d0
> __x64_sys_ioctl+0x6a/0xa0
> do_syscall_64+0x37/0xb0
> entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> other info that might help us debug this:
> Chain exists of:
> drm_global_mutex --> &helper->lock --> &dev->master_rwsem
> Possible unsafe locking scenario:
> CPU0 CPU1
> ---- ----
> lock(&dev->master_rwsem);
> lock(&helper->lock);
> lock(&dev->master_rwsem);
> lock(drm_global_mutex);
>
> *** DEADLOCK ***
>
> The lock hierarchy inversion happens because we grab the
> drm_global_mutex while already holding on to master_rwsem. To avoid
> this, we do some prep work to grab the drm_global_mutex before
> checking for ioctl permissions.
>
> At the same time, we update the check for the global mutex to use the
> drm_dev_needs_global_mutex helper function.
This is intentional, essentially we force all non-legacy drivers to have
unlocked ioctl (otherwise everyone forgets to set that flag).
For non-legacy drivers the global lock only ensures ordering between
drm_open and lastclose (I think at least), and between
drm_dev_register/unregister and the backwards ->load/unload callbacks
(which are called in the wrong place, but we cannot fix that for legacy
drivers).
->load/unload should be completely unused (maybe radeon still uses it),
and ->lastclose is also on the decline.
Maybe we should update the comment of drm_global_mutex to explain what it
protects and why.
I'm also confused how this patch connects to the splat, since for i915 we
shouldn't be taking the drm_global_lock here at all. The problem seems to
be the drm_open_helper when we create a new lease, which is an entirely
different can of worms.
I'm honestly not sure how to best do that, but we should be able to create
a file and then call drm_open_helper directly, or well a version of that
which never takes the drm_global_mutex. Because that is not needed for
nested drm_file opening:
- legacy drivers never go down this path because leases are only supported
with modesetting, and modesetting is only supported for non-legacy
drivers
- the races against dev->open_count due to last_close or ->load callbacks
don't matter, because for the entire ioctl we already have an open
drm_file and that wont disappear.
So this should work, but I'm not entirely sure how to make it work.
-Daniel
> Signed-off-by: Desmond Cheong Zhi Xi <desmondcheongzx(a)gmail.com>
> ---
> drivers/gpu/drm/drm_ioctl.c | 18 +++++++++---------
> 1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
> index 880fc565d599..2cb57378a787 100644
> --- a/drivers/gpu/drm/drm_ioctl.c
> +++ b/drivers/gpu/drm/drm_ioctl.c
> @@ -779,19 +779,19 @@ long drm_ioctl_kernel(struct file *file, drm_ioctl_t *func, void *kdata,
> if (drm_dev_is_unplugged(dev))
> return -ENODEV;
>
> + /* Enforce sane locking for modern driver ioctls. */
> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> + mutex_lock(&drm_global_mutex);
> +
> retcode = drm_ioctl_permit(flags, file_priv);
> if (unlikely(retcode))
> - return retcode;
> + goto out;
>
> - /* Enforce sane locking for modern driver ioctls. */
> - if (likely(!drm_core_check_feature(dev, DRIVER_LEGACY)) ||
> - (flags & DRM_UNLOCKED))
> - retcode = func(dev, kdata, file_priv);
> - else {
> - mutex_lock(&drm_global_mutex);
> - retcode = func(dev, kdata, file_priv);
> + retcode = func(dev, kdata, file_priv);
> +
> +out:
> + if (unlikely(drm_dev_needs_global_mutex(dev)) && !(flags & DRM_UNLOCKED))
> mutex_unlock(&drm_global_mutex);
> - }
> return retcode;
> }
> EXPORT_SYMBOL(drm_ioctl_kernel);
> --
> 2.25.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch