On Mon, 21 Jun 2021 at 15:20, LinMa linma@zju.edu.cn wrote: [SNIP]
However, it seems that this patch breaks the rule and we have to figure out a better one. T^T (I just hope this patch won't introduce any security impacts but just this warning BUG, at least it will help with the previous UAF one)
Out of curiosity (since I'm seeing this warning a lot), I turned on some of the various lock debugging options, and get the following: [ 171.250942] ====================================================== [ 171.250945] WARNING: possible circular locking dependency detected [ 171.250949] 5.10.45-lockdep #72 Tainted: G W [ 171.250952] ------------------------------------------------------ [ 171.250955] kworker/u4:30/3998 is trying to acquire lock: [ 171.250958] ffff892194614130 (sk_lock-AF_BLUETOOTH-BTPROTO_HCI){+.+.}-{0:0}, at: hci_sock_dev_event+0x160/0x1f6 [bluetooth] [ 171.250974] but task is already holding lock: [ 171.250977] ffffffffc08a4808 (hci_sk_list.lock){++++}-{2:2}, at: hci_sock_dev_event+0x134/0x1f6 [bluetooth] [ 171.250993] which lock already depends on the new lock.
[ 171.250996] the existing dependency chain (in reverse order) is: [ 171.250999] -> #1 (hci_sk_list.lock){++++}-{2:2}: [ 171.251008] _raw_read_lock+0x3e/0x7c [ 171.251020] hci_send_to_channel+0x27/0x4b [bluetooth] [ 171.251032] hci_sock_sendmsg+0x8fe/0x92a [bluetooth] [ 171.251037] sock_sendmsg+0x72/0x76 [ 171.251041] ____sys_sendmsg+0x16c/0x1e5 [ 171.251045] ___sys_sendmsg+0x95/0xd1 [ 171.251048] __sys_sendmsg+0x86/0xc0 [ 171.251052] do_syscall_64+0x43/0x55 [ 171.251056] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 171.251059] -> #0 (sk_lock-AF_BLUETOOTH-BTPROTO_HCI){+.+.}-{0:0}: [ 171.251068] __lock_acquire+0x1519/0x2a2f [ 171.251072] lock_acquire+0x191/0x265 [ 171.251076] lock_sock_nested+0x7b/0x8a [ 171.251087] hci_sock_dev_event+0x160/0x1f6 [bluetooth] [ 171.251099] hci_unregister_dev+0x16a/0x313 [bluetooth] [ 171.251103] btusb_disconnect+0x77/0x127 [btusb] [ 171.251107] usb_unbind_interface+0xa9/0x231 [ 171.251111] device_release_driver_internal+0x100/0x1a2 [ 171.251115] unbind_marked_interfaces+0x4e/0x69 [ 171.251118] usb_resume+0x59/0x66 [ 171.251122] dpm_run_callback+0x48/0x95 [ 171.251125] device_resume+0x1f3/0x25d [ 171.251128] async_resume+0x1d/0x42 [ 171.251132] async_run_entry_fn+0x3d/0xd1 [ 171.251137] process_one_work+0x2a1/0x51c [ 171.251141] worker_thread+0x215/0x376 [ 171.251145] kthread+0x159/0x168 [ 171.251149] ret_from_fork+0x22/0x30 [ 171.251152] other info that might help us debug this:
[ 171.251155] Possible unsafe locking scenario:
[ 171.251158] CPU0 CPU1 [ 171.251161] ---- ---- [ 171.251163] lock(hci_sk_list.lock); [ 171.251168] lock(sk_lock-AF_BLUETOOTH-BTPROTO_HCI); [ 171.251172] lock(hci_sk_list.lock); [ 171.251176] lock(sk_lock-AF_BLUETOOTH-BTPROTO_HCI); [ 171.251181] *** DEADLOCK ***
It looks like there's a potential deadlock between hci_sock_sendmsg() and hci_sock_dev_event(). In particular: hci_sock_sendmsg(channel == HCI_CHANNEL_LOGGING) acquires lock_sock(sk); -> hci_logging_frame() -> hci_send_to_channel() acquires read_lock(&hci_sk_list.lock);
and: hci_sock_dev_event() acquires read_lock(&hci_sk_list.lock); then lock_sock(sk);
Granted, this is likely very rare. But I think it is still a concern.
My direct idea is to replace the hci_sk_list.lock to another sleep-able lock too. Or we have to craft the logic to allow the HCI_DEV_UNREG event to signal other functions to abandon the lock. I'm going to working on this, and hope to get some suggestions just like before.
And Greg, really sorry to submit this not properly tested patch. Please pardon me for this unintended mistake. :(
Regards Lin Ma