Hi,
I got the following issue while I was testing loopack protocol on Zephyr: [76617.493324] greybus 1-2.2: Interface added (greybus) [76617.493328] greybus 1-2.2: GMP VID=0x00000126, PID=0x00000126 [76617.493330] greybus 1-2.2: DDBL1 Manufacturer=0x00000126, Product=0x00000126 [76618.515124] greybus greybus1: 3/2:0: synchronous operation id 0x0002 of type 0x01 failed: -110 [76618.515136] greybus 1-2.2: failed to get control-protocol version: -110 [76619.539015] greybus 1-2.2.ctrl: failed to send disconnecting: -110 [76619.539029] greybus greybus1: 3/2:0: failed to send disconnecting: -110 [76620.562921] greybus greybus1: 3/2:0: failed to send cport shutdown (phase 1): -110 [76621.586890] greybus greybus1: 3/2:0: failed to send cport shutdown (phase 2): -110 [76621.587036] greybus 1-2: failed to enable interface 2: -110 [81851.515405] greybus greybus1: 0/0:0: synchronous operation id 0x0c41 of type 0x19 failed: -110 [81852.539233] greybus greybus1: 0/0:0: synchronous operation id 0x0c43 of type 0x0e failed: -110 [81852.795204] greybus greybus1: 0/0:0: synchronous operation id 0x0c42 of type 0x13 failed: -110 [81852.795210] greybus 1-svc: SVC ping has returned -110, something is wrong!!! [81852.795212] greybus 1-svc: Resetting the greybus network, watch out!!! [81852.795356] greybus 1-2.2: Interface removed [81852.795389] loopback 1-1.1.1: failed to resume bundle: -22 [81852.795392] loopback 1-1.1.1: pm_runtime_get_sync failed: -22 [81852.795421] general protection fault: 0000 [#1] PREEMPT SMP [81852.795449] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables cdc_subset cdc_ether usbnet mii hid_logitech_hidpp hid_logitech_dj usbhid hid snd_seq_dummy snd_seq snd_seq_device vboxnetadp(O) pci_stub vboxpci(O) vboxnetflt(O) vboxdrv(O) pl2303 usbserial gb_netlink(C) gb_loopback(C) greybus(C) fuse ctr ccm bnep nls_utf8 nls_cp437 vfat fat joydev intel_rapl x86_pkg_temp_thermal uvcvideo intel_powerclamp videobuf2_vmalloc coretemp [81852.795794] videobuf2_memops arc4 kvm_intel videobuf2_v4l2 kvm videobuf2_core videodev irqbypass crct10dif_pclmul media btusb crc32_pclmul btrtl ghash_clmulni_intel iwlmvm btbcm aesni_intel btintel aes_x86_64 bluetooth lrw mac80211 gf128mul iTCO_wdt glue_helper crc16 ablk_helper iTCO_vendor_support snd_hda_codec_realtek cryptd efi_pstore snd_hda_codec_hdmi snd_hda_codec_generic psmouse efivars pcspkr iwlwifi rtsx_pci_ms cfg80211 memstick snd_hda_intel thermal snd_hda_codec wmi thinkpad_acpi snd_hda_core e1000e nvram snd_hwdep rfkill snd_pcm battery ac mei_me tpm_tis snd_timer ptp evdev i2c_i801 tpm_tis_core mei snd tpm pps_core lpc_ich i2c_smbus shpchp soundcore nfsd auth_rpcgss oid_registry nfs_acl lockd grace sunrpc sch_fq_codel sd_mod mmc_block rtsx_pci_sdmmc mmc_core crc32c_intel ahci libahci [81852.796155] serio_raw i915 ehci_pci libata ehci_hcd scsi_mod usbcore rtsx_pci mfd_core video button efivarfs [81852.796205] CPU: 0 PID: 16298 Comm: gbridge Tainted: G WC O 4.9.0+ #8 [81852.796231] Hardware name: LENOVO 20AQCTO1WW/20AQCTO1WW, BIOS GJET77WW (2.27 ) 05/20/2014 [81852.796259] task: ffff8801656601c0 task.stack: ffffc90002660000 [81852.796282] RIP: 0010:[<ffffffff81098a45>] [<ffffffff81098a45>] kthread_stop+0x25/0x140 [81852.796317] RSP: 0018:ffffc90002663988 EFLAGS: 00010202 [81852.796336] RAX: 1da0ffffffffffc8 RBX: 1da0ffffffffffc8 RCX: 1da1000000000000 [81852.796361] RDX: 0000000000000005 RSI: 0000000000000005 RDI: ffff880251aa1080 [81852.796386] RBP: ffff880251aa1080 R08: ffff880251aa1a08 R09: 0000000000ffff0a [81852.796411] R10: 0000000000000000 R11: 0000000000000ffc R12: ffff880241cc96f8 [81852.796436] R13: ffffffffa09fa020 R14: ffff8802caa1cee8 R15: ffff88023ecf5200 [81852.796462] FS: 00007fb7e17917c0(0000) GS:ffff88031e200000(0000) knlGS:0000000000000000 [81852.796490] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [81852.796517] CR2: 00007fae1131ca80 CR3: 0000000251a26000 CR4: 00000000001406f0 [81852.796533] Stack: [81852.796540] ffff8802ce703e00 ffff880241cc9400 ffff880241cc96f8 ffffffffa09f8606 [81852.796561] ffffffff816f217a ffffffff816f1dcd ffffffffa09d67a6 ffff8800080e8300 [81852.796582] ffff880241cc9400 00000000aa6146f5 ffff880241cc96c8 ffff880241cc9400 [81852.796602] Call Trace: [81852.796613] [<ffffffffa09f8606>] ? gb_loopback_disconnect+0x56/0x1a0 [gb_loopback] [81852.796632] [<ffffffff816f217a>] ? _raw_spin_lock_irq+0x1a/0x40 [81852.796647] [<ffffffff816f1dcd>] ? _raw_spin_unlock_irq+0x1d/0x30 [81852.796668] [<ffffffffa09d67a6>] ? gb_connection_disable_rx+0x36/0x140 [greybus] [81852.796688] [<ffffffffa09d20f4>] ? greybus_remove+0x84/0x190 [greybus] [81852.796705] [<ffffffff81555715>] ? __device_release_driver+0x95/0x140 [81852.796721] [<ffffffff815557de>] ? device_release_driver+0x1e/0x30 [81852.796737] [<ffffffff81554465>] ? bus_remove_device+0xf5/0x170 [81852.796753] [<ffffffff815508f7>] ? device_del+0x127/0x260 [81852.796769] [<ffffffffa09d626d>] ? gb_bundle_destroy+0x1d/0x90 [greybus] [81852.796787] [<ffffffffa09d4e17>] ? gb_interface_disable.part.12+0x57/0x1a0 [greybus] [81852.796807] [<ffffffffa09d3f1b>] ? gb_module_del+0x5b/0xf0 [greybus] [81852.796825] [<ffffffffa09da8ee>] ? gb_svc_del+0x9e/0xe0 [greybus] [81852.796841] [<ffffffffa09d2d8b>] ? gb_hd_del+0x1b/0x80 [greybus] [81852.796857] [<ffffffffa090302a>] ? _gb_netlink_exit+0x1a/0x30 [gb_netlink] [81852.796874] [<ffffffffa09030b5>] ? gb_netlink_hd_reset+0x15/0x30 [gb_netlink] [81852.796892] [<ffffffff8162414f>] ? genl_family_rcv_msg+0x1bf/0x360 [81852.796907] [<ffffffff816242f0>] ? genl_family_rcv_msg+0x360/0x360 [81852.796922] [<ffffffff81624360>] ? genl_rcv_msg+0x70/0xb0 [81852.797918] [<ffffffff81623931>] ? netlink_rcv_skb+0xa1/0xc0 [81852.798909] [<ffffffff81623f74>] ? genl_rcv+0x24/0x40 [81852.799975] [<ffffffff8162331c>] ? netlink_unicast+0x16c/0x200 [81852.801015] [<ffffffff816236ed>] ? netlink_sendmsg+0x33d/0x3b0 [81852.801985] [<ffffffff815cf380>] ? sock_sendmsg+0x30/0x40 [81852.802956] [<ffffffff815d01df>] ? ___sys_sendmsg+0x27f/0x290 [81852.803950] [<ffffffff811eff86>] ? mem_cgroup_event_ratelimit.isra.34+0x36/0xa0 [81852.804926] [<ffffffff811f2e1a>] ? memcg_check_events+0x2a/0x1b0 [81852.805900] [<ffffffff8118d8c5>] ? __lru_cache_add+0x75/0xa0 [81852.806880] [<ffffffff816f1d56>] ? _raw_spin_unlock+0x16/0x30 [81852.807881] [<ffffffff811b48ec>] ? handle_mm_fault+0x3dc/0x13e0 [81852.808861] [<ffffffff8108734f>] ? do_send_specific+0x7f/0x90 [81852.809843] [<ffffffff8121db60>] ? __fget+0x70/0xc0 [81852.810819] [<ffffffff815d0b01>] ? __sys_sendmsg+0x51/0x90 [81852.811817] [<ffffffff816f247b>] ? entry_SYSCALL_64_fastpath+0x1e/0xad [81852.812794] Code: 1f 80 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 fd 53 0f 1f 44 00 00 f0 ff 45 18 48 89 ef e8 e3 f2 ff ff 48 85 c0 48 89 c3 74 28 <f0> 80 08 02 48 89 c6 48 89 ef e8 4c ff ff ff 48 89 ef e8 64 ae [81852.813896] RIP [<ffffffff81098a45>] kthread_stop+0x25/0x140 [81852.814952] RSP <ffffc90002663988> [81852.822091] ---[ end trace 73645b448dac6603 ]--- [81853.307086] svc_watchdog: calling "/system/bin/start unipro_reset" to reset greybus network!
The issue happens when I rmmod gb_loopback after some control operations failed (because of firmware issues). Actually, the thread in loopback exits when pm_runtime fail. But when we unload the loopback module, we call kthread_stop(), whereas the thread has already exited.
The following workaround unblocked me: diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index ada00cb..156b09d 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -994,8 +994,11 @@ static int gb_loopback_fn(void *data) wait_event_interruptible(gb->wq, gb->type || kthread_should_stop()); ret = gb_pm_runtime_get_sync(bundle); - if (ret) + if (ret) { + wait_event_interruptible(gb->wq, + kthread_should_stop()); return ret; + } }
if (kthread_should_stop())
I guess that's not a good way to fix the issue. I think the best thing to do will be to just trigger a disconnect. Exiting the thread let loopback in incorrect state: nothing will work because the thread stopped but sysfs / debugfs files are still there.
I'm volunteer to fix the issue but I'm not sure how to do it, and may be there is other better solutions I might have missed. So, all suggestion are welcome.
Thanks, Alexandre