Following kernel warnings and crash notices on arm64 Rpi4 device while running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Please refer to test log links for detailed test plan and kernel crash logs. It is reproducible on arm64 juno-r2, Rpi4 and Qualcomm dragonboard 410c and qemu-arm64.
Test log: ----------- kselftest: Running tests in gpio TAP version 13 1..2 # selftests: gpio: gpio-mockup.sh # 1. Module load tests [ 61.176149] ============================================================================= [ 61.176802] [ 61.176807] ====================================================== [ 61.176809] WARNING: possible circular locking dependency detected [ 61.176811] 6.3.0-rc1-next-20230307 #1 Not tainted [ 61.176814] ------------------------------------------------------ [ 61.176816] modprobe/510 is trying to acquire lock: [ 61.176818] ffff80000b2284e8 (console_owner){..-.}-{0:0}, at: console_flush_all (kernel/printk/printk.c:2879 kernel/printk/printk.c:2942) [ 61.176846] [ 61.176846] but task is already holding lock: [ 61.176848] ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node.part.0 (mm/slub.c:2271) [ 61.176861] [ 61.176861] which lock already depends on the new lock. [ 61.176861] [ 61.176863] [ 61.176863] the existing dependency chain (in reverse order) is: [ 61.176864] [ 61.176864] -> #2 (&n->list_lock){-.-.}-{2:2}: [ 61.176871] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.176879] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 61.176885] get_partial_node.part.0 (mm/slub.c:2271) [ 61.176890] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 61.176894] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 61.176899] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 61.176903] __kmalloc (mm/slab_common.c:968 mm/slab_common.c:980) [ 61.176908] tty_buffer_alloc (drivers/tty/tty_buffer.c:182) [ 61.176914] __tty_buffer_request_room (drivers/tty/tty_buffer.c:279) [ 61.176919] __tty_insert_flip_char (drivers/tty/tty_buffer.c:398) [ 61.176924] uart_insert_char (drivers/tty/serial/serial_core.c:3341) [ 61.176929] pl011_fifo_to_tty.isra.0 (drivers/tty/serial/amba-pl011.c:314) [ 61.176934] pl011_int (include/linux/spinlock.h:390 drivers/tty/serial/amba-pl011.c:1396 drivers/tty/serial/amba-pl011.c:1571) [ 61.176937] __handle_irq_event_percpu (kernel/irq/handle.c:158) [ 61.176941] handle_irq_event (kernel/irq/handle.c:193 kernel/irq/handle.c:210) [ 61.176944] handle_fasteoi_irq (kernel/irq/chip.c:716) [ 61.176950] generic_handle_domain_irq (kernel/irq/irqdesc.c:652 kernel/irq/irqdesc.c:707) [ 61.176953] gic_handle_irq (arch/arm64/include/asm/io.h:75 include/asm-generic/io.h:335 drivers/irqchip/irq-gic.c:344) [ 61.176958] call_on_irq_stack (arch/arm64/kernel/entry.S:905) [ 61.176962] do_interrupt_handler (arch/arm64/kernel/entry-common.c:274) [ 61.176968] el1_interrupt (arch/arm64/kernel/entry-common.c:472 arch/arm64/kernel/entry-common.c:486) [ 61.176971] el1h_64_irq_handler (arch/arm64/kernel/entry-common.c:492) [ 61.176975] el1h_64_irq (arch/arm64/kernel/entry.S:587) [ 61.176978] __kmem_cache_alloc_node (mm/slub.c:3490) [ 61.176983] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 61.176986] inet6_dump_fib (net/ipv6/ip6_fib.c:657) [ 61.176991] rtnl_dump_all (net/core/rtnetlink.c:3964) [ 61.176997] netlink_dump (net/netlink/af_netlink.c:2296) [ 61.177004] netlink_recvmsg (net/netlink/af_netlink.c:2024) [ 61.177009] ____sys_recvmsg (net/socket.c:1015 net/socket.c:1036 net/socket.c:2723) [ 61.177014] ___sys_recvmsg (net/socket.c:2765) [ 61.177019] __sys_recvmsg (include/linux/file.h:31 net/socket.c:2797) [ 61.177025] __arm64_sys_recvmsg (net/socket.c:2802) [ 61.177030] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 61.177037] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 61.177043] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 61.177049] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 61.177052] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 61.177055] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 61.177058] [ 61.177058] -> #1 (&port_lock_key){-.-.}-{2:2}: [ 61.177065] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.177071] _raw_spin_lock_irqsave (include/linux/spinlock_api_smp.h:111 kernel/locking/spinlock.c:162) [ 61.177074] serial8250_console_write (drivers/tty/serial/8250/8250_port.c:3394) [ 61.177082] univ8250_console_write (drivers/tty/serial/8250/8250_core.c:585) [ 61.177087] console_flush_all (kernel/printk/printk.c:2888 kernel/printk/printk.c:2942) [ 61.177093] console_unlock.part.0 (kernel/printk/printk.c:3017) [ 61.177098] vprintk_emit (kernel/printk/printk.c:2317) [ 61.177104] vprintk_default (kernel/printk/printk.c:2328) [ 61.177110] vprintk (kernel/printk/printk_safe.c:50) [ 61.177116] _printk (kernel/printk/printk.c:2341) [ 61.177121] register_console (kernel/printk/printk.c:3468) [ 61.177126] uart_add_one_port (drivers/tty/serial/serial_core.c:2579 drivers/tty/serial/serial_core.c:3100) [ 61.177130] serial8250_register_8250_port (drivers/tty/serial/8250/8250_core.c:1093) [ 61.177135] bcm2835aux_serial_probe (drivers/tty/serial/8250/8250_bcm2835aux.c:184) [ 61.177141] platform_probe (drivers/base/platform.c:1405) [ 61.177148] really_probe (drivers/base/dd.c:552 drivers/base/dd.c:631) [ 61.177152] __driver_probe_device (drivers/base/dd.c:768) [ 61.177157] driver_probe_device (drivers/base/dd.c:798) [ 61.177161] __driver_attach (drivers/base/dd.c:1185) [ 61.177166] bus_for_each_dev (drivers/base/bus.c:368) [ 61.177170] driver_attach (drivers/base/dd.c:1202) [ 61.177173] bus_add_driver (drivers/base/bus.c:673) [ 61.177177] driver_register (drivers/base/driver.c:246) [ 61.177182] __platform_driver_register (drivers/base/platform.c:868) [ 61.177188] bcm2835aux_serial_driver_init (drivers/tty/serial/8250/8250_bcm2835aux.c:233) [ 61.177195] do_one_initcall (init/main.c:1306) [ 61.177199] kernel_init_freeable (init/main.c:1378 init/main.c:1395 init/main.c:1414 init/main.c:1634) [ 61.177207] kernel_init (init/main.c:1524) [ 61.177212] ret_from_fork (arch/arm64/kernel/entry.S:871) [ 61.177216] [ 61.177216] -> #0 (console_owner){..-.}-{0:0}: [ 61.177222] __lock_acquire (kernel/locking/lockdep.c:3099 kernel/locking/lockdep.c:3217 kernel/locking/lockdep.c:3832 kernel/locking/lockdep.c:5056) [ 61.177228] lock_acquire.part.0 (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5671) [ 61.177233] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.177238] console_flush_all (kernel/printk/printk.c:2883 kernel/printk/printk.c:2942) [ 61.177244] console_unlock.part.0 (kernel/printk/printk.c:3017) [ 61.177250] vprintk_emit (kernel/printk/printk.c:2317) [ 61.177255] vprintk_default (kernel/printk/printk.c:2328) [ 61.177261] vprintk (kernel/printk/printk_safe.c:50) [ 61.177267] _printk (kernel/printk/printk.c:2341) [ 61.177271] slab_bug (mm/slub.c:892) [ 61.177274] check_bytes_and_report (mm/slub.c:1054) [ 61.177279] check_object (mm/slub.c:1196 (discriminator 2)) [ 61.177283] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425) [ 61.177287] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279) [ 61.177291] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 61.177295] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 61.177300] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 61.177304] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 61.177308] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486) [ 61.177311] platform_device_add (drivers/base/platform.c:717) [ 61.177317] platform_device_register_full (drivers/base/platform.c:844) [ 61.177323] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup [ 61.177342] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 61.177352] do_one_initcall (init/main.c:1306) [ 61.177356] do_init_module (kernel/module/main.c:2457) [ 61.177363] load_module (kernel/module/main.c:2859) [ 61.177369] __do_sys_finit_module (kernel/module/main.c:2961) [ 61.177375] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 61.177381] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 61.177387] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 61.177393] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 61.177398] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 61.177402] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 61.177405] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 61.177408] [ 61.177408] other info that might help us debug this: [ 61.177408] [ 61.177410] Chain exists of: [ 61.177410] console_owner --> &port_lock_key --> &n->list_lock [ 61.177410] [ 61.177417] Possible unsafe locking scenario: [ 61.177417] [ 61.177418] CPU0 CPU1 [ 61.177419] ---- ---- [ 61.177420] lock(&n->list_lock); [ 61.177423] lock(&port_lock_key); [ 61.177426] lock(&n->list_lock); [ 61.177429] lock(console_owner); [ 61.177432] [ 61.177432] *** DEADLOCK *** [ 61.177432] [ 61.177434] 3 locks held by modprobe/510: [ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node.part.0 (mm/slub.c:2271) [ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at: vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315) [ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at: console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290 kernel/printk/printk.c:2934) [ 61.177471] [ 61.177471] stack backtrace: [ 61.177474] CPU: 3 PID: 510 Comm: modprobe Not tainted 6.3.0-rc1-next-20230307 #1 [ 61.177479] Hardware name: Raspberry Pi 4 Model B (DT) [ 61.177482] Call trace: [ 61.177483] dump_backtrace (arch/arm64/kernel/stacktrace.c:160) [ 61.177487] show_stack (arch/arm64/kernel/stacktrace.c:167) [ 61.177490] dump_stack_lvl (lib/dump_stack.c:107) [ 61.177498] dump_stack (lib/dump_stack.c:114) [ 61.177504] print_circular_bug (kernel/locking/lockdep.c:2057) [ 61.177509] check_noncircular (kernel/locking/lockdep.c:2181) [ 61.177514] __lock_acquire (kernel/locking/lockdep.c:3099 kernel/locking/lockdep.c:3217 kernel/locking/lockdep.c:3832 kernel/locking/lockdep.c:5056) [ 61.177520] lock_acquire.part.0 (arch/arm64/include/asm/percpu.h:40 kernel/locking/lockdep.c:467 kernel/locking/lockdep.c:5671) [ 61.177525] lock_acquire (kernel/locking/lockdep.c:5673) [ 61.177530] console_flush_all (kernel/printk/printk.c:2883 kernel/printk/printk.c:2942) [ 61.177536] console_unlock.part.0 (kernel/printk/printk.c:3017) [ 61.177542] vprintk_emit (kernel/printk/printk.c:2317) [ 61.177547] vprintk_default (kernel/printk/printk.c:2328) [ 61.177553] vprintk (kernel/printk/printk_safe.c:50) [ 61.177559] _printk (kernel/printk/printk.c:2341) [ 61.177564] slab_bug (mm/slub.c:892) [ 61.177567] check_bytes_and_report (mm/slub.c:1054) [ 61.177571] check_object (mm/slub.c:1196 (discriminator 2)) [ 61.177575] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425) [ 61.177579] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279) [ 61.177583] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 61.177587] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 61.177592] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 61.177596] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 61.177600] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486) [ 61.177603] platform_device_add (drivers/base/platform.c:717) [ 61.177609] platform_device_register_full (drivers/base/platform.c:844) [ 61.177615] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup [ 61.177625] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 61.177634] do_one_initcall (init/main.c:1306) [ 61.177638] do_init_module (kernel/module/main.c:2457) [ 61.177644] load_module (kernel/module/main.c:2859) [ 61.177650] __do_sys_finit_module (kernel/module/main.c:2961) [ 61.177656] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 61.177662] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 61.177668] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 61.177674] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 61.177680] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 61.177683] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 61.177686] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 62.011685] BUG kmalloc-512 (Not tainted): Poison overwritten [ 62.017513] ----------------------------------------------------------------------------- [ 62.017513] [ 62.027300] 0xffff00004ecb7a38-0xffff00004ecb7a47 @offset=31288. First byte 0x6a instead of 0x6b [ 62.036210] Allocated in swnode_register+0x40/0x218 age=808 cpu=3 pid=386 [ 62.043101] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 62.047784] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 62.051406] swnode_register (drivers/base/swnode.c:776) [ 62.055293] fwnode_create_software_node (drivers/base/swnode.c:934 (discriminator 4)) [ 62.060238] gpio_mockup_register_chip+0x1c4/0x2b8 gpio_mockup [ 62.066337] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 62.071551] do_one_initcall (init/main.c:1306) [ 62.075437] do_init_module (kernel/module/main.c:2457) [ 62.079238] load_module (kernel/module/main.c:2859) [ 62.083037] __do_sys_finit_module (kernel/module/main.c:2961) [ 62.087455] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 62.092048] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 62.095848] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 62.100793] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 62.104151] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 62.107244] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 62.111570] Freed in software_node_release+0xdc/0x108 age=632 cpu=0 pid=428 [ 62.118633] __kmem_cache_free (mm/slub.c:3732 mm/slub.c:3788 mm/slub.c:3800) [ 62.122784] kfree (mm/slab_common.c:1020) [ 62.125788] software_node_release (drivers/base/swnode.c:761) [ 62.130204] kobject_put (lib/kobject.c:685 lib/kobject.c:712 include/linux/kref.h:65 lib/kobject.c:729) [ 62.133739] software_node_notify_remove (drivers/base/swnode.c:1093) [ 62.138597] device_del (drivers/base/core.c:2265 drivers/base/core.c:3778) [ 62.142134] platform_device_del.part.0 (drivers/base/platform.c:753) [ 62.146903] platform_device_unregister (drivers/base/platform.c:551 drivers/base/platform.c:794) [ 62.151672] gpio_mockup_exit+0x54/0x280 gpio_mockup [ 62.156888] __arm64_sys_delete_module (kernel/module/main.c:756 kernel/module/main.c:698 kernel/module/main.c:698) [ 62.161745] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 62.165545] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 62.170490] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 62.173850] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 62.176941] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 62.181267] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 62.184975] Slab 0xfffffc00013b2c00 objects=21 used=7 fp=0xffff00004ecb7400 flags=0x7fffc0000010200(slab|head|node=0|zone=1|lastcpupid=0xffff) [ 62.197943] Object 0xffff00004ecb7a00 @offset=31232 fp=0xffff00004ecb7400 [ 62.197943] [ 62.206325] Redzone ffff00004ecb7800:
...
[ 63.089597] CPU: 3 PID: 510 Comm: modprobe Not tainted 6.3.0-rc1-next-20230307 #1 [ 63.097186] Hardware name: Raspberry Pi 4 Model B (DT) [ 63.102392] Call trace: [ 63.104865] dump_backtrace (arch/arm64/kernel/stacktrace.c:160) [ 63.108665] show_stack (arch/arm64/kernel/stacktrace.c:167) [ 63.112021] dump_stack_lvl (lib/dump_stack.c:107) [ 63.115734] dump_stack (lib/dump_stack.c:114) [ 63.119093] print_trailer (mm/slub.c:953) [ 63.122892] check_bytes_and_report (mm/slub.c:1058) [ 63.127395] check_object (mm/slub.c:1196 (discriminator 2)) [ 63.131104] alloc_debug_processing (mm/slub.c:1415 mm/slub.c:1425) [ 63.135606] get_partial_node.part.0 (mm/slub.c:2146 mm/slub.c:2279) [ 63.140286] ___slab_alloc (mm/slub.c:2268 mm/slub.c:2386 mm/slub.c:3188) [ 63.144084] __slab_alloc.constprop.0 (mm/slub.c:3292) [ 63.148674] __kmem_cache_alloc_node (mm/slub.c:3345 mm/slub.c:3442 mm/slub.c:3491) [ 63.153354] kmalloc_trace (mm/slab_common.c:1064 (discriminator 4)) [ 63.156974] device_add (drivers/base/core.c:3436 drivers/base/core.c:3486) [ 63.160508] platform_device_add (drivers/base/platform.c:717) [ 63.164837] platform_device_register_full (drivers/base/platform.c:844) [ 63.169959] gpio_mockup_register_chip+0x1ec/0x2b8 gpio_mockup [ 63.176057] gpio_mockup_init+0xf0/0xd40 gpio_mockup [ 63.181269] do_one_initcall (init/main.c:1306) [ 63.185155] do_init_module (kernel/module/main.c:2457) [ 63.188956] load_module (kernel/module/main.c:2859) [ 63.192755] __do_sys_finit_module (kernel/module/main.c:2961) [ 63.197171] __arm64_sys_finit_module (kernel/module/main.c:2928) [ 63.201765] invoke_syscall (arch/arm64/include/asm/current.h:19 arch/arm64/kernel/syscall.c:57) [ 63.205565] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:149) [ 63.210510] do_el0_svc (arch/arm64/kernel/syscall.c:194) [ 63.213869] el0_svc (arch/arm64/include/asm/daifflags.h:28 arch/arm64/kernel/entry-common.c:133 arch/arm64/kernel/entry-common.c:142 arch/arm64/kernel/entry-common.c:638) [ 63.216961] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:656) [ 63.221287] el0t_64_sync (arch/arm64/kernel/entry.S:591) [ 63.224998] FIX kmalloc-512: Restoring Poison 0xffff00004ecb7a38-0xffff00004ecb7a47=0x6b [ 63.233202] FIX kmalloc-512: Marking all objects used [ 63.399213] =============================================================================
links to the crash: - https://lkft.validation.linaro.org/scheduler/job/6224830#L1291 - https://lkft.validation.linaro.org/scheduler/job/6224742#L1202 - https://lkft.validation.linaro.org/scheduler/job/6224784#L3415 - https://lkft.validation.linaro.org/scheduler/job/6224810#L2029
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 709c6adf19dc558e44ab5c01659b09a16a2d3c82 git_describe: next-20230307 kernel_version: 6.3.0-rc1 kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2MfXESbRAbSUj9oic6d8d... build-url: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next/-/pipelines/798095907 artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2MfXESbRAbSUj9oic6d8d... toolchain: gcc-11
-- Linaro LKFT https://lkft.linaro.org
On Thu, Mar 09, 2023 at 09:33:29PM +0530, Naresh Kamboju wrote:
Following kernel warnings and crash notices on arm64 Rpi4 device while running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
...
[ 61.177432] *** DEADLOCK *** [ 61.177432] [ 61.177434] 3 locks held by modprobe/510: [ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node.part.0 (mm/slub.c:2271) [ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at: vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315) [ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at: console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290 kernel/printk/printk.c:2934)
How is it related to the GPIO? Can you bisect if it's a regression somewhere?
On Thu, 9 Mar 2023 at 23:20, Andy Shevchenko andriy.shevchenko@linux.intel.com wrote:
On Thu, Mar 09, 2023 at 09:33:29PM +0530, Naresh Kamboju wrote:
Following kernel warnings and crash notices on arm64 Rpi4 device while running selftests: gpio on Linux mainline 6.3.0-rc1 kernel and Linux next.
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
...
[ 61.177432] *** DEADLOCK *** [ 61.177432] [ 61.177434] 3 locks held by modprobe/510: [ 61.177436] #0: ffff000040000698 (&n->list_lock){-.-.}-{2:2}, at: get_partial_node.part.0 (mm/slub.c:2271) [ 61.177448] #1: ffff80000b227f18 (console_lock){+.+.}-{0:0}, at: vprintk_emit (kernel/printk/printk.c:1936 kernel/printk/printk.c:2315) [ 61.177460] #2: ffff80000b228388 (console_srcu){....}-{0:0}, at: console_flush_all (include/linux/srcu.h:200 kernel/printk/printk.c:290 kernel/printk/printk.c:2934)
How is it related to the GPIO? Can you bisect if it's a regression somewhere?
The following crash on Linux mainline and next was noticed while running selftest gpio on arm64. And this is a kselftest-merge config build. Which means it has a large set of test configs enabled.
Anders performed bisection on this problem. The bisection have been poing to this commit log, first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7] gpiolib: ensure that fwnode is properly set
Reported-by: Linux Kernel Functional Testing lkft@linaro.org Reported-by: Anders Roxell anders.roxell@linaro.org
Crash log: # 2. Module load error tests # 2.1 gpio overflow [ 88.796755] ============================================================================= [ 88.806704] BUG kmalloc-512 (Not tainted): Poison overwritten [ 88.812482] ----------------------------------------------------------------------------- [ 88.812482] [ 88.822170] 0xe38-0xe47 @offset=28216. First byte 0x6a instead of 0x6b [ 88.830994] Allocated in swnode_register+0x40/0x218 age=828 cpu=2 pid=565 [ 88.837830] __kmem_cache_alloc_node+0x11c/0x320 [ 88.842480] kmalloc_trace+0x54/0xa8 [ 88.846083] swnode_register+0x40/0x218 [ 88.849947] fwnode_create_software_node+0xcc/0x1a0 [ 88.854856] 0xffff8000018c71cc [ 88.858029] 0xffff8000018c73b0 [ 88.861197] do_one_initcall+0x80/0x320 [ 88.865062] do_init_module+0x50/0x210 [ 88.868843] load_module+0x201c/0x2260 [ 88.872622] __do_sys_finit_module+0xb0/0x138 [ 88.877010] __arm64_sys_finit_module+0x2c/0x48 [ 88.881572] invoke_syscall+0x8c/0x120 [ 88.885352] el0_svc_common.constprop.0+0x104/0x130 [ 88.890263] do_el0_svc+0x44/0xb8 [ 88.893606] el0_svc+0x48/0xb8 [ 88.896686] el0t_64_sync_handler+0xbc/0x138 [ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683 [ 88.907899] __kmem_cache_free+0x2a4/0x2e0 [ 88.912024] kfree+0xc0/0x1a0 [ 88.915015] software_node_release+0xdc/0x108 [ 88.919402] kobject_put+0xb0/0x220 [ 88.922919] software_node_notify_remove+0x98/0xe8 [ 88.927741] device_del+0x184/0x380 [ 88.931259] platform_device_del.part.0+0x24/0xa8 [ 88.935995] platform_device_unregister+0x30/0x50 [ 88.940730] 0xffff8000018a1dd4 [ 88.943898] __arm64_sys_delete_module+0x184/0x328 [ 88.948722] invoke_syscall+0x8c/0x120 [ 88.952502] el0_svc_common.constprop.0+0x104/0x130 [ 88.957413] do_el0_svc+0x44/0xb8 [ 88.960757] el0_svc+0x48/0xb8 [ 88.963834] el0t_64_sync_handler+0xbc/0x138 [ 88.968132] el0t_64_sync+0x190/0x198 [ 88.971820] Slab 0xfffffc00209d0800 objects=21 used=7 fp=0xffff000827423200 flags=0xbfffc0000010200(slab|head|node=0|zone=2|lastcpupid=0xffff) [ 88.984653] Object 0xe00 @offset=28160 fp=0xffff000827423200 [ 88.984653] [ 88.992953] Redzone <Trim> [ 89.002470] Redzone <Trim> [ 89.011986] Redzone <Trim> ... [ 89.278435] Redzone <Trim> [ 89.287951] Redzone <Trim> [ 89.297468] Object <Trim> [ 89.306984] Object <Trim> [ 89.316500] Object <Trim> ... [ 89.592465] Object [ 89.601981] Redzone [ 89.610801] Padding [ 89.620317] Padding
Details test and crash log: - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc6/test... - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc6/test... - https://qa-reports.linaro.org/lkft/linux-mainline-master/build/v6.3-rc6/test...
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline git_sha: 09a9639e56c01c7a00d6c0ca63f4c7c41abe075d git_describe: v6.3-rc6 kernel_version: 6.3.0-rc6 kernel-config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2OCXLSEmqxrg13KKFGzyp... build-url: https://gitlab.com/Linaro/lkft/mirrors/torvalds/linux-mainline/-/pipelines/8... artifact-location: https://storage.tuxsuite.com/public/linaro/lkft/builds/2OCXLSEmqxrg13KKFGzyp... toolchain: gcc-11 email-notification: '' build_name: gcc-11-lkftconfig-kselftest
-- Linaro LKFT https://lkft.linaro.org
-- With Best Regards, Andy Shevchenko
On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju naresh.kamboju@linaro.org wrote: (...)
Anders performed bisection on this problem. The bisection have been poing to this commit log, first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7] gpiolib: ensure that fwnode is properly set
I don't think this is the real issue.
(...)
# 2. Module load error tests # 2.1 gpio overflow
(...)
[ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683 [ 88.907899] __kmem_cache_free+0x2a4/0x2e0 [ 88.912024] kfree+0xc0/0x1a0 [ 88.915015] software_node_release+0xdc/0x108 [ 88.919402] kobject_put+0xb0/0x220 [ 88.922919] software_node_notify_remove+0x98/0xe8 [ 88.927741] device_del+0x184/0x380 [ 88.931259] platform_device_del.part.0+0x24/0xa8 [ 88.935995] platform_device_unregister+0x30/0x50
I think the refcount is wrong on the fwnode.
The chip is allocated with devm_gpiochip_add_data() which will not call gpiochip_remove() until all references are removed by calling devm_gpio_chip_release().
Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c and see if the callback is even called. I think this could be the problem: if that isn't cleaned up, there will be dangling references.
diff --git a/drivers/gpio/gpiolib-devres.c b/drivers/gpio/gpiolib-devres.c index fe9ce6b19f15..30a0622210d7 100644 --- a/drivers/gpio/gpiolib-devres.c +++ b/drivers/gpio/gpiolib-devres.c @@ -394,6 +394,7 @@ static void devm_gpio_chip_release(void *data) { struct gpio_chip *gc = data;
+ pr_info("GPIOCHIP %s WAS REMOVED BY DEVRES\n", gc->label); gpiochip_remove(gc); }
If this isn't working we need to figure out what is holding a reference to the gpiochip.
I don't know how the references to the gpiochip fwnode is supposed to drop to zero though? I didn't work with mockup much ...
What I could think of is that maybe the mockup driver need a .shutdown() callback to forcibly call gpiochip_remove(), and in that case it should be wrapped in a non-existining devm_gpiochip_remove() since devres is used to register it.
Bartosz will know better though! I am pretty sure he has this working flawlessly so the tests must be doing something weird which is leaving references around.
Yours, Linus Walleij
On Tue, Apr 11, 2023 at 10:57 AM Linus Walleij linus.walleij@linaro.org wrote:
On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju naresh.kamboju@linaro.org wrote: (...)
Anders performed bisection on this problem. The bisection have been poing to this commit log, first bad commit: [24c94060fc9b4e0f19e6e018869db46db21d6bc7] gpiolib: ensure that fwnode is properly set
I don't think this is the real issue.
(...)
# 2. Module load error tests # 2.1 gpio overflow
(...)
[ 88.900984] Freed in software_node_release+0xdc/0x108 age=34 cpu=1 pid=683 [ 88.907899] __kmem_cache_free+0x2a4/0x2e0 [ 88.912024] kfree+0xc0/0x1a0 [ 88.915015] software_node_release+0xdc/0x108 [ 88.919402] kobject_put+0xb0/0x220 [ 88.922919] software_node_notify_remove+0x98/0xe8 [ 88.927741] device_del+0x184/0x380 [ 88.931259] platform_device_del.part.0+0x24/0xa8 [ 88.935995] platform_device_unregister+0x30/0x50
I think the refcount is wrong on the fwnode.
The chip is allocated with devm_gpiochip_add_data() which will not call gpiochip_remove() until all references are removed by calling devm_gpio_chip_release().
Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c and see if the callback is even called. I think this could be the problem: if that isn't cleaned up, there will be dangling references.
diff --git a/drivers/gpio/gpiolib-devres.c b/drivers/gpio/gpiolib-devres.c index fe9ce6b19f15..30a0622210d7 100644 --- a/drivers/gpio/gpiolib-devres.c +++ b/drivers/gpio/gpiolib-devres.c @@ -394,6 +394,7 @@ static void devm_gpio_chip_release(void *data) { struct gpio_chip *gc = data;
pr_info("GPIOCHIP %s WAS REMOVED BY DEVRES\n", gc->label); gpiochip_remove(gc);
}
If this isn't working we need to figure out what is holding a reference to the gpiochip.
I don't know how the references to the gpiochip fwnode is supposed to drop to zero though? I didn't work with mockup much ...
What I could think of is that maybe the mockup driver need a .shutdown() callback to forcibly call gpiochip_remove(), and in that case it should be wrapped in a non-existining devm_gpiochip_remove() since devres is used to register it.
Bartosz will know better though! I am pretty sure he has this working flawlessly so the tests must be doing something weird which is leaving references around.
Yours, Linus Walleij
Interestingly I'm not seeing this neither with gpio-sim selftests nor with any of the libgpiod tests which suggests it's the gpio-mockup module that's doing something wrong (or very right in which case it uncovers some otherwise hidden bug). Anyway, I'll try to spend some time on it and figure it out, although I'd like to be done with gpio-mockup altogether already.
Bart
On Tue, Apr 11, 2023 at 10:57:28AM +0200, Linus Walleij wrote:
On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
...
Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c and see if the callback is even called. I think this could be the problem: if that isn't cleaned up, there will be dangling references.
Side note: Since we have devres tracepoints, your patch seems an overkill :-) Just enable devres tracepoints and filter out by the function name. I believe that should work.
Hi Linus and Bartosz,
On Tue, 20 Jun 2023 at 22:11, Andy Shevchenko andriy.shevchenko@linux.intel.com wrote:
On Tue, Apr 11, 2023 at 10:57:28AM +0200, Linus Walleij wrote:
On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
...
Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c and see if the callback is even called. I think this could be the problem: if that isn't cleaned up, there will be dangling references.
Side note: Since we have devres tracepoints, your patch seems an overkill :-) Just enable devres tracepoints and filter out by the function name. I believe that should work.
Since I have been tracking open issues on the stable-rc kernel, The reported problem on stable-rc linux.6.3.y has been solved on the stable-rc linux.6.6.y branch.
Thanks for fixing this reported issue.
Upstream links about this fix and discussion,
Commit daecca4b8433 gpiolib: Do not alter GPIO chip fwnode member
[1] https://lore.kernel.org/linux-gpio/20230703142308.5772-4-andriy.shevchenko@l... [2] https://lore.kernel.org/linux-gpio/CAMRc=MfFEBSeJ78NO7XeuzAMJ0KezEPAYWsWnFXX... [3] https://lore.kernel.org/linux-gpio/CA+G9fYv94gx8+-JMzbmQaue3q3y6QdBmsGUCdD-2...
-- With Best Regards, Andy Shevchenko
Tue, Nov 07, 2023 at 07:21:32PM +0530, Naresh Kamboju kirjoitti:
On Tue, 20 Jun 2023 at 22:11, Andy Shevchenko andriy.shevchenko@linux.intel.com wrote:
On Tue, Apr 11, 2023 at 10:57:28AM +0200, Linus Walleij wrote:
On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
...
Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c and see if the callback is even called. I think this could be the problem: if that isn't cleaned up, there will be dangling references.
Side note: Since we have devres tracepoints, your patch seems an overkill :-) Just enable devres tracepoints and filter out by the function name. I believe that should work.
Since I have been tracking open issues on the stable-rc kernel, The reported problem on stable-rc linux.6.3.y has been solved on the stable-rc linux.6.6.y branch.
Thanks for fixing this reported issue.
Upstream links about this fix and discussion,
I'm a bit lost. Is the [3] fixed? Is the fix the below mentioned commit?
Commit daecca4b8433 gpiolib: Do not alter GPIO chip fwnode member
[1] https://lore.kernel.org/linux-gpio/20230703142308.5772-4-andriy.shevchenko@l... [2] https://lore.kernel.org/linux-gpio/CAMRc=MfFEBSeJ78NO7XeuzAMJ0KezEPAYWsWnFXX... [3] https://lore.kernel.org/linux-gpio/CA+G9fYv94gx8+-JMzbmQaue3q3y6QdBmsGUCdD-2...
On Wed, 15 Nov 2023 at 07:12, andy.shevchenko@gmail.com wrote:
Tue, Nov 07, 2023 at 07:21:32PM +0530, Naresh Kamboju kirjoitti:
On Tue, 20 Jun 2023 at 22:11, Andy Shevchenko andriy.shevchenko@linux.intel.com wrote:
On Tue, Apr 11, 2023 at 10:57:28AM +0200, Linus Walleij wrote:
On Mon, Apr 10, 2023 at 11:16 AM Naresh Kamboju naresh.kamboju@linaro.org wrote:
...
Add a pr_info() devm_gpio_chip_release() in drivers/gpio/gpiolib-devres.c and see if the callback is even called. I think this could be the problem: if that isn't cleaned up, there will be dangling references.
Side note: Since we have devres tracepoints, your patch seems an overkill :-) Just enable devres tracepoints and filter out by the function name. I believe that should work.
Since I have been tracking open issues on the stable-rc kernel, The reported problem on stable-rc linux.6.3.y has been solved on the stable-rc linux.6.6.y branch.
Thanks for fixing this reported issue.
Upstream links about this fix and discussion,
I'm a bit lost. Is the [3] fixed? Is the fix the below mentioned commit?
As per my understanding on this, The reported issue has been fixed with the below commit.
Commit daecca4b8433 gpiolib: Do not alter GPIO chip fwnode member
[1] https://lore.kernel.org/linux-gpio/20230703142308.5772-4-andriy.shevchenko@l... [2] https://lore.kernel.org/linux-gpio/CAMRc=MfFEBSeJ78NO7XeuzAMJ0KezEPAYWsWnFXX... [3] https://lore.kernel.org/linux-gpio/CA+G9fYv94gx8+-JMzbmQaue3q3y6QdBmsGUCdD-2...
-- With Best Regards, Andy Shevchenko
- Naresh