While running libhugetlbfs fallocate_stress.sh on stable-rc 5.4 branch kernel on arm64 hikey device. The following kernel Internal error: Oops: crash dump noticed.
fallocate_stress.sh (2M: 64): [ 129.706506] Unable to handle kernel paging request at virtual address ffff00006772f000 [ 129.714638] Mem abort info: [ 129.717553] ESR = 0x96000047 [ 129.720726] EC = 0x25: DABT (current EL), IL = 32 bits [ 129.726188] SET = 0, FnV = 0 [ 129.729338] EA = 0, S1PTW = 0 [ 129.732573] Data abort info: [ 129.735546] ISV = 0, ISS = 0x00000047 [ 129.739493] CM = 0, WnR = 1 [ 129.742534] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000013ad000 [ 129.749409] [ffff00006772f000] pgd=0000000077ff7003, pud=0000000077e0d003, pmd=0000000077cd1003, pte=006800006772f713 [ 129.760294] Internal error: Oops: 96000047 [#1] PREEMPT SMP [ 129.765988] Modules linked in: wl18xx wlcore mac80211 cfg80211 hci_uart snd_soc_audio_graph_card adv7511 crct10dif_ce wlcore_sdio btbcm snd_soc_simple_card_utils cec kirin_drm bluetooth drm_kms_helper dw_drm_dsi rfkill drm fuse [ 129.786626] CPU: 1 PID: 1263 Comm: fallocate_stres Not tainted 5.4.41-rc1-00091-g132220af41e6 #1 [ 129.795601] Hardware name: HiKey Development Board (DT) [ 129.800940] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 129.805847] pc : clear_page+0x10/0x24 [ 129.809594] lr : __cpu_clear_user_page+0xc/0x18 [ 129.814225] sp : ffff800012a1bbe0 [ 129.817609] x29: ffff800012a1bbe0 x28: fffffe00017d8000 [ 129.823039] x27: ffff000073070268 x26: ffff800011adf000 [ 129.828466] x25: ffff800011ae06c8 x24: 0000000000001000 [ 129.833893] x23: 0000000000000000 x22: fffffe00017d8000 [ 129.839320] x21: 0000000000000000 x20: 0000000006a00000 [ 129.844747] x19: ffff000037945400 x18: 0000000000000000 [ 129.850174] x17: 0000000000000000 x16: 0000000000000000 [ 129.855602] x15: 0000000000000000 x14: 0000000000000000 [ 129.861031] x13: 0000000000000000 x12: 0000000000000000 [ 129.866458] x11: 0000000000000000 x10: ffff800012a1bbd0 [ 129.871886] x9 : 0000000000000200 x8 : 0ffff00000010000 [ 129.877314] x7 : 0000000000000000 x6 : 0000000000000080 [ 129.882741] x5 : 0000000000000036 x4 : 0000020000200000 [ 129.888170] x3 : 0000000000004bc0 x2 : 0000000000000004 [ 129.893597] x1 : 0000000000000040 x0 : ffff00006772f000 [ 129.899025] Call trace: [ 129.901530] clear_page+0x10/0x24 [ 129.904926] clear_subpage+0x54/0x90 [ 129.908580] clear_huge_page+0x6c/0x208 [ 129.912503] hugetlbfs_fallocate+0x2e0/0x4a0 [ 129.916869] vfs_fallocate+0x1b8/0x2e0 [ 129.920699] ksys_fallocate+0x44/0x90 [ 129.924446] __arm64_sys_fallocate+0x1c/0x28 [ 129.928811] el0_svc_common.constprop.0+0x68/0x160 [ 129.933708] el0_svc_handler+0x20/0x80 [ 129.937539] el0_svc+0x8/0xc [ 129.940488] Code: d53b00e1 12000c21 d2800082 9ac12041 (d50b7420) [ 129.946719] ---[ end trace df98e92a449be749 ]--- [ 129.959274] note: fallocate_stres[1263] exited with preempt_count 1
ref: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.4-oe/build/v5.4.40-91-g... https://qa-reports.linaro.org/lkft/linux-stable-rc-5.4-oe/build/v5.4.40-91-g...
kernel config: https://builds.tuxbuild.com/SqvcoklXmvQsC70j6rfcgA/kernel.config
On Wed 13-05-20 23:11:40, Naresh Kamboju wrote:
While running libhugetlbfs fallocate_stress.sh on stable-rc 5.4 branch kernel on arm64 hikey device. The following kernel Internal error: Oops: crash dump noticed.
Is the same problem reproducible on vanilla 5.4 without any stable patches?
fallocate_stress.sh (2M: 64): [ 129.706506] Unable to handle kernel paging request at virtual address ffff00006772f000 [ 129.714638] Mem abort info: [ 129.717553] ESR = 0x96000047 [ 129.720726] EC = 0x25: DABT (current EL), IL = 32 bits [ 129.726188] SET = 0, FnV = 0 [ 129.729338] EA = 0, S1PTW = 0 [ 129.732573] Data abort info: [ 129.735546] ISV = 0, ISS = 0x00000047 [ 129.739493] CM = 0, WnR = 1 [ 129.742534] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000013ad000 [ 129.749409] [ffff00006772f000] pgd=0000000077ff7003, pud=0000000077e0d003, pmd=0000000077cd1003, pte=006800006772f713 [ 129.760294] Internal error: Oops: 96000047 [#1] PREEMPT SMP [ 129.765988] Modules linked in: wl18xx wlcore mac80211 cfg80211 hci_uart snd_soc_audio_graph_card adv7511 crct10dif_ce wlcore_sdio btbcm snd_soc_simple_card_utils cec kirin_drm bluetooth drm_kms_helper dw_drm_dsi rfkill drm fuse [ 129.786626] CPU: 1 PID: 1263 Comm: fallocate_stres Not tainted 5.4.41-rc1-00091-g132220af41e6 #1 [ 129.795601] Hardware name: HiKey Development Board (DT) [ 129.800940] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 129.805847] pc : clear_page+0x10/0x24 [ 129.809594] lr : __cpu_clear_user_page+0xc/0x18 [ 129.814225] sp : ffff800012a1bbe0 [ 129.817609] x29: ffff800012a1bbe0 x28: fffffe00017d8000 [ 129.823039] x27: ffff000073070268 x26: ffff800011adf000 [ 129.828466] x25: ffff800011ae06c8 x24: 0000000000001000 [ 129.833893] x23: 0000000000000000 x22: fffffe00017d8000 [ 129.839320] x21: 0000000000000000 x20: 0000000006a00000 [ 129.844747] x19: ffff000037945400 x18: 0000000000000000 [ 129.850174] x17: 0000000000000000 x16: 0000000000000000 [ 129.855602] x15: 0000000000000000 x14: 0000000000000000 [ 129.861031] x13: 0000000000000000 x12: 0000000000000000 [ 129.866458] x11: 0000000000000000 x10: ffff800012a1bbd0 [ 129.871886] x9 : 0000000000000200 x8 : 0ffff00000010000 [ 129.877314] x7 : 0000000000000000 x6 : 0000000000000080 [ 129.882741] x5 : 0000000000000036 x4 : 0000020000200000 [ 129.888170] x3 : 0000000000004bc0 x2 : 0000000000000004 [ 129.893597] x1 : 0000000000000040 x0 : ffff00006772f000 [ 129.899025] Call trace: [ 129.901530] clear_page+0x10/0x24 [ 129.904926] clear_subpage+0x54/0x90 [ 129.908580] clear_huge_page+0x6c/0x208 [ 129.912503] hugetlbfs_fallocate+0x2e0/0x4a0 [ 129.916869] vfs_fallocate+0x1b8/0x2e0 [ 129.920699] ksys_fallocate+0x44/0x90 [ 129.924446] __arm64_sys_fallocate+0x1c/0x28 [ 129.928811] el0_svc_common.constprop.0+0x68/0x160 [ 129.933708] el0_svc_handler+0x20/0x80 [ 129.937539] el0_svc+0x8/0xc [ 129.940488] Code: d53b00e1 12000c21 d2800082 9ac12041 (d50b7420) [ 129.946719] ---[ end trace df98e92a449be749 ]--- [ 129.959274] note: fallocate_stres[1263] exited with preempt_count 1
ref: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.4-oe/build/v5.4.40-91-g... https://qa-reports.linaro.org/lkft/linux-stable-rc-5.4-oe/build/v5.4.40-91-g...
kernel config: https://builds.tuxbuild.com/SqvcoklXmvQsC70j6rfcgA/kernel.config
-- Linaro LKFT https://lkft.linaro.org
On 5/13/20 11:40 PM, Michal Hocko wrote:
On Wed 13-05-20 23:11:40, Naresh Kamboju wrote:
While running libhugetlbfs fallocate_stress.sh on stable-rc 5.4 branch kernel on arm64 hikey device. The following kernel Internal error: Oops: crash dump noticed.
Is the same problem reproducible on vanilla 5.4 without any stable patches?
Or, an earlier version of 5.4-stable? Nothing in the changelog for 5.4.41 looks related to this issue. There was an arm specific hugetlb change "arm64: hugetlb: avoid potential NULL dereference", but that is pretty straight forward.
I'm guessing this may not reproduce easily. To help reproduce, you could change the #define FALLOCATE_ITERATIONS 100000 in .../libhugetlbfs/tests/fallocate_stress.c to a larger number to force the stress test to run longer.
On Thu, 14 May 2020 at 22:01, Mike Kravetz mike.kravetz@oracle.com wrote:
On 5/13/20 11:40 PM, Michal Hocko wrote:
On Wed 13-05-20 23:11:40, Naresh Kamboju wrote:
While running libhugetlbfs fallocate_stress.sh on stable-rc 5.4 branch kernel on arm64 hikey device. The following kernel Internal error: Oops: crash dump noticed.
Is the same problem reproducible on vanilla 5.4 without any stable patches?
Or, an earlier version of 5.4-stable? Nothing in the changelog for 5.4.41 looks related to this issue. There was an arm specific hugetlb change "arm64: hugetlb: avoid potential NULL dereference", but that is pretty straight forward.
I'm guessing this may not reproduce easily. To help reproduce, you could change the #define FALLOCATE_ITERATIONS 100000 in .../libhugetlbfs/tests/fallocate_stress.c to a larger number to force the stress test to run longer.
Sorry i did not get a chance to run as per your suggestion. But this issue is reproducible on stable-rc 5.4.46-rc1 on arm64 hikey device
./runltp -p -q -f hugetlb <> ksm05.c:78: PASS: still alive. ksm05.c:78: PASS: still alive. ksm05.c:78: PASS: still alive. ksm05.c:78: PASS: still alive. [ 383.751513] oom01 invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 384.715831] EC = 0x25: DABT (current EL), IL = 32 bits [ 384.725478] CPU: 0 PID: 10948 Comm: oom01 Not tainted 5.4.46-rc1-00035-g12a5ce113626 #1 [ 384.730887] SET = 0, FnV = 0 [ 384.739060] Hardware name: HiKey Development Board (DT) [ 384.739066] Call trace: [ 384.739081] dump_backtrace+0x0/0x140 [ 384.739090] show_stack+0x14/0x20 [ 384.742209] EA = 0, S1PTW = 0 [ 384.746701] dwmmc_k3 f723d000.dwmmc0: Unexpected interrupt latency [ 384.747550] dump_stack+0xb4/0xf8 [ 384.747559] dump_header+0x44/0x1ec [ 384.747565] oom_kill_process+0x1d4/0x1d8 [ 384.747572] out_of_memory+0x170/0x4e0 [ 384.750070] Data abort info: [ 384.753813] __alloc_pages_slowpath+0x954/0x9f8 [ 384.753819] __alloc_pages_nodemask+0x21c/0x280 [ 384.753826] alloc_pages_vma+0x88/0x210 [ 384.753836] __handle_mm_fault+0x638/0x1080 [ 384.757236] ISV = 0, ISS = 0x00000047 [ 384.760428] handle_mm_fault+0xdc/0x1a8 [ 384.760436] do_page_fault+0x130/0x460 [ 384.760442] do_translation_fault+0x5c/0x78 [ 384.760450] do_mem_abort+0x3c/0x98 [ 384.766776] CM = 0, WnR = 1 [ 384.770154] el0_da+0x1c/0x20 [ 384.773735] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000013c5000 [ 384.777949] Mem-Info: [ 384.781679] [ffff0000641ff000] pgd=0000000077ff7003, pud=0000000077e0d003, pmd=0000000077cec003, pte=0000000000000000 [ 384.781694] Internal error: Oops: 96000047 [#1] PREEMPT SMP [ 384.781698] Modules linked in: wl18xx wlcore mac80211 cfg80211 hci_uart snd_soc_audio_graph_card btbcm snd_soc_simple_card_utils crct10dif_ce wlcore_sdio adv7511 bluetooth kirin_drm cec dw_drm_dsi rfkill drm_kms_helper drm fuse [ 384.784854] active_anon:472313 inactive_anon:2168 isolated_anon:0 [ 384.784854] active_file:63 inactive_file:0 isolated_file:0 [ 384.784854] unevictable:0 dirty:0 writeback:0 unstable:0 [ 384.784854] slab_reclaimable:2625 slab_unreclaimable:7426 [ 384.784854] mapped:202 shmem:2175 pagetables:1188 bounce:0 [ 384.784854] free:5469 free_pcp:1684 free_cma:14 [ 384.789304] CPU: 5 PID: 10945 Comm: oom01 Not tainted 5.4.46-rc1-00035-g12a5ce113626 #1 [ 384.789309] Hardware name: HiKey Development Board (DT) [ 384.789315] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 384.789328] pc : clear_page+0x10/0x24 [ 384.789339] lr : __cpu_clear_user_page+0xc/0x18 [ 384.794000] Node 0 active_anon:1889252kB inactive_anon:8672kB active_file:412kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:800kB dirty:0kB writeback:0kB shmem:8700kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 1478656kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes [ 384.797884] sp : ffff8000190ebc10 [ 384.797888] x29: ffff8000190ebc10 x28: ffff000066cf2a00 [ 384.797895] x27: 0000000000000002 x26: fffffe00019f2400 [ 384.797901] x25: ffff00006e51df00 x24: 0000000000001000 [ 384.802205] Node 0 DMA32 free:21876kB min:22528kB low:28160kB high:33792kB active_anon:1889252kB inactive_anon:8672kB active_file:0kB inactive_file:484kB unevictable:0kB writepending:0kB present:2061364kB managed:1995396kB mlocked:0kB kernel_stack:2800kB pagetables:4752kB bounce:0kB free_pcp:6864kB local_pcp:1320kB free_cma:56kB [ 384.806099] x23: 0000000000000000 x22: fffffe0001700000 [ 384.806106] x21: 0000000000000000 x20: 0000fffef3800000 [ 384.806112] x19: ffff000066cf2a00 x18: 0000000000000000 [ 384.806117] x17: 0000000000000000 x16: 0000000000000000 [ 384.810066] lowmem_reserve[]: 0 0 0 [ 384.813873] x15: 0000000000000000 x14: 0000000000000000 [ 384.813879] x13: 0000000000000000 x12: 0000000000000000 [ 384.813885] x11: 0000000000000000 x10: 0000000000000000 [ 384.813891] x9 : ffff800066671000 x8 : 0000000000000200 [ 384.818192] Node 0 DMA32: 899*4kB (UME) 471*8kB (UMEC) 205*16kB (UEC) 153*32kB (UMEC) 61*64kB (UME) 15*128kB (UE) 2*256kB (ME) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 21876kB [ 384.821730] x7 : ffff800066671000 x6 : 0000000000000000 [ 384.821737] x5 : 0000000000000000 x4 : 0000020000200000 [ 384.821743] x3 : 0000000000007fc0 x2 : 0000000000000004 [ 384.821748] x1 : 0000000000000040 x0 : ffff0000641ff000 [ 384.821754] Call trace: [ 384.821762] clear_page+0x10/0x24 [ 384.821771] clear_subpage+0x54/0x90 [ 384.821780] clear_huge_page+0x6c/0x208 [ 384.824842] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 384.827843] do_huge_pmd_anonymous_page+0x1a4/0x7a0 [ 384.827851] __handle_mm_fault+0x83c/0x1080 [ 384.827857] handle_mm_fault+0xdc/0x1a8 [ 384.827863] do_page_fault+0x130/0x460 [ 384.827872] do_translation_fault+0x5c/0x78 [ 384.834786] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=32768kB [ 384.837038] do_mem_abort+0x3c/0x98 [ 384.837044] el0_da+0x1c/0x20 [ 384.837056] Code: d53b00e1 12000c21 d2800082 9ac12041 (d50b7420) [ 384.847921] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 384.853582] ---[ end trace 298eea3ec03b10c2 ]--- [ 384.853619] note: oom01[10945] exited with preempt_count 1 [ 384.874237] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=64kB [ 385.070710] dwmmc_k3 f723d000.dwmmc0: Unexpected interrupt latency [ 385.551002] 2384 total pagecache pages [ 385.563572] 0 pages in swap cache [ 385.575536] Swap cache stats: add 0, delete 0, find 0/0 [ 385.589403] Free swap = 0kB [ 385.600885] Total swap = 0kB [ 385.612403] 515341 pages RAM [ 385.623860] 0 pages HighMem/MovableOnly [ 385.636339] 16492 pages reserved [ 385.648140] 32768 pages cma reserved [ 385.660271] 0 pages hwpoisoned [ 385.671865] Tasks state (memory values in pages): [ 385.685192] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [ 385.702711] [ 377] 0 377 3387 315 61440 0 0 systemd-journal [ 385.720672] [ 413] 0 413 3520 310 49152 0 -1000 systemd-udevd [ 385.738467] [ 435] 993 435 1526 79 53248 0 0 systemd-network [ 385.756400] [ 459] 992 459 1665 99 49152 0 0 systemd-resolve [ 385.774283] [ 463] 0 463 553 21 40960 0 0 tee-supplicant [ 385.792002] [ 464] 0 464 1479 111 45056 0 0 systemd-logind [ 385.809585] [ 472] 995 472 1197 105 45056 0 0 avahi-daemon [ 385.826656] [ 473] 995 473 1166 66 45056 0 0 avahi-daemon [ 385.843371] [ 474] 0 474 771 19 40960 0 0 syslogd [ 385.859600] [ 475] 0 475 771 18 45056 0 0 klogd [ 385.875602] [ 476] 0 476 1382 62 49152 0 0 bluetoothd [ 385.892028] [ 479] 996 479 1151 187 45056 0 -900 dbus-daemon [ 385.908563] [ 481] 0 481 78394 563 106496 0 0 NetworkManager [ 385.925332] [ 482] 0 482 698 133 40960 0 0 crond [ 385.941236] [ 527] 65534 527 629 44 40960 0 0 dnsmasq [ 385.957226] [ 529] 0 529 578 32 40960 0 0 agetty [ 385.973067] [ 530] 0 530 1173 107 49152 0 0 login [ 385.988832] [ 531] 0 531 578 32 40960 0 0 agetty [ 386.004670] [ 536] 0 536 2385 148 49152 0 0 wpa_supplicant [ 386.020865] [ 537] 998 537 115916 1319 131072 0 0 polkitd [ 386.036060] [ 563] 0 563 24661 430 69632 0 0 dhclient [ 386.051325] [ 602] 0 602 1899 214 57344 0 0 systemd [ 386.066521] [ 603] 0 603 2569 477 61440 0 0 (sd-pam) [ 386.081811] [ 607] 0 607 910 102 40960 0 0 sh [ 386.096558] [ 611] 0 611 1039 81 45056 0 0 su [ 386.110911] [ 612] 0 612 910 97 40960 0 0 sh [ 386.124866] [ 615] 0 615 756 55 40960 0 0 lava-test-runne [ 386.139888] [ 1327] 0 1327 756 50 40960 0 0 lava-test-shell [ 386.154903] [ 1328] 0 1328 756 52 36864 0 0 sh [ 386.168797] [ 1330] 0 1330 822 133 40960 0 0 ltp.sh [ 386.183055] [ 1348] 0 1348 822 133 40960 0 0 ltp.sh [ 386.197278] [ 1349] 0 1349 822 133 40960 0 0 ltp.sh [ 386.211413] [ 1350] 0 1350 822 133 40960 0 0 ltp.sh [ 386.225573] [ 1351] 0 1351 921 230 45056 0 0 runltp [ 386.239776] [ 1352] 0 1352 452 15 40960 0 0 tee [ 386.253702] [ 1426] 0 1426 451 28 40960 0 0 ltp-pan [ 386.267740] [ 10933] 0 10933 494 18 32768 0 0 oom01 [ 386.281352] [ 10934] 0 10934 527 31 36864 0 0 oom01 [ 386.294984] [ 10944] 0 10944 5519894 467709 3833856 0 0 oom01 [ 386.308646] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=oom01,pid=10944,uid=0 [ 386.326954] Out of memory: Killed process 10944 (oom01) total-vm:22079576kB, anon-rss:1870836kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:3744kB oom_score_adj:0 ksm05.c:78: PASS: still alive. ksm05.c:78: PASS: still alive.
-- Mike Kravetz