The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with - CONFIG_ARM64_64K_PAGES=y - CONFIG_ARM64_16K_PAGES=y and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Test log: --------- tst_test.c:1799: TINFO: === Testing on btrfs === tst_test.c:1158: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts='' <6>[ 71.880167] BTRFS: device fsid d492b571-012c-40a9-b8e1-efc97408d3bc devid 1 transid 6 /dev/loop0 (7:0) scanned by chdir01 (476) tst_test.c:1170: TINFO: Mounting /dev/loop0 to /tmp/LTP_chdJeywxF/mntpoint fstyp=btrfs flags=0 <6>[ 71.960245] BTRFS info (device loop0): first mount of filesystem d492b571-012c-40a9-b8e1-efc97408d3bc <6>[ 71.970667] BTRFS info (device loop0): using crc32c (crc32c-arm64) checksum algorithm <2>[ 71.993486] BTRFS critical (device loop0): corrupt superblock syschunk array: chunk_start=22020096, invalid chunk sectorsize, have 65536 expect 4096 <3>[ 71.995802] BTRFS error (device loop0): superblock contains fatal errors <3>[ 72.014538] BTRFS error (device loop0): open_ctree failed: -22 tst_test.c:1170: TBROK: mount(/dev/loop0, mntpoint, btrfs, 0, (nil)) failed: EINVAL (22)
Summary: passed 48 failed 0 broken 1 skipped 0 warnings 0
Duration: 7.002s
===== symlink01 ===== command: symlink01 <12>[ 72.494428] /usr/local/bin/kirk[253]: starting test symlink01 (symlink01) symlink01 0 TINFO : Using /tmp/LTP_symmsYXet as tmpdir (tmpfs filesystem) symlink01 1 TPASS : Creation of symbolic link file to no object file is ok symlink01 2 TPASS : Creation of symbolic link file to no object file is ok symlink01 3 TPASS : Creation of symbolic link file and object file via symbolic link is ok symlink01 4 TPASS : Creating an existing symbolic link file error is caught symlink01 5 TPASS : Creating a symbolic link which exceeds maximum pathname error is caught
Summary: passed 5 failed 0 broken 0 skipped 0 warnings 0
Duration: 0.052s
===== stat04 ===== command: stat04 <12>[ 72.966706] /usr/local/bin/kirk[253]: starting test stat04 (stat04) tst_buffers.c:57: TINFO: Test is using guarded buffers tst_tmpdir.c:316: TINFO: Using /tmp/LTP_staEABwgV as tmpdir (tmpfs filesystem) <6>[ 73.447708] loop0: detected capacity change from 0 to 614400 tst_device.c:96: TINFO: Found free device 0 '/dev/loop0' tst_test.c:1860: TINFO: LTP version: 20240930 tst_test.c:1864: TINFO: Tested kernel: 6.13.0-rc3-next-20241218 #1 SMP PREEMPT @1734498806 aarch64 tst_test.c:1703: TINFO: Timeout per run is 0h 05m 24s stat04.c:60: TINFO: Formatting /dev/loop0 with ext2 opts='-b 4096' extra opts='' mke2fs 1.47.1 (20-May-2024) <3>[ 73.859753] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10000800 phys_seg 0 prio class 0 stat04.c:61: TINFO: Mounting /dev/loop0 to /tmp/LTP_staEABwgV/mntpoint fstyp=ext2 flags=0 <6>[ 73.939263] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem <1>[ 73.946378] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 73.947878] Mem abort info: <1>[ 73.949153] ESR = 0x0000000096000005 <1>[ 73.959105] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 73.960031] SET = 0, FnV = 0 <1>[ 73.960349] EA = 0, S1PTW = 0 <1>[ 73.960638] FSC = 0x05: level 1 translation fault <1>[ 73.961005] Data abort info: <1>[ 73.961293] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 73.963739] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 73.964980] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 73.967132] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 73.968923] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP <4>[ 73.970516] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 73.974237] CPU: 1 UID: 0 PID: 529 Comm: stat04 Not tainted 6.13.0-rc3-next-20241218 #1 <4>[ 73.975359] Hardware name: linux,dummy-virt (DT) <4>[ 73.977170] pstate: 62402009 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) <4>[ 73.978295] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 73.980200] lr : alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 73.981466] sp : ffff80008258f950 <4>[ 73.982228] x29: ffff80008258f970 x28: ffffa93389398000 x27: 0000000000000001 <4>[ 73.983875] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 73.985649] x23: 0000000000000cc0 x22: ffffa93387f51d0c x21: 00000000ffffffff <4>[ 73.986188] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 73.988686] x17: fff056cd748b0000 x16: ffff800080020000 x15: 0000000000000000 <4>[ 73.990276] x14: 0000000000002a66 x13: 0000000000004000 x12: 0000000000000001 <4>[ 73.992401] x11: 0000000000000002 x10: 0000000000004001 x9 : ffffa93387f51d0c <4>[ 73.993108] x8 : fff00000c2c99240 x7 : 0000000000000001 x6 : 0000000000000001 <4>[ 73.993886] x5 : fff00000c4879800 x4 : 0000000000000000 x3 : 000000000033a401 <4>[ 73.995550] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 73.997017] Call trace: <4>[ 73.998266] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 73.999716] alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 74.000942] alloc_workqueue_attrs (kernel/workqueue.c:4624 (discriminator 1)) <4>[ 74.001327] apply_wqattrs_prepare (kernel/workqueue.c:5263) <4>[ 74.003095] apply_workqueue_attrs_locked (kernel/workqueue.c:5351) <4>[ 74.003855] alloc_workqueue (kernel/workqueue.c:5722 (discriminator 1) kernel/workqueue.c:5772 (discriminator 1)) <4>[ 74.005398] ext4_fill_super (fs/ext4/super.c:5484 fs/ext4/super.c:5722) <4>[ 74.006132] get_tree_bdev_flags (fs/super.c:1636) <4>[ 74.007624] get_tree_bdev (fs/super.c:1660) <4>[ 74.008664] ext4_get_tree (fs/ext4/super.c:5755) <4>[ 74.009423] vfs_get_tree (fs/super.c:1814) <4>[ 74.009703] path_mount (fs/namespace.c:3556 fs/namespace.c:3883) <4>[ 74.010608] __arm64_sys_mount (fs/namespace.c:3896 fs/namespace.c:4107 fs/namespace.c:4084 fs/namespace.c:4084) <4>[ 74.011527] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 74.012798] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 74.014042] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1) arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 74.014942] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:763) <4>[ 74.015917] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 74.017042] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a) All code ======== 0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction =========================================== 0: f8626b1a ldr x26, [x24, x2] <4>[ 74.019014] ---[ end trace 0000000000000000 ]--- tst_test.c:1763: TBROK: Test killed by SIGSEGV!
Summary: passed 0 failed 0 broken 1 skipped 0 warnings 0 tst_device.c:269: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO for too long Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m Tainted kernel: ['kernel died recently, i.e. there was an OOPS or BUG'][0m Restarting SUT: host
===== df01_sh ===== command: df01.sh <12>[ 76.370093] /usr/local/bin/kirk[253]: starting test df01_sh (df01.sh) Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m <1>[ 76.603065] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 76.603922] Mem abort info: <1>[ 76.604197] ESR = 0x0000000096000005 <1>[ 76.604638] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 76.605128] SET = 0, FnV = 0 <1>[ 76.606996] EA = 0, S1PTW = 0 <1>[ 76.607274] FSC = 0x05: level 1 translation fault <1>[ 76.607611] Data abort info: <1>[ 76.607897] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 76.609765] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 76.610958] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 76.611652] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 76.612130] Internal error: Oops: 0000000096000005 [#2] PREEMPT SMP <4>[ 76.613305] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 76.617688] CPU: 1 UID: 0 PID: 553 Comm: df01.sh Tainted: G D 6.13.0-rc3-next-20241218 #1 <4>[ 76.620869] Tainted: [D]=DIE <4>[ 76.621184] Hardware name: linux,dummy-virt (DT) <4>[ 76.622671] pstate: 63402009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[ 76.623693] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 76.624180] lr : __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.625290] sp : ffff80008258fa90 <4>[ 76.626275] x29: ffff80008258fab0 x28: fff00000c2c98e80 x27: fff00000c48fd100 <4>[ 76.626966] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 76.627599] x23: 0000000000000dc0 x22: ffffa93386d87390 x21: 00000000ffffffff <4>[ 76.628603] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 76.629618] x17: 0000000000000000 x16: ffff800082180000 x15: ffff800080000000 <4>[ 76.630999] x14: fff00000c00203f0 x13: 00000ffff8000821 x12: 0000000000000000 <4>[ 76.632089] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa93386d87390 <4>[ 76.634293] x8 : ffff80008258f908 x7 : fff00000c2c98e80 x6 : 0000000000010000 <4>[ 76.634816] x5 : ffffa93389379000 x4 : 0000000000000000 x3 : 000000000033b801 <4>[ 76.636355] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 76.638309] Call trace: <4>[ 76.639031] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 76.640890] __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.641267] copy_process (kernel/fork.c:314 (discriminator 1) kernel/fork.c:1061 (discriminator 1) kernel/fork.c:2176 (discriminator 1)) <4>[ 76.641795] kernel_clone (kernel/fork.c:2758) <4>[ 76.643003] __do_sys_clone (kernel/fork.c:2902) <4>[ 76.644078] __arm64_sys_clone (kernel/fork.c:2869) <4>[ 76.645306] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 76.646337] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 76.646974] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator 1) arch/arm64/include/asm/irqflags.h:136 (discriminator 1) arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 76.647709] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:763) <4>[ 76.649032] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 76.649724] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
<trim>
All code ======== 0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction =========================================== 0: f8626b1a ldr x26, [x24, x2] <4>[ 79.647693] ---[ end trace 0000000000000000 ]--- <0>[ 79.649260] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b <2>[ 79.650229] SMP: stopping secondary CPUs <0>[ 79.651558] Kernel Offset: 0x293306a00000 from 0xffff800080000000 <0>[ 79.652015] PHYS_OFFSET: 0x40000000 <0>[ 79.652461] CPU features: 0x000,000000d0,60bef2d8,cb7e7f3f <0>[ 79.653039] Memory Limit: none <0>[ 79.653854] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Links: ------- - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241218/tes... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241212/tes... - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2qNMDhPFtR8... - https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... - https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241211/tes...
metadata: ---- git describe: next-20241210..next-20241218 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... build url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... toolchain: gcc-13 config: CONFIG_ARM64_64K_PAGES=y, CONFIG_ARM64_16K_PAGES=y arch: arm64 qemu: qemu-arm64 version 9.1.2
-- Linaro LKFT https://lkft.linaro.org
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with
- CONFIG_ARM64_64K_PAGES=y
- CONFIG_ARM64_16K_PAGES=y
and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Anders bisected this reported regression and found, # first bad commit: [9c1d66793b6faa00106ae4c866359578bfc012d2] btrfs: validate system chunk array at btrfs_validate_super()
Test log:
tst_test.c:1799: TINFO: === Testing on btrfs === tst_test.c:1158: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts='' <6>[ 71.880167] BTRFS: device fsid d492b571-012c-40a9-b8e1-efc97408d3bc devid 1 transid 6 /dev/loop0 (7:0) scanned by chdir01 (476) tst_test.c:1170: TINFO: Mounting /dev/loop0 to /tmp/LTP_chdJeywxF/mntpoint fstyp=btrfs flags=0 <6>[ 71.960245] BTRFS info (device loop0): first mount of filesystem d492b571-012c-40a9-b8e1-efc97408d3bc <6>[ 71.970667] BTRFS info (device loop0): using crc32c (crc32c-arm64) checksum algorithm <2>[ 71.993486] BTRFS critical (device loop0): corrupt superblock syschunk array: chunk_start=22020096, invalid chunk sectorsize, have 65536 expect 4096 <3>[ 71.995802] BTRFS error (device loop0): superblock contains fatal errors <3>[ 72.014538] BTRFS error (device loop0): open_ctree failed: -22 tst_test.c:1170: TBROK: mount(/dev/loop0, mntpoint, btrfs, 0, (nil)) failed: EINVAL (22)
Summary: passed 48 failed 0 broken 1 skipped 0 warnings 0
Duration: 7.002s
===== symlink01 ===== command: symlink01 <12>[ 72.494428] /usr/local/bin/kirk[253]: starting test symlink01 (symlink01) symlink01 0 TINFO : Using /tmp/LTP_symmsYXet as tmpdir (tmpfs filesystem) symlink01 1 TPASS : Creation of symbolic link file to no object file is ok symlink01 2 TPASS : Creation of symbolic link file to no object file is ok symlink01 3 TPASS : Creation of symbolic link file and object file via symbolic link is ok symlink01 4 TPASS : Creating an existing symbolic link file error is caught symlink01 5 TPASS : Creating a symbolic link which exceeds maximum pathname error is caught
Summary: passed 5 failed 0 broken 0 skipped 0 warnings 0
Duration: 0.052s
===== stat04 ===== command: stat04 <12>[ 72.966706] /usr/local/bin/kirk[253]: starting test stat04 (stat04) tst_buffers.c:57: TINFO: Test is using guarded buffers tst_tmpdir.c:316: TINFO: Using /tmp/LTP_staEABwgV as tmpdir (tmpfs filesystem) <6>[ 73.447708] loop0: detected capacity change from 0 to 614400 tst_device.c:96: TINFO: Found free device 0 '/dev/loop0' tst_test.c:1860: TINFO: LTP version: 20240930 tst_test.c:1864: TINFO: Tested kernel: 6.13.0-rc3-next-20241218 #1 SMP PREEMPT @1734498806 aarch64 tst_test.c:1703: TINFO: Timeout per run is 0h 05m 24s stat04.c:60: TINFO: Formatting /dev/loop0 with ext2 opts='-b 4096' extra opts='' mke2fs 1.47.1 (20-May-2024) <3>[ 73.859753] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10000800 phys_seg 0 prio class 0 stat04.c:61: TINFO: Mounting /dev/loop0 to /tmp/LTP_staEABwgV/mntpoint fstyp=ext2 flags=0 <6>[ 73.939263] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem <1>[ 73.946378] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 73.947878] Mem abort info: <1>[ 73.949153] ESR = 0x0000000096000005 <1>[ 73.959105] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 73.960031] SET = 0, FnV = 0 <1>[ 73.960349] EA = 0, S1PTW = 0 <1>[ 73.960638] FSC = 0x05: level 1 translation fault <1>[ 73.961005] Data abort info: <1>[ 73.961293] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 73.963739] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 73.964980] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 73.967132] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 73.968923] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP <4>[ 73.970516] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 73.974237] CPU: 1 UID: 0 PID: 529 Comm: stat04 Not tainted 6.13.0-rc3-next-20241218 #1 <4>[ 73.975359] Hardware name: linux,dummy-virt (DT) <4>[ 73.977170] pstate: 62402009 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) <4>[ 73.978295] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 73.980200] lr : alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 73.981466] sp : ffff80008258f950 <4>[ 73.982228] x29: ffff80008258f970 x28: ffffa93389398000 x27: 0000000000000001 <4>[ 73.983875] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 73.985649] x23: 0000000000000cc0 x22: ffffa93387f51d0c x21: 00000000ffffffff <4>[ 73.986188] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 73.988686] x17: fff056cd748b0000 x16: ffff800080020000 x15: 0000000000000000 <4>[ 73.990276] x14: 0000000000002a66 x13: 0000000000004000 x12: 0000000000000001 <4>[ 73.992401] x11: 0000000000000002 x10: 0000000000004001 x9 : ffffa93387f51d0c <4>[ 73.993108] x8 : fff00000c2c99240 x7 : 0000000000000001 x6 : 0000000000000001 <4>[ 73.993886] x5 : fff00000c4879800 x4 : 0000000000000000 x3 : 000000000033a401 <4>[ 73.995550] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 73.997017] Call trace: <4>[ 73.998266] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 73.999716] alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 74.000942] alloc_workqueue_attrs (kernel/workqueue.c:4624 (discriminator 1)) <4>[ 74.001327] apply_wqattrs_prepare (kernel/workqueue.c:5263) <4>[ 74.003095] apply_workqueue_attrs_locked (kernel/workqueue.c:5351) <4>[ 74.003855] alloc_workqueue (kernel/workqueue.c:5722 (discriminator 1) kernel/workqueue.c:5772 (discriminator 1)) <4>[ 74.005398] ext4_fill_super (fs/ext4/super.c:5484 fs/ext4/super.c:5722) <4>[ 74.006132] get_tree_bdev_flags (fs/super.c:1636) <4>[ 74.007624] get_tree_bdev (fs/super.c:1660) <4>[ 74.008664] ext4_get_tree (fs/ext4/super.c:5755) <4>[ 74.009423] vfs_get_tree (fs/super.c:1814) <4>[ 74.009703] path_mount (fs/namespace.c:3556 fs/namespace.c:3883) <4>[ 74.010608] __arm64_sys_mount (fs/namespace.c:3896 fs/namespace.c:4107 fs/namespace.c:4084 fs/namespace.c:4084) <4>[ 74.011527] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 74.012798] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 74.014042] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 74.014942] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:763) <4>[ 74.015917] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 74.017042] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a) All code ======== 0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction
0: f8626b1a ldr x26, [x24, x2] <4>[ 74.019014] ---[ end trace 0000000000000000 ]--- tst_test.c:1763: TBROK: Test killed by SIGSEGV!
Summary: passed 0 failed 0 broken 1 skipped 0 warnings 0 tst_device.c:269: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO for too long Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m Tainted kernel: ['kernel died recently, i.e. there was an OOPS or BUG'][0m Restarting SUT: host
===== df01_sh ===== command: df01.sh <12>[ 76.370093] /usr/local/bin/kirk[253]: starting test df01_sh (df01.sh) Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m <1>[ 76.603065] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 76.603922] Mem abort info: <1>[ 76.604197] ESR = 0x0000000096000005 <1>[ 76.604638] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 76.605128] SET = 0, FnV = 0 <1>[ 76.606996] EA = 0, S1PTW = 0 <1>[ 76.607274] FSC = 0x05: level 1 translation fault <1>[ 76.607611] Data abort info: <1>[ 76.607897] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 76.609765] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 76.610958] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 76.611652] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 76.612130] Internal error: Oops: 0000000096000005 [#2] PREEMPT SMP <4>[ 76.613305] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 76.617688] CPU: 1 UID: 0 PID: 553 Comm: df01.sh Tainted: G D 6.13.0-rc3-next-20241218 #1 <4>[ 76.620869] Tainted: [D]=DIE <4>[ 76.621184] Hardware name: linux,dummy-virt (DT) <4>[ 76.622671] pstate: 63402009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[ 76.623693] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 76.624180] lr : __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.625290] sp : ffff80008258fa90 <4>[ 76.626275] x29: ffff80008258fab0 x28: fff00000c2c98e80 x27: fff00000c48fd100 <4>[ 76.626966] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 76.627599] x23: 0000000000000dc0 x22: ffffa93386d87390 x21: 00000000ffffffff <4>[ 76.628603] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 76.629618] x17: 0000000000000000 x16: ffff800082180000 x15: ffff800080000000 <4>[ 76.630999] x14: fff00000c00203f0 x13: 00000ffff8000821 x12: 0000000000000000 <4>[ 76.632089] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa93386d87390 <4>[ 76.634293] x8 : ffff80008258f908 x7 : fff00000c2c98e80 x6 : 0000000000010000 <4>[ 76.634816] x5 : ffffa93389379000 x4 : 0000000000000000 x3 : 000000000033b801 <4>[ 76.636355] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 76.638309] Call trace: <4>[ 76.639031] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 76.640890] __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.641267] copy_process (kernel/fork.c:314 (discriminator 1) kernel/fork.c:1061 (discriminator 1) kernel/fork.c:2176 (discriminator 1)) <4>[ 76.641795] kernel_clone (kernel/fork.c:2758) <4>[ 76.643003] __do_sys_clone (kernel/fork.c:2902) <4>[ 76.644078] __arm64_sys_clone (kernel/fork.c:2869) <4>[ 76.645306] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 76.646337] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 76.646974] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 76.647709] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:763) <4>[ 76.649032] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 76.649724] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
<trim>
All code
0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction
0: f8626b1a ldr x26, [x24, x2] <4>[ 79.647693] ---[ end trace 0000000000000000 ]--- <0>[ 79.649260] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b <2>[ 79.650229] SMP: stopping secondary CPUs <0>[ 79.651558] Kernel Offset: 0x293306a00000 from 0xffff800080000000 <0>[ 79.652015] PHYS_OFFSET: 0x40000000 <0>[ 79.652461] CPU features: 0x000,000000d0,60bef2d8,cb7e7f3f <0>[ 79.653039] Memory Limit: none <0>[ 79.653854] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241218/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241212/tes...
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2qNMDhPFtR8...
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241211/tes...
metadata:
git describe: next-20241210..next-20241218 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... build url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... toolchain: gcc-13 config: CONFIG_ARM64_64K_PAGES=y, CONFIG_ARM64_16K_PAGES=y arch: arm64 qemu: qemu-arm64 version 9.1.2
-- Linaro LKFT https://lkft.linaro.org
On Wed, Dec 18, 2024 at 09:22:26PM +0530, Naresh Kamboju wrote:
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with
- CONFIG_ARM64_64K_PAGES=y
- CONFIG_ARM64_16K_PAGES=y
and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Anders bisected this reported regression and found, # first bad commit: [9c1d66793b6faa00106ae4c866359578bfc012d2] btrfs: validate system chunk array at btrfs_validate_super()
Thanks, I'll drop the patch from linux-next for now.
在 2024/12/19 02:22, Naresh Kamboju 写道:
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with
- CONFIG_ARM64_64K_PAGES=y
- CONFIG_ARM64_16K_PAGES=y
and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Anders bisected this reported regression and found, # first bad commit: [9c1d66793b6faa00106ae4c866359578bfc012d2] btrfs: validate system chunk array at btrfs_validate_super()
Weird, I run daily fstests with 64K page sized aarch64 VM.
But never hit a crash on this.
And the original crash call trace only points back to ext4, not btrfs.
Mind to test it with KASAN enabled?
Thanks, Qu
Test log:
tst_test.c:1799: TINFO: === Testing on btrfs === tst_test.c:1158: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts='' <6>[ 71.880167] BTRFS: device fsid d492b571-012c-40a9-b8e1-efc97408d3bc devid 1 transid 6 /dev/loop0 (7:0) scanned by chdir01 (476) tst_test.c:1170: TINFO: Mounting /dev/loop0 to /tmp/LTP_chdJeywxF/mntpoint fstyp=btrfs flags=0 <6>[ 71.960245] BTRFS info (device loop0): first mount of filesystem d492b571-012c-40a9-b8e1-efc97408d3bc <6>[ 71.970667] BTRFS info (device loop0): using crc32c (crc32c-arm64) checksum algorithm <2>[ 71.993486] BTRFS critical (device loop0): corrupt superblock syschunk array: chunk_start=22020096, invalid chunk sectorsize, have 65536 expect 4096 <3>[ 71.995802] BTRFS error (device loop0): superblock contains fatal errors <3>[ 72.014538] BTRFS error (device loop0): open_ctree failed: -22 tst_test.c:1170: TBROK: mount(/dev/loop0, mntpoint, btrfs, 0, (nil)) failed: EINVAL (22)
Summary: passed 48 failed 0 broken 1 skipped 0 warnings 0
Duration: 7.002s
===== symlink01 ===== command: symlink01 <12>[ 72.494428] /usr/local/bin/kirk[253]: starting test symlink01 (symlink01) symlink01 0 TINFO : Using /tmp/LTP_symmsYXet as tmpdir (tmpfs filesystem) symlink01 1 TPASS : Creation of symbolic link file to no object file is ok symlink01 2 TPASS : Creation of symbolic link file to no object file is ok symlink01 3 TPASS : Creation of symbolic link file and object file via symbolic link is ok symlink01 4 TPASS : Creating an existing symbolic link file error is caught symlink01 5 TPASS : Creating a symbolic link which exceeds maximum pathname error is caught
Summary: passed 5 failed 0 broken 0 skipped 0 warnings 0
Duration: 0.052s
===== stat04 ===== command: stat04 <12>[ 72.966706] /usr/local/bin/kirk[253]: starting test stat04 (stat04) tst_buffers.c:57: TINFO: Test is using guarded buffers tst_tmpdir.c:316: TINFO: Using /tmp/LTP_staEABwgV as tmpdir (tmpfs filesystem) <6>[ 73.447708] loop0: detected capacity change from 0 to 614400 tst_device.c:96: TINFO: Found free device 0 '/dev/loop0' tst_test.c:1860: TINFO: LTP version: 20240930 tst_test.c:1864: TINFO: Tested kernel: 6.13.0-rc3-next-20241218 #1 SMP PREEMPT @1734498806 aarch64 tst_test.c:1703: TINFO: Timeout per run is 0h 05m 24s stat04.c:60: TINFO: Formatting /dev/loop0 with ext2 opts='-b 4096' extra opts='' mke2fs 1.47.1 (20-May-2024) <3>[ 73.859753] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10000800 phys_seg 0 prio class 0 stat04.c:61: TINFO: Mounting /dev/loop0 to /tmp/LTP_staEABwgV/mntpoint fstyp=ext2 flags=0 <6>[ 73.939263] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem <1>[ 73.946378] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 73.947878] Mem abort info: <1>[ 73.949153] ESR = 0x0000000096000005 <1>[ 73.959105] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 73.960031] SET = 0, FnV = 0 <1>[ 73.960349] EA = 0, S1PTW = 0 <1>[ 73.960638] FSC = 0x05: level 1 translation fault <1>[ 73.961005] Data abort info: <1>[ 73.961293] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 73.963739] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 73.964980] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 73.967132] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 73.968923] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP <4>[ 73.970516] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 73.974237] CPU: 1 UID: 0 PID: 529 Comm: stat04 Not tainted 6.13.0-rc3-next-20241218 #1 <4>[ 73.975359] Hardware name: linux,dummy-virt (DT) <4>[ 73.977170] pstate: 62402009 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) <4>[ 73.978295] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 73.980200] lr : alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 73.981466] sp : ffff80008258f950 <4>[ 73.982228] x29: ffff80008258f970 x28: ffffa93389398000 x27: 0000000000000001 <4>[ 73.983875] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 73.985649] x23: 0000000000000cc0 x22: ffffa93387f51d0c x21: 00000000ffffffff <4>[ 73.986188] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 73.988686] x17: fff056cd748b0000 x16: ffff800080020000 x15: 0000000000000000 <4>[ 73.990276] x14: 0000000000002a66 x13: 0000000000004000 x12: 0000000000000001 <4>[ 73.992401] x11: 0000000000000002 x10: 0000000000004001 x9 : ffffa93387f51d0c <4>[ 73.993108] x8 : fff00000c2c99240 x7 : 0000000000000001 x6 : 0000000000000001 <4>[ 73.993886] x5 : fff00000c4879800 x4 : 0000000000000000 x3 : 000000000033a401 <4>[ 73.995550] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 73.997017] Call trace: <4>[ 73.998266] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 73.999716] alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 74.000942] alloc_workqueue_attrs (kernel/workqueue.c:4624 (discriminator 1)) <4>[ 74.001327] apply_wqattrs_prepare (kernel/workqueue.c:5263) <4>[ 74.003095] apply_workqueue_attrs_locked (kernel/workqueue.c:5351) <4>[ 74.003855] alloc_workqueue (kernel/workqueue.c:5722 (discriminator 1) kernel/workqueue.c:5772 (discriminator 1)) <4>[ 74.005398] ext4_fill_super (fs/ext4/super.c:5484 fs/ext4/super.c:5722) <4>[ 74.006132] get_tree_bdev_flags (fs/super.c:1636) <4>[ 74.007624] get_tree_bdev (fs/super.c:1660) <4>[ 74.008664] ext4_get_tree (fs/ext4/super.c:5755) <4>[ 74.009423] vfs_get_tree (fs/super.c:1814) <4>[ 74.009703] path_mount (fs/namespace.c:3556 fs/namespace.c:3883) <4>[ 74.010608] __arm64_sys_mount (fs/namespace.c:3896 fs/namespace.c:4107 fs/namespace.c:4084 fs/namespace.c:4084) <4>[ 74.011527] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 74.012798] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 74.014042] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 74.014942] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:763) <4>[ 74.015917] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 74.017042] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a) All code ======== 0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction
0: f8626b1a ldr x26, [x24, x2]
<4>[ 74.019014] ---[ end trace 0000000000000000 ]--- tst_test.c:1763: TBROK: Test killed by SIGSEGV!
Summary: passed 0 failed 0 broken 1 skipped 0 warnings 0 tst_device.c:269: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO for too long Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m Tainted kernel: ['kernel died recently, i.e. there was an OOPS or BUG'][0m Restarting SUT: host
===== df01_sh ===== command: df01.sh <12>[ 76.370093] /usr/local/bin/kirk[253]: starting test df01_sh (df01.sh) Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m <1>[ 76.603065] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 76.603922] Mem abort info: <1>[ 76.604197] ESR = 0x0000000096000005 <1>[ 76.604638] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 76.605128] SET = 0, FnV = 0 <1>[ 76.606996] EA = 0, S1PTW = 0 <1>[ 76.607274] FSC = 0x05: level 1 translation fault <1>[ 76.607611] Data abort info: <1>[ 76.607897] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 76.609765] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 76.610958] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 76.611652] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 76.612130] Internal error: Oops: 0000000096000005 [#2] PREEMPT SMP <4>[ 76.613305] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 76.617688] CPU: 1 UID: 0 PID: 553 Comm: df01.sh Tainted: G D 6.13.0-rc3-next-20241218 #1 <4>[ 76.620869] Tainted: [D]=DIE <4>[ 76.621184] Hardware name: linux,dummy-virt (DT) <4>[ 76.622671] pstate: 63402009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[ 76.623693] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 76.624180] lr : __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.625290] sp : ffff80008258fa90 <4>[ 76.626275] x29: ffff80008258fab0 x28: fff00000c2c98e80 x27: fff00000c48fd100 <4>[ 76.626966] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 76.627599] x23: 0000000000000dc0 x22: ffffa93386d87390 x21: 00000000ffffffff <4>[ 76.628603] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 76.629618] x17: 0000000000000000 x16: ffff800082180000 x15: ffff800080000000 <4>[ 76.630999] x14: fff00000c00203f0 x13: 00000ffff8000821 x12: 0000000000000000 <4>[ 76.632089] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa93386d87390 <4>[ 76.634293] x8 : ffff80008258f908 x7 : fff00000c2c98e80 x6 : 0000000000010000 <4>[ 76.634816] x5 : ffffa93389379000 x4 : 0000000000000000 x3 : 000000000033b801 <4>[ 76.636355] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 76.638309] Call trace: <4>[ 76.639031] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 76.640890] __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.641267] copy_process (kernel/fork.c:314 (discriminator 1) kernel/fork.c:1061 (discriminator 1) kernel/fork.c:2176 (discriminator 1)) <4>[ 76.641795] kernel_clone (kernel/fork.c:2758) <4>[ 76.643003] __do_sys_clone (kernel/fork.c:2902) <4>[ 76.644078] __arm64_sys_clone (kernel/fork.c:2869) <4>[ 76.645306] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 76.646337] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 76.646974] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 76.647709] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:763) <4>[ 76.649032] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 76.649724] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
<trim>
All code
0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20
10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction
0: f8626b1a ldr x26, [x24, x2]
<4>[ 79.647693] ---[ end trace 0000000000000000 ]--- <0>[ 79.649260] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b <2>[ 79.650229] SMP: stopping secondary CPUs <0>[ 79.651558] Kernel Offset: 0x293306a00000 from 0xffff800080000000 <0>[ 79.652015] PHYS_OFFSET: 0x40000000 <0>[ 79.652461] CPU features: 0x000,000000d0,60bef2d8,cb7e7f3f <0>[ 79.653039] Memory Limit: none <0>[ 79.653854] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241218/tes...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241212/tes...
- https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2qNMDhPFtR8...
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM...
- https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20241211/tes...
metadata:
git describe: next-20241210..next-20241218 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git kernel config: https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... build url: https://storage.tuxsuite.com/public/linaro/lkft/builds/2qNMCQazNJteQLGCw7MnM... toolchain: gcc-13 config: CONFIG_ARM64_64K_PAGES=y, CONFIG_ARM64_16K_PAGES=y arch: arm64 qemu: qemu-arm64 version 9.1.2
-- Linaro LKFT https://lkft.linaro.org
在 2024/12/19 06:37, Qu Wenruo 写道:
在 2024/12/19 02:22, Naresh Kamboju 写道:
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with - CONFIG_ARM64_64K_PAGES=y - CONFIG_ARM64_16K_PAGES=y and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Anders bisected this reported regression and found, # first bad commit: [9c1d66793b6faa00106ae4c866359578bfc012d2] btrfs: validate system chunk array at btrfs_validate_super()
Weird, I run daily fstests with 64K page sized aarch64 VM.
But never hit a crash on this.
And the original crash call trace only points back to ext4, not btrfs.
Mind to test it with KASAN enabled?
Another thing is, how do you enable both 16K and 64K page size at the same time?
The Kconfig should only select one page size IIRC.
And for the bisection, does it focus on the test failure or the crash?
For the test failure it looks like some older btrfs-progs, causing invalid system chunk items, which got caught by the newer and more strict sanity checks.
For the crash, unfortunately I'm not able to reproduce using fstests. Will try LTP soon.
Thanks, Qu
Thanks, Qu
Test log:
tst_test.c:1799: TINFO: === Testing on btrfs === tst_test.c:1158: TINFO: Formatting /dev/loop0 with btrfs opts='' extra opts='' <6>[ 71.880167] BTRFS: device fsid d492b571-012c-40a9-b8e1-efc97408d3bc devid 1 transid 6 /dev/loop0 (7:0) scanned by chdir01 (476) tst_test.c:1170: TINFO: Mounting /dev/loop0 to /tmp/LTP_chdJeywxF/mntpoint fstyp=btrfs flags=0 <6>[ 71.960245] BTRFS info (device loop0): first mount of filesystem d492b571-012c-40a9-b8e1-efc97408d3bc <6>[ 71.970667] BTRFS info (device loop0): using crc32c (crc32c-arm64) checksum algorithm <2>[ 71.993486] BTRFS critical (device loop0): corrupt superblock syschunk array: chunk_start=22020096, invalid chunk sectorsize, have 65536 expect 4096 <3>[ 71.995802] BTRFS error (device loop0): superblock contains fatal errors <3>[ 72.014538] BTRFS error (device loop0): open_ctree failed: -22 tst_test.c:1170: TBROK: mount(/dev/loop0, mntpoint, btrfs, 0, (nil)) failed: EINVAL (22)
Summary: passed 48 failed 0 broken 1 skipped 0 warnings 0
Duration: 7.002s
===== symlink01 ===== command: symlink01 <12>[ 72.494428] /usr/local/bin/kirk[253]: starting test symlink01 (symlink01) symlink01 0 TINFO : Using /tmp/LTP_symmsYXet as tmpdir (tmpfs filesystem) symlink01 1 TPASS : Creation of symbolic link file to no object file is ok symlink01 2 TPASS : Creation of symbolic link file to no object file is ok symlink01 3 TPASS : Creation of symbolic link file and object file via symbolic link is ok symlink01 4 TPASS : Creating an existing symbolic link file error is caught symlink01 5 TPASS : Creating a symbolic link which exceeds maximum pathname error is caught
Summary: passed 5 failed 0 broken 0 skipped 0 warnings 0
Duration: 0.052s
===== stat04 ===== command: stat04 <12>[ 72.966706] /usr/local/bin/kirk[253]: starting test stat04 (stat04) tst_buffers.c:57: TINFO: Test is using guarded buffers tst_tmpdir.c:316: TINFO: Using /tmp/LTP_staEABwgV as tmpdir (tmpfs filesystem) <6>[ 73.447708] loop0: detected capacity change from 0 to 614400 tst_device.c:96: TINFO: Found free device 0 '/dev/loop0' tst_test.c:1860: TINFO: LTP version: 20240930 tst_test.c:1864: TINFO: Tested kernel: 6.13.0-rc3-next-20241218 #1 SMP PREEMPT @1734498806 aarch64 tst_test.c:1703: TINFO: Timeout per run is 0h 05m 24s stat04.c:60: TINFO: Formatting /dev/loop0 with ext2 opts='-b 4096' extra opts='' mke2fs 1.47.1 (20-May-2024) <3>[ 73.859753] operation not supported error, dev loop0, sector 614272 op 0x9:(WRITE_ZEROES) flags 0x10000800 phys_seg 0 prio class 0 stat04.c:61: TINFO: Mounting /dev/loop0 to /tmp/LTP_staEABwgV/mntpoint fstyp=ext2 flags=0 <6>[ 73.939263] EXT4-fs (loop0): mounting ext2 file system using the ext4 subsystem <1>[ 73.946378] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 73.947878] Mem abort info: <1>[ 73.949153] ESR = 0x0000000096000005 <1>[ 73.959105] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 73.960031] SET = 0, FnV = 0 <1>[ 73.960349] EA = 0, S1PTW = 0 <1>[ 73.960638] FSC = 0x05: level 1 translation fault <1>[ 73.961005] Data abort info: <1>[ 73.961293] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 73.963739] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 73.964980] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 73.967132] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 73.968923] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP <4>[ 73.970516] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 73.974237] CPU: 1 UID: 0 PID: 529 Comm: stat04 Not tainted 6.13.0-rc3-next-20241218 #1 <4>[ 73.975359] Hardware name: linux,dummy-virt (DT) <4>[ 73.977170] pstate: 62402009 (nZCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) <4>[ 73.978295] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 73.980200] lr : alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 73.981466] sp : ffff80008258f950 <4>[ 73.982228] x29: ffff80008258f970 x28: ffffa93389398000 x27: 0000000000000001 <4>[ 73.983875] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 73.985649] x23: 0000000000000cc0 x22: ffffa93387f51d0c x21: 00000000ffffffff <4>[ 73.986188] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 73.988686] x17: fff056cd748b0000 x16: ffff800080020000 x15: 0000000000000000 <4>[ 73.990276] x14: 0000000000002a66 x13: 0000000000004000 x12: 0000000000000001 <4>[ 73.992401] x11: 0000000000000002 x10: 0000000000004001 x9 : ffffa93387f51d0c <4>[ 73.993108] x8 : fff00000c2c99240 x7 : 0000000000000001 x6 : 0000000000000001 <4>[ 73.993886] x5 : fff00000c4879800 x4 : 0000000000000000 x3 : 000000000033a401 <4>[ 73.995550] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 73.997017] Call trace: <4>[ 73.998266] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 73.999716] alloc_cpumask_var_node (lib/cpumask.c:62 (discriminator 2)) <4>[ 74.000942] alloc_workqueue_attrs (kernel/workqueue.c:4624 (discriminator 1)) <4>[ 74.001327] apply_wqattrs_prepare (kernel/workqueue.c:5263) <4>[ 74.003095] apply_workqueue_attrs_locked (kernel/workqueue.c:5351) <4>[ 74.003855] alloc_workqueue (kernel/workqueue.c:5722 (discriminator 1) kernel/workqueue.c:5772 (discriminator 1)) <4>[ 74.005398] ext4_fill_super (fs/ext4/super.c:5484 fs/ext4/ super.c:5722) <4>[ 74.006132] get_tree_bdev_flags (fs/super.c:1636) <4>[ 74.007624] get_tree_bdev (fs/super.c:1660) <4>[ 74.008664] ext4_get_tree (fs/ext4/super.c:5755) <4>[ 74.009423] vfs_get_tree (fs/super.c:1814) <4>[ 74.009703] path_mount (fs/namespace.c:3556 fs/namespace.c:3883) <4>[ 74.010608] __arm64_sys_mount (fs/namespace.c:3896 fs/namespace.c:4107 fs/namespace.c:4084 fs/namespace.c:4084) <4>[ 74.011527] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 74.012798] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 74.014042] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 74.014942] el0t_64_sync_handler (arch/arm64/kernel/entry- common.c:763) <4>[ 74.015917] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 74.017042] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a) All code ======== 0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction
0: f8626b1a ldr x26, [x24, x2] <4>[ 74.019014] ---[ end trace 0000000000000000 ]--- tst_test.c:1763: TBROK: Test killed by SIGSEGV!
Summary: passed 0 failed 0 broken 1 skipped 0 warnings 0 tst_device.c:269: TWARN: ioctl(/dev/loop0, LOOP_CLR_FD, 0) no ENXIO for too long Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m Tainted kernel: ['kernel died recently, i.e. there was an OOPS or BUG'][0m Restarting SUT: host
===== df01_sh ===== command: df01.sh <12>[ 76.370093] /usr/local/bin/kirk[253]: starting test df01_sh (df01.sh) Tainted kernel: kernel died recently, i.e. there was an OOPS or BUG[0m <1>[ 76.603065] Unable to handle kernel paging request at virtual address a8fff00000c0c224 <1>[ 76.603922] Mem abort info: <1>[ 76.604197] ESR = 0x0000000096000005 <1>[ 76.604638] EC = 0x25: DABT (current EL), IL = 32 bits <1>[ 76.605128] SET = 0, FnV = 0 <1>[ 76.606996] EA = 0, S1PTW = 0 <1>[ 76.607274] FSC = 0x05: level 1 translation fault <1>[ 76.607611] Data abort info: <1>[ 76.607897] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 <1>[ 76.609765] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 <1>[ 76.610958] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 <1>[ 76.611652] [a8fff00000c0c224] address between user and kernel address ranges <0>[ 76.612130] Internal error: Oops: 0000000096000005 [#2] PREEMPT SMP <4>[ 76.613305] Modules linked in: btrfs blake2b_generic xor xor_neon raid6_pq zstd_compress sm3_ce sm3 sha3_ce sha512_ce sha512_arm64 fuse drm backlight ip_tables x_tables <4>[ 76.617688] CPU: 1 UID: 0 PID: 553 Comm: df01.sh Tainted: G D 6.13.0-rc3-next-20241218 #1 <4>[ 76.620869] Tainted: [D]=DIE <4>[ 76.621184] Hardware name: linux,dummy-virt (DT) <4>[ 76.622671] pstate: 63402009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) <4>[ 76.623693] pc : __kmalloc_node_noprof (mm/slub.c:492 mm/slub.c:505 mm/slub.c:532 mm/slub.c:3993 mm/slub.c:4152 mm/slub.c:4293 mm/slub.c:4300) <4>[ 76.624180] lr : __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.625290] sp : ffff80008258fa90 <4>[ 76.626275] x29: ffff80008258fab0 x28: fff00000c2c98e80 x27: fff00000c48fd100 <4>[ 76.626966] x26: fffffc1fc0303080 x25: 00000000ffffffff x24: a8fff00000c0c224 <4>[ 76.627599] x23: 0000000000000dc0 x22: ffffa93386d87390 x21: 00000000ffffffff <4>[ 76.628603] x20: fff00000c0010400 x19: 0000000000000008 x18: 0000000000000000 <4>[ 76.629618] x17: 0000000000000000 x16: ffff800082180000 x15: ffff800080000000 <4>[ 76.630999] x14: fff00000c00203f0 x13: 00000ffff8000821 x12: 0000000000000000 <4>[ 76.632089] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa93386d87390 <4>[ 76.634293] x8 : ffff80008258f908 x7 : fff00000c2c98e80 x6 : 0000000000010000 <4>[ 76.634816] x5 : ffffa93389379000 x4 : 0000000000000000 x3 : 000000000033b801 <4>[ 76.636355] x2 : 0000000000000000 x1 : a8fff00000c0c224 x0 : fff00000c0010400 <4>[ 76.638309] Call trace: <4>[ 76.639031] __kmalloc_node_noprof+0x100/0x4a0 P <4>[ 76.640890] __vmalloc_node_range_noprof (include/linux/slab.h:922 mm/vmalloc.c:3647 mm/vmalloc.c:3846) <4>[ 76.641267] copy_process (kernel/fork.c:314 (discriminator 1) kernel/fork.c:1061 (discriminator 1) kernel/fork.c:2176 (discriminator 1)) <4>[ 76.641795] kernel_clone (kernel/fork.c:2758) <4>[ 76.643003] __do_sys_clone (kernel/fork.c:2902) <4>[ 76.644078] __arm64_sys_clone (kernel/fork.c:2869) <4>[ 76.645306] invoke_syscall.constprop.0 (arch/arm64/include/asm/syscall.h:61 arch/arm64/kernel/syscall.c:54) <4>[ 76.646337] do_el0_svc (include/linux/thread_info.h:135 (discriminator 2) arch/arm64/kernel/syscall.c:140 (discriminator 2) arch/arm64/kernel/syscall.c:151 (discriminator 2)) <4>[ 76.646974] el0_svc (arch/arm64/include/asm/irqflags.h:82 (discriminator 1) arch/arm64/include/asm/irqflags.h:123 (discriminator
- arch/arm64/include/asm/irqflags.h:136 (discriminator 1)
arch/arm64/kernel/entry-common.c:165 (discriminator 1) arch/arm64/kernel/entry-common.c:178 (discriminator 1) arch/arm64/kernel/entry-common.c:745 (discriminator 1)) <4>[ 76.647709] el0t_64_sync_handler (arch/arm64/kernel/entry- common.c:763) <4>[ 76.649032] el0t_64_sync (arch/arm64/kernel/entry.S:600) <0>[ 76.649724] Code: 12800019 b9402a82 aa1803e1 aa1403e0 (f8626b1a)
<trim>
All code
0: 12800019 mov w25, #0xffffffff // #-1 4: b9402a82 ldr w2, [x20, #40] 8: aa1803e1 mov x1, x24 c: aa1403e0 mov x0, x20 10:* f8626b1a ldr x26, [x24, x2] <-- trapping instruction
Code starting with the faulting instruction
0: f8626b1a ldr x26, [x24, x2] <4>[ 79.647693] ---[ end trace 0000000000000000 ]--- <0>[ 79.649260] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b <2>[ 79.650229] SMP: stopping secondary CPUs <0>[ 79.651558] Kernel Offset: 0x293306a00000 from 0xffff800080000000 <0>[ 79.652015] PHYS_OFFSET: 0x40000000 <0>[ 79.652461] CPU features: 0x000,000000d0,60bef2d8,cb7e7f3f <0>[ 79.653039] Memory Limit: none <0>[ 79.653854] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
Links:
- https://qa-reports.linaro.org/lkft/linux-next-master/build/ next-20241218/testrun/26396709/suite/log-parser-test/test/panic- multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/ history/ - https://qa-reports.linaro.org/lkft/linux-next-master/build/ next-20241212/testrun/26277241/suite/log-parser-test/test/panic- multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/log - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/ tests/2qNMDhPFtR8j185QSvZMn989u84 - https://storage.tuxsuite.com/public/linaro/lkft/ builds/2qNMCQazNJteQLGCw7MnMtUwzkD/ - https://qa-reports.linaro.org/lkft/linux-next-master/build/ next-20241211/testrun/26266202/suite/log-parser-test/test/panic- multiline-kernel-panic-not-syncing-attempted-to-kill-init-exitcode/ details/
metadata:
git describe: next-20241210..next-20241218 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/next/ linux-next.git kernel config: https://storage.tuxsuite.com/public/linaro/lkft/ builds/2qNMCQazNJteQLGCw7MnMtUwzkD/config build url: https://storage.tuxsuite.com/public/linaro/lkft/ builds/2qNMCQazNJteQLGCw7MnMtUwzkD/ toolchain: gcc-13 config: CONFIG_ARM64_64K_PAGES=y, CONFIG_ARM64_16K_PAGES=y arch: arm64 qemu: qemu-arm64 version 9.1.2
-- Linaro LKFT https://lkft.linaro.org
On Thu, Dec 19, 2024 at 10:44:12AM +1030, Qu Wenruo wrote:
在 2024/12/19 06:37, Qu Wenruo 写道:
在 2024/12/19 02:22, Naresh Kamboju 写道:
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with - CONFIG_ARM64_64K_PAGES=y - CONFIG_ARM64_16K_PAGES=y and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Anders bisected this reported regression and found, # first bad commit: [9c1d66793b6faa00106ae4c866359578bfc012d2] btrfs: validate system chunk array at btrfs_validate_super()
Weird, I run daily fstests with 64K page sized aarch64 VM.
But never hit a crash on this.
And the original crash call trace only points back to ext4, not btrfs.
Yeah. But it's in the memory allocator so it looks like memory corruption. After the ext4 crash then random other stuff starts crashing as well when it allocates memory.
Mind to test it with KASAN enabled?
Anders is going to try that later and report back.
Another thing is, how do you enable both 16K and 64K page size at the same time?
The Kconfig should only select one page size IIRC.
Right. We tested 4k, 16k and 64k. 4k pages worked.
And for the bisection, does it focus on the test failure or the crash?
The crash.
regards, dan carpenter
On Thu, Dec 19, 2024 at 06:10:56PM +0300, Dan Carpenter wrote:
Mind to test it with KASAN enabled?
Anders is going to try that later and report back.
Anders ran it and emailed me. I was going to tell him to respond to the thread but I decided to steal the credit. #GreatArtists
BTRFS info (device loop0): using crc32c (crc32c-arm64) checksum algorithm ================================================================== BUG: KASAN: slab-out-of-bounds in __bitmap_set+0xf8/0x100 Read of size 8 at addr fff0000020e4a3c8 by task chdir01/479
CPU: 1 UID: 0 PID: 479 Comm: chdir01 Not tainted 6.13.0-rc3-next-20241218 #1 Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x8c/0xd0 print_report+0x118/0x5e0 kasan_report+0xb4/0x100 __asan_report_load8_noabort+0x20/0x30 __bitmap_set+0xf8/0x100 btrfs_subpage_set_uptodate+0xd8/0x1d0 [btrfs] set_extent_buffer_uptodate+0x1ac/0x288 [btrfs] __alloc_dummy_extent_buffer+0x2cc/0x488 [btrfs] alloc_dummy_extent_buffer+0x4c/0x78 [btrfs] btrfs_check_system_chunk_array+0x30/0x308 [btrfs] btrfs_validate_super+0x7e8/0xd40 [btrfs] open_ctree+0x958/0x3c98 [btrfs] btrfs_get_tree+0xce4/0x13d8 [btrfs] vfs_get_tree+0x7c/0x290 fc_mount+0x20/0xa8 btrfs_get_tree+0x72c/0x13d8 [btrfs] vfs_get_tree+0x7c/0x290 path_mount+0x748/0x1518 __arm64_sys_mount+0x234/0x4f8 invoke_syscall.constprop.0+0x78/0x1f0 do_el0_svc+0xcc/0x1d8 el0_svc+0x38/0xa8 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x198/0x1a0
Here are the full logs. https://people.linaro.org/~anders.roxell/next-20241218-issue-arm64-64k+kasan...
regards, dan carpenter
在 2024/12/20 02:07, Dan Carpenter 写道:
On Thu, Dec 19, 2024 at 06:10:56PM +0300, Dan Carpenter wrote:
Mind to test it with KASAN enabled?
Anders is going to try that later and report back.
Anders ran it and emailed me. I was going to tell him to respond to the thread but I decided to steal the credit. #GreatArtists
BTRFS info (device loop0): using crc32c (crc32c-arm64) checksum algorithm
BUG: KASAN: slab-out-of-bounds in __bitmap_set+0xf8/0x100 Read of size 8 at addr fff0000020e4a3c8 by task chdir01/479 CPU: 1 UID: 0 PID: 479 Comm: chdir01 Not tainted 6.13.0-rc3-next-20241218 #1 Hardware name: linux,dummy-virt (DT) Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x8c/0xd0 print_report+0x118/0x5e0 kasan_report+0xb4/0x100 __asan_report_load8_noabort+0x20/0x30 __bitmap_set+0xf8/0x100 btrfs_subpage_set_uptodate+0xd8/0x1d0 [btrfs] set_extent_buffer_uptodate+0x1ac/0x288 [btrfs] __alloc_dummy_extent_buffer+0x2cc/0x488 [btrfs]
Thanks a lot. The problem is still inside the dummy extent buffer allocation.
This time it's again related to the uninitialized fs_info->* variables.
In this case, it's the fs_info->sectorsize_bits which is not initialized, thus its default value is 0.
Then in btrfs_subpage_set_uptodate(), we use (len >> sectorsize_bits) to calculate how many bits must be set.
But since sectorsize_bits is 0, the @len (4K) is utilized, resulting the out-of-boundary access.
The root cause is the same as the initial failure, that we can not use a lot of functions to do such early verification, thus I'll create a dedicated helper to do the sanity check to avoid memory allocation nor early access to fs_info.
Thanks everyone involved in exposing this bug! Qu
alloc_dummy_extent_buffer+0x4c/0x78 [btrfs] btrfs_check_system_chunk_array+0x30/0x308 [btrfs] btrfs_validate_super+0x7e8/0xd40 [btrfs] open_ctree+0x958/0x3c98 [btrfs] btrfs_get_tree+0xce4/0x13d8 [btrfs] vfs_get_tree+0x7c/0x290 fc_mount+0x20/0xa8 btrfs_get_tree+0x72c/0x13d8 [btrfs] vfs_get_tree+0x7c/0x290 path_mount+0x748/0x1518 __arm64_sys_mount+0x234/0x4f8 invoke_syscall.constprop.0+0x78/0x1f0 do_el0_svc+0xcc/0x1d8 el0_svc+0x38/0xa8 el0t_64_sync_handler+0x10c/0x138 el0t_64_sync+0x198/0x1a0
Here are the full logs. https://people.linaro.org/~anders.roxell/next-20241218-issue-arm64-64k+kasan...
regards, dan carpenter
在 2024/12/20 01:40, Dan Carpenter 写道:
On Thu, Dec 19, 2024 at 10:44:12AM +1030, Qu Wenruo wrote:
在 2024/12/19 06:37, Qu Wenruo 写道:
在 2024/12/19 02:22, Naresh Kamboju 写道:
On Wed, 18 Dec 2024 at 17:33, Naresh Kamboju naresh.kamboju@linaro.org wrote:
The following kernel crash noticed on qemu-arm64 while running the Linux next-20241210 tag (to next-20241218) kernel built with - CONFIG_ARM64_64K_PAGES=y - CONFIG_ARM64_16K_PAGES=y and running LTP smoke tests.
First seen on Linux next-20241210. Good: next-20241209 Bad: next-20241210 and next-20241218
qemu-arm64: 9.1.2
Anyone noticed this ?
Anders bisected this reported regression and found, # first bad commit: [9c1d66793b6faa00106ae4c866359578bfc012d2] btrfs: validate system chunk array at btrfs_validate_super()
Weird, I run daily fstests with 64K page sized aarch64 VM.
But never hit a crash on this.
And the original crash call trace only points back to ext4, not btrfs.
Yeah. But it's in the memory allocator so it looks like memory corruption. After the ext4 crash then random other stuff starts crashing as well when it allocates memory.
Mind to test it with KASAN enabled?
Anders is going to try that later and report back.
Another thing is, how do you enable both 16K and 64K page size at the same time?
The Kconfig should only select one page size IIRC.
Right. We tested 4k, 16k and 64k. 4k pages worked.
And for the bisection, does it focus on the test failure or the crash?
The crash.
For the failure part, I got the reason, it's indeed the patch, where we call btrfs_check_chunk_valid() but fs_info->sectorsize is still in the default value (4096), not the real one from the superblock.
Thus it will always report false alerts if the on-disk super block is not using 4K sectorsize.
I'll fix it soon.
But sorry I didn't see why the false alert is related to the crash, the only new memory allocation done in that patch is for a dummy extent buffer, which should always be freed.
Anyway in the next version I'll get rid of the memory allocation completely.
Thanks, Qu
regards, dan carpenter