Hi Ted,
On Thu, Dec 12, 2024 at 09:31:05PM +0300, Nikolai Zhubr wrote:
This is to report that after jumping from generic kernel 5.15.167 to 5.15.170 I apparently observe ext4 damage.
Hi Nick,
In general this is not something that upstream kernel developers will pay a lot of attention to try to root cause. If you can come up with
Thanks for a quick and detailed reply. That's really appreciated. I need to clarify. I'm not a hardcore kernel developer at all, I just touch it a little bit occasionally, for random reasons. Debugging the situation thoroughly so as to find and prove the cause is far beyond my capability and also not exactly my personal or professional interest. I also don't need any sort of support (i.e. as a client) - I've already repaired and validated/restored from backups almost everything now, and I can just stick at 5.15.167 for basically as long as I like.
On the other hand, having buggy kernels (to the point of ext4 fs corruption) published as suitable for wide general use is not a good thing in my book, therefore I believe in the case of reasonable suspects I must at least raise a warning about it, and if I can somehow contribute to tracking the problem I'll do what I'm able to.
Not going to argue, but it'd seem if 5.15 is totally out of interest already, why keep patching it? And as long as it keeps receiving patches, supposedly they are backported and applied to stabilize, not damage it? Ok, nevermind :-)
People will also pay more attention if you give more detail in your message. Not just some vague "ext4 damage" (where 99% of time, these sorts of things happen due to hardware-induced corruption), but the exact message when mount failed.
Yes. That is why I spent 2 days for solely testing hardware, booting from separate media, stressing everything, and making plenty of copies. As I mentioned in my initial post, this had revealed no hardware issues. And I'm enjoying md raid-1 since around 2003 already (Not on this device though). I can post all my "smart" values as is, but I can assure they are perfectly fine for both raid-1 members. I encounter faulty hdds elsewhere routinely so its not something unseen too.
#smartctl -a /dev/nvme0n1 | grep Spare Available Spare: 100% Available Spare Threshold: 10%
#smartctl -a /dev/sda | grep Sector Sector Sizes: 512 bytes logical, 4096 bytes physical 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
I have a copy of the entire ext4 partition taken immediately as mount first failed, it is ~800Gb and may contain some sensitive data so I cannot just hand it to someone else or publish for examination. But I can now easily do a replay of mount failure and fsck processing as many times as needed. For now, it seems file/dir bodies had not been damaged, just some system areas had. I've not encountered any file which would give wrong checksum or otherwise appeared definitely damaged, with overall like 95% verified and definitely fine, 5% hard to reliably verify but those are less important files.
Also helpful when reporting ext4 issues, it's helpful to include information about the file system configuration using "dumpe2fs -h
This is a dump run on a standalone copy taken before repair (after successful raid re-check):
#dumpe2fs -h /dev/sdb1 Filesystem volume name: DATA Last mounted on: /opt Filesystem UUID: ea823c6c-500f-4bf0-a4a7-a872ed740af3 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 51634176 Block count: 206513920 Reserved block count: 10325696 Overhead clusters: 3292742 Free blocks: 48135978 Free inodes: 50216050 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 1024 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 8192 Inode blocks per group: 512 Flex block group size: 16 Filesystem created: Tue Jul 9 01:51:16 2024 Last mount time: Mon Dec 9 10:08:27 2024 Last write time: Tue Dec 10 04:08:17 2024 Mount count: 273 Maximum mount count: -1 Last checked: Tue Jul 9 01:51:16 2024 Check interval: 0 (<none>) Lifetime writes: 913 GB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 60bfa28b-cdd2-4ba6-8261-87961db4ecea Journal backup: inode blocks FS Error count: 293 First error time: Tue Dec 10 06:17:23 2024 First error function: ext4_lookup First error line #: 1437 First error inode #: 20709377 Last error time: Tue Dec 10 21:12:30 2024 Last error function: ext4_lookup Last error line #: 1437 Last error inode #: 20709377 Journal features: journal_incompat_revoke journal_64bit Total journal size: 128M Total journal blocks: 32768 Max transaction length: 32768 Fast commit length: 0 Journal sequence: 0x00064c6e Journal start: 0
/dev/XXX". Extracting kernel log messages that include the string "EXT4-fs", via commands like "sudo dmesg | grep EXT4-fs", or "sudo journalctl | grep EXT4-fs", or "grep EXT4-fs /var/log/messages" are also helpful, as is getting a report from fsck via a command like
#grep EXT4-fs messages-20241212 | grep md126 2024-12-06T11:53:09.471317+03:00 lenovo-zh kernel: [ 7.649474][ T1124] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-06T11:53:09.471351+03:00 lenovo-zh kernel: [ 7.899321][ T1124] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. 2024-12-07T12:03:18.518047+03:00 lenovo-zh kernel: [ 7.633150][ T1106] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-07T12:03:18.518054+03:00 lenovo-zh kernel: [ 7.951716][ T1106] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. 2024-12-08T12:41:33.686145+03:00 lenovo-zh kernel: [ 7.588405][ T1118] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-08T12:41:33.686148+03:00 lenovo-zh kernel: [ 7.679963][ T1118] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. (* normal boot failed and subsequently fsck was run on real data here *) 2024-12-10T18:21:40.356656+03:00 lenovo-zh kernel: [ 483.522025][ T1740] EXT4-fs (md126): failed to initialize system zone (-117) 2024-12-10T18:21:40.356685+03:00 lenovo-zh kernel: [ 483.522050][ T1740] EXT4-fs (md126): mount failed 2024-12-11T02:00:18.382301+03:00 lenovo-zh kernel: [ 490.551080][ T1809] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none. 2024-12-11T12:00:53.249626+03:00 lenovo-zh kernel: [ 7.550823][ T1056] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-11T12:00:53.249629+03:00 lenovo-zh kernel: [ 7.662317][ T1056] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none.
#grep md126 messages-20241212 2024-12-07T12:03:18.518038+03:00 lenovo-zh kernel: [ 7.154448][ T992] md126: detected capacity change from 0 to 1652111360 2024-12-07T12:03:18.518047+03:00 lenovo-zh kernel: [ 7.633150][ T1106] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-07T12:03:18.518054+03:00 lenovo-zh kernel: [ 7.951716][ T1106] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. 2024-12-08T12:41:33.685280+03:00 lenovo-zh systemd[1]: Started Timer to wait for more drives before activating degraded array md126.. 2024-12-08T12:41:33.685325+03:00 lenovo-zh systemd[1]: mdadm-last-resort@md126.timer: Deactivated successfully. 2024-12-08T12:41:33.685327+03:00 lenovo-zh systemd[1]: Stopped Timer to wait for more drives before activating degraded array md126.. 2024-12-08T12:41:33.686136+03:00 lenovo-zh kernel: [ 7.346744][ T1107] md/raid1:md126: active with 2 out of 2 mirrors 2024-12-08T12:41:33.686137+03:00 lenovo-zh kernel: [ 7.357218][ T1107] md126: detected capacity change from 0 to 1652111360 2024-12-08T12:41:33.686145+03:00 lenovo-zh kernel: [ 7.588405][ T1118] EXT4-fs (md126): Mount option "noacl" will be removed by 3.5 2024-12-08T12:41:33.686148+03:00 lenovo-zh kernel: [ 7.679963][ T1118] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: noacl. Quota mode: none. (* on 2024-12-09 system refused to boot and no normal log was written *) 2024-12-10T18:13:44.862091+03:00 lenovo-zh systemd[1]: Started Timer to wait for more drives before activating degraded array md126.. 2024-12-10T18:13:45.164589+03:00 lenovo-zh kernel: [ 8.332616][ T1248] md/raid1:md126: active with 2 out of 2 mirrors 2024-12-10T18:13:45.196580+03:00 lenovo-zh kernel: [ 8.363066][ T1248] md126: detected capacity change from 0 to 1652111360 2024-12-10T18:13:45.469396+03:00 lenovo-zh systemd[1]: mdadm-last-resort@md126.timer: Deactivated successfully. 2024-12-10T18:13:45.469584+03:00 lenovo-zh systemd[1]: Stopped Timer to wait for more drives before activating degraded array md126.. 2024-12-10T18:18:51.652575+03:00 lenovo-zh kernel: [ 314.821429][ T1657] md: data-check of RAID array md126 2024-12-10T18:21:40.356656+03:00 lenovo-zh kernel: [ 483.522025][ T1740] EXT4-fs (md126): failed to initialize system zone (-117) 2024-12-10T18:21:40.356685+03:00 lenovo-zh kernel: [ 483.522050][ T1740] EXT4-fs (md126): mount failed 2024-12-10T20:07:29.116652+03:00 lenovo-zh kernel: [ 6832.284366][ T1657] md: md126: data-check done. (fsck was run on real data here) 2024-12-11T01:52:15.839052+03:00 lenovo-zh systemd[1]: Started Timer to wait for more drives before activating degraded array md126.. 2024-12-11T01:52:15.840396+03:00 lenovo-zh kernel: [ 7.832271][ T1170] md/raid1:md126: active with 2 out of 2 mirrors 2024-12-11T01:52:15.840397+03:00 lenovo-zh kernel: [ 7.845385][ T1170] md126: detected capacity change from 0 to 1652111360 2024-12-11T01:52:16.255454+03:00 lenovo-zh systemd[1]: mdadm-last-resort@md126.timer: Deactivated successfully. 2024-12-11T01:52:16.255573+03:00 lenovo-zh systemd[1]: Stopped Timer to wait for more drives before activating degraded array md126.. 2024-12-11T02:00:18.382301+03:00 lenovo-zh kernel: [ 490.551080][ T1809] EXT4-fs (md126): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
"fsck.ext4 -fn /dev/XXX >& /tmp/fsck.out"
This is a fsck run on a standalone copy taken before repair (after successful raid re-check):
#fsck.ext4 -fn /dev/sdb1 ext2fs_check_desc: Corrupt group descriptor: bad block for block bitmap fsck.ext4: Group descriptors look bad... trying backup blocks... Pass 1: Checking inodes, blocks, and sizes Inode 9185447 extent tree (at level 1) could be narrower. Optimize? no Inode 9189969 extent tree (at level 1) could be narrower. Optimize? no Inode 22054610 extent tree (at level 1) could be shorter. Optimize? no Inode 22959998 extent tree (at level 1) could be shorter. Optimize? no Inode 23351116 extent tree (at level 1) could be shorter. Optimize? no Inode 23354700 extent tree (at level 1) could be shorter. Optimize? no Inode 23363083 extent tree (at level 1) could be shorter. Optimize? no Inode 25197205 extent tree (at level 1) could be narrower. Optimize? no Inode 25197271 extent tree (at level 1) could be narrower. Optimize? no Inode 47710225 extent tree (at level 1) could be narrower. Optimize? no Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong for group #0 (23414, counted=22437). Fix? no Free blocks count wrong for group #1 (31644, counted=7). Fix? no Free blocks count wrong for group #2 (32768, counted=0). Fix? no Free blocks count wrong for group #3 (31644, counted=4). Fix? no
[repeated tons of times]
Free inodes count wrong for group #4895 (8192, counted=8044). Fix? no Directories count wrong for group #4895 (0, counted=148). Fix? no Free inodes count wrong for group #4896 (8192, counted=8114). Fix? no Directories count wrong for group #4896 (0, counted=13). Fix? no Free inodes count wrong for group #5824 (8192, counted=8008). Fix? no Directories count wrong for group #5824 (0, counted=31). Fix? no Free inodes count wrong (51634165, counted=50157635). Fix? no DATA: ********** WARNING: Filesystem still has errors ********** DATA: 11/51634176 files (73845.5% non-contiguous), 3292748/206513920 blocks
And because there are apparently 0 commits to ext4 in 5.15 since 5.15.168 at the moment, I thought I'd report.
Did you check for any changes to the md/dm code, or the block layer?
No. Generally, it could be just anything, therefore I see no point even starting without good background knowledge. That is why I'm trying to draw attention of those who are more aware instead. :-)
Also, if you checked for I/O errors in the system logs, or run "smartctl" on the block devices, please say so. (And if there are indications of I/O errors or storage device issues, please do immediate backups and make plans to replace your hardware before you
I have not found any indication of hardware errors at this point.
#grep -i err messages-20241212 | grep sda (nothing) #grep -i err messages-20241212 | grep nvme (nothing)
Some "smart" values are posted above. Nothing suspicious whatsoever.
Thank you!
Regards,
Nick
suffer more serious data loss.)
Finally, if you want more support than what volunteers in the upstream linux kernel community can provide, this is what paid support from companies like SuSE, or Red Hat, can provide.
Cheers,
- Ted