On Mon, Dec 16, 2024 at 03:16:00PM +0000, David Laight wrote:
....
The location of block allocation bitmaps never gets changed, so this sort of thing only happens due to hardware-induced corruption.
Well, unless e.g. some modified sectors start being flushed to random wrong offsets, like in [1] above, or something similar.
Well in the bug that you referenced in [1], what was happening was that data could get written to the wrong offset in the file under certain race conditions. This would not be the case of data block getting written over some metadata block like the block group descriptors.
Sectors getting written to the wrong LBA's do happen; there's a reason why enterprise databases include a checksum in every 4k database block. But the root cause of that generally tends to be a bit getting flipped in the LBA number when it is being sent from the CPU to the Controller to the storage device. It's rare, but when it does happen, it is more often than not hardware-induced --- and again, one of those things where RAID won't necessarily save you.
Or cutting the power in the middle of SSD 'wear levelling'.
I've seen a completely trashed disk (sectors in completely the wrong places) after an unexpected power cut.
Sure, but that falls in the category of hardware-induced corruption. There have been non-power-fail certified SSD which have their flash translation metadata so badly corrupted that you lose everything (there's a reason why professional photographers use dual SDcard slots, and some may use duct tape to make sure the battery access door won't fly open if their camera gets dropped).
- Ted