On Wed, Aug 27, 2025 at 12:34:44AM -0700, Christoph Hellwig wrote:
On Mon, Aug 25, 2025 at 08:34:14AM -0700, Darrick J. Wong wrote:
- case BLK_STS_NOSPC:
return -ENOSPC;
- case BLK_STS_OFFLINE:
return -ENODEV;
- default:
return -EIO;
Well as I pointed out earlier, one interesting "quality" of the current behavior is that online fsck captures the ENODATA and turns that into a metadata corruption report. I'd like to keep that behavior.
-EIO is just as much of a metadata corruption, so if you only catch ENODATA you're missing most of them.
Hrmm, well an EIO (or an ENODATA) coming from the block layer causes the scrub code to return to userspace with EIO, and xfs_scrub will complain about the IO error and exit.
It doesn't explicitly mark the data structure as corrupt, but scrub failing should be enough to conclude that the fs is corrupt.
I could patch the kernel to set the CORRUPT flag on the data structure and keep going, since the likelihood of random bit errors causing media errors is pretty high now that we have disks that store more than 1e15 bits.
if (bio->bi_status)
xfs_buf_ioerror(bp, blk_status_to_errno(bio->bi_status));
xfs_buf_ioerror(bp, xfs_buf_bio_status(bio));
I think you'd also want to wrap all the submit_bio_wait here too, right?
Hrm, only discard bios, log writes, and zonegc use that function. Maybe not? I think a failed log write takes down the system no matter what error code, nobody cares about failing discard, and I think zonegc write failures just lead to the gc ... aborting?
Yes. In Linux -EIO means an unrecoverable I/O error that the lower layers gave up retrying. Not much we can do about that.
<nod>
--D