On Tue, 21 May 2019 at 15:08, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, May 21, 2019 at 02:58:58PM +0530, Naresh Kamboju wrote:
On Tue, 21 May 2019 at 14:30, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Mon, May 20, 2019 at 05:23:42PM -0500, Dan Rue wrote:
On Mon, May 20, 2019 at 02:13:06PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.45 release. There are 105 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:49 AM UTC. Anything received after that time might be too late.
We're seeing an ext4 issue previously reported at https://lore.kernel.org/lkml/20190514092054.GA6949@osiris.
[ 1916.032087] EXT4-fs error (device sda): ext4_find_extent:909: inode #8: comm jbd2/sda-8: pblk 121667583 bad header/extent: invalid extent entries - magic f30a, entries 8, max 340(340), depth 0(0) [ 1916.073840] jbd2_journal_bmap: journal block not found at offset 4455 on sda-8 [ 1916.081071] Aborting journal on device sda-8. [ 1916.348652] EXT4-fs error (device sda): ext4_journal_check_start:61: Detected aborted journal [ 1916.357222] EXT4-fs (sda): Remounting filesystem read-only
This is seen on 4.19-rc, 5.0-rc, mainline, and next. We don't have data for 5.1-rc yet, which is presumably also affected in this RC round.
We only see this on x86_64 and i386 devices - though our hardware setups vary so it could be coincidence.
I have to run out now, but I'll come back and work on a reproducer and bisection later tonight and tomorrow.
Here is an example test run; link goes to the spot in the ltp syscalls test where the disk goes into read-only mode. https://lkft.validation.linaro.org/scheduler/job/735468#L8081
Odd, I keep hearing rumors of ext4 issues right now, but nothing actually solid that I can point to. Any help you can provide here would be great.
git bisect helped me to land on this commit,
# git bisect bad e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 is the first bad commit commit e8fd3c9a5415f9199e3fc5279e0f1dfcc0a80ab2 Author: Theodore Ts'o tytso@mit.edu Date: Tue Apr 9 23:37:08 2019 -0400
ext4: protect journal inode's blocks using block_validity commit 345c0dbf3a30872d9b204db96b5857cd00808cae upstream. Add the blocks which belong to the journal inode to block_validity's system zone so attempts to deallocate or overwrite the journal due a corrupted file system where the journal blocks are also claimed by another inode. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202879 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
:040000 040000 b8b6ce2577d60c65021e5cc1c3a38b32e0cbb2ff 747c67b159b33e4e1da414b1d33567a5da9ae125 M fs
Ah, many thanks for this bisection.
Ted, any ideas here? Should I drop this from the stable trees, and you revert it from Linus's? Or something else?
Note, I do also have 170417c8c7bb ("ext4: fix block validity checks for journal inodes using indirect blocks") in the trees, which was supposed to fix the problem with this patch, am I missing another one as well?
FYI, I have applied fix patch 170417c8c7bb ("ext4: fix block validity checks for journal inodes using indirect blocks") but did not fix this problem.
(side note, it was mean not to mark 170417c8c7bb for stable, when the patch it was fixing was marked for stable, I'm lucky I caught it...)
This problem occurring on stable rc 4.19, 5.0, 5.1 branches and master branch of mainline and -next trees also.
- Naresh