Re: [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap

6 Nov 2022

First of all, you replied to this patch a completely different patch,
"ext4: fix BUG_ON() when directory entry has invalid rec_len".  This
very much confuses b4, so please don't do that.  If you send a patch
series, where the message-id are related, e.g.:
20221011155623.14840-1-lhenriques@suse.de
    20221011155623.14840-2-lhenriques@suse.de
etc., b4 will figure out what is going on.  But when the message id's
are unrelated, e.g:
20221011155623.14840-1-lhenriques@suse.de
vs    
    20221012131330.32456-1-lhenriques@suse.de
... b4 will assume that 20221012131330.32456-1-lhenriques@suse.de is a
newer version of 20221011155623.14840-1-lhenriques@suse.de and there
is apparently no way to tell it to not try to use the "newer" version
of the patch.
On Tue, Oct 11, 2022 at 04:56:24PM +0100, Luís Henriques wrote:
...
It's possible to hit a NULL pointer exception while accessing the
sb->s_group_info in ext4_validate_inode_bitmap(), when calling
ext4_get_group_info().
...
...
This issue can be fixed by returning NULL in ext4_get_group_info() when
->s_group_info is NULL.  This also requires checking the return code from
ext4_get_group_info() when appropriate.
I don't believe this is a correct diagnosis of what is going on.  Did
you actually confirm the line numbers associated with the call stack?
What makes you believe that?  Look at how s_group_info is initialized
in ext4_mb_alloc_groupinfo() in fs/ext4/mballoc.c.  It's pretty
careful to make sure this is not the case.
...
EXT4-fs (loop0): warning: mounting unchecked fs, running e2fsck is recommended
 EXT4-fs error (device loop0): ext4_clear_blocks:866: inode #32: comm mount: attempt to clear invalid blocks 16777450 len 1
 EXT4-fs error (device loop0): ext4_free_branches:1012: inode #32: comm mount: invalid indirect mapped block 1258291200 (level 1)
 EXT4-fs error (device loop0): ext4_free_branches:1012: inode #32: comm mount: invalid indirect mapped block 7379847 (level 2)
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:ext4_read_inode_bitmap+0x21b/0x5a0
 ...
 Call Trace:
  <TASK>
  ext4_free_inode+0x172/0x5c0
  ext4_evict_inode+0x4a5/0x730
  evict+0xc1/0x1c0
  ext4_setup_system_zone+0x2ea/0x380
  ext4_fill_super+0x249f/0x3910
  ? ext4_reconfigure+0x880/0x880
  ? snprintf+0x49/0x60
  ? ext4_reconfigure+0x880/0x880
  get_tree_bdev+0x169/0x260
  vfs_get_tree+0x16/0x70
  path_mount+0x77d/0xa30
  __x64_sys_mount+0x101/0x140
  do_syscall_64+0x3c/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
So we're evicting an inode while in the middle of calling
ext4_setup_system_zone() in fs/ext4/block_validity.c.  That can only
happen if we are calling iput() on an an inode, and the only place
that we do that in block_validity.c is in the function
ext4_protect_reserved_inode() --- which we call on the journal inode.
Given the error messages, I suspect this was a fuzzed file system
where the journal inode was not in the standard reserved ino, but
rather in a the normal inode number, in s_journal_inum (which is a
leftover relic from the very early ext3 days), and that inode number
was then explicitly/maliciously placed on the orphan list, and then
hilarity ensued from there.
We need to add some better error checking to protect against this case
in ext4_orphan_get().
Do you have the file system image which triggered this failure?  Was
it the same syzkaller report, or perhaps was it some other syzkaller
report?
...

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 860fc5119009..e5ac5c2363df 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -148,6 +148,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
   struct super_block *sb = inode->i_sb;
   Indirect *p = chain;
   struct buffer_head *bh;

unsigned int key;
int ret = -EIO;

*err = 0;
@@ -156,9 +157,18 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
   if (!p->key)
   	goto no_block;
   while (--depth) {

bh = sb_getblk(sb, le32_to_cpu(p->key));




key = le32_to_cpu(p->key);


bh = sb_getblk(sb, key);

if (unlikely(!bh)) {


	ret = -ENOMEM;




	/*


	 * sb_getblk() masks different errors by always


	 * returning NULL.  Let's distinguish at least the case


	 * where the block is out of range.


	 */


	if (key > ext4_blocks_count(EXT4_SB(sb)->s_es))


		ret = -EFSCORRUPTED;


	else


		ret = -ENOMEM;
goto failure;

}

And this is fixing a completely different problem and should go in a
different patch.  It's also not the best way of fixing it.  What we
should do is check whether key is out of bounds *before* calling
sb_getblkf(), and then call ext4_error() to mark the file system is
corrupted, and then return -EFSCORRUPTED.
Cheers,
- Ted

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap