Re: [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap

8 Nov 2022

On Sat, Nov 05, 2022 at 08:32:08PM -0400, Theodore Ts'o wrote:
...
First of all, you replied to this patch a completely different patch,
"ext4: fix BUG_ON() when directory entry has invalid rec_len".  This
very much confuses b4, so please don't do that.  If you send a patch
series, where the message-id are related, e.g.:
20221011155623.14840-1-lhenriques@suse.de
20221011155623.14840-2-lhenriques@suse.de


etc., b4 will figure out what is going on.  But when the message id's
are unrelated, e.g:
20221011155623.14840-1-lhenriques@suse.de

vs    
    20221012131330.32456-1-lhenriques@suse.de
... b4 will assume that 20221012131330.32456-1-lhenriques@suse.de is a
newer version of 20221011155623.14840-1-lhenriques@suse.de and there
is apparently no way to tell it to not try to use the "newer" version
of the patch.
Yeah, I'm really sorry for this.  As I mentioned in a reply to that email,
I messed it up by running my scripts from shell history, without cleaning
the extra parameters.  Lesson learned -- *never* use shell history for
sending patches! :-(
...
On Tue, Oct 11, 2022 at 04:56:24PM +0100, Luís Henriques wrote:
...
It's possible to hit a NULL pointer exception while accessing the
sb->s_group_info in ext4_validate_inode_bitmap(), when calling
ext4_get_group_info().
...
...
This issue can be fixed by returning NULL in ext4_get_group_info() when
->s_group_info is NULL.  This also requires checking the return code from
ext4_get_group_info() when appropriate.
I don't believe this is a correct diagnosis of what is going on.  Did
you actually confirm the line numbers associated with the call stack?
Here's the line numbers:
$ ./scripts/faddr2line fs/ext4/ialloc.o ext4_read_inode_bitmap+0x21b/0x5a0
ext4_read_inode_bitmap+0x21b/0x5a0:
ext4_get_group_info at /home/miguel/kernel/linux/fs/ext4/ext4.h:3332
(inlined by) ext4_validate_inode_bitmap at /home/miguel/kernel/linux/fs/ext4/ialloc.c:90
(inlined by) ext4_read_inode_bitmap at /home/miguel/kernel/linux/fs/ext4/ialloc.c:210
This is on a 6.1.0-rc4 kernel, where I got:
RIP: 0010:ext4_read_inode_bitmap+0x21b/0x5a0
So, the issue is happening in ext4_read_inode_bitmap(), when
jumping to the 'verify' label from here:
184         if (buffer_uptodate(bh)) {
    185                 /*
    186                  * if not uninit if bh is uptodate,
    187                  * bitmap is also uptodate
    188                  */
    189                 set_bitmap_uptodate(bh);
    190                 unlock_buffer(bh);
    191                 goto verify;
    192         }
    ...
    209 verify:
==> 210         err = ext4_validate_inode_bitmap(sb, desc, block_group, bh);
    211         if (err)
    212                 goto out;
    213         return bh;
    214 out:
    215         put_bh(bh);
    216         return ERR_PTR(err);
    217 }
...
What makes you believe that?  Look at how s_group_info is initialized
in ext4_mb_alloc_groupinfo() in fs/ext4/mballoc.c.  It's pretty
careful to make sure this is not the case.
Right.  I may be missing something, but I don't think we get that far.
__ext4_fill_super() will first call ext4_setup_system_zone() (which is
where this bug occurs) and only after that ext4_mb_init() will be invoked
(which is where ext4_mb_alloc_groupinfo() will eventually be called).
...
...
EXT4-fs (loop0): warning: mounting unchecked fs, running e2fsck is recommended
 EXT4-fs error (device loop0): ext4_clear_blocks:866: inode #32: comm mount: attempt to clear invalid blocks 16777450 len 1
 EXT4-fs error (device loop0): ext4_free_branches:1012: inode #32: comm mount: invalid indirect mapped block 1258291200 (level 1)
 EXT4-fs error (device loop0): ext4_free_branches:1012: inode #32: comm mount: invalid indirect mapped block 7379847 (level 2)
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 ...
 RIP: 0010:ext4_read_inode_bitmap+0x21b/0x5a0
 ...
 Call Trace:
  <TASK>
  ext4_free_inode+0x172/0x5c0
  ext4_evict_inode+0x4a5/0x730
  evict+0xc1/0x1c0
  ext4_setup_system_zone+0x2ea/0x380
  ext4_fill_super+0x249f/0x3910
  ? ext4_reconfigure+0x880/0x880
  ? snprintf+0x49/0x60
  ? ext4_reconfigure+0x880/0x880
  get_tree_bdev+0x169/0x260
  vfs_get_tree+0x16/0x70
  path_mount+0x77d/0xa30
  __x64_sys_mount+0x101/0x140
  do_syscall_64+0x3c/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0
So we're evicting an inode while in the middle of calling
ext4_setup_system_zone() in fs/ext4/block_validity.c.  That can only
happen if we are calling iput() on an an inode, and the only place
that we do that in block_validity.c is in the function
ext4_protect_reserved_inode() --- which we call on the journal inode.
Given the error messages, I suspect this was a fuzzed file system
where the journal inode was not in the standard reserved ino, but
rather in a the normal inode number, in s_journal_inum (which is a
leftover relic from the very early ext3 days), and that inode number
was then explicitly/maliciously placed on the orphan list, and then
hilarity ensued from there.
Correct, the images do indeed have the wrong inode number (32) in
s_journal_inum.
...
We need to add some better error checking to protect against this case
in ext4_orphan_get().
Unfortunately, after some debug, I don't see ext4_orphan_get() ever being
invoked anywhere.
...
Do you have the file system image which triggered this failure?  Was
it the same syzkaller report, or perhaps was it some other syzkaller
report?
Yes, these were generated with a fuzzer, and the 2 images I've used as
reproducers were picked from the bugzillas in the commit 'Link' tags:
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216541
  Link: https://bugzilla.kernel.org/show_bug.cgi?id=216539
To reproduce the issue you simply need to mount those images.
...
...

diff --git a/fs/ext4/indirect.c b/fs/ext4/indirect.c
index 860fc5119009..e5ac5c2363df 100644
--- a/fs/ext4/indirect.c
+++ b/fs/ext4/indirect.c
@@ -148,6 +148,7 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
   struct super_block *sb = inode->i_sb;
   Indirect *p = chain;
   struct buffer_head *bh;

unsigned int key;
int ret = -EIO;

*err = 0;
@@ -156,9 +157,18 @@ static Indirect *ext4_get_branch(struct inode *inode, int depth,
   if (!p->key)
   	goto no_block;
   while (--depth) {

bh = sb_getblk(sb, le32_to_cpu(p->key));




key = le32_to_cpu(p->key);


bh = sb_getblk(sb, key);

if (unlikely(!bh)) {


	ret = -ENOMEM;




	/*


	 * sb_getblk() masks different errors by always


	 * returning NULL.  Let's distinguish at least the case


	 * where the block is out of range.


	 */


	if (key > ext4_blocks_count(EXT4_SB(sb)->s_es))


		ret = -EFSCORRUPTED;


	else


		ret = -ENOMEM;
goto failure;

}

And this is fixing a completely different problem and should go in a
different patch.  It's also not the best way of fixing it.  What we
should do is check whether key is out of bounds *before* calling
sb_getblkf(), and then call ext4_error() to mark the file system is
corrupted, and then return -EFSCORRUPTED.
OK, makes sense.  I'll send out a separate patch for this.  Thanks a lot
for your review, Ted.
Cheers,
--
Luís

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v2] ext4: fix a NULL pointer when validating an inode bitmap