DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled.
1) We just disallow loading and directory inodes with EXT4_INDEX_FL when DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
2) When we find out htree structure is corrupted (but the filesystem and the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more.
CC: stable@vger.kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Signed-off-by: Jan Kara jack@suse.cz --- fs/ext4/dir.c | 14 ++++++++------ fs/ext4/ext4.h | 5 ++++- fs/ext4/inode.c | 13 +++++++++++++ fs/ext4/namei.c | 7 +++++++ 4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 9f00fc0bf21d..cb9ea593b544 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -129,12 +129,14 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx) if (err != ERR_BAD_DX_DIR) { return err; } - /* - * We don't set the inode dirty flag since it's not - * critical that it get flushed back to the disk. - */ - ext4_clear_inode_flag(file_inode(file), - EXT4_INODE_INDEX); + /* Can we just clear INDEX flag to ignore htree information? */ + if (!ext4_has_metadata_csum(sb)) { + /* + * We don't set the inode dirty flag since it's not + * critical that it gets flushed back to the disk. + */ + ext4_clear_inode_flag(inode, EXT4_INODE_INDEX); + } }
if (ext4_has_inline_data(inode)) { diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index f8578caba40d..1fd6c1e2ce2a 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2482,8 +2482,11 @@ void ext4_insert_dentry(struct inode *inode, struct ext4_filename *fname); static inline void ext4_update_dx_flag(struct inode *inode) { - if (!ext4_has_feature_dir_index(inode->i_sb)) + if (!ext4_has_feature_dir_index(inode->i_sb)) { + /* ext4_iget() should have caught this... */ + WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb)); ext4_clear_inode_flag(inode, EXT4_INODE_INDEX); + } } static const unsigned char ext4_filetype_table[] = { DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 629a25d999f0..d33135308c1b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4615,6 +4615,19 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ret = -EFSCORRUPTED; goto bad_inode; } + /* + * If dir_index is not enabled but there's dir with INDEX flag set, + * we'd normally treat htree data as empty space. But with metadata + * checksumming that corrupts checksums so forbid that. + */ + if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) && + ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) { + ext4_error_inode(inode, function, line, 0, + "iget: Dir with htree data on filesystem " + "without dir_index feature."); + ret = -EFSCORRUPTED; + goto bad_inode; + } ei->i_disksize = inode->i_size; #ifdef CONFIG_QUOTA ei->i_reserved_quota = 0; diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 1cb42d940784..deb9f7a02976 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2207,6 +2207,13 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry, retval = ext4_dx_add_entry(handle, &fname, dir, inode); if (!retval || (retval != ERR_BAD_DX_DIR)) goto out; + /* Can we just ignore htree data? */ + if (ext4_has_metadata_csum(sb)) { + EXT4_ERROR_INODE(dir, + "Directory has corrupted htree index."); + retval = -EFSCORRUPTED; + goto out; + } ext4_clear_inode_flag(dir, EXT4_INODE_INDEX); dx_fallback++; ext4_mark_inode_dirty(handle, dir);
On Feb 5, 2020, at 10:30 AM, Jan Kara jack@suse.cz wrote:
DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled.
- We just disallow loading and directory inodes with EXT4_INDEX_FL when
s/and/any/ ?
DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
Wouldn't it be better to continue allowing the directory to be read, but not modified? Otherwise, essentially, metadata_csum is making the filesystem _less_ robust rather than making it more robust. We don't _need_ the htree index to do a lookup in the directory.
- When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more.
CC: stable@vger.kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Signed-off-by: Jan Kara jack@suse.cz
fs/ext4/dir.c | 14 ++++++++------ fs/ext4/ext4.h | 5 ++++- fs/ext4/inode.c | 13 +++++++++++++ fs/ext4/namei.c | 7 +++++++ 4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 629a25d999f0..d33135308c1b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4615,6 +4615,19 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ret = -EFSCORRUPTED; goto bad_inode; }
- /*
* If dir_index is not enabled but there's dir with INDEX flag set,
* we'd normally treat htree data as empty space. But with metadata
* checksumming that corrupts checksums so forbid that.
*/
- if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) &&
ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) {
ext4_error_inode(inode, function, line, 0,
"iget: Dir with htree data on filesystem "
"without dir_index feature.");
Kernel style suggests error strings should not be line wrapped at 80 columns.
Cheers, Andreas
On Wed 05-02-20 11:04:23, Andreas Dilger wrote:
On Feb 5, 2020, at 10:30 AM, Jan Kara jack@suse.cz wrote:
DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled.
- We just disallow loading and directory inodes with EXT4_INDEX_FL when
s/and/any/ ?
Yes, will fix.
DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
Wouldn't it be better to continue allowing the directory to be read, but not modified? Otherwise, essentially, metadata_csum is making the filesystem _less_ robust rather than making it more robust. We don't _need_ the htree index to do a lookup in the directory.
Hum, I was somewhat afraid it may be a bit fragile but thinking about it now, there aren't that many places that need to check. OK, I will try to do this and see how it looks.
- When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more.
CC: stable@vger.kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Signed-off-by: Jan Kara jack@suse.cz
fs/ext4/dir.c | 14 ++++++++------ fs/ext4/ext4.h | 5 ++++- fs/ext4/inode.c | 13 +++++++++++++ fs/ext4/namei.c | 7 +++++++ 4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 629a25d999f0..d33135308c1b 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4615,6 +4615,19 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ret = -EFSCORRUPTED; goto bad_inode; }
- /*
* If dir_index is not enabled but there's dir with INDEX flag set,
* we'd normally treat htree data as empty space. But with metadata
* checksumming that corrupts checksums so forbid that.
*/
- if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) &&
ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) {
ext4_error_inode(inode, function, line, 0,
"iget: Dir with htree data on filesystem "
"without dir_index feature.");
Kernel style suggests error strings should not be line wrapped at 80 columns.
OK. Will change. Thanks for review!
Honza
On Thu 06-02-20 08:49:44, Jan Kara wrote:
On Wed 05-02-20 11:04:23, Andreas Dilger wrote:
On Feb 5, 2020, at 10:30 AM, Jan Kara jack@suse.cz wrote:
DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
Wouldn't it be better to continue allowing the directory to be read, but not modified? Otherwise, essentially, metadata_csum is making the filesystem _less_ robust rather than making it more robust. We don't _need_ the htree index to do a lookup in the directory.
Hum, I was somewhat afraid it may be a bit fragile but thinking about it now, there aren't that many places that need to check. OK, I will try to do this and see how it looks.
When I actually implemented this and started testing, I found out why I didn't want to do it this way :) - the directories are not readable anyway because checksums are failing for block 0 (which used to be htree root node). And I'd rather not pile up further hacks in the kernel trying to work around this. As I wrote in the changelog, chances of anyone hitting this in practice are rather low and e2fsck can do the right thing...
Honza
Hi Jan,
On Wed, Feb 05, 2020 at 06:30:25PM +0100, Jan Kara wrote:
DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled.
- We just disallow loading and directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
- When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more.
CC: stable@vger.kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Signed-off-by: Jan Kara jack@suse.cz
fs/ext4/dir.c | 14 ++++++++------ fs/ext4/ext4.h | 5 ++++- fs/ext4/inode.c | 13 +++++++++++++ fs/ext4/namei.c | 7 +++++++ 4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c index 9f00fc0bf21d..cb9ea593b544 100644 --- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -129,12 +129,14 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx) if (err != ERR_BAD_DX_DIR) { return err; }
/*
* We don't set the inode dirty flag since it's not
* critical that it get flushed back to the disk.
*/
ext4_clear_inode_flag(file_inode(file),
EXT4_INODE_INDEX);
/* Can we just clear INDEX flag to ignore htree information? */
if (!ext4_has_metadata_csum(sb)) {
/*
* We don't set the inode dirty flag since it's not
* critical that it gets flushed back to the disk.
*/
ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
}}
if (ext4_has_inline_data(inode)) { diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index f8578caba40d..1fd6c1e2ce2a 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2482,8 +2482,11 @@ void ext4_insert_dentry(struct inode *inode, struct ext4_filename *fname); static inline void ext4_update_dx_flag(struct inode *inode) {
- if (!ext4_has_feature_dir_index(inode->i_sb))
- if (!ext4_has_feature_dir_index(inode->i_sb)) {
/* ext4_iget() should have caught this... */
ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb));
- }
}
This new WARN_ON_ONCE() gets triggered by the following commands:
mkfs.ext4 -O ^dir_index /dev/vdc mount /dev/vdc /vdc mkdir /vdc/dir
WARNING: CPU: 1 PID: 305 at fs/ext4/ext4.h:2700 add_dirent_to_buf+0x1d0/0x1e0 fs/ext4/namei.c:2039 CPU: 1 PID: 305 Comm: mkdir Not tainted 5.10.0-rc4 #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 RIP: 0010:ext4_update_dx_flag fs/ext4/ext4.h:2700 [inline] RIP: 0010:add_dirent_to_buf+0x1d0/0x1e0 fs/ext4/namei.c:2038 [...] Call Trace: ext4_add_entry+0x179/0x4d0 fs/ext4/namei.c:2248 ext4_mkdir+0x1c0/0x320 fs/ext4/namei.c:2814 vfs_mkdir+0xcc/0x130 fs/namei.c:3650 do_mkdirat+0x81/0x120 fs/namei.c:3673 __do_sys_mkdir fs/namei.c:3689 [inline]
What is intended here? metadata_csum && ^dir_index is a weird combination, but it's technically valid, right?
- Eric
Hi Eric!
On Mon 16-11-20 20:28:32, Eric Biggers wrote:
On Wed, Feb 05, 2020 at 06:30:25PM +0100, Jan Kara wrote:
DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled.
- We just disallow loading and directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
- When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more.
CC: stable@vger.kernel.org Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Signed-off-by: Jan Kara jack@suse.cz
fs/ext4/dir.c | 14 ++++++++------ fs/ext4/ext4.h | 5 ++++- fs/ext4/inode.c | 13 +++++++++++++ fs/ext4/namei.c | 7 +++++++ 4 files changed, 32 insertions(+), 7 deletions(-)
...
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index f8578caba40d..1fd6c1e2ce2a 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2482,8 +2482,11 @@ void ext4_insert_dentry(struct inode *inode, struct ext4_filename *fname); static inline void ext4_update_dx_flag(struct inode *inode) {
- if (!ext4_has_feature_dir_index(inode->i_sb))
- if (!ext4_has_feature_dir_index(inode->i_sb)) {
/* ext4_iget() should have caught this... */
ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb));
- }
}
This new WARN_ON_ONCE() gets triggered by the following commands:
mkfs.ext4 -O ^dir_index /dev/vdc mount /dev/vdc /vdc mkdir /vdc/dir
WARNING: CPU: 1 PID: 305 at fs/ext4/ext4.h:2700 add_dirent_to_buf+0x1d0/0x1e0 fs/ext4/namei.c:2039 CPU: 1 PID: 305 Comm: mkdir Not tainted 5.10.0-rc4 #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.14.0-1 04/01/2014 RIP: 0010:ext4_update_dx_flag fs/ext4/ext4.h:2700 [inline] RIP: 0010:add_dirent_to_buf+0x1d0/0x1e0 fs/ext4/namei.c:2038 [...] Call Trace: ext4_add_entry+0x179/0x4d0 fs/ext4/namei.c:2248 ext4_mkdir+0x1c0/0x320 fs/ext4/namei.c:2814 vfs_mkdir+0xcc/0x130 fs/namei.c:3650 do_mkdirat+0x81/0x120 fs/namei.c:3673 __do_sys_mkdir fs/namei.c:3689 [inline]
What is intended here? metadata_csum && ^dir_index is a weird combination, but it's technically valid, right?
Indeed the WARN_ON_ONCE() is wrong. It should also check that EXT4_INODE_INDEX is set. The idea of the warning is that when we just clear EXT4_INODE_INDEX flag, checksums will become invalid so generally that's not desirable... I'll send a fix. Thanks for report!
Honza
linux-stable-mirror@lists.linaro.org