Commit 91b587ba79e1 ("f2fs: Introduce linear search for dentries") This is a follow up to a patch that previously merged in stable to restore access to files created with emojis under a different hashing scheme.
I believe it's relevant for all currently supported stable branches.
-Daniel
On Fri, Jan 31, 2025 at 01:49:16PM -0800, Daniel Rosenberg wrote:
Commit 91b587ba79e1 ("f2fs: Introduce linear search for dentries") This is a follow up to a patch that previously merged in stable to restore access to files created with emojis under a different hashing scheme.
I believe it's relevant for all currently supported stable branches.
As the original commit that this says it fixes was reverted, should that also be brought back everywhere also? Or did that happen already and I missed that?
thanks,
greg k-h
On Sat, Feb 1, 2025 at 12:29 AM Greg KH gregkh@linuxfoundation.org wrote:
On Fri, Jan 31, 2025 at 01:49:16PM -0800, Daniel Rosenberg wrote:
Commit 91b587ba79e1 ("f2fs: Introduce linear search for dentries") This is a follow up to a patch that previously merged in stable to restore access to files created with emojis under a different hashing scheme.
I believe it's relevant for all currently supported stable branches.
As the original commit that this says it fixes was reverted, should that also be brought back everywhere also? Or did that happen already and I missed that?
Before we can bring back the reverted patch, we need the same fix for ext4. Daniel, is there progress on that?
thanks,
greg k-h
To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
On Sat, Feb 1, 2025 at 12:29 AM Greg KH gregkh@linuxfoundation.org wrote:
As the original commit that this says it fixes was reverted, should that also be brought back everywhere also? Or did that happen already and I missed that?
thanks,
greg k-h
The revert of the unicode patch is in all of the stable branches already. That f2fs patch is technically a fix for the revert as well, since the existence of either of those is a problem for the same reason :/
On Sat, Feb 1, 2025 at 9:06 AM Todd Kjos tkjos@google.com wrote:
Before we can bring back the reverted patch, we need the same fix for ext4. Daniel, is there progress on that?
Last I knew, Ted had a prototype patch for that, not sure what the current status of it is. I'm also not sure whether the unicode patch is being relanded, or if there's a different fix in the works there.
-Daniel
On Mon, Feb 03, 2025 at 03:07:10PM -0800, Daniel Rosenberg wrote:
On Sat, Feb 1, 2025 at 12:29 AM Greg KH gregkh@linuxfoundation.org wrote:
As the original commit that this says it fixes was reverted, should that also be brought back everywhere also? Or did that happen already and I missed that?
thanks,
greg k-h
The revert of the unicode patch is in all of the stable branches already. That f2fs patch is technically a fix for the revert as well, since the existence of either of those is a problem for the same reason :/
Ok, so that means that the original issue is still in mainline but now it is safe to bring the unicode patch back there and add it to the stable branches if we also take this one?
thanks,
greg k-h
On Tue, Feb 04, 2025 at 12:32:22PM +0100, Greg KH wrote:
On Mon, Feb 03, 2025 at 03:07:10PM -0800, Daniel Rosenberg wrote:
On Sat, Feb 1, 2025 at 12:29 AM Greg KH gregkh@linuxfoundation.org wrote:
As the original commit that this says it fixes was reverted, should that also be brought back everywhere also? Or did that happen already and I missed that?
thanks,
greg k-h
The revert of the unicode patch is in all of the stable branches already. That f2fs patch is technically a fix for the revert as well, since the existence of either of those is a problem for the same reason :/
Ok, so that means that the original issue is still in mainline but now it is safe to bring the unicode patch back there and add it to the stable branches if we also take this one?
It also does not apply cleanly to 5.4.y and 5.10.y so can we get backports for that sent here?
thanks,
greg k-h
On Tue, Feb 4, 2025 at 3:33 AM Greg KH gregkh@linuxfoundation.org wrote:
It also does not apply cleanly to 5.4.y and 5.10.y so can we get backports for that sent here?
thanks,
greg k-h
So, looking at that branch, the conflict involves the patch that introduces casefolding and encryption, which is the main case that breaks this. Without encrypted casefolding, fsck is able to correct the issue. So it looks like that patch is not needed on 5.4.y and 5.10.y
On Mon, Feb 03, 2025 at 03:07:10PM -0800, Daniel Rosenberg wrote:
On Sat, Feb 1, 2025 at 9:06 AM Todd Kjos tkjos@google.com wrote:
Before we can bring back the reverted patch, we need the same fix for ext4. Daniel, is there progress on that?
Last I knew, Ted had a prototype patch for that, not sure what the current status of it is. I'm also not sure whether the unicode patch is being relanded, or if there's a different fix in the works there.
Between travel and an emergency at work, I haven't had time to create the script to create a test file system to verify the prototype patch. It turns out this is quite diffisult!
I finally managed to create a script which demonstrates why the revert was necessary, but it wasn't enough to demonstrate why a further patch is needed. I think I know what I need to do, but it's a mess and I've already wasted hours and hours in this.
Do you have a relible script to generate a test file system. This is what I have so far, but as I said, it's not quite good enough...
- Ted
On Mon, Feb 03, 2025 at 03:07:10PM -0800, Daniel Rosenberg wrote:
The revert of the unicode patch is in all of the stable branches already. That f2fs patch is technically a fix for the revert as well, since the existence of either of those is a problem for the same reason :/
On Sat, Feb 1, 2025 at 9:06 AM Todd Kjos tkjos@google.com wrote:
Before we can bring back the reverted patch, we need the same fix for ext4. Daniel, is there progress on that?
So I have a working fix for ext4, now but it's going to be a lot more complicated if we want to bring back the reverted patch. That's because both e2fsprogs and f2fs-tools needs to be able to calculate the hash used by the directories, and so fsck.ext4 and fsck.f2fs will get confused if they run across file systems with file names which were inserted while the reverted patch was in force.
I confirmed this was applicable for both ext4 and f2fs by modifying my unicode-hijinks script to generate an f2fs image, and then running fsck.f2fs on the image:
% /sbin/fsck.f2fs /tmp/foo.img Info: MKFS version "Linux version 6.14.0-rc1-xfstests-00013-g30a8509ae0bb-dirty (tytso@cwcc) (gcc (Debian 14.2.0-8) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.50.20241210) #456 SMP PREEMPT_DYNAMIC Fri Feb 7 01:18:48 EST 2025" Info: FSCK version ... [FIX] (f2fs_check_hash_code:1471) --> Mismatch hash_code for "❤️" [9a2ea068:19dd7132] [FIX] (f2fs_check_hash_code:1471) --> Mismatch hash_code for "❤️" [9a2ea068:19dd7132]
And of course, this happens with ext4 as well:
Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Problem in HTREE directory inode 4048: block #18 has bad max hash Invalid HTREE directory inode 4048 (/I/no-E/H/red). Clear HTree index? yes
Now, I'm not sure how much it's important to bring back the reverted patch. Yes, I know it's claimed that it fixes a "security issue", but in my opinion, it's pretty bullshit worry. First, almost no one uses the case folded feature other than Android, and second, do you *really* think someone will really be trying to run git under Termux on their Pixel 9 Pro Fold? I mean.... I guess; I do have Termux installed on my P9PF, but even I'm not crazy enough to try install git, emacs, gcc, etc., on an Android phone and expect to get aything useful done. Using ssh, or mosh, with Termux, sure. But git? Not convinced....
Anyway, if we *do* want bring back the reverted patch, it would need to be reworked so that there is a bit in the encoding flags which indicates how we are treating Unicode "ignorable" characters, so that e2fsprogs and f2fs-tools can do the right thing. Once the kernel can handle things with and without ignorable characters, on a switchable basis based on a bit in the superblock, then we wouldn't need to use the linear fallback hack, with the attendant performance penalty.
But honestly, I'm not sure it worth it. But if someone sends me a patch which handles the switchable unicode casefold, I'm willing to spend time to get this integrated into e2fsprogs.
Cheers,
- Ted
P.S. This has only been tested using my a file system image created using my unicode-hijinks script, but it hasn't gone through a full set of regression tests yet. But this it is doing the right thing at least as far as the Unicode case folding is concerned.
From 78499980bfa243a129a2a72f037d35147d3cf363 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o tytso@mit.edu Date: Fri, 7 Feb 2025 23:08:02 -0500 Subject: [PATCH] ext4: introduce linear search for dentries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points").
There are good reasons why this change should be made; it's actually quite stupid that Unicode seems to think that the characters ❤ and ❤️ should be casefolded. Unfortimately because of the backwards compatibility issue, this commit was reverted in 231825b2e1ff.
This problem is addressed by instituting a brute-force linear fallback if a lookup fails on case-folded directory, which does result in a performance hit when looking up files affected by the changing how thekernel treats ignorable Uniode characters, or when attempting to look up non-existent file names. So this fallback can be disabled by setting an encoding flag if in the future, the system administrator or the manufacturer of a mobile handset or tablet can be sure that there was no opportunity for a kernel to insert file names with incompatible encodings.
Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Signed-off-by: Theodore Ts'o tytso@mit.edu --- fs/ext4/namei.c | 14 ++++++++++---- include/linux/fs.h | 6 +++++- 2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 536d56d15072..820e7ab7f3a3 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent, * sure cf_name was properly initialized before * considering the calculated hash. */ - if (IS_ENCRYPTED(parent) && fname->cf_name.name && + if (sb_no_casefold_compat_fallback(parent->i_sb) && + IS_ENCRYPTED(parent) && fname->cf_name.name && (fname->hinfo.hash != EXT4_DIRENT_HASH(de) || fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de))) return false; @@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, * return. Otherwise, fall back to doing a search the * old fashioned way. */ - if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR) + if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR) + dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " + "falling back\n")); + else if (!sb_no_casefold_compat_fallback(dir->i_sb) && + *res_dir == NULL && IS_CASEFOLDED(dir)) + dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold " + "failed, falling back\n")); + else goto cleanup_and_exit; - dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " - "falling back\n")); ret = NULL; } nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); diff --git a/include/linux/fs.h b/include/linux/fs.h index be3ad155ec9f..b50ba230f1d4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1246,11 +1246,15 @@ extern int send_sigurg(struct file *file); #define SB_NOUSER BIT(31)
/* These flags relate to encoding and casefolding */ -#define SB_ENC_STRICT_MODE_FL (1 << 0) +#define SB_ENC_STRICT_MODE_FL (1 << 0) +#define SB_ENC_NO_COMPAT_FALLBACK_FL (1 << 1)
#define sb_has_strict_encoding(sb) \ (sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL)
+#define sb_no_casefold_compat_fallback(sb) \ + (sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL) + /* * Umount options */
"Theodore Ts'o" tytso@mit.edu writes:
Now, I'm not sure how much it's important to bring back the reverted patch. Yes, I know it's claimed that it fixes a "security issue", but in my opinion, it's pretty bullshit worry. First, almost no one uses the case folded feature other than Android, and second, do you *really* think someone will really be trying to run git under Termux on their Pixel 9 Pro Fold? I mean.... I guess; I do have Termux installed on my P9PF, but even I'm not crazy enough to try install git, emacs, gcc, etc., on an Android phone and expect to get aything useful done. Using ssh, or mosh, with Termux, sure. But git? Not convinced....
Anyway, if we *do* want bring back the reverted patch, it would need to be reworked so that there is a bit in the encoding flags which indicates how we are treating Unicode "ignorable" characters, so that e2fsprogs and f2fs-tools can do the right thing. Once the kernel can handle things with and without ignorable characters, on a switchable basis based on a bit in the superblock, then we wouldn't need to use the linear fallback hack, with the attendant performance penalty.
But honestly, I'm not sure it worth it. But if someone sends me a patch which handles the switchable unicode casefold, I'm willing to spend time to get this integrated into e2fsprogs.
What I think would be a correct approach for commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") is to fold *some* code points: zero-length characters like ZWSP are folded as they should be, but we limit the list to not normalize those characters that make some sense, like the Variant Selectors. This would be similar to what APFS seems to do. This would be complex, but the user-visible semantics would be slightly more sane. It should be done with caution, with a bit marking this change and preserving the current unicode database, to prevent further breakage. But given the damage this apparent simple patch has caused already, I myself won't pursue that without a real security motivation.
Thanks for the linear search patches. Not great, but it solves the current situation. For your ext4 patch:
Reviewed-by: Gabriel Krisman Bertazi krisman@suse.de
linux-stable-mirror@lists.linaro.org