On Mon, Feb 03, 2025 at 03:07:10PM -0800, Daniel Rosenberg wrote:
The revert of the unicode patch is in all of the stable branches already. That f2fs patch is technically a fix for the revert as well, since the existence of either of those is a problem for the same reason :/
On Sat, Feb 1, 2025 at 9:06 AM Todd Kjos tkjos@google.com wrote:
Before we can bring back the reverted patch, we need the same fix for ext4. Daniel, is there progress on that?
So I have a working fix for ext4, now but it's going to be a lot more complicated if we want to bring back the reverted patch. That's because both e2fsprogs and f2fs-tools needs to be able to calculate the hash used by the directories, and so fsck.ext4 and fsck.f2fs will get confused if they run across file systems with file names which were inserted while the reverted patch was in force.
I confirmed this was applicable for both ext4 and f2fs by modifying my unicode-hijinks script to generate an f2fs image, and then running fsck.f2fs on the image:
% /sbin/fsck.f2fs /tmp/foo.img Info: MKFS version "Linux version 6.14.0-rc1-xfstests-00013-g30a8509ae0bb-dirty (tytso@cwcc) (gcc (Debian 14.2.0-8) 14.2.0, GNU ld (GNU Binutils for Debian) 2.43.50.20241210) #456 SMP PREEMPT_DYNAMIC Fri Feb 7 01:18:48 EST 2025" Info: FSCK version ... [FIX] (f2fs_check_hash_code:1471) --> Mismatch hash_code for "❤️" [9a2ea068:19dd7132] [FIX] (f2fs_check_hash_code:1471) --> Mismatch hash_code for "❤️" [9a2ea068:19dd7132]
And of course, this happens with ext4 as well:
Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Problem in HTREE directory inode 4048: block #18 has bad max hash Invalid HTREE directory inode 4048 (/I/no-E/H/red). Clear HTree index? yes
Now, I'm not sure how much it's important to bring back the reverted patch. Yes, I know it's claimed that it fixes a "security issue", but in my opinion, it's pretty bullshit worry. First, almost no one uses the case folded feature other than Android, and second, do you *really* think someone will really be trying to run git under Termux on their Pixel 9 Pro Fold? I mean.... I guess; I do have Termux installed on my P9PF, but even I'm not crazy enough to try install git, emacs, gcc, etc., on an Android phone and expect to get aything useful done. Using ssh, or mosh, with Termux, sure. But git? Not convinced....
Anyway, if we *do* want bring back the reverted patch, it would need to be reworked so that there is a bit in the encoding flags which indicates how we are treating Unicode "ignorable" characters, so that e2fsprogs and f2fs-tools can do the right thing. Once the kernel can handle things with and without ignorable characters, on a switchable basis based on a bit in the superblock, then we wouldn't need to use the linear fallback hack, with the attendant performance penalty.
But honestly, I'm not sure it worth it. But if someone sends me a patch which handles the switchable unicode casefold, I'm willing to spend time to get this integrated into e2fsprogs.
Cheers,
- Ted
P.S. This has only been tested using my a file system image created using my unicode-hijinks script, but it hasn't gone through a full set of regression tests yet. But this it is doing the right thing at least as far as the Unicode case folding is concerned.
From 78499980bfa243a129a2a72f037d35147d3cf363 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o tytso@mit.edu Date: Fri, 7 Feb 2025 23:08:02 -0500 Subject: [PATCH] ext4: introduce linear search for dentries MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit
This patch addresses an issue where some files in case-insensitive directories become inaccessible due to changes in how the kernel function, utf8_casefold(), generates case-folded strings from the commit 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points").
There are good reasons why this change should be made; it's actually quite stupid that Unicode seems to think that the characters ❤ and ❤️ should be casefolded. Unfortimately because of the backwards compatibility issue, this commit was reverted in 231825b2e1ff.
This problem is addressed by instituting a brute-force linear fallback if a lookup fails on case-folded directory, which does result in a performance hit when looking up files affected by the changing how thekernel treats ignorable Uniode characters, or when attempting to look up non-existent file names. So this fallback can be disabled by setting an encoding flag if in the future, the system administrator or the manufacturer of a mobile handset or tablet can be sure that there was no opportunity for a kernel to insert file names with incompatible encodings.
Fixes: 5c26d2f1d3f5 ("unicode: Don't special case ignorable code points") Signed-off-by: Theodore Ts'o tytso@mit.edu --- fs/ext4/namei.c | 14 ++++++++++---- include/linux/fs.h | 6 +++++- 2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index 536d56d15072..820e7ab7f3a3 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1462,7 +1462,8 @@ static bool ext4_match(struct inode *parent, * sure cf_name was properly initialized before * considering the calculated hash. */ - if (IS_ENCRYPTED(parent) && fname->cf_name.name && + if (sb_no_casefold_compat_fallback(parent->i_sb) && + IS_ENCRYPTED(parent) && fname->cf_name.name && (fname->hinfo.hash != EXT4_DIRENT_HASH(de) || fname->hinfo.minor_hash != EXT4_DIRENT_MINOR_HASH(de))) return false; @@ -1595,10 +1596,15 @@ static struct buffer_head *__ext4_find_entry(struct inode *dir, * return. Otherwise, fall back to doing a search the * old fashioned way. */ - if (!IS_ERR(ret) || PTR_ERR(ret) != ERR_BAD_DX_DIR) + if (IS_ERR(ret) && PTR_ERR(ret) == ERR_BAD_DX_DIR) + dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " + "falling back\n")); + else if (!sb_no_casefold_compat_fallback(dir->i_sb) && + *res_dir == NULL && IS_CASEFOLDED(dir)) + dxtrace(printk(KERN_DEBUG "ext4_find_entry: casefold " + "failed, falling back\n")); + else goto cleanup_and_exit; - dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, " - "falling back\n")); ret = NULL; } nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); diff --git a/include/linux/fs.h b/include/linux/fs.h index be3ad155ec9f..b50ba230f1d4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1246,11 +1246,15 @@ extern int send_sigurg(struct file *file); #define SB_NOUSER BIT(31)
/* These flags relate to encoding and casefolding */ -#define SB_ENC_STRICT_MODE_FL (1 << 0) +#define SB_ENC_STRICT_MODE_FL (1 << 0) +#define SB_ENC_NO_COMPAT_FALLBACK_FL (1 << 1)
#define sb_has_strict_encoding(sb) \ (sb->s_encoding_flags & SB_ENC_STRICT_MODE_FL)
+#define sb_no_casefold_compat_fallback(sb) \ + (sb->s_encoding_flags & SB_ENC_NO_COMPAT_FALLBACK_FL) + /* * Umount options */