When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely.
The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead.
CC: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Reported-and-tested-by: Jeremi Piotrowski jpiotrowski@linux.microsoft.com Reported-by: Thilo Fromm t-lo@linux.microsoft.com Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microso... Signed-off-by: Jan Kara jack@suse.cz --- fs/ext4/xattr.c | 4 ++-- fs/mbcache.c | 14 ++++++++------ include/linux/mbcache.h | 9 +++++++-- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 800ce5cdb9d2..08043aa72cf1 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1281,7 +1281,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, ce = mb_cache_entry_get(ea_block_cache, hash, bh->b_blocknr); if (ce) { - ce->e_reusable = 1; + set_bit(MBE_REUSABLE_B, &ce->e_flags); mb_cache_entry_put(ea_block_cache, ce); } } @@ -2042,7 +2042,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, } BHDR(new_bh)->h_refcount = cpu_to_le32(ref); if (ref == EXT4_XATTR_REFCOUNT_MAX) - ce->e_reusable = 0; + clear_bit(MBE_REUSABLE_B, &ce->e_flags); ea_bdebug(new_bh, "reusing; refcount now=%d", ref); ext4_xattr_block_csum_set(inode, new_bh); diff --git a/fs/mbcache.c b/fs/mbcache.c index e272ad738faf..2a4b8b549e93 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -100,8 +100,9 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, atomic_set(&entry->e_refcnt, 2); entry->e_key = key; entry->e_value = value; - entry->e_reusable = reusable; - entry->e_referenced = 0; + entry->e_flags = 0; + if (reusable) + set_bit(MBE_REUSABLE_B, &entry->e_flags); head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { @@ -165,7 +166,8 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, while (node) { entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list); - if (entry->e_key == key && entry->e_reusable && + if (entry->e_key == key && + test_bit(MBE_REUSABLE_B, &entry->e_flags) && atomic_inc_not_zero(&entry->e_refcnt)) goto out; node = node->next; @@ -284,7 +286,7 @@ EXPORT_SYMBOL(mb_cache_entry_delete_or_get); void mb_cache_entry_touch(struct mb_cache *cache, struct mb_cache_entry *entry) { - entry->e_referenced = 1; + set_bit(MBE_REFERENCED_B, &entry->e_flags); } EXPORT_SYMBOL(mb_cache_entry_touch);
@@ -309,9 +311,9 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, entry = list_first_entry(&cache->c_list, struct mb_cache_entry, e_list); /* Drop initial hash reference if there is no user */ - if (entry->e_referenced || + if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { - entry->e_referenced = 0; + clear_bit(MBE_REFERENCED_B, &entry->e_flags); list_move_tail(&entry->e_list, &cache->c_list); continue; } diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 2da63fd7b98f..97e64184767d 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -10,6 +10,12 @@
struct mb_cache;
+/* Cache entry flags */ +enum { + MBE_REFERENCED_B = 0, + MBE_REUSABLE_B +}; + struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; @@ -26,8 +32,7 @@ struct mb_cache_entry { atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key; - u32 e_referenced:1; - u32 e_reusable:1; + unsigned long e_flags; /* User provided value - stable during lifetime of the entry */ u64 e_value; };
On Wed, Nov 23, 2022 at 08:39:50PM +0100, Jan Kara wrote:
When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely.
The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead.
CC: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Reported-and-tested-by: Jeremi Piotrowski jpiotrowski@linux.microsoft.com Reported-by: Thilo Fromm t-lo@linux.microsoft.com Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microso... Signed-off-by: Jan Kara jack@suse.cz
Hi Ted,
Could it be that you didn't see this email? We have users who are hitting this and are very eager to see this bugfix get merged and backported to stable.
Thanks, Jeremi
fs/ext4/xattr.c | 4 ++-- fs/mbcache.c | 14 ++++++++------ include/linux/mbcache.h | 9 +++++++-- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 800ce5cdb9d2..08043aa72cf1 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1281,7 +1281,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, ce = mb_cache_entry_get(ea_block_cache, hash, bh->b_blocknr); if (ce) {
ce->e_reusable = 1;
set_bit(MBE_REUSABLE_B, &ce->e_flags); mb_cache_entry_put(ea_block_cache, ce); } }
@@ -2042,7 +2042,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, } BHDR(new_bh)->h_refcount = cpu_to_le32(ref); if (ref == EXT4_XATTR_REFCOUNT_MAX)
ce->e_reusable = 0;
clear_bit(MBE_REUSABLE_B, &ce->e_flags); ea_bdebug(new_bh, "reusing; refcount now=%d", ref); ext4_xattr_block_csum_set(inode, new_bh);
diff --git a/fs/mbcache.c b/fs/mbcache.c index e272ad738faf..2a4b8b549e93 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -100,8 +100,9 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, atomic_set(&entry->e_refcnt, 2); entry->e_key = key; entry->e_value = value;
- entry->e_reusable = reusable;
- entry->e_referenced = 0;
- entry->e_flags = 0;
- if (reusable)
head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) {set_bit(MBE_REUSABLE_B, &entry->e_flags);
@@ -165,7 +166,8 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, while (node) { entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list);
if (entry->e_key == key && entry->e_reusable &&
if (entry->e_key == key &&
atomic_inc_not_zero(&entry->e_refcnt)) goto out; node = node->next;test_bit(MBE_REUSABLE_B, &entry->e_flags) &&
@@ -284,7 +286,7 @@ EXPORT_SYMBOL(mb_cache_entry_delete_or_get); void mb_cache_entry_touch(struct mb_cache *cache, struct mb_cache_entry *entry) {
- entry->e_referenced = 1;
- set_bit(MBE_REFERENCED_B, &entry->e_flags);
} EXPORT_SYMBOL(mb_cache_entry_touch); @@ -309,9 +311,9 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, entry = list_first_entry(&cache->c_list, struct mb_cache_entry, e_list); /* Drop initial hash reference if there is no user */
if (entry->e_referenced ||
atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) {if (test_bit(MBE_REFERENCED_B, &entry->e_flags) ||
entry->e_referenced = 0;
}clear_bit(MBE_REFERENCED_B, &entry->e_flags); list_move_tail(&entry->e_list, &cache->c_list); continue;
diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 2da63fd7b98f..97e64184767d 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -10,6 +10,12 @@ struct mb_cache; +/* Cache entry flags */ +enum {
- MBE_REFERENCED_B = 0,
- MBE_REUSABLE_B
+};
struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; @@ -26,8 +32,7 @@ struct mb_cache_entry { atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key;
- u32 e_referenced:1;
- u32 e_reusable:1;
- unsigned long e_flags; /* User provided value - stable during lifetime of the entry */ u64 e_value;
};
2.35.3
On 01.12.22 16:10, Jeremi Piotrowski wrote:
On Wed, Nov 23, 2022 at 08:39:50PM +0100, Jan Kara wrote:
When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely.
The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead.
CC: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Reported-and-tested-by: Jeremi Piotrowski jpiotrowski@linux.microsoft.com Reported-by: Thilo Fromm t-lo@linux.microsoft.com Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microso... Signed-off-by: Jan Kara jack@suse.cz
Could it be that you didn't see this email? We have users who are hitting this and are very eager to see this bugfix get merged and backported to stable.
Andreas, Ted, or any other trusted ext4 reviewer:
Jan's patch to fix the regression is now our 12 days out and afaics didn't make any progress (or did I miss something?). Is there are reason why or did it simply fall through the cracks? Just asking, because it would be good to finally get this resolved.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
fs/ext4/xattr.c | 4 ++-- fs/mbcache.c | 14 ++++++++------ include/linux/mbcache.h | 9 +++++++-- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 800ce5cdb9d2..08043aa72cf1 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1281,7 +1281,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, ce = mb_cache_entry_get(ea_block_cache, hash, bh->b_blocknr); if (ce) {
ce->e_reusable = 1;
set_bit(MBE_REUSABLE_B, &ce->e_flags); mb_cache_entry_put(ea_block_cache, ce); } }
@@ -2042,7 +2042,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, } BHDR(new_bh)->h_refcount = cpu_to_le32(ref); if (ref == EXT4_XATTR_REFCOUNT_MAX)
ce->e_reusable = 0;
clear_bit(MBE_REUSABLE_B, &ce->e_flags); ea_bdebug(new_bh, "reusing; refcount now=%d", ref); ext4_xattr_block_csum_set(inode, new_bh);
diff --git a/fs/mbcache.c b/fs/mbcache.c index e272ad738faf..2a4b8b549e93 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -100,8 +100,9 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, atomic_set(&entry->e_refcnt, 2); entry->e_key = key; entry->e_value = value;
- entry->e_reusable = reusable;
- entry->e_referenced = 0;
- entry->e_flags = 0;
- if (reusable)
head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) {set_bit(MBE_REUSABLE_B, &entry->e_flags);
@@ -165,7 +166,8 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, while (node) { entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list);
if (entry->e_key == key && entry->e_reusable &&
if (entry->e_key == key &&
atomic_inc_not_zero(&entry->e_refcnt)) goto out; node = node->next;test_bit(MBE_REUSABLE_B, &entry->e_flags) &&
@@ -284,7 +286,7 @@ EXPORT_SYMBOL(mb_cache_entry_delete_or_get); void mb_cache_entry_touch(struct mb_cache *cache, struct mb_cache_entry *entry) {
- entry->e_referenced = 1;
- set_bit(MBE_REFERENCED_B, &entry->e_flags);
} EXPORT_SYMBOL(mb_cache_entry_touch); @@ -309,9 +311,9 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, entry = list_first_entry(&cache->c_list, struct mb_cache_entry, e_list); /* Drop initial hash reference if there is no user */
if (entry->e_referenced ||
atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) {if (test_bit(MBE_REFERENCED_B, &entry->e_flags) ||
entry->e_referenced = 0;
}clear_bit(MBE_REFERENCED_B, &entry->e_flags); list_move_tail(&entry->e_list, &cache->c_list); continue;
diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 2da63fd7b98f..97e64184767d 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -10,6 +10,12 @@ struct mb_cache; +/* Cache entry flags */ +enum {
- MBE_REFERENCED_B = 0,
- MBE_REUSABLE_B
+};
struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; @@ -26,8 +32,7 @@ struct mb_cache_entry { atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key;
- u32 e_referenced:1;
- u32 e_reusable:1;
- unsigned long e_flags; /* User provided value - stable during lifetime of the entry */ u64 e_value;
};
2.35.3
On Wed, Nov 23, 2022 at 08:39:50PM +0100, Jan Kara wrote:
When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely.
The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead.
CC: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Reported-and-tested-by: Jeremi Piotrowski jpiotrowski@linux.microsoft.com Reported-by: Thilo Fromm t-lo@linux.microsoft.com Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microso... Signed-off-by: Jan Kara jack@suse.cz
Reviewed-by: Andreas Dilger adilger@dilger.ca
Cheers, Andreas
On Mon, Dec 05, 2022 at 04:41:49PM +0100, Thorsten Leemhuis wrote:
Jan's patch to fix the regression is now our 12 days out and afaics didn't make any progress (or did I miss something?). Is there are reason why or did it simply fall through the cracks? Just asking, because it would be good to finally get this resolved.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
This patch showed up right before the Thanksgiving holiday, and (b) it just missed Q/A cutoff for the the ext4 bugfix pull request which I sent to Linus right before I went on my Thanksgiving break.
Since Thanksgiving, I've been busy with the realities of corporate life --- end of year performance evaluations, preparing for 2023 roadmap reviews with management, etc. So the next pull request I was planning to send to Linus is when the merge window opens, and I'm currently processing patches and running Q/A to be ready for the opening of that merge window.
One thing which is completely unclear to me is how this relates to the claimed regression. I understand that Jeremi and Thilo have asserted that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix warning in ext4_iomap_begin as race between bmap and write") is not in their 5.15 product tree.
However, the stack traces point to a problem in the extended attribute code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5 only changes the ext4's bmap function --- which these days gets used for the FIBMAP ioctl and very little else.
Furthermore, the fix which Jan provided, and which apparently fixes the user's problem, (a) doesn't touch the ext4_bmap function, and (b) has a fixes tag for the patch:
Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
... which is a commit which dates back to 2016, and the v4.6 kernel. ?!?
So at this point, I have no idea whether or not this is a regression or not, but we'll get the fix to Linus soon.
Cheers,
- Ted
Hi Ted!
On Thu 08-12-22 00:55:55, Theodore Ts'o wrote:
One thing which is completely unclear to me is how this relates to the claimed regression. I understand that Jeremi and Thilo have asserted that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix warning in ext4_iomap_begin as race between bmap and write") is not in their 5.15 product tree.
IMHO the bisection was flawed and somehow the test of a revert (which guys claimed to have done) must have been lucky and didn't trip the race. This is not that hard to imagine because firstly, the commit 51ae846cff5 got included in the same stable kernel release as ext4 xattr changes (65f8b80053 ("ext4: fix race when reusing xattr blocks") and related mbcache changes) which likely made the race more likely. Secondly, the mbcache state corruption is not that easy to hit because you need an interaction of slab reclaim on mbcache entry with ext4 xattr code adding reference to xattr block and just hitting the reference limit.
However, the stack traces point to a problem in the extended attribute code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5 only changes the ext4's bmap function --- which these days gets used for the FIBMAP ioctl and very little else.
Furthermore, the fix which Jan provided, and which apparently fixes the user's problem, (a) doesn't touch the ext4_bmap function, and (b) has a fixes tag for the patch:
Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
... which is a commit which dates back to 2016, and the v4.6 kernel. ?!?
Yes. AFAICT the bitfield race in mbcache was introduced in this commit but somehow ext4 was using mbcache in a way that wasn't tripping the race. After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race became much more likely and users started to notice...
Honza
On Thu, Dec 08, 2022 at 10:15:23AM +0100, Jan Kara wrote:
Hi Ted!
On Thu 08-12-22 00:55:55, Theodore Ts'o wrote:
One thing which is completely unclear to me is how this relates to the claimed regression. I understand that Jeremi and Thilo have asserted that the hang goes away if a backport commit 51ae846cff5 ("ext4: fix warning in ext4_iomap_begin as race between bmap and write") is not in their 5.15 product tree.
IMHO the bisection was flawed and somehow the test of a revert (which guys claimed to have done) must have been lucky and didn't trip the race. This is not that hard to imagine because firstly, the commit 51ae846cff5 got included in the same stable kernel release as ext4 xattr changes (65f8b80053 ("ext4: fix race when reusing xattr blocks") and related mbcache changes) which likely made the race more likely. Secondly, the mbcache state corruption is not that easy to hit because you need an interaction of slab reclaim on mbcache entry with ext4 xattr code adding reference to xattr block and just hitting the reference limit.
Yeah, sorry about that, there was never a bisect that led to 51ae846cff5, it was just a guess and at that point we were unable to reproduce it ourselves so we just had information from a user stating that when they revert that commit in their own test build the issue doesn't occur.
Once we were able to personally reproduce the actual bisect led to 65f8b80053, which as Honza stated made sure that the corruption/inconsistency leads to a busy loop which is harder to miss.
However, the stack traces point to a problem in the extended attribute code, which has nothing to do with ext4_bmap(), and commit 51ae846cff5 only changes the ext4's bmap function --- which these days gets used for the FIBMAP ioctl and very little else.
Furthermore, the fix which Jan provided, and which apparently fixes the user's problem, (a) doesn't touch the ext4_bmap function, and (b) has a fixes tag for the patch:
Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
... which is a commit which dates back to 2016, and the v4.6 kernel. ?!?
Yes. AFAICT the bitfield race in mbcache was introduced in this commit but somehow ext4 was using mbcache in a way that wasn't tripping the race. After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race became much more likely and users started to notice...
Honza
-- Jan Kara jack@suse.com SUSE Labs, CR
On Thu, Dec 08, 2022 at 10:15:23AM +0100, Jan Kara wrote:
Furthermore, the fix which Jan provided, and which apparently fixes the user's problem, (a) doesn't touch the ext4_bmap function, and (b) has a fixes tag for the patch:
Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
... which is a commit which dates back to 2016, and the v4.6 kernel. ?!?
Yes. AFAICT the bitfield race in mbcache was introduced in this commit but somehow ext4 was using mbcache in a way that wasn't tripping the race. After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race became much more likely and users started to notice...
Ah, OK. And 65f8b80053 landed in 6.0, so while the bug may have been around for much longer, this change made it much more likely that folks would notice. That's the missing piece and why Microsoft started noticing this in their "Flatcar" container kernel.
So I'll update the commit description so that this is more clear, and then I can figure out how to tell the regression-bot that the regression should be tracked using commit 65f8b80053 instead of 51ae846cff5 ("ext4: fix warning in ext4_iomap_begin as race between bmap and write").
Cheers, and thanks for the clarification,
- Ted
On 08.12.22 16:39, Theodore Ts'o wrote:
On Thu, Dec 08, 2022 at 10:15:23AM +0100, Jan Kara wrote:
Furthermore, the fix which Jan provided, and which apparently fixes the user's problem, (a) doesn't touch the ext4_bmap function, and (b) has a fixes tag for the patch:
Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
... which is a commit which dates back to 2016, and the v4.6 kernel. ?!?
Yes. AFAICT the bitfield race in mbcache was introduced in this commit but somehow ext4 was using mbcache in a way that wasn't tripping the race. After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race became much more likely and users started to notice...
Ah, OK. And 65f8b80053 landed in 6.0, so while the bug may have been around for much longer, this change made it much more likely that folks would notice. That's the missing piece and why Microsoft started noticing this in their "Flatcar" container kernel.
Yeah, likely when 65f8b80053 was backported to 5.15.y in 1be97463696c
So I'll update the commit description so that this is more clear,
Thx for taking care of this, I'm glad this is on track now.
Maybe I should talk to Greg again to revert backported changes like 1be97463696c until fixes for them are ready.
and then I can figure out how to tell the regression-bot that the regression should be tracked using commit 65f8b80053 instead of 51ae846cff5 ("ext4: fix warning in ext4_iomap_begin as race between bmap and write").
FWIW, there is no strong need to, nobody looks at those details once the regression is fixed. But yeah, that might change over time, so let me take care of that:
#regzbot introduced: 65f8b80053
[normally things like that have to be done as a direct or indirect reply to the report, but regzbot knows (famos last words...) how to associate this command with the report, as the patch that started this thread linked to the report using a Link: tag].
Ciao, Thorsten
On Thu, Dec 08, 2022 at 06:16:02PM +0100, Thorsten Leemhuis wrote:
Maybe I should talk to Greg again to revert backported changes like 1be97463696c until fixes for them are ready.
The fix is in the ext4 git tree, and it's ready to be pushed to Linus when the merge window opens --- presumably, on Sunday.
So it's probably not worth it to revert the backported change, only to reapply immediately afterwards.
- Ted
On 09.12.22 07:12, Theodore Ts'o wrote:
On Thu, Dec 08, 2022 at 06:16:02PM +0100, Thorsten Leemhuis wrote:
Maybe I should talk to Greg again to revert backported changes like 1be97463696c until fixes for them are ready.
The fix is in the ext4 git tree, and it's ready to be pushed to Linus when the merge window opens --- presumably, on Sunday.
Thx!
So it's probably not worth it to revert the backported change, only to reapply immediately afterwards.
Definitely agreed, I was more taking in the general sense (sorry, should have been clearer), as it's not the first time some backport exposes existing problems that take a while to get analyzed and fixed in mainline. Which is just how it is sometimes, hence a revert and a reapply of that backport (once the fix is available) in stable/longterm sounds appropriate to me to prevent users from running into known problems.
Ciao, Thorsten
On Fri, Dec 09, 2022 at 07:31:03AM +0100, Thorsten Leemhuis wrote:
On 09.12.22 07:12, Theodore Ts'o wrote:
On Thu, Dec 08, 2022 at 06:16:02PM +0100, Thorsten Leemhuis wrote:
Maybe I should talk to Greg again to revert backported changes like 1be97463696c until fixes for them are ready.
The fix is in the ext4 git tree, and it's ready to be pushed to Linus when the merge window opens --- presumably, on Sunday.
Thx!
So it's probably not worth it to revert the backported change, only to reapply immediately afterwards.
Definitely agreed, I was more taking in the general sense (sorry, should have been clearer), as it's not the first time some backport exposes existing problems that take a while to get analyzed and fixed in mainline. Which is just how it is sometimes, hence a revert and a reapply of that backport (once the fix is available) in stable/longterm sounds appropriate to me to prevent users from running into known problems.
It's a balancing act: reverting a fix would mean that we reintroduce an issue that was previously fixed back to users. It's not always the right thing to do, and sometimes we won't.
linux-stable-mirror@lists.linaro.org