On Thu, Dec 08, 2022 at 10:15:23AM +0100, Jan Kara wrote:
Furthermore, the fix which Jan provided, and which apparently fixes the user's problem, (a) doesn't touch the ext4_bmap function, and (b) has a fixes tag for the patch:
Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries")
... which is a commit which dates back to 2016, and the v4.6 kernel. ?!?
Yes. AFAICT the bitfield race in mbcache was introduced in this commit but somehow ext4 was using mbcache in a way that wasn't tripping the race. After 65f8b80053 ("ext4: fix race when reusing xattr blocks"), the race became much more likely and users started to notice...
Ah, OK. And 65f8b80053 landed in 6.0, so while the bug may have been around for much longer, this change made it much more likely that folks would notice. That's the missing piece and why Microsoft started noticing this in their "Flatcar" container kernel.
So I'll update the commit description so that this is more clear, and then I can figure out how to tell the regression-bot that the regression should be tracked using commit 65f8b80053 instead of 51ae846cff5 ("ext4: fix warning in ext4_iomap_begin as race between bmap and write").
Cheers, and thanks for the clarification,
- Ted