On Mon, May 26, 2025 at 06:20:11PM +0200, Jan Kara wrote:
So I would *love* to do this and was thinking about this as well. But the trouble is that this causes lock inversion as well because ->mmap callback is called with mmap_lock held and so we cannot acquire inode_lock here either.
Recent changes which switch from ->mmap to ->mmap_prepare callback are actually going in a suitable direction but we'd need a rather larger rewrite to get from under mmap_lock and I'm not sure that's justified.
This came up in discussions at this week's ext4 video conference, and afterwards, I started thinking. I suspect the best solution is one where we avoid using write_begin/write_end calls entirely. Instead, we allocate a single block using ext4_mb_new_block() (see what ext4_new_meta_blocks does for an example); then we write the contents into the newly allocated blocks; and if the allocate and the write are successful, only then do we start a jbd2 handle, to zero the i_blocks[] array and then set logical block 0 to point at the newly allocated data block, and then close the handle.
Splitting the block allocation from the "update the logical->physical mapping", which is currently done in ext4_map_blocks(), into two separate operations has been one of the things I had been wanting to do for a while, since it would allow us to replace the current dioread_nolock buffered write pqth, where we call ext4_map_blocks() to allocate the blocks but mark the blocks as uninitialized, and then we write the data blocks, and then we do a second ext4_map_blocks() call to clear the uninitialized flag.
So it's a bit more involved since it requires refactoring parts of ext4_map_blocks() to create a new function, ext4_set_blocks() which updates the extent tree or indirect block map using block(s) that were allocated separately. But this would be useful for things besides just the inline file conversion operation, so it's worth the effort.
Cheers,
- Ted