On Mon, Jun 3, 2019 at 4:22 PM Jan Kara jack@suse.cz wrote:
Hole puching currently evicts pages from page cache and then goes on to remove blocks from the inode. This happens under both i_mmap_sem and i_rwsem held exclusively which provides appropriate serialization with racing page faults. However there is currently nothing that prevents ordinary read(2) from racing with the hole punch and instantiating page cache page after hole punching has evicted page cache but before it has removed blocks from the inode. This page cache page will be mapping soon to be freed block and that can lead to returning stale data to userspace or even filesystem corruption.
Fix the problem by protecting reads as well as readahead requests with i_mmap_sem.
So ->write_iter() does not take i_mmap_sem right? and therefore mixed randrw workload is not expected to regress heavily because of this change?
Did you test performance diff? Here [1] I posted results of fio test that did x5 worse in xfs vs. ext4, but I've seen much worse cases.
Thanks, Amir.
[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxhu=Qtme9RJ7uZXYXt0UE+=xD+OC4gQ9...