On 11/1/21 7:31 PM, Jan Kara wrote:
On Thu 28-10-21 15:09:08, Joseph Qi wrote:
Hi Jan,
On 10/25/21 11:13 PM, Jan Kara wrote:
ocfs2_truncate_file() did unmap invalidate page cache pages before zeroing partial tail cluster and setting i_size. Thus some pages could be left (and likely have left if the cluster zeroing happened) in the page cache beyond i_size after truncate finished letting user possibly see stale data once the file was extended again. Also the tail cluster
I don't quite understand the case. truncate_inode_pages() will truncate pages from new_i_size to i_size, and the following ocfs2_orphan_for_truncate() will zero range and then update i_size for inode as well as dinode. So once truncate finished, how stale data exposing happens? Or do you mean a race case between the above two steps?
Sorry, I was not quite accurate in the above paragraph. There are several ways how stale pages in the pagecache can cause problems.
- Because i_size is reduced after truncating page cache, page fault can
happen after truncating page cache and zeroing pages but before reducing i_size. This will in allow user to arbitrarily modify pages that are used for writing zeroes into the cluster tail and after file extension these data will become visible.
- The tail cluster zeroing in ocfs2_orphan_for_truncate() can actually try
to write zeroed pages above i_size (e.g. if we have 4k blocksize, 64k clustersize, and do truncate(f, 4k) on a 4k file). This will cause exactly same problems as already described in commit 5314454ea3f "ocfs2: fix data corruption after conversion from inline format".
Hope it is clearer now.
So the core reason is ocfs2_zero_range_for_truncate() grabs pages and then zero, right? I think an alternative way is using zeroout instead of zero pages, which won't grab pages again. Anyway, I'm also fine with your way since it is simple.
Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com