On Sun, Jan 29, 2023 at 12:18:51PM +0000, Matthew Wilcox (Oracle) wrote:
Both f2fs and ext4 end up passing the ciphertext page to wbc_account_cgroup_owner(). At the moment, the ciphertext page appears to belong to no cgroup, so it is accounted to the root_mem_cgroup instead of whatever cgroup the original page was in.
It's hard to say how far back this is a bug. The crypto code shared between ext4 & f2fs was created in May 2015 with commit 0b81d0779072, but neither filesystem did anything with memcg_data before then. memcg writeback accounting was added to ext4 in July 2015 in commit 001e4a8775f6 and it wasn't added to f2fs until January 2018 (commit 578c647879f7).
I'm going with the ext4 commit since this is the first commit where there was a difference in behaviour between encrypted and unencrypted filesystems.
Fixes: 001e4a8775f6 ("ext4: implement cgroup writeback support") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org
fs/crypto/crypto.c | 3 +++ 1 file changed, 3 insertions(+)
What is the actual effect of this bug?
The bounce pages are short-lived, so surely it doesn't really matter what memory cgroup they get charged to?
I guess it's really more about the effect on cgroup writeback? And that's also the reason why this is a problem here but not e.g. in dm-crypt?
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c index e78be66bbf01..a4e76f96f291 100644 --- a/fs/crypto/crypto.c +++ b/fs/crypto/crypto.c @@ -205,6 +205,9 @@ struct page *fscrypt_encrypt_pagecache_blocks(struct page *page, } SetPagePrivate(ciphertext_page); set_page_private(ciphertext_page, (unsigned long)page); +#ifdef CONFIG_MEMCG
- ciphertext_page->memcg_data = page->memcg_data;
+#endif return ciphertext_page; }
Nothing outside mm/ and include/linux/memcontrol.h does anything with memcg_data directly. Are you sure this is the right thing to do here?
Also, this patch causes the following:
[ 16.192276] BUG: Bad page state in process kworker/u4:2 pfn:10798a [ 16.192919] page:00000000332f5565 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10798a [ 16.193848] memcg:ffff88810766c000 [ 16.194186] flags: 0x200000000000000(node=0|zone=2) [ 16.194642] raw: 0200000000000000 0000000000000000 dead000000000122 0000000000000000 [ 16.195356] raw: 0000000000000000 0000000000000000 00000000ffffffff ffff88810766c000 [ 16.196061] page dumped because: page still charged to cgroup [ 16.196599] CPU: 0 PID: 33 Comm: kworker/u4:2 Tainted: G T 6.2.0-rc5-00001-gf84eecbf5db1 #3 [ 16.197494] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Arch Linux 1.16.1-1-1 04/01/2014 [ 16.198343] Workqueue: ext4-rsv-conversion ext4_end_io_rsv_work [ 16.198899] Call Trace: [ 16.199143] <TASK> [ 16.199350] show_stack+0x47/0x56 [ 16.199670] dump_stack_lvl+0x55/0x72 [ 16.200019] dump_stack+0x14/0x18 [ 16.200345] bad_page.cold+0x5e/0x8a [ 16.200685] free_page_is_bad_report+0x61/0x70 [ 16.201111] free_pcp_prepare+0x13f/0x290 [ 16.201486] free_unref_page+0x27/0x1f0 [ 16.201848] __free_pages+0xa0/0xc0 [ 16.202186] mempool_free_pages+0xd/0x20 [ 16.202556] mempool_free+0x28/0x90 [ 16.202889] fscrypt_free_bounce_page+0x26/0x40 [ 16.203322] ext4_finish_bio+0x1ed/0x240 [ 16.203690] ext4_release_io_end+0x4a/0x100 [ 16.204088] ext4_end_io_rsv_work+0xa8/0x1b0 [ 16.204492] process_one_work+0x27f/0x580 [ 16.204874] worker_thread+0x5a/0x3d0 [ 16.205229] ? process_one_work+0x580/0x580 [ 16.205621] kthread+0x102/0x130 [ 16.205929] ? kthread_exit+0x30/0x30 [ 16.206280] ret_from_fork+0x1f/0x30 [ 16.206620] </TASK>