On Mon, Nov 07, 2022 at 01:25:16PM -0800, Minchan Kim wrote:
following bug is trying to workaround an error on ppc64le, where zram01.sh LTP test (there is also kernel selftest tools/testing/selftests/zram/zram01.sh, but LTP test got further updates) has often mem_used_total 0 although zram is already filled.
Is it happening on only ppc64le?
I have managed to replicate this on an arm64 system. I frankly don't know what is so special about it -- it's a qemu guest and I'm not sure what exactly it's running ontop of.
Is it a new regression? What kernel version did you use?
I've replicated this on 4.18.0; obviously something more recent would be useful but I'm hesitant to destroy too much state in case it is something ...
Actually, mem_used_total indicates how many *physical memory* were currently used to keep original data size.
However, if the test data is repeated pattern of unsigned long (https://github.com/torvalds/linux/blob/master/drivers/block/zram/zram_drv.c#...) zram doesn't allocate the physical memory but just mark the unsigned long's value in meta area for decompression later.
To recap; this test [1] creates a zram device, makes a filesystem on it, and fills it with sequential 1k writes from /dev/zero via dd. The problem is that it sees the mem_used_total for the zram device as zero in the sysfs stats after the writes; this causes a divide by zero error in the script calculation.
An annoted extract:
zram01 3 TINFO: /sys/block/zram1/disksize = '26214400' zram01 3 TPASS: test succeeded zram01 4 TINFO: set memory limit to zram device(s) zram01 4 TINFO: /sys/block/zram1/mem_limit = '25M' zram01 4 TPASS: test succeeded zram01 5 TINFO: make vfat filesystem on /dev/zram1
at this point a cat of /sys/block/zram1/mm_stat shows 65536 527 65536 26214400 65536 0 0 0
zram01 5 TPASS: zram_makefs succeeded zram01 6 TINFO: mount /dev/zram1 zram01 6 TPASS: mount of zram device(s) succeeded zram01 7 TINFO: filling zram1 (it can take long time) zram01 7 TPASS: zram1 was filled with '25568' KB
at this point "ls -lh" shows the file total 25M -rwxr-xr-x. 1 root root 25M Aug 4 01:06 file
however, /sys/block/zram1/mm_stat shows 9502720 0 0 26214400 196608 145 0 0 the script reads this zero value and tries to calculate the compression ratio
./zram01.sh: line 145: 100 * 1024 * 25568 / 0: division by 0 (error token is "0")
If we do a "sync" then redisply the mm_stat after, we get 26214400 2842 65536 26214400 196608 399 0 0
I have managed to instrument this, and in the following
static ssize_t mm_stat_show(struct device *dev, struct device_attribute *attr, char *buf) { ... if (init_done(zram)) { mem_used = zs_get_total_pages(zram->mem_pool); pr_info("mm_stat_show: init done %p %lld\n", zram->mem_pool, mem_used); zs_pool_stats(zram->mem_pool, &pool_stats);
zs_get_total_pages(zram->mem_pool) is definitely zero, which is why the mm_stat is returning zero. i.e. zsmalloc really doesn't seem to have any pages recorded for that mem_pool ...
This doesn't seem to make sense; how can a device that has a file system on it not even have one page assigned to it in zram->mem_pool?
I *think* this has something to do with the de-deuplication as noted. If I stub out page_same_filled() to return false always, we see instead
zram01 7 TPASS: zram1 was filled with '25568' KB
< immediately after > 10223616 48516 131072 26214400 196608 0 0 0 < after sync > 26214400 126933 327680 26214400 327680 0 0 0
So I think this test still needs a sync to be sure that it's seeing the right values? It's probably expected that this takes some time to write everything out?
But is it possible that mem_used_total being zero is a bug -- possibly triggered by the de-dup path and the test writing the same thing in every block? Something like the first de-duped page also being thrown out?
-i
[1] https://github.com/linux-test-project/ltp/blob/8c201e55f684965df2ae5a13ff439...