While the offline memory test obey ratio limit, the same test with error injection does not and tries to offline all the hotpluggable memory, spamming system logs with hundreds of thousands of dump_page() entries, slowing system down (to the point the test itself timeout and gets terminated) and excessive fs occupation:
... [ 9784.393354] page:c00c0000007d1b40 refcount:3 mapcount:0 mapping:c0000001fc03e950 index:0xe7b [ 9784.393355] def_blk_aops [ 9784.393356] flags: 0x3ffff800002062(referenced|active|workingset|private) [ 9784.393358] raw: 003ffff800002062 c0000001b9343a68 c0000001b9343a68 c0000001fc03e950 [ 9784.393359] raw: 0000000000000e7b c000000006607b18 00000003ffffffff c00000000490d000 [ 9784.393359] page dumped because: migration failure [ 9784.393360] page->mem_cgroup:c00000000490d000 [ 9784.393416] migrating pfn 1f46d failed ret:1 ...
$ grep "page dumped because: migration failure" /var/log/kern.log | wc -l 2405558
$ ls -la /var/log/kern.log -rw-r----- 1 syslog adm 2256109539 Jun 30 14:19 /var/log/kern.log
Signed-off-by: Paolo Pisati paolo.pisati@canonical.com --- tools/testing/selftests/memory-hotplug/mem-on-off-test.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/memory-hotplug/mem-on-off-test.sh b/tools/testing/selftests/memory-hotplug/mem-on-off-test.sh index b37585e6aa38..46a97f318f58 100755 --- a/tools/testing/selftests/memory-hotplug/mem-on-off-test.sh +++ b/tools/testing/selftests/memory-hotplug/mem-on-off-test.sh @@ -282,7 +282,9 @@ done # echo $error > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_OFFLINE/error for memory in `hotpluggable_online_memory`; do - offline_memory_expect_fail $memory + if [ $((RANDOM % 100)) -lt $ratio ]; then + offline_memory_expect_fail $memory + fi done
echo 0 > $NOTIFIER_ERR_INJECT_DIR/actions/MEM_GOING_OFFLINE/error
On 30/06/2021 16:57, Paolo Pisati wrote:
While the offline memory test obey ratio limit, the same test with error injection does not and tries to offline all the hotpluggable memory, spamming system logs with hundreds of thousands of dump_page() entries, slowing system down (to the point the test itself timeout and gets terminated) and excessive fs occupation:
... [ 9784.393354] page:c00c0000007d1b40 refcount:3 mapcount:0 mapping:c0000001fc03e950 index:0xe7b [ 9784.393355] def_blk_aops [ 9784.393356] flags: 0x3ffff800002062(referenced|active|workingset|private) [ 9784.393358] raw: 003ffff800002062 c0000001b9343a68 c0000001b9343a68 c0000001fc03e950 [ 9784.393359] raw: 0000000000000e7b c000000006607b18 00000003ffffffff c00000000490d000 [ 9784.393359] page dumped because: migration failure [ 9784.393360] page->mem_cgroup:c00000000490d000 [ 9784.393416] migrating pfn 1f46d failed ret:1 ...
$ grep "page dumped because: migration failure" /var/log/kern.log | wc -l 2405558
$ ls -la /var/log/kern.log -rw-r----- 1 syslog adm 2256109539 Jun 30 14:19 /var/log/kern.log
Makes sense to me and looks better choice than to disable the test completely (as other choice...).
Acked-by: Krzysztof Kozlowski krzysztof.kozlowski@canonical.com
Signed-off-by: Paolo Pisati paolo.pisati@canonical.com
tools/testing/selftests/memory-hotplug/mem-on-off-test.sh | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
Best regards, Krzysztof
On Wed, Jun 30, 2021 at 04:57:40PM +0200, Paolo Pisati wrote:
While the offline memory test obey ratio limit, the same test with error injection does not and tries to offline all the hotpluggable memory, spamming system logs with hundreds of thousands of dump_page() entries, slowing system down (to the point the test itself timeout and gets terminated) and excessive fs occupation:
...
Anyone with spare cycles could review this? It got one ack already.
On 7/9/21 7:00 AM, Paolo Pisati wrote:
On Wed, Jun 30, 2021 at 04:57:40PM +0200, Paolo Pisati wrote:
While the offline memory test obey ratio limit, the same test with error injection does not and tries to offline all the hotpluggable memory, spamming system logs with hundreds of thousands of dump_page() entries, slowing system down (to the point the test itself timeout and gets terminated) and excessive fs occupation:
...
Anyone with spare cycles could review this? It got one ack already.
Thanks for finding and fixing this.
Looks good to me. I will pull this in as soon as the merge window ends and 5.14-rc1 comes out.
thanks, -- Shuah
linux-kselftest-mirror@lists.linaro.org