LTP syscalls madvise08 test case failed on all devices from Linux next 20210118 to till day. strace log attached to this email and link provided below.
BAD: next-20210118 GOOD: next-20210115
This failure is easily reproducible on Linux next tag 20210118 above.
tst_test.c:1250: TINFO: Timeout per run is 0h 15m 00s madvise08.c:73: TINFO: Temporary core pattern is '/scratch/ltp-2nftQzNI1K/HclFMH/dump-%p' madvise08.c:112: TINFO: Dump file should be dump-10109 madvise08.c:196: TPASS: madvise(..., MADV_DONTDUMP) madvise08.c:112: TINFO: Dump file should be dump-10110 madvise08.c:203: TFAIL: No sequence in dump after MADV_DODUMP.
strace log, https://lkft.validation.linaro.org/scheduler/job/2184866#L1257
Full test log link, https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20210118/tes...
Test case link, https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/sysca...
metadata: git branch: master git repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git describe: next-20210118 kernel-config: https://builds.tuxbuild.com/1nERt1Y6o3JSvLbiPXYTb9PMo1O/config
On Mon, Jan 25, 2021 at 3:48 PM Naresh Kamboju naresh.kamboju@linaro.org wrote:
LTP syscalls madvise08 test case failed on all devices from Linux next 20210118 to till day. strace log attached to this email and link provided below.
BAD: next-20210118 GOOD: next-20210115
This failure is easily reproducible on Linux next tag 20210118 above.
tst_test.c:1250: TINFO: Timeout per run is 0h 15m 00s madvise08.c:73: TINFO: Temporary core pattern is '/scratch/ltp-2nftQzNI1K/HclFMH/dump-%p' madvise08.c:112: TINFO: Dump file should be dump-10109 madvise08.c:196: TPASS: madvise(..., MADV_DONTDUMP) madvise08.c:112: TINFO: Dump file should be dump-10110 madvise08.c:203: TFAIL: No sequence in dump after MADV_DODUMP.
strace log, https://lkft.validation.linaro.org/scheduler/job/2184866#L1257
Ok, so in this part of the log,
[pid 485] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=487, si_uid=0, si_status=SIGABRT, si_utime=0, si_stime=0} --- [pid 485] write(2, "madvise08.c:117: \33[1;34mTINFO: \33"..., 64madvise08.c:117: [1;34mTINFO: [0mDump file should be dump-487 ) = 64 [pid 485] access("dump-487", F_OK) = 0 [pid 485] openat(AT_FDCWD, "dump-487", O_RDONLY) = 3 [pid 485] read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\4\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 485] read(3, "\0\320\3\0\0\0\0\0\0\340\375\24\304\177\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0"..., 1024) = 292 [pid 485] read(3, "", 1024) = 0 [pid 485] close(3) = 0 [pid 485] write(2, "madvise08.c:208: \33[1;31mTFAIL: \33"..., 74madvise08.c:208: [1;31mTFAIL: [0mNo sequence in dump after MADV_DODUMP.
it seems that the data that was requested to be dumped with MADV_DODUMP is indeed completely absent.
There was exactly one commit that got merged between next-20210115 and next-20120118 related to core dumps: 8a3cc755b138 ("coredump: don't bother with do_truncate()"). Adding Al Viro to Cc for that.
Naresh, could you try reverting that patch?
Arnd
Hi Arnd,
On Mon, 25 Jan 2021 at 20:41, Arnd Bergmann arnd@kernel.org wrote:
On Mon, Jan 25, 2021 at 3:48 PM Naresh Kamboju naresh.kamboju@linaro.org wrote:
LTP syscalls madvise08 test case failed on all devices from Linux next 20210118 to till day. strace log attached to this email and link provided below.
BAD: next-20210118 GOOD: next-20210115
This failure is easily reproducible on Linux next tag 20210118 above.
tst_test.c:1250: TINFO: Timeout per run is 0h 15m 00s madvise08.c:73: TINFO: Temporary core pattern is '/scratch/ltp-2nftQzNI1K/HclFMH/dump-%p' madvise08.c:112: TINFO: Dump file should be dump-10109 madvise08.c:196: TPASS: madvise(..., MADV_DONTDUMP) madvise08.c:112: TINFO: Dump file should be dump-10110 madvise08.c:203: TFAIL: No sequence in dump after MADV_DODUMP.
strace log, https://lkft.validation.linaro.org/scheduler/job/2184866#L1257
Ok, so in this part of the log,
[pid 485] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_DUMPED, si_pid=487, si_uid=0, si_status=SIGABRT, si_utime=0, si_stime=0} --- [pid 485] write(2, "madvise08.c:117: \33[1;34mTINFO: \33"..., 64madvise08.c:117: [1;34mTINFO: [0mDump file should be dump-487 ) = 64 [pid 485] access("dump-487", F_OK) = 0 [pid 485] openat(AT_FDCWD, "dump-487", O_RDONLY) = 3 [pid 485] read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\4\0>\0\1\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 [pid 485] read(3, "\0\320\3\0\0\0\0\0\0\340\375\24\304\177\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0"..., 1024) = 292 [pid 485] read(3, "", 1024) = 0 [pid 485] close(3) = 0 [pid 485] write(2, "madvise08.c:208: \33[1;31mTFAIL: \33"..., 74madvise08.c:208: [1;31mTFAIL: [0mNo sequence in dump after MADV_DODUMP.
it seems that the data that was requested to be dumped with MADV_DODUMP is indeed completely absent.
There was exactly one commit that got merged between next-20210115 and next-20120118 related to core dumps: 8a3cc755b138 ("coredump: don't bother with do_truncate()"). Adding Al Viro to Cc for that.
Naresh, could you try reverting that patch?
This suspecting commit reverted and tested and the test case PASS.
commit 8a3cc755b13808eba74846dfd1033fcbc21f9a65 Author: Al Viro viro@zeniv.linux.org.uk Date: Sun Mar 8 09:16:37 2020 -0400
coredump: don't bother with do_truncate()
have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whichever comes first. And instead of playing with do_truncate() in the end, just write one NUL at the end of the last gap (if any).
Signed-off-by: Al Viro viro@zeniv.linux.org.uk
fs/binfmt_elf.c | 1 - fs/coredump.c | 56 +++++++++++++++++++++++++++--------------------- include/linux/binfmts.h | 1 + include/linux/coredump.h | 1 -
Test case output link, https://lkft.validation.linaro.org/scheduler/job/2184975#L1369 https://lkft.validation.linaro.org/scheduler/job/2184972#L1358
- Naresh
On Mon, Jan 25, 2021 at 10:01:48PM +0530, Naresh Kamboju wrote:
commit 8a3cc755b13808eba74846dfd1033fcbc21f9a65 Author: Al Viro viro@zeniv.linux.org.uk Date: Sun Mar 8 09:16:37 2020 -0400
coredump: don't bother with do_truncate() have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whichever comes first. And instead of playing with do_truncate() in the end, just write one NUL at the end of the last gap (if any). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs/binfmt_elf.c | 1 - fs/coredump.c | 56 +++++++++++++++++++++++++++--------------------- include/linux/binfmts.h | 1 + include/linux/coredump.h | 1 -
Test case output link, https://lkft.validation.linaro.org/scheduler/job/2184975#L1369 https://lkft.validation.linaro.org/scheduler/job/2184972#L1358
I think I see what's going on. Could you check if the following fixes your reproducer?
diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c index 60b5583e9eaf..1a587618015c 100644 --- a/arch/powerpc/platforms/cell/spufs/coredump.c +++ b/arch/powerpc/platforms/cell/spufs/coredump.c @@ -149,8 +149,7 @@ static int spufs_arch_write_note(struct spu_context *ctx, int i, return -EIO; }
- if (!dump_skip(cprm, roundup(cprm->pos - ret + sz, 4) - cprm->pos)) - return -EIO; + dump_skip_to(cprm, roundup(cprm->pos - ret + sz, 4)); return 0; }
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 66137d5dca5f..474a3c7dd5ce 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -2267,8 +2267,7 @@ static int elf_core_dump(struct coredump_params *cprm) goto end_coredump;
/* Align to page */ - if (!dump_skip(cprm, dataoff - cprm->pos)) - goto end_coredump; + dump_skip_to(cprm, dataoff);
for (i = 0; i < vma_count; i++) { struct core_vma_metadata *meta = vma_meta + i; diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index be4062b8ba75..01f043971644 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1640,8 +1640,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) goto end_coredump; }
- if (!dump_skip(cprm, dataoff - cprm->pos)) - goto end_coredump; + dump_skip_to(cprm, dataoff);
if (!elf_fdpic_dump_segments(cprm, vma_meta, vma_count)) goto end_coredump; diff --git a/fs/coredump.c b/fs/coredump.c index 27a93f724251..bfac7422cd14 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -894,7 +894,14 @@ int dump_emit(struct coredump_params *cprm, const void *addr, int nr) } EXPORT_SYMBOL(dump_emit);
-int dump_skip(struct coredump_params *cprm, size_t nr) +void dump_skip_to(struct coredump_params *cprm, unsigned long pos) +{ + cprm->to_skip = pos - cprm->pos; + return 0; +} +EXPORT_SYMBOL(dump_skip_to); + +void dump_skip(struct coredump_params *cprm, size_t nr) { cprm->to_skip += nr; return 0; @@ -925,11 +932,11 @@ int dump_user_range(struct coredump_params *cprm, unsigned long start, stop = !dump_emit(cprm, kaddr, PAGE_SIZE); kunmap(page); put_page(page); + if (stop) + return 0; } else { - stop = !dump_skip(cprm, PAGE_SIZE); + dump_skip(cprm, PAGE_SIZE); } - if (stop) - return 0; } return 1; } @@ -940,7 +947,9 @@ int dump_align(struct coredump_params *cprm, int align) unsigned mod = (cprm->pos + cprm->to_skip) & (align - 1); if (align & (align - 1)) return 0; - return mod ? dump_skip(cprm, align - mod) : 1; + if (mod) + cprm->to_skip += align - mod; + return 1; } EXPORT_SYMBOL(dump_align);
diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 806d7f87d16a..78fcd776b185 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -23,7 +23,8 @@ extern unsigned int core_pipe_limit; * functions to write out all the necessary info. */ struct coredump_params; -extern int dump_skip(struct coredump_params *cprm, size_t nr); +extern void dump_skip_to(struct coredump_params *cprm, unsigned long to); +extern void dump_skip(struct coredump_params *cprm, size_t nr); extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start,
Hi Al Viro,
On Mon, 15 Feb 2021 at 01:25, Al Viro viro@zeniv.linux.org.uk wrote:
On Mon, Jan 25, 2021 at 10:01:48PM +0530, Naresh Kamboju wrote:
commit 8a3cc755b13808eba74846dfd1033fcbc21f9a65 Author: Al Viro viro@zeniv.linux.org.uk Date: Sun Mar 8 09:16:37 2020 -0400
coredump: don't bother with do_truncate() have dump_skip() just remember how much needs to be skipped, leave actual seeks/writing zeroes to the next dump_emit() or the end of coredump output, whichever comes first. And instead of playing with do_truncate() in the end, just write one NUL at the end of the last gap (if any). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
fs/binfmt_elf.c | 1 - fs/coredump.c | 56 +++++++++++++++++++++++++++--------------------- include/linux/binfmts.h | 1 + include/linux/coredump.h | 1 -
Test case output link, https://lkft.validation.linaro.org/scheduler/job/2184975#L1369 https://lkft.validation.linaro.org/scheduler/job/2184972#L1358
I think I see what's going on. Could you check if the following fixes your reproducer?
diff --git a/arch/powerpc/platforms/cell/spufs/coredump.c b/arch/powerpc/platforms/cell/spufs/coredump.c index 60b5583e9eaf..1a587618015c 100644 --- a/arch/powerpc/platforms/cell/spufs/coredump.c +++ b/arch/powerpc/platforms/cell/spufs/coredump.c @@ -149,8 +149,7 @@ static int spufs_arch_write_note(struct spu_context *ctx, int i, return -EIO; }
if (!dump_skip(cprm, roundup(cprm->pos - ret + sz, 4) - cprm->pos))
return -EIO;
dump_skip_to(cprm, roundup(cprm->pos - ret + sz, 4)); return 0;
}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 66137d5dca5f..474a3c7dd5ce 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -2267,8 +2267,7 @@ static int elf_core_dump(struct coredump_params *cprm) goto end_coredump;
/* Align to page */
if (!dump_skip(cprm, dataoff - cprm->pos))
goto end_coredump;
dump_skip_to(cprm, dataoff); for (i = 0; i < vma_count; i++) { struct core_vma_metadata *meta = vma_meta + i;
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index be4062b8ba75..01f043971644 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1640,8 +1640,7 @@ static int elf_fdpic_core_dump(struct coredump_params *cprm) goto end_coredump; }
if (!dump_skip(cprm, dataoff - cprm->pos))
goto end_coredump;
dump_skip_to(cprm, dataoff); if (!elf_fdpic_dump_segments(cprm, vma_meta, vma_count)) goto end_coredump;
diff --git a/fs/coredump.c b/fs/coredump.c index 27a93f724251..bfac7422cd14 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -894,7 +894,14 @@ int dump_emit(struct coredump_params *cprm, const void *addr, int nr) } EXPORT_SYMBOL(dump_emit);
-int dump_skip(struct coredump_params *cprm, size_t nr) +void dump_skip_to(struct coredump_params *cprm, unsigned long pos) +{
cprm->to_skip = pos - cprm->pos;
return 0;
fs/coredump.c:903:9: error: 'return' with a value, in function returning void [-Werror=return-type] 903 | return 0; | ^
Build failed due to above error.
build log link, https://gitlab.com/Linaro/lkft/users/naresh.kamboju/linux/-/jobs/1029838856#...
- Naresh
On Mon, Feb 15, 2021 at 02:11:15PM +0530, Naresh Kamboju wrote:
fs/coredump.c:903:9: error: 'return' with a value, in function returning void [-Werror=return-type] 903 | return 0; | ^
Build failed due to above error.
FWIW, here the test results in Running tests....... <<<test_start>>> tag=madvise08 stime=1613398818 cmdline="madvise08" contacts="" analysis=exit <<<test_output>>> incrementing stop tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s madvise08.c:78: TINFO: Temporary core pattern is '/tmp/ltp-tgvQ3Lz1UZ/B6lwy6/dump-%p' madvise08.c:117: TINFO: Dump file should be dump-2276 madvise08.c:201: TPASS: madvise(..., MADV_DONTDUMP) madvise08.c:117: TINFO: Dump file should be dump-2277 madvise08.c:205: TPASS: madvise(..., MADV_DODUMP)
Summary: passed 2 failed 0 skipped 0 warnings 0 <<<execution_status>>> initiation_status="ok" duration=0 termination_type=exited termination_id=0 corefile=no cutime=0 cstime=0 <<<test_end>>>
(built without -Werror=return-type, so I'd missed the warnings)
Anyway, I've folded the fix (with those stray return 0 removed, of course) into #work.coredump and #for-next; works here. Could you test either branch (in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git)?
On Mon, 15 Feb 2021 at 20:05, Al Viro viro@zeniv.linux.org.uk wrote:
On Mon, Feb 15, 2021 at 02:11:15PM +0530, Naresh Kamboju wrote:
fs/coredump.c:903:9: error: 'return' with a value, in function returning void [-Werror=return-type] 903 | return 0; | ^
Build failed due to above error.
FWIW, here the test results in Running tests....... <<<test_start>>> tag=madvise08 stime=1613398818 cmdline="madvise08" contacts="" analysis=exit <<<test_output>>> incrementing stop tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s madvise08.c:78: TINFO: Temporary core pattern is '/tmp/ltp-tgvQ3Lz1UZ/B6lwy6/dump-%p' madvise08.c:117: TINFO: Dump file should be dump-2276 madvise08.c:201: TPASS: madvise(..., MADV_DONTDUMP) madvise08.c:117: TINFO: Dump file should be dump-2277 madvise08.c:205: TPASS: madvise(..., MADV_DODUMP)
Summary: passed 2 failed 0 skipped 0 warnings 0 <<<execution_status>>> initiation_status="ok" duration=0 termination_type=exited termination_id=0 corefile=no cutime=0 cstime=0 <<<test_end>>>
(built without -Werror=return-type, so I'd missed the warnings)
Anyway, I've folded the fix (with those stray return 0 removed, of course) into #work.coredump and #for-next; works here. Could you test either branch (in git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git)?
By using Linaro tuxsuite [1] I have built your tree and tested on x86_64 and the reported test PASS now.
LTP syscalls [2] and LTP fs [3] tested on x86_64 and both PASS.
Tested-by: Naresh Kamboju naresh.kamboju@linaro.org
[1] https://gitlab.com/Linaro/tuxsuite [2] https://lkft.validation.linaro.org/results/2278012 [3] https://lkft.validation.linaro.org/results/2280979
- Naresh