Linux next 20210617 tag following x86_64 builds failed with clang-10 and clang-11. Regressions found on x86_64:
- build/clang-11-tinyconfig - build/clang-11-allnoconfig - build/clang-10-tinyconfig - build/clang-10-allnoconfig - build/clang-11-x86_64_defconfig - build/clang-10-defconfig
We are running git bisect to identify the bad commit.
Build log: ------------ drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_relocate_parse_slow()+0x466: stack state mismatch: cfa1=4+120 cfa2=-1+0 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations()+0x1e0: stack state mismatch: cfa1=4+104 cfa2=-1+0 x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342' make[1]: *** [/builds/linux/Makefile:1252: vmlinux] Error 1 make[1]: Target '__all' not remade because of errors. make: *** [Makefile:222: __sub-make] Error 2 make: Target '__all' not remade because of errors. make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- 'HOSTCC=sccache clang' 'CC=sccache clang' headers_install INSTALL_HDR_PATH=/home/tuxbuild/.cache/tuxmake/builds/current/install_hdr/ tar caf /home/tuxbuild/.cache/tuxmake/builds/current/headers.tar.xz -C /home/tuxbuild/.cache/tuxmake/builds/current/install_hdr .
ref: https://builds.tuxbuild.com/1u4ZKFTh12vrYBVf8b1xGpaFOrE/
# TuxMake is a command line tool and Python library that provides # portable and repeatable Linux kernel builds across a variety of # architectures, toolchains, kernel configurations, and make targets. # # TuxMake supports the concept of runtimes. # See https://docs.tuxmake.org/runtimes/, for that to work it requires # that you install podman or docker on your system. # # To install tuxmake on your system globally: # sudo pip3 install -U tuxmake # # See https://docs.tuxmake.org/ for complete documentation.
tuxmake --runtime podman --target-arch x86_64 --toolchain clang-11 --kconfig x86_64_defconfig
ref: https://builds.tuxbuild.com/1u4ZKFTh12vrYBVf8b1xGpaFOrE/
build info: git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 7d9c6b8147bdd76d7eb2cf6f74f84c6918ae0939 git_short_log: 7d9c6b8147bd (\Add linux-next specific files for 20210617) kconfig: x86_64_defconfig kernel_image: kernel_version: 5.13.0-rc6 toolchain: clang-11
-- Linaro LKFT https://lkft.linaro.org
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
Linux next 20210617 tag following x86_64 builds failed with clang-10 and clang-11. Regressions found on x86_64:
- build/clang-11-tinyconfig
- build/clang-11-allnoconfig
- build/clang-10-tinyconfig
- build/clang-10-allnoconfig
- build/clang-11-x86_64_defconfig
- build/clang-10-defconfig
We are running git bisect to identify the bad commit.
Build log:
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_relocate_parse_slow()+0x466: stack state mismatch: cfa1=4+120 cfa2=-1+0 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.o: warning: objtool: eb_copy_relocations()+0x1e0: stack state mismatch: cfa1=4+104 cfa2=-1+0 x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Let's make kernel stacktraces easier to identify by including the build ID[1] of a module if the stacktrace is printing a symbol from a module. This makes it simpler for developers to locate a kernel module's full debuginfo for a particular stacktrace. Combined with scripts/decode_stracktrace.sh, a developer can download the matching debuginfo from a debuginfod[2] server and find the exact file and line number for the functions plus offsets in a stacktrace that match the module. This is especially useful for pstore crash debugging where the kernel crashes are recorded in something like console-ramoops and the recovery kernel/modules are different or the debuginfo doesn't exist on the device due to space concerns (the debuginfo can be too large for space limited devices).
Originally, I put this on the %pS format, but that was quickly rejected given that %pS is used in other places such as ftrace where build IDs aren't meaningful. There was some discussions on the list to put every module build ID into the "Modules linked in:" section of the stacktrace message but that quickly becomes very hard to read once you have more than three or four modules linked in. It also provides too much information when we don't expect each module to be traversed in a stacktrace. Having the build ID for modules that aren't important just makes things messy. Splitting it to multiple lines for each module quickly explodes the number of lines printed in an oops too, possibly wrapping the warning off the console. And finally, trying to stash away each module used in a callstack to provide the ID of each symbol printed is cumbersome and would require changes to each architecture to stash away modules and return their build IDs once unwinding has completed.
Instead, we opt for the simpler approach of introducing new printk formats '%pS[R]b' for "pointer symbolic backtrace with module build ID" and '%pBb' for "pointer backtrace with module build ID" and then updating the few places in the architecture layer where the stacktrace is printed to use this new format.
Before:
Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm] direct_entry+0x16c/0x1b4 [lkdtm] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8
After:
Call trace: lkdtm_WARNING+0x28/0x30 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] direct_entry+0x16c/0x1b4 [lkdtm 6c2215028606bda50de823490723dc4bc5bf46f9] full_proxy_write+0x74/0xa4 vfs_write+0xec/0x2e8
Link: https://lkml.kernel.org/r/20210511003845.2429846-6-swboyd@chromium.org Link: https://fedoraproject.org/wiki/Releases/FeatureBuildId [1] Link: https://sourceware.org/elfutils/Debuginfod.html [2] Signed-off-by: Stephen Boyd swboyd@chromium.org Cc: Jiri Olsa jolsa@kernel.org Cc: Alexei Starovoitov ast@kernel.org Cc: Jessica Yu jeyu@kernel.org Cc: Evan Green evgreen@chromium.org Cc: Hsin-Yi Wang hsinyi@chromium.org Cc: Petr Mladek pmladek@suse.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Sergey Senozhatsky sergey.senozhatsky@gmail.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Rasmus Villemoes linux@rasmusvillemoes.dk Cc: Matthew Wilcox willy@infradead.org Cc: Baoquan He bhe@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: Catalin Marinas catalin.marinas@arm.com Cc: Dave Young dyoung@redhat.com Cc: Ingo Molnar mingo@redhat.com Cc: Konstantin Khlebnikov khlebnikov@yandex-team.ru Cc: Sasha Levin sashal@kernel.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Vivek Goyal vgoyal@redhat.com Cc: Will Deacon will@kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Stephen Rothwell sfr@canb.auug.org.au Documentation/core-api/printk-formats.rst | 11 ++++ include/linux/kallsyms.h | 20 +++++- include/linux/module.h | 8 ++- kernel/kallsyms.c | 101 ++++++++++++++++++++++++------ kernel/module.c | 31 ++++++++- lib/vsprintf.c | 8 ++- 6 files changed, 154 insertions(+), 25 deletions(-) Previous HEAD position was b2dcc0267277 dump_stack: add vmlinux build ID to stack traces HEAD is now at 7d9c6b8147bd Add linux-next specific files for 20210617
make[1]: *** [/builds/linux/Makefile:1252: vmlinux] Error 1 make[1]: Target '__all' not remade because of errors. make: *** [Makefile:222: __sub-make] Error 2 make: Target '__all' not remade because of errors. make --silent --keep-going --jobs=8 O=/home/tuxbuild/.cache/tuxmake/builds/current ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- 'HOSTCC=sccache clang' 'CC=sccache clang' headers_install INSTALL_HDR_PATH=/home/tuxbuild/.cache/tuxmake/builds/current/install_hdr/ tar caf /home/tuxbuild/.cache/tuxmake/builds/current/headers.tar.xz -C /home/tuxbuild/.cache/tuxmake/builds/current/install_hdr .
ref: https://builds.tuxbuild.com/1u4ZKFTh12vrYBVf8b1xGpaFOrE/
# TuxMake is a command line tool and Python library that provides # portable and repeatable Linux kernel builds across a variety of # architectures, toolchains, kernel configurations, and make targets. # # TuxMake supports the concept of runtimes. # See https://docs.tuxmake.org/runtimes/, for that to work it requires # that you install podman or docker on your system. # # To install tuxmake on your system globally: # sudo pip3 install -U tuxmake # # See https://docs.tuxmake.org/ for complete documentation.
tuxmake --runtime podman --target-arch x86_64 --toolchain clang-11 --kconfig x86_64_defconfig
ref: https://builds.tuxbuild.com/1u4ZKFTh12vrYBVf8b1xGpaFOrE/
build info: git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 7d9c6b8147bdd76d7eb2cf6f74f84c6918ae0939 git_short_log: 7d9c6b8147bd (\Add linux-next specific files for 20210617) kconfig: x86_64_defconfig kernel_image: kernel_version: 5.13.0-rc6 toolchain: clang-11
Reported-by: Naresh Kamboju naresh.kamboju@linaro.org
-- Linaro LKFT https://lkft.linaro.org
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
Hi Matthew,
On Thu, 17 Jun 2021 at 19:22, Matthew Wilcox willy@infradead.org wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
Sorry for pointing to incorrect bad commits coming from git bisect.
Any best way to run git bisect on linux next tree ?
Here is the git bisect log from gitlab pipeline, https://gitlab.com/Linaro/lkft/bisect/-/jobs/1354963448
- Naresh
On Thu, 17 Jun 2021 20:15:13 +0530 Naresh Kamboju naresh.kamboju@linaro.org wrote:
Your git bisect probably went astray. There's no way that commit caused that regression.
Sorry for pointing to incorrect bad commits coming from git bisect.
Any best way to run git bisect on linux next tree ?
Here is the git bisect log from gitlab pipeline, https://gitlab.com/Linaro/lkft/bisect/-/jobs/1354963448
Is it possible that it's not 100% reproducible?
Anyway, before posting the result of any commit as the buggy commit from a git bisect, it is best to confirm it by:
1) Checking out the tree at the bad commit. 2) Verify that the tree at that point is bad 3) Check out the parent of that commit (the commit before the bad commit was applied). 4) Verify that the tree at that point is good
May need to repeat the above a couple of times, in case the issue is not 100% reproducible.
If the above is true, then post the patch as the bad commit. If it is not, then something went wrong with the bisect.
-- Steve
On Thu, Jun 17, 2021 at 5:54 PM Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Thu, 17 Jun 2021 at 19:22, Matthew Wilcox willy@infradead.org wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
Sorry for pointing to incorrect bad commits coming from git bisect.
Any best way to run git bisect on linux next tree ?
Linux Next is not anyhow different to any other repository that does merges. It takes the origin/master (Linus') tree as the base.
Rebuilt the CC list because most people were added based on the incorrect bisect result.
On Thu, Jun 17, 2021 at 02:51:49PM +0100, Matthew Wilcox wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
My bisect landed on commit 83f85ac75855 ("mm/mremap: convert huge PUD move to separate helper"). flush_pud_tlb_range() evaluates to BUILD_BUG() when CONFIG_TRANSPARENT_HUGEPAGE is unset but this function is present just based on the value of CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
$ make -skj(nproc) ARCH=x86_64 CC=clang O=build/x86_64 distclean allnoconfig mm/mremap.o
$ llvm-readelf -s build/x86_64/mm/mremap.o &| rg __compiletime_assert 21: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __compiletime_assert_337
$ rg TRANSPARENT_ build/x86_64/.config 450:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y 451:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y 562:# CONFIG_TRANSPARENT_HUGEPAGE is not set
Not sure why this does not happen on newer clang versions, presumably something with inlining decisions? Still seems like a legitimate issue to me.
Cheers, Nathan
On 6/17/21 11:32 PM, Nathan Chancellor wrote:
Rebuilt the CC list because most people were added based on the incorrect bisect result.
On Thu, Jun 17, 2021 at 02:51:49PM +0100, Matthew Wilcox wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
My bisect landed on commit 83f85ac75855 ("mm/mremap: convert huge PUD move to separate helper"). flush_pud_tlb_range() evaluates to BUILD_BUG() when CONFIG_TRANSPARENT_HUGEPAGE is unset but this function is present just based on the value of CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
$ make -skj(nproc) ARCH=x86_64 CC=clang O=build/x86_64 distclean allnoconfig mm/mremap.o
$ llvm-readelf -s build/x86_64/mm/mremap.o &| rg __compiletime_assert 21: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __compiletime_assert_337
$ rg TRANSPARENT_ build/x86_64/.config 450:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y 451:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y 562:# CONFIG_TRANSPARENT_HUGEPAGE is not set
Not sure why this does not happen on newer clang versions, presumably something with inlining decisions? Still seems like a legitimate issue to me.
gcc 10 also doesn't give a build error. I guess that is because we evaluate
if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
to if (0) with CONFIG_TRANSPARENT_HUGEPAGE disabled.
switching that to if (1) do results in BUILD_BUG triggering.
Should we fix this ?
modified mm/mremap.c @@ -336,7 +336,7 @@ static inline bool move_normal_pud(struct vm_area_struct *vma, } #endif
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +#if defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pud_t *old_pud, pud_t *new_pud) {
On Fri, Jun 18, 2021 at 10:32:42AM +0530, Aneesh Kumar K.V wrote:
On 6/17/21 11:32 PM, Nathan Chancellor wrote:
Rebuilt the CC list because most people were added based on the incorrect bisect result.
On Thu, Jun 17, 2021 at 02:51:49PM +0100, Matthew Wilcox wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
My bisect landed on commit 83f85ac75855 ("mm/mremap: convert huge PUD move to separate helper"). flush_pud_tlb_range() evaluates to BUILD_BUG() when CONFIG_TRANSPARENT_HUGEPAGE is unset but this function is present just based on the value of CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
$ make -skj(nproc) ARCH=x86_64 CC=clang O=build/x86_64 distclean allnoconfig mm/mremap.o
$ llvm-readelf -s build/x86_64/mm/mremap.o &| rg __compiletime_assert 21: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __compiletime_assert_337
$ rg TRANSPARENT_ build/x86_64/.config 450:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y 451:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y 562:# CONFIG_TRANSPARENT_HUGEPAGE is not set
Not sure why this does not happen on newer clang versions, presumably something with inlining decisions? Still seems like a legitimate issue to me.
gcc 10 also doesn't give a build error. I guess that is because we evaluate
if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
to if (0) with CONFIG_TRANSPARENT_HUGEPAGE disabled.
switching that to if (1) do results in BUILD_BUG triggering.
Thanks for pointing that out. I think what happens with clang-10 and clang-11 is that move_huge_pud() gets inlined into move_pgt_entry() but then the compiler does not figure out that the HPAGE_PUD case is dead so the code sticks around, where as GCC and newer clang versions can figure that out and eliminate that case.
Should we fix this ?
Yes, I believe that we should.
modified mm/mremap.c @@ -336,7 +336,7 @@ static inline bool move_normal_pud(struct vm_area_struct *vma, } #endif
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +#if defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pud_t *old_pud, pud_t *new_pud) {
That works or we could mirror what has already been done for the HPAGE_PMD case. No personal preference.
diff --git a/mm/mremap.c b/mm/mremap.c index 9a7fbec31dc9..5989d3990020 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -460,7 +460,8 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, new_entry); break; case HPAGE_PUD: - moved = move_huge_pud(vma, old_addr, new_addr, old_entry, + moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + move_huge_pud(vma, old_addr, new_addr, old_entry, new_entry); break;
Cheers, Nathan
An additional report: https://lore.kernel.org/lkml/20210623223015.GA315292@paulmck-ThinkPad-P17-Ge... EOM
On Fri, Jun 18, 2021 at 4:05 PM Nathan Chancellor nathan@kernel.org wrote:
On Fri, Jun 18, 2021 at 10:32:42AM +0530, Aneesh Kumar K.V wrote:
On 6/17/21 11:32 PM, Nathan Chancellor wrote:
Rebuilt the CC list because most people were added based on the incorrect bisect result.
On Thu, Jun 17, 2021 at 02:51:49PM +0100, Matthew Wilcox wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote:
x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
My bisect landed on commit 83f85ac75855 ("mm/mremap: convert huge PUD move to separate helper"). flush_pud_tlb_range() evaluates to BUILD_BUG() when CONFIG_TRANSPARENT_HUGEPAGE is unset but this function is present just based on the value of CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
$ make -skj(nproc) ARCH=x86_64 CC=clang O=build/x86_64 distclean allnoconfig mm/mremap.o
$ llvm-readelf -s build/x86_64/mm/mremap.o &| rg __compiletime_assert 21: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __compiletime_assert_337
$ rg TRANSPARENT_ build/x86_64/.config 450:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y 451:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y 562:# CONFIG_TRANSPARENT_HUGEPAGE is not set
Not sure why this does not happen on newer clang versions, presumably something with inlining decisions? Still seems like a legitimate issue to me.
gcc 10 also doesn't give a build error. I guess that is because we evaluate
if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
to if (0) with CONFIG_TRANSPARENT_HUGEPAGE disabled.
switching that to if (1) do results in BUILD_BUG triggering.
Thanks for pointing that out. I think what happens with clang-10 and clang-11 is that move_huge_pud() gets inlined into move_pgt_entry() but then the compiler does not figure out that the HPAGE_PUD case is dead so the code sticks around, where as GCC and newer clang versions can figure that out and eliminate that case.
Should we fix this ?
Yes, I believe that we should.
modified mm/mremap.c @@ -336,7 +336,7 @@ static inline bool move_normal_pud(struct vm_area_struct *vma, } #endif
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +#if defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pud_t *old_pud, pud_t *new_pud) {
That works or we could mirror what has already been done for the HPAGE_PMD case. No personal preference.
diff --git a/mm/mremap.c b/mm/mremap.c index 9a7fbec31dc9..5989d3990020 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -460,7 +460,8 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, new_entry); break; case HPAGE_PUD:
moved = move_huge_pud(vma, old_addr, new_addr, old_entry,
moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
move_huge_pud(vma, old_addr, new_addr, old_entry, new_entry); break;
Cheers, Nathan
On Wed, Jun 23, 2021 at 04:39:56PM -0700, Nick Desaulniers wrote:
An additional report: https://lore.kernel.org/lkml/20210623223015.GA315292@paulmck-ThinkPad-P17-Ge... EOM
On Fri, Jun 18, 2021 at 4:05 PM Nathan Chancellor nathan@kernel.org wrote:
On Fri, Jun 18, 2021 at 10:32:42AM +0530, Aneesh Kumar K.V wrote:
On 6/17/21 11:32 PM, Nathan Chancellor wrote:
Rebuilt the CC list because most people were added based on the incorrect bisect result.
On Thu, Jun 17, 2021 at 02:51:49PM +0100, Matthew Wilcox wrote:
On Thu, Jun 17, 2021 at 06:15:45PM +0530, Naresh Kamboju wrote:
On Thu, 17 Jun 2021 at 17:41, Naresh Kamboju naresh.kamboju@linaro.org wrote: > x86_64-linux-gnu-ld: mm/mremap.o: in function `move_pgt_entry': > mremap.c:(.text+0x763): undefined reference to `__compiletime_assert_342'
The git bisect pointed out the first bad commit.
The first bad commit: commit 928cf6adc7d60c96eca760c05c1000cda061604e Author: Stephen Boyd swboyd@chromium.org Date: Thu Jun 17 15:21:35 2021 +1000 module: add printk formats to add module build ID to stacktraces
Your git bisect probably went astray. There's no way that commit caused that regression.
My bisect landed on commit 83f85ac75855 ("mm/mremap: convert huge PUD move to separate helper"). flush_pud_tlb_range() evaluates to BUILD_BUG() when CONFIG_TRANSPARENT_HUGEPAGE is unset but this function is present just based on the value of CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
$ make -skj(nproc) ARCH=x86_64 CC=clang O=build/x86_64 distclean allnoconfig mm/mremap.o
$ llvm-readelf -s build/x86_64/mm/mremap.o &| rg __compiletime_assert 21: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND __compiletime_assert_337
$ rg TRANSPARENT_ build/x86_64/.config 450:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y 451:CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y 562:# CONFIG_TRANSPARENT_HUGEPAGE is not set
Not sure why this does not happen on newer clang versions, presumably something with inlining decisions? Still seems like a legitimate issue to me.
gcc 10 also doesn't give a build error. I guess that is because we evaluate
if (pud_trans_huge(*old_pud) || pud_devmap(*old_pud)) {
to if (0) with CONFIG_TRANSPARENT_HUGEPAGE disabled.
switching that to if (1) do results in BUILD_BUG triggering.
Thanks for pointing that out. I think what happens with clang-10 and clang-11 is that move_huge_pud() gets inlined into move_pgt_entry() but then the compiler does not figure out that the HPAGE_PUD case is dead so the code sticks around, where as GCC and newer clang versions can figure that out and eliminate that case.
Should we fix this ?
Yes, I believe that we should.
modified mm/mremap.c @@ -336,7 +336,7 @@ static inline bool move_normal_pud(struct vm_area_struct *vma, } #endif
-#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD +#if defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && defined(CONFIG_TRANSPARENT_HUGEPAGE) static bool move_huge_pud(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pud_t *old_pud, pud_t *new_pud) {
Making the above change does the trick for my repeat-by, thank you!
That works or we could mirror what has already been done for the HPAGE_PMD case. No personal preference.
diff --git a/mm/mremap.c b/mm/mremap.c index 9a7fbec31dc9..5989d3990020 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -460,7 +460,8 @@ static bool move_pgt_entry(enum pgt_entry entry, struct vm_area_struct *vma, new_entry); break; case HPAGE_PUD:
moved = move_huge_pud(vma, old_addr, new_addr, old_entry,
moved = IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
move_huge_pud(vma, old_addr, new_addr, old_entry, new_entry); break;
This one is already in -next, but you knew that already. I am happy to test the resulting patch, when and if.
Thanx, Paul