[TCWG CI] Regression caused by linux: Makefile: Enable -Warray-bounds: commit d4e0dad4a0cd00d1518f2105ccbfee17e2aa44a7 Author: Kees Cook keescook@chromium.org
Makefile: Enable -Warray-bounds
Results regressed to # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21319 # First few build errors in logs: # 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:95:9: error: array subscript 0 is outside array bounds of ‘volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:95:9: error: array subscript 0 is outside array bounds of ‘volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:95:9: error: array subscript 0 is outside array bounds of ‘volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:95:9: error: array subscript 0 is outside array bounds of ‘volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] # 00:00:53 ./arch/arm/include/asm/io.h:95:9: error: array subscript 0 is outside array bounds of ‘volatile void[0]’ [-Werror=array-bounds]
from # reset_artifacts: -10 # build_abe binutils: -9 # build_abe stage1: -5 # build_abe qemu: -2 # linux_n_obj: 21459 # First few build errors in logs: # 00:03:34 arch/arm/kernel/ptrace.c:438:40: error: ‘arch_ctrl’ is used uninitialized [-Werror=uninitialized] # 00:03:34 arch/arm/kernel/ptrace.c:484:40: error: ‘ctrl’ is used uninitialized [-Werror=uninitialized] # 00:03:36 make[2]: *** [scripts/Makefile.build:288: arch/arm/kernel/ptrace.o] Error 1 # 00:03:44 arch/arm/kernel/module-plts.c:127:21: error: statement will never be executed [-Werror=switch-unreachable] # 00:03:44 make[2]: *** [scripts/Makefile.build:288: arch/arm/kernel/module-plts.o] Error 1 # 00:04:00 sound/core/oss/mixer_oss.c:1057:21: error: ‘slot’ is used uninitialized [-Werror=uninitialized] # 00:04:01 sound/core/oss/pcm_oss.c:108:29: error: ‘t’ is used uninitialized [-Werror=uninitialized] # 00:04:01 sound/core/oss/pcm_oss.c:2998:51: error: ‘template’ is used uninitialized [-Werror=uninitialized] # 00:04:01 sound/core/seq/oss/seq_oss_init.c:350:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized] # 00:04:01 sound/core/seq/oss/seq_oss_init.c:370:35: error: ‘qinfo’ is used uninitialized [-Werror=uninitialized]
THIS IS THE END OF INTERESTING STUFF. BELOW ARE LINKS TO BUILDS, REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
This commit has regressed these CI configurations: - tcwg_kernel/gnu-master-arm-next-allmodconfig
First_bad build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... Last_good build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... Baseline build: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... Even more details: https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc...
Reproduce builds: <cut> mkdir investigate-linux-d4e0dad4a0cd00d1518f2105ccbfee17e2aa44a7 cd investigate-linux-d4e0dad4a0cd00d1518f2105ccbfee17e2aa44a7
# Fetch scripts git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
# Save baseline build state (which is then restored in artifacts/test.sh) mkdir -p ./bisect rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ --exclude /linux/ ./ ./bisect/baseline/
cd linux
# Reproduce first_bad build git checkout --detach d4e0dad4a0cd00d1518f2105ccbfee17e2aa44a7 ../artifacts/test.sh
# Reproduce last_good build git checkout --detach 9d210ed97e491400750e9b3c8c93f98d75845904 ../artifacts/test.sh
cd .. </cut>
Full commit (up to 1000 lines): <cut> commit d4e0dad4a0cd00d1518f2105ccbfee17e2aa44a7 Author: Kees Cook keescook@chromium.org Date: Fri Jun 18 23:30:07 2021 -0700
Makefile: Enable -Warray-bounds
With the recent fixes for flexible arrays and expanded FORTIFY_SOURCE coverage, it is now possible to enable -Warray-bounds. Since both GCC and Clang include -Warray-bounds in -Wall, adjust the Makefile to just stop disabling it.
Note that this option can be conservative in its warnings (which is done at casting time rather than access time), but this is reasonable since the cast variables may be accessed out of a scope where the true size of the original object can't be evaluated. These handful of false positives (which are arguably bad casts and can be easily avoided), are worth dealing with because of the many places where this option has helped identify missed bounds checks and even accesses done against cases where a NULL pointer could be reached.
https://github.com/KSPP/linux/issues/109 https://github.com/KSPP/linux/issues/151
Cc: Arnd Bergmann arnd@arndb.de Cc: Masahiro Yamada masahiroy@kernel.org Cc: linux-kbuild@vger.kernel.org Co-developed-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Gustavo A. R. Silva gustavoars@kernel.org Signed-off-by: Kees Cook keescook@chromium.org --- Makefile | 1 - 1 file changed, 1 deletion(-)
diff --git a/Makefile b/Makefile index 0fb4f94a6885..71c313b90a2b 100644 --- a/Makefile +++ b/Makefile @@ -952,7 +952,6 @@ KBUILD_CFLAGS += $(call cc-disable-warning, stringop-truncation)
# We'll want to enable this eventually, but it's not going away for 5.7 at least KBUILD_CFLAGS += $(call cc-disable-warning, zero-length-bounds) -KBUILD_CFLAGS += -Wno-array-bounds KBUILD_CFLAGS += $(call cc-disable-warning, stringop-overflow)
# Another good warning that we'll want to enable eventually </cut>
On Sun, Jan 30, 2022 at 01:00:43AM +0000, ci_notify@linaro.org wrote:
[TCWG CI] Regression caused by linux: Makefile: Enable -Warray-bounds: commit d4e0dad4a0cd00d1518f2105ccbfee17e2aa44a7 Author: Kees Cook keescook@chromium.org
Makefile: Enable -Warray-bounds
[...] # 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds]
Thanks for the report!
Would it be possible to include the "inlined from" details in the email summaries? Just getting a header file doesn't say where a header-defined inline is being used.
For example, extracting from the build log, I can see more:
00:00:53 In file included from ./include/linux/io.h:13, 00:00:53 from arch/arm/mach-cns3xxx/pm.c:8: 00:00:53 In function ‘__raw_readl’, 00:00:53 inlined from ‘cns3xxx_pwr_clk_en’ at arch/arm/mach-cns3xxx/pm.c:17:12: 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] 00:00:53 113 | asm volatile("ldr %0, %1" 00:00:53 | ^~~
Looks like something sees a "void" type... this smells like a compiler bug. I haven't been able to reproduce this warning yet.
[...] git clone https://git.linaro.org/toolchain/jenkins-scripts
# Fetch manifests and test.sh script mkdir -p artifacts/manifests curl -o artifacts/manifests/build-baseline.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... --fail curl -o artifacts/manifests/build-parameters.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... --fail curl -o artifacts/test.sh https://ci.linaro.org/job/tcwg_kernel-gnu-bisect-gnu-master-arm-next-allmodc... --fail chmod +x artifacts/test.sh
# Reproduce the baseline build (build all pre-requisites) ./jenkins-scripts/tcwg_kernel-build.sh @@ artifacts/manifests/build-baseline.sh
I couldn't find the compiler version anywhere in here. Could you include the compiler and linker --version output in the build logs too?
Maybe something in tcwg_kernel-build.sh near here, to get either CC or CROSS_COMPILE + HOSTCC and ld_opt's --version output:
local opts opts="CC=$(pwd)/bin/${rr[target]}-cc $ld_opt SUBLEVEL=0 EXTRAVERSION=-bisect" if [ x"${rr[target]}" != x"$(uname -m)" ]; then opts="$opts ARCH=$(print_kernel_target ${rr[target]})" opts="$opts CROSS_COMPILE=$(print_gnu_target ${rr[target]})-" opts="$opts HOSTCC=gcc" fi
It looks like maybe this is built under Ubuntu bionic? Or maybe focal? I don't see the warning with any GCC version I've tested with: 11.2.0 (impish), 10.3.0 (hirsute), 9.3.0 (focal), nor 7.5.0 (bionic).
Do you have some further hints about this?
Thanks!
On Mon, Jan 31, 2022 at 11:30 PM Kees Cook keescook@chromium.org wrote:
On Sun, Jan 30, 2022 at 01:00:43AM +0000, ci_notify@linaro.org wrote:
For example, extracting from the build log, I can see more:
00:00:53 In file included from ./include/linux/io.h:13, 00:00:53 from arch/arm/mach-cns3xxx/pm.c:8: 00:00:53 In function ‘__raw_readl’, 00:00:53 inlined from ‘cns3xxx_pwr_clk_en’ at arch/arm/mach-cns3xxx/pm.c:17:12: 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] 00:00:53 113 | asm volatile("ldr %0, %1" 00:00:53 | ^~~
Looks like something sees a "void" type... this smells like a compiler bug. I haven't been able to reproduce this warning yet.
I suspect this is a variation of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
When gcc sees a pointer dereference of a literal address like *(int *)(void *)0x1234000, this is sometimes interpreted as a NULL pointer with an offset, which in turn is assumed to have zero bytes that can be dereferenced.
Arnd
On Tue, Feb 01, 2022 at 08:17:50AM +0100, Arnd Bergmann wrote:
On Mon, Jan 31, 2022 at 11:30 PM Kees Cook keescook@chromium.org wrote:
On Sun, Jan 30, 2022 at 01:00:43AM +0000, ci_notify@linaro.org wrote:
For example, extracting from the build log, I can see more:
00:00:53 In file included from ./include/linux/io.h:13, 00:00:53 from arch/arm/mach-cns3xxx/pm.c:8: 00:00:53 In function ‘__raw_readl’, 00:00:53 inlined from ‘cns3xxx_pwr_clk_en’ at arch/arm/mach-cns3xxx/pm.c:17:12: 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] 00:00:53 113 | asm volatile("ldr %0, %1" 00:00:53 | ^~~
Looks like something sees a "void" type... this smells like a compiler bug. I haven't been able to reproduce this warning yet.
I suspect this is a variation of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
When gcc sees a pointer dereference of a literal address like *(int *)(void *)0x1234000, this is sometimes interpreted as a NULL pointer with an offset, which in turn is assumed to have zero bytes that can be dereferenced.
Eww.
I still can't reproduce this error myself. Any hints on compiler versions?
On Tue, Feb 1, 2022 at 8:52 AM Kees Cook keescook@chromium.org wrote:
On Tue, Feb 01, 2022 at 08:17:50AM +0100, Arnd Bergmann wrote:
On Mon, Jan 31, 2022 at 11:30 PM Kees Cook keescook@chromium.org wrote:
On Sun, Jan 30, 2022 at 01:00:43AM +0000, ci_notify@linaro.org wrote:
For example, extracting from the build log, I can see more:
00:00:53 In file included from ./include/linux/io.h:13, 00:00:53 from arch/arm/mach-cns3xxx/pm.c:8: 00:00:53 In function ‘__raw_readl’, 00:00:53 inlined from ‘cns3xxx_pwr_clk_en’ at arch/arm/mach-cns3xxx/pm.c:17:12: 00:00:53 ./arch/arm/include/asm/io.h:113:9: error: array subscript 0 is outside array bounds of ‘const volatile void[0]’ [-Werror=array-bounds] 00:00:53 113 | asm volatile("ldr %0, %1" 00:00:53 | ^~~
Looks like something sees a "void" type... this smells like a compiler bug. I haven't been able to reproduce this warning yet.
I suspect this is a variation of https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578
When gcc sees a pointer dereference of a literal address like *(int *)(void *)0x1234000, this is sometimes interpreted as a NULL pointer with an offset, which in turn is assumed to have zero bytes that can be dereferenced.
Eww.
I still can't reproduce this error myself. Any hints on compiler versions?
The godbolt.org link in my report shows this as reproducible with any gcc-11.x version but not gcc-10 or earlier.
Arnd
linaro-kernel@lists.linaro.org