Without -O2, the generated code for testing chacha function is awful. GCC even implements rol32() as a function instead of just using the rotlwi instruction, that function is 20 instructions long.
~# time ./vdso_test_chacha TAP version 13 1..1 ok 1 chacha: PASS real 0m 37.16s user 0m 36.89s sys 0m 0.26s
Several other selftests directory add -O2, and the kernel is also always built with optimisation active. Do the same for vDSO selftests.
With this patch the time is reduced by approx 15%.
~# time ./vdso_test_chacha TAP version 13 1..1 ok 1 chacha: PASS real 0m 32.09s user 0m 31.86s sys 0m 0.22s
Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu --- tools/testing/selftests/vDSO/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vDSO/Makefile b/tools/testing/selftests/vDSO/Makefile index cfb7c281b22c..96f25aa2f84e 100644 --- a/tools/testing/selftests/vDSO/Makefile +++ b/tools/testing/selftests/vDSO/Makefile @@ -13,7 +13,7 @@ TEST_GEN_PROGS += vdso_test_correctness TEST_GEN_PROGS += vdso_test_getrandom TEST_GEN_PROGS += vdso_test_chacha
-CFLAGS := -std=gnu99 +CFLAGS := -std=gnu99 -O2
ifeq ($(CONFIG_X86_32),y) LDLIBS += -lgcc_s
On Sun, Sep 01, 2024 at 07:24:03PM +0200, Christophe Leroy wrote:
Without -O2, the generated code for testing chacha function is awful. GCC even implements rol32() as a function instead of just using the rotlwi instruction, that function is 20 instructions long.
~# time ./vdso_test_chacha TAP version 13 1..1 ok 1 chacha: PASS real 0m 37.16s user 0m 36.89s sys 0m 0.26s
Several other selftests directory add -O2, and the kernel is also always built with optimisation active. Do the same for vDSO selftests.
With this patch the time is reduced by approx 15%.
~# time ./vdso_test_chacha TAP version 13 1..1 ok 1 chacha: PASS real 0m 32.09s user 0m 31.86s sys 0m 0.22s
Seems reasonable. I'll queue it up.
Thanks.
Jason
linux-kselftest-mirror@lists.linaro.org