Following kernel regressions have been noticed on Linux next-20220426 (till date) on qemu_i386. This is a special build with KFENCE and KUNIT enabled but not KASAN on i386.
snippet of Kconfigs from the build: CONFIG_CC_HAS_KASAN_GENERIC=y CONFIG_KFENCE=y CONFIG_KFENCE_KUNIT_TEST=y CONFIG_KUNIT=y CONFIG_KUNIT_DEBUGFS=y CONFIG_KUNIT_TEST=y CONFIG_KUNIT_EXAMPLE_TEST=y CONFIG_KUNIT_ALL_TESTS=y
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Regressions found on qemu_i386;:
- kunit/test_free_bulk - kunit/test_memcache_typesafe_by_rcu - kunit/test_out_of_bounds_read-memcache - kunit/test_invalid_access - kunit/test_krealloc - kunit/test_invalid_addr_free - kunit/test_gfpzero - kunit/test_kmalloc_aligned_oob_write - kunit/kfence - kunit/test_out_of_bounds_read - kunit/test_out_of_bounds_write - kunit/test_invalid_addr_free-memcache - kunit/test_double_free - kunit/test_corruption-memcache - kunit/test_use_after_free_read-memcache - kunit/test_corruption - kunit/test_use_after_free_read - kunit/test_shrink_memcache - kunit/test_double_free-memcache - kunit/test_out_of_bounds_write-memcache - kunit/test_kmalloc_aligned_oob_read
full test log link [1]
snippet of test log: [ 4.168302] # test_free_bulk: test_alloc: size=271, gfp=cc0, policy=right, cache=0 [ 4.271599] # test_free_bulk: test_alloc: size=271, gfp=cc0, policy=none, cache=0 [ 4.271611] # test_free_bulk: test_alloc: size=271, gfp=cc0, policy=left, cache=0 [ 4.687568] # test_free_bulk: test_alloc: size=271, gfp=cc0, policy=none, cache=0 [ 4.687581] # test_free_bulk: test_alloc: size=271, gfp=cc0, policy=none, cache=0 [ 4.687616] # test_free_bulk: ASSERTION FAILED at mm/kfence/kfence_test.c:547 [ 4.687616] Expected report_available() to be false, but is true [ 4.687789] not ok 13 - test_free_bulk ... [ 33.294007] # test_memcache_typesafe_by_rcu: EXPECTATION FAILED at mm/kfence/kfence_test.c:687 [ 33.294007] Expected report_matches(&expect) to be true, but is false [ 33.294182] not ok 23 - test_memcache_typesafe_by_rcu [ 33.294183] kunit_try_catch (209) used greatest stack depth: 5840 bytes left [ 33.294619] # test_krealloc: test_alloc: size=32, gfp=cc0, policy=any, cache=0 [ 33.391637] # test_krealloc: ASSERTION FAILED at mm/kfence/kfence_test.c:724 [ 33.391637] Expected report_available() to be false, but is true [ 33.391658] not ok 24 - test_krealloc [ 33.391902] # test_memcache_alloc_bulk: setup_test_cache: size=32, ctor=0x0 [ 33.495801] ok 25 - test_memcache_alloc_bulk [ 33.495808] # kfence: pass:3 fail:20 skip:2 total:25 [ 33.495817] # Totals: pass:3 fail:20 skip:2 total:25 [ 33.495833] not ok 1 - kfence
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 088fb7eff3496e0f61fdf68bda89b81a4d0a4434 git_describe: next-20220426 kernel-config: https://builds.tuxbuild.com/28KafYBlDWOxI3qyNmGDwb63GuX/config
-- Linaro LKFT https://lkft.linaro.org
[1] https://lkft.validation.linaro.org/scheduler/job/4950383#L770 [2] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20220426/tes...
On Fri, Apr 29, 2022 at 08:50PM +0530, Naresh Kamboju wrote: [...]
snippet of Kconfigs from the build: CONFIG_CC_HAS_KASAN_GENERIC=y CONFIG_KFENCE=y CONFIG_KFENCE_KUNIT_TEST=y CONFIG_KUNIT=y CONFIG_KUNIT_DEBUGFS=y CONFIG_KUNIT_TEST=y CONFIG_KUNIT_EXAMPLE_TEST=y CONFIG_KUNIT_ALL_TESTS=y
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Regressions found on qemu_i386;:
[...]
[ 33.495833] not ok 1 - kfence
Thank you for reporting. I did a bisection and landed here:
| git bisect start | # good: [38d741cb70b30741c0e802cbed7bd9cf4fd15fa4] Merge tag 'drm-fixes-2022-04-29' of git://anongit.freedesktop.org/drm/drm | git bisect good 38d741cb70b30741c0e802cbed7bd9cf4fd15fa4 | # bad: [5469f0c06732a077c70a759a81f2a1f00b277694] Add linux-next specific files for 20220429 | git bisect bad 5469f0c06732a077c70a759a81f2a1f00b277694 | # bad: [8aceb3338349dbda8fa0caf194b775335106343c] Merge branch 'drm-next' of git://git.freedesktop.org/git/drm/drm.git | git bisect bad 8aceb3338349dbda8fa0caf194b775335106343c | # bad: [ee45eb58be0352e34825f74c3f0187ffd9802cf8] Merge branch 'docs-next' of git://git.lwn.net/linux.git | git bisect bad ee45eb58be0352e34825f74c3f0187ffd9802cf8 | # good: [79ff65e50b2275296c8a12e823e0b50d573b7716] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git | git bisect good 79ff65e50b2275296c8a12e823e0b50d573b7716 | # good: [040c491d3a03e71a4c8c7aa0d24921f9a0cb8057] Merge branch 'master' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt.git | git bisect good 040c491d3a03e71a4c8c7aa0d24921f9a0cb8057 | # good: [52a97e777288db02be4211911c884c3a4810bbbd] Merge branch 'master' of https://github.com/Paragon-Software-Group/linux-ntfs3.git | git bisect good 52a97e777288db02be4211911c884c3a4810bbbd | # bad: [565f64656a35d673d8a9a61e570024a5fa3d369b] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git | git bisect bad 565f64656a35d673d8a9a61e570024a5fa3d369b | # good: [513df2cd1c0bbf378fcf9d7f9a288e736ca74286] Merge branch 'locks-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux.git | git bisect good 513df2cd1c0bbf378fcf9d7f9a288e736ca74286 | # bad: [bfc1f2749c23a4748dc4993df37cf75fa2602aba] Merge branch 'rework/kthreads' into for-next | git bisect bad bfc1f2749c23a4748dc4993df37cf75fa2602aba | # good: [c5f75d490fc2dd706898c005a05a933b39e880d3] Merge branch 'for-5.18' into for-next | git bisect good c5f75d490fc2dd706898c005a05a933b39e880d3 | # good: [a699449bb13b70b8bd10dc03ad7327ea3993221e] printk: refactor and rework printing logic | git bisect good a699449bb13b70b8bd10dc03ad7327ea3993221e | # bad: [8e274732115f63c1d09136284431b3555bd5cc56] printk: extend console_lock for per-console locking | git bisect bad 8e274732115f63c1d09136284431b3555bd5cc56 | # good: [3b604ca81202eea2a917eb6491e90f610fba0ec7] printk: add pr_flush() | git bisect good 3b604ca81202eea2a917eb6491e90f610fba0ec7 | # bad: [09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7] printk: add kthread console printers | git bisect bad 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7 | # good: [2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2] printk: add functions to prefer direct printing | git bisect good 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2 | # first bad commit: [09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7] printk: add kthread console printers
And looking at your log [1], it shows that KFENCE is working just fine, but the logic that is supposed to intercept the kernel log (via tracepoint) to check that reports are being generated correctly seems to be broken.
And this is not only i386-specific, it's also broken on a x86-64 build.
At first I thought maybe with the printk changes we'd now have to call pr_flush(), but that doesn't work, so I'm missing something still:
| --- a/mm/kfence/kfence_test.c | +++ b/mm/kfence/kfence_test.c | @@ -73,11 +73,18 @@ static void probe_console(void *ignore, const char *buf, size_t len) | } | | /* Check if a report related to the test exists. */ | -static bool report_available(void) | +static bool __report_available(void) | { | return READ_ONCE(observed.nlines) == ARRAY_SIZE(observed.lines); | } | | +/* Check if a report related to the test exists; may sleep. */ | +static bool report_available(void) | +{ | + pr_flush(0, true); | + return __report_available(); | +} | + | /* Information we expect in a report. */ | struct expect_report { | enum kfence_error_type type; /* The type or error. */ | @@ -165,7 +172,7 @@ static bool report_matches(const struct expect_report *r) | cur += scnprintf(cur, end - cur, " 0x%p", (void *)addr); | | spin_lock_irqsave(&observed.lock, flags); | - if (!report_available()) | + if (!__report_available()) | goto out; /* A new report is being captured. */
John, Petr, any hints what's going on?
Thanks, -- Marco
metadata: git_ref: master git_repo: https://gitlab.com/Linaro/lkft/mirrors/next/linux-next git_sha: 088fb7eff3496e0f61fdf68bda89b81a4d0a4434 git_describe: next-20220426 kernel-config: https://builds.tuxbuild.com/28KafYBlDWOxI3qyNmGDwb63GuX/config
-- Linaro LKFT https://lkft.linaro.org
[1] https://lkft.validation.linaro.org/scheduler/job/4950383#L770 [2] https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20220426/tes...
Hi Marco,
On 2022-04-29, Marco Elver elver@google.com wrote:
And looking at your log [1], it shows that KFENCE is working just fine, but the logic that is supposed to intercept the kernel log (via tracepoint) to check that reports are being generated correctly seems to be broken.
And this is not only i386-specific, it's also broken on a x86-64 build.
At first I thought maybe with the printk changes we'd now have to call pr_flush(), but that doesn't work, so I'm missing something still:
| --- a/mm/kfence/kfence_test.c | +++ b/mm/kfence/kfence_test.c | @@ -73,11 +73,18 @@ static void probe_console(void *ignore, const char *buf, size_t len) | } | | /* Check if a report related to the test exists. */ | -static bool report_available(void) | +static bool __report_available(void) | { | return READ_ONCE(observed.nlines) == ARRAY_SIZE(observed.lines); | } | | +/* Check if a report related to the test exists; may sleep. */ | +static bool report_available(void) | +{ | + pr_flush(0, true); | + return __report_available(); | +} | +
I am not familiar with how this works. Is the tracepoint getting set on call_console_drivers()? Or on call_console_driver()?
If so, there are a couple problems with that. First off, the prototype for that function has changed. Second, that function is called when text is printed, but this is not when the text was created. With the kthreads, the printing can be significantly delayed.
Since printk() is now lockless and console printing is delayed, it becomes a bit tricky to parse the records in the existing code using a tracepoint.
I wonder if creating a NOP function for the kfence probe to attach to would be more appropriate. In printk_sprint() we get the text after space has been reserved, but before the text is committed to the ringbuffer. This is guaranteed to be called from within the printk() context.
Here is an example of what I am thinking...
--- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2227,6 +2227,10 @@ static u16 printk_sprint(char *text, u16 size, int facility, } }
+#ifdef CONFIG_KFENCE_KUNIT_TEST + printk_kfence_check(text, text_len); +#endif + return text_len; }
The probe_console() could attach to a NOP function printk_kfence_check().
John
On Sat, Apr 30, 2022 at 11:14PM +0206, John Ogness wrote: [...]
I am not familiar with how this works. Is the tracepoint getting set on call_console_drivers()? Or on call_console_driver()?
It's at the start of call_console_drivers(). See trace_console_rcuidle() call.
If so, there are a couple problems with that. First off, the prototype for that function has changed. Second, that function is called when text is printed, but this is not when the text was created. With the kthreads, the printing can be significantly delayed.
Since printk() is now lockless and console printing is delayed, it becomes a bit tricky to parse the records in the existing code using a tracepoint.
I wonder if creating a NOP function for the kfence probe to attach to would be more appropriate. In printk_sprint() we get the text after space has been reserved, but before the text is committed to the ringbuffer. This is guaranteed to be called from within the printk() context.
I think we just need to fix the existing tracepoint, since it has changed its semantics vs. what it was intended to be:
| commit 95100358491abaa2e9a5483811370059bbca4645 | Author: Johannes Berg johannes.berg@intel.com | Date: Thu Nov 24 20:03:08 2011 +0100 | | printk/tracing: Add console output tracing | | Add a printk.console trace point to record any printk | messages into the trace, regardless of the current | console loglevel. This can help correlate (existing) | printk debugging with other tracing. | | Link: http://lkml.kernel.org/r/1322161388.5366.54.camel@jlt3.sipsolutions.net | | Acked-by: Frederic Weisbecker fweisbec@gmail.com | Cc: Christoph Hellwig hch@infradead.org | Cc: Ingo Molnar mingo@redhat.com | Acked-by: Peter Zijlstra a.p.zijlstra@chello.nl | Acked-by: Thomas Gleixner tglx@linutronix.de | Signed-off-by: Johannes Berg johannes.berg@intel.com | Signed-off-by: Steven Rostedt rostedt@goodmis.org
Specifically using it to "correlate (existing) printk debugging with other tracing" is now broken.
Here is an example of what I am thinking...
--- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2227,6 +2227,10 @@ static u16 printk_sprint(char *text, u16 size, int facility, } } +#ifdef CONFIG_KFENCE_KUNIT_TEST
- printk_kfence_check(text, text_len);
+#endif
- return text_len;
} The probe_console() could attach to a NOP function printk_kfence_check().
Thanks for this! However, I think we can't have a KFENCE-specific helper, it needs to be a tracepoint, because there are more tests that want to check console output (kernel/kcsan/kcsan_test.c did this before the KFENCE test actually).
My proposal would be to fix the tracepoint like so:
| --- a/kernel/printk/printk.c | +++ b/kernel/printk/printk.c | @@ -2002,8 +2002,6 @@ static void call_console_driver(struct console *con, const char *text, size_t le | { | size_t dropped_len; | | - trace_console_rcuidle(text, len); | - | if (con->dropped && dropped_text) { | dropped_len = snprintf(dropped_text, DROPPED_TEXT_MAX, | "** %lu printk messages dropped **\n", | @@ -2178,6 +2176,8 @@ static u16 printk_sprint(char *text, u16 size, int facility, | } | } | | + trace_console_rcuidle(text, text_len); | + | return text_len; | }
This fixes the KFENCE and KCSAN tests.
Unless I hear objections, I'll prepare a patch explaining why we need to fix the tracepoint.
Thanks, -- Marco
On Mon 2022-05-02 09:51:46, Marco Elver wrote:
On Sat, Apr 30, 2022 at 11:14PM +0206, John Ogness wrote: [...]
I am not familiar with how this works. Is the tracepoint getting set on call_console_drivers()? Or on call_console_driver()?
It's at the start of call_console_drivers(). See trace_console_rcuidle() call.
If so, there are a couple problems with that. First off, the prototype for that function has changed. Second, that function is called when text is printed, but this is not when the text was created. With the kthreads, the printing can be significantly delayed.
Since printk() is now lockless and console printing is delayed, it becomes a bit tricky to parse the records in the existing code using a tracepoint.
I wonder if creating a NOP function for the kfence probe to attach to would be more appropriate. In printk_sprint() we get the text after space has been reserved, but before the text is committed to the ringbuffer. This is guaranteed to be called from within the printk() context.
I think we just need to fix the existing tracepoint, since it has changed its semantics vs. what it was intended to be:
| commit 95100358491abaa2e9a5483811370059bbca4645 | Author: Johannes Berg johannes.berg@intel.com | Date: Thu Nov 24 20:03:08 2011 +0100 | | printk/tracing: Add console output tracing | | Add a printk.console trace point to record any printk | messages into the trace, regardless of the current | console loglevel. This can help correlate (existing) | printk debugging with other tracing. | | Link: http://lkml.kernel.org/r/1322161388.5366.54.camel@jlt3.sipsolutions.net | | Acked-by: Frederic Weisbecker fweisbec@gmail.com | Cc: Christoph Hellwig hch@infradead.org | Cc: Ingo Molnar mingo@redhat.com | Acked-by: Peter Zijlstra a.p.zijlstra@chello.nl | Acked-by: Thomas Gleixner tglx@linutronix.de | Signed-off-by: Johannes Berg johannes.berg@intel.com | Signed-off-by: Steven Rostedt rostedt@goodmis.org
Specifically using it to "correlate (existing) printk debugging with other tracing" is now broken.
It is pity that it was done in the console code in the first place.
printk() always used console_trylock() and the message was flushed to the console only when the trylock succeeded. And it was always deferred in NMI or when printed via printk_deferred().
My proposal would be to fix the tracepoint like so:
| --- a/kernel/printk/printk.c | +++ b/kernel/printk/printk.c | @@ -2002,8 +2002,6 @@ static void call_console_driver(struct console *con, const char *text, size_t le | { | size_t dropped_len; | | - trace_console_rcuidle(text, len); | - | if (con->dropped && dropped_text) { | dropped_len = snprintf(dropped_text, DROPPED_TEXT_MAX, | "** %lu printk messages dropped **\n", | @@ -2178,6 +2176,8 @@ static u16 printk_sprint(char *text, u16 size, int facility, | } | } | | + trace_console_rcuidle(text, text_len); | + | return text_len; | }
This fixes the KFENCE and KCSAN tests.
Unless I hear objections, I'll prepare a patch explaining why we need to fix the tracepoint.
It makes perfect sense to me.
Note that there is one difference. The original tracepoint stored the string with the console header (loglevel + timestamp), see info_print_prefix(). The new code will store "only" the plain message.
But the new code should be more user friendly. The printk() message will be sorted with the other tracing messages out of box. Users will not longer need to use the printk timestamp to find when it was actually printed.
Best Regards, Petr
On 2022-05-02, Petr Mladek pmladek@suse.com wrote:
My proposal would be to fix the tracepoint like so:
| --- a/kernel/printk/printk.c | +++ b/kernel/printk/printk.c | @@ -2002,8 +2002,6 @@ static void call_console_driver(struct console *con, const char *text, size_t le | { | size_t dropped_len; | | - trace_console_rcuidle(text, len); | - | if (con->dropped && dropped_text) { | dropped_len = snprintf(dropped_text, DROPPED_TEXT_MAX, | "** %lu printk messages dropped **\n", | @@ -2178,6 +2176,8 @@ static u16 printk_sprint(char *text, u16 size, int facility, | } | } | | + trace_console_rcuidle(text, text_len); | + | return text_len; | }
This fixes the KFENCE and KCSAN tests.
Unless I hear objections, I'll prepare a patch explaining why we need to fix the tracepoint.
It makes perfect sense to me.
This is the easiest place for it. However, it should be clear that in the context of trace_console_rcuidle(), the message is not yet visible to any readers. The message _will_ get committed and definitely _will_ become visible at some point. But it is not (yet) visible at _this_ point. Maybe that is OK for what it is being used for.
If trace_console_rcuidle() must be called at the point of visibility for readers, it becomes more complicated.
John
On Mon, 2 May 2022 at 11:20, John Ogness john.ogness@linutronix.de wrote:
On 2022-05-02, Petr Mladek pmladek@suse.com wrote:
My proposal would be to fix the tracepoint like so:
[...]
This fixes the KFENCE and KCSAN tests.
Unless I hear objections, I'll prepare a patch explaining why we need to fix the tracepoint.
It makes perfect sense to me.
This is the easiest place for it. However, it should be clear that in the context of trace_console_rcuidle(), the message is not yet visible to any readers. The message _will_ get committed and definitely _will_ become visible at some point. But it is not (yet) visible at _this_ point. Maybe that is OK for what it is being used for.
If trace_console_rcuidle() must be called at the point of visibility for readers, it becomes more complicated.
I think that wasn't the original intent, so fixing it up to actually happen on printk() is probably the sanest thing. I sent the patch trying to explain: https://lore.kernel.org/all/20220503073844.4148944-1-elver@google.com/
Thank you both for the valuable pointers!
-- Marco
John