KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
Depends on kunit: support failure from dynamic analysis tools [2]
[1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137
[2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajar...
Signed-off-by: Uriel Guajardo urielguajardo@google.com --- v2 Changes: - Removed lockdep_reset
- Added warning to users about lockdep shutting off --- lib/kunit/test.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c index d8189d827368..7e477482457b 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -11,6 +11,7 @@ #include <linux/kref.h> #include <linux/sched/debug.h> #include <linux/sched.h> +#include <linux/debug_locks.h>
#include "debugfs.h" #include "string-stream.h" @@ -22,6 +23,26 @@ void kunit_fail_current_test(void) kunit_set_failure(current->kunit_test); }
+static void kunit_check_locking_bugs(struct kunit *test, + unsigned long saved_preempt_count, + bool saved_debug_locks) +{ + preempt_count_set(saved_preempt_count); +#ifdef CONFIG_TRACE_IRQFLAGS + if (softirq_count()) + current->softirqs_enabled = 0; + else + current->softirqs_enabled = 1; +#endif +#if IS_ENABLED(CONFIG_LOCKDEP) + if (saved_debug_locks && !debug_locks) { + kunit_set_failure(test); + kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP."); + kunit_warn(test, "Further tests will have LOCKDEP disabled."); + } +#endif +} + static void kunit_print_tap_version(void) { static bool kunit_has_printed_tap_version; @@ -290,6 +311,9 @@ static void kunit_try_run_case(void *data) struct kunit_suite *suite = ctx->suite; struct kunit_case *test_case = ctx->test_case;
+ unsigned long saved_preempt_count = preempt_count(); + bool saved_debug_locks = debug_locks; + current->kunit_test = test;
/* @@ -298,7 +322,8 @@ static void kunit_try_run_case(void *data) * thread will resume control and handle any necessary clean up. */ kunit_run_case_internal(test, suite, test_case); - /* This line may never be reached. */ + /* These lines may never be reached. */ + kunit_check_locking_bugs(test, saved_preempt_count, saved_debug_locks); kunit_run_case_cleanup(test, suite); }
On Wed, 12 Aug 2020, Uriel Guajardo wrote:
KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
Depends on kunit: support failure from dynamic analysis tools [2]
[1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137
[2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajar...
Signed-off-by: Uriel Guajardo urielguajardo@google.com
v2 Changes:
Removed lockdep_reset
Added warning to users about lockdep shutting off
lib/kunit/test.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c index d8189d827368..7e477482457b 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -11,6 +11,7 @@ #include <linux/kref.h> #include <linux/sched/debug.h> #include <linux/sched.h> +#include <linux/debug_locks.h> #include "debugfs.h" #include "string-stream.h" @@ -22,6 +23,26 @@ void kunit_fail_current_test(void) kunit_set_failure(current->kunit_test); } +static void kunit_check_locking_bugs(struct kunit *test,
unsigned long saved_preempt_count,
bool saved_debug_locks)
+{
- preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
- if (softirq_count())
current->softirqs_enabled = 0;
- else
current->softirqs_enabled = 1;
+#endif +#if IS_ENABLED(CONFIG_LOCKDEP)
- if (saved_debug_locks && !debug_locks) {
kunit_set_failure(test);
kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
kunit_warn(test, "Further tests will have LOCKDEP disabled.");
- }
+#endif +}
Nit: I could be wrong but the general approach for this sort of feature is to do conditional compilation combined with "static inline" definitions to handle the case where the feature isn't enabled. Could we tidy this up a bit and haul this stuff out into a conditionally-compiled (if CONFIG_LOCKDEP) kunit lockdep.c file? Then in kunit's lockdep.h we'd have
struct kunit_lockdep { int preempt_count; bool debug_locks; };
#if IS_ENABLED(CONFIG_LOCKDEP) void kunit_test_init_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep); void kunit_test_check_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep); #else static inline void kunit_init_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep) { } static inline void kunit_check_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep) { } #endif
The test execution code could then call
struct kunit_lockdep lockdep;
kunit_test_init_lockdep(test, &lockdep);
kunit_test_check_lockdep(test, &lockdep);
If that approach makes sense, we could go a bit further and we might benefit from a bit more generalization here. _If_ the pattern of needing pre- and post- test actions is sustained across multiple analysis tools, could we add generic hooks for this? That would allow any additional dynamic analysis tools to utilize them. So kunit_try_run_case() would then cycle through the registered pre- hooks prior to running the case and post- hooks after, failing if any of the latter returned a failure value.
I'm thinking something like
kunit_register_external_test("lockdep", lockdep_pre, lockdep_post, &kunit_lockdep);
(or we could define a kunit_external_test struct for better extensibility).
A void * would be passed to pre/post, in this case it'd be a pointer to a struct containing the saved preempt count/debug locks, and the registration could be called during kunit initialization. This doesn't need to be done with your change of course but I wanted to float the idea as in addition to uncluttering the test case execution code, it might allow us to build facilities on top of that generic tool support for situations like "I'd like to see if the test passes absent any lockdep issues, so I'd like to disable lockdep-based failure". Such situations are more likely to arise in a world where kunit+tests are built as modules and run multiple times within a single system boot admittedly, but worth considering I think.
For that we'd need a way to select which dynamic tools kunit enables(kernel/module parameters or debugfs could do this), but a generic approach might help that sort of thing.
An external test under this model wouldn't have to necessarily be external to the area under test; the general criteria for such things would be "something I want to track across multiple test case execution".
Again I'm not trying to put you on the hook for any of the above suggestions (having lockdep support like this is fantastic!), but I think it'd be good to see if there's a pattern here we could potentially exploit in other use cases.
Thanks!
Alan
On Thu, Aug 13, 2020 at 4:11 AM Alan Maguire alan.maguire@oracle.com wrote:
On Wed, 12 Aug 2020, Uriel Guajardo wrote:
KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
Depends on kunit: support failure from dynamic analysis tools [2]
[1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137
[2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajar...
Signed-off-by: Uriel Guajardo urielguajardo@google.com
v2 Changes:
Removed lockdep_reset
Added warning to users about lockdep shutting off
lib/kunit/test.c | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c index d8189d827368..7e477482457b 100644 --- a/lib/kunit/test.c +++ b/lib/kunit/test.c @@ -11,6 +11,7 @@ #include <linux/kref.h> #include <linux/sched/debug.h> #include <linux/sched.h> +#include <linux/debug_locks.h>
#include "debugfs.h" #include "string-stream.h" @@ -22,6 +23,26 @@ void kunit_fail_current_test(void) kunit_set_failure(current->kunit_test); }
+static void kunit_check_locking_bugs(struct kunit *test,
unsigned long saved_preempt_count,
bool saved_debug_locks)
+{
preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
if (softirq_count())
current->softirqs_enabled = 0;
else
current->softirqs_enabled = 1;
+#endif +#if IS_ENABLED(CONFIG_LOCKDEP)
if (saved_debug_locks && !debug_locks) {
kunit_set_failure(test);
kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
kunit_warn(test, "Further tests will have LOCKDEP disabled.");
}
+#endif +}
Nit: I could be wrong but the general approach for this sort of feature is to do conditional compilation combined with "static inline" definitions to handle the case where the feature isn't enabled. Could we tidy this up a bit and haul this stuff out into a conditionally-compiled (if CONFIG_LOCKDEP) kunit lockdep.c file?
Sure! Apologies if this isn't convention.
Then in kunit's lockdep.h we'd have
struct kunit_lockdep { int preempt_count; bool debug_locks; };
#if IS_ENABLED(CONFIG_LOCKDEP) void kunit_test_init_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep); void kunit_test_check_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep); #else static inline void kunit_init_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep) { } static inline void kunit_check_lockdep(struct kunit_test *test, struct kunit_lockdep *lockdep) { } #endif
The test execution code could then call
struct kunit_lockdep lockdep; kunit_test_init_lockdep(test, &lockdep); kunit_test_check_lockdep(test, &lockdep);
Thanks for these helpful tips. I agree that it'll be cleaner this way. I'll implement this in the next version of the patch.
If that approach makes sense, we could go a bit further and we might benefit from a bit more generalization here. _If_ the pattern of needing pre- and post- test actions is sustained across multiple analysis tools, could we add generic hooks for this? That would allow any additional dynamic analysis tools to utilize them. So
I think this is a great idea. Right now I'm a little hesitant to generalize beyond lockdep, since most analysis tools I've seen don't seem to require this. For most tools, they fail, they report to KUnit, then they continue working without us needing to clean state. Perhaps the generic hooks could prove useful in other ways that I'm not considering..
In any case, I will go ahead and work on the lockdep-specific hook for KUnit. If you or anyone else thinks it could be useful in other ways in the future, we can make it generic!
kunit_try_run_case() would then cycle through the registered pre- hooks prior to running the case and post- hooks after, failing if any of the latter returned a failure value.
I'm thinking something like
kunit_register_external_test("lockdep", lockdep_pre, lockdep_post, &kunit_lockdep);
(or we could define a kunit_external_test struct for better extensibility).
A void * would be passed to pre/post, in this case it'd be a pointer to a struct containing the saved preempt count/debug locks, and the registration could be called during kunit initialization. This doesn't need to be done with your change of course but I wanted to float the idea as in addition to uncluttering the test case execution code, it might allow us to build facilities on top of that generic tool support for situations like "I'd like to see if the test passes absent any lockdep issues, so I'd like to disable lockdep-based failure". Such situations are more likely to arise in a world where kunit+tests are built as modules and run multiple times within a single system boot admittedly, but worth considering I think.
Interesting!
For that we'd need a way to select which dynamic tools kunit enables(kernel/module parameters or debugfs could do this), but a generic approach might help that sort of thing.
An external test under this model wouldn't have to necessarily be external to the area under test; the general criteria for such things would be "something I want to track across multiple test case execution".
Again I'm not trying to put you on the hook for any of the above suggestions (having lockdep support like this is fantastic!), but I think it'd be good to see if there's a pattern here we could potentially exploit in other use cases.
No worries, thanks for putting these suggestions out there.
Thanks!
Alan
On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
+static void kunit_check_locking_bugs(struct kunit *test,
unsigned long saved_preempt_count,
bool saved_debug_locks)
+{
- preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
- if (softirq_count())
current->softirqs_enabled = 0;
- else
current->softirqs_enabled = 1;
+#endif
Urgh, don't silently change these... if they're off that's a hard fail.
if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count)) preempt_count_set(saved_preempt_count);
And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger the below fail.
- if (saved_debug_locks && !debug_locks) {
kunit_set_failure(test);
kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
kunit_warn(test, "Further tests will have LOCKDEP disabled.");
- }
+}
On Thu, Aug 13, 2020 at 5:36 AM peterz@infradead.org wrote:
On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
+static void kunit_check_locking_bugs(struct kunit *test,
unsigned long saved_preempt_count,
bool saved_debug_locks)
+{
preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
if (softirq_count())
current->softirqs_enabled = 0;
else
current->softirqs_enabled = 1;
+#endif
Urgh, don't silently change these... if they're off that's a hard fail.
if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count)) preempt_count_set(saved_preempt_count);
And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger the below fail.
Hmm, I see. My original assumption was that lock related bugs that could corrupt preempt_count would always be intervened by lockdep (resulting in debug_locks already being off). Is this not always true? In any case, I think it's better to explicitly show the failure associated with preemption count as you have done, but I'm still curious.
Also, for further clarification: the check you have made on preempt_count also covers softirq_count, right? My understanding is that softirqs are re-{enabled/disabled} due to the corruption of the preemption count, so no changes should occur if the preemption count remains the same. If it does change, we've already failed from DEBUG_LOCKS_WARN_ON.
if (saved_debug_locks && !debug_locks) {
kunit_set_failure(test);
kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
kunit_warn(test, "Further tests will have LOCKDEP disabled.");
}
+}
On Thu, Aug 13, 2020 at 08:15:27AM -0500, Uriel Guajardo wrote:
On Thu, Aug 13, 2020 at 5:36 AM peterz@infradead.org wrote:
On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
+static void kunit_check_locking_bugs(struct kunit *test,
unsigned long saved_preempt_count,
bool saved_debug_locks)
+{
preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
if (softirq_count())
current->softirqs_enabled = 0;
else
current->softirqs_enabled = 1;
+#endif
Urgh, don't silently change these... if they're off that's a hard fail.
if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count)) preempt_count_set(saved_preempt_count);
And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger the below fail.
Hmm, I see. My original assumption was that lock related bugs that could corrupt preempt_count would always be intervened by lockdep (resulting in debug_locks already being off). Is this not always true? In any case, I think it's better to explicitly show the failure associated with preemption count as you have done, but I'm still curious.
Code could have an unbalanced preempt_disable() unrelated to locks.
Also, for further clarification: the check you have made on preempt_count also covers softirq_count, right?
Correct.
My understanding is that softirqs are re-{enabled/disabled} due to the corruption of the preemption count, so no changes should occur if the preemption count remains the same. If it does change, we've already failed from DEBUG_LOCKS_WARN_ON.
local_bh_enable() might call into softirq handling if it got raised while disabled, you'll miss that here. The next interrupt will likely run the softirq after that.
This is best effort error recovery, you got a splat, all we aim for is living long enough to get the user to see it.
linux-kselftest-mirror@lists.linaro.org