On Thu, Aug 13, 2020 at 08:15:27AM -0500, Uriel Guajardo wrote:
On Thu, Aug 13, 2020 at 5:36 AM peterz@infradead.org wrote:
On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
KUnit will fail tests upon observing a lockdep failure. Because lockdep turns itself off after its first failure, only fail the first test and warn users to not expect any future failures from lockdep.
Similar to lib/locking-selftest [1], we check if the status of debug_locks has changed after the execution of a test case. However, we do not reset lockdep afterwards.
Like the locking selftests, we also fix possible preemption count corruption from lock bugs.
+static void kunit_check_locking_bugs(struct kunit *test,
unsigned long saved_preempt_count,
bool saved_debug_locks)
+{
preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
if (softirq_count())
current->softirqs_enabled = 0;
else
current->softirqs_enabled = 1;
+#endif
Urgh, don't silently change these... if they're off that's a hard fail.
if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count)) preempt_count_set(saved_preempt_count);
And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger the below fail.
Hmm, I see. My original assumption was that lock related bugs that could corrupt preempt_count would always be intervened by lockdep (resulting in debug_locks already being off). Is this not always true? In any case, I think it's better to explicitly show the failure associated with preemption count as you have done, but I'm still curious.
Code could have an unbalanced preempt_disable() unrelated to locks.
Also, for further clarification: the check you have made on preempt_count also covers softirq_count, right?
Correct.
My understanding is that softirqs are re-{enabled/disabled} due to the corruption of the preemption count, so no changes should occur if the preemption count remains the same. If it does change, we've already failed from DEBUG_LOCKS_WARN_ON.
local_bh_enable() might call into softirq handling if it got raised while disabled, you'll miss that here. The next interrupt will likely run the softirq after that.
This is best effort error recovery, you got a splat, all we aim for is living long enough to get the user to see it.