On Wed, Dec 16 2020 at 15:55, Naresh Kamboju wrote:
On Tue, 15 Dec 2020 at 23:52, Jakub Kicinski kuba@kernel.org wrote:
Or you could place checks for being in a BH-disable further up in the code. Or build with CONFIG_DEBUG_INFO=y to allow more precise interpretation of this stack trace.
I will try to reproduce this warning with DEBUG_INFO=y enabled kernel and get back to you with a better crash log.
My money would be on the option that whatever run on this workqueue before forgot to re-enable BH, but we already have a check for that... Naresh, do you have the full log? Is there nothing like "BUG: workqueue leaked lock" above the splat?
No, because it's in the middle of the work. The workqueue bug triggers when the work has finished.
So cleanup_up() net does
.... synchronize_rcu(); <- might sleep. So up to here it should be fine.
list_for_each_entry_continue_reverse(ops, &pernet_list, list) ops_exit_list(ops, &net_exit_list);
ops_exit_list() is called for each ops which then either invokes ops->exit() or ops->exit_batch().
So one of those callbacks fails to reenable BH, so adding a check after each invocation of ops->exit() and ops->exit_batch() for !local_bh_disabled() should be able to identify the buggy callback.
Thanks,
tglx