On Wed 2021-10-27 13:57:40, Miroslav Benes wrote:
On Tue, 26 Oct 2021, Luis Chamberlain wrote:
On Tue, Oct 26, 2021 at 11:37:30PM +0800, Ming Lei wrote:
On Tue, Oct 26, 2021 at 10:48:18AM +0200, Petr Mladek wrote:
Livepatch code never called kobject_del() under a lock. It would cause the obvious deadlock.
I have to correct myself. IMHO, the deadlock is far from obvious. I always get lost in the code and the documentation is not clear. I always get lost.
Never?
kobject_put() to be precise.
IMHO, the problem is actually with kobject_del() that gets blocked until the sysfs interface gets removed. kobject_put() will have the same problem only when the clean up is not delayed.
When I started working on the support for module/live patches removal, calling kobject_put() under our klp_mutex lock was the obvious first choice given how the code was structured, but I ran into problems with deadlocks immediately. So it was changed to async approach with the workqueue. Thus the mainline code has never suffered from this, but we knew about the issues.
The historic code only waited in the module_exit() callback until the sysfs interface was removed.
OK, then Luis shouldn't consider livepatching as one such issue to solve with one generic solution.
It's not what I was told when the deadlock was found with zram, so I was informed quite the contrary.
From my perspective, it is quite easy to get it wrong due to either a lack
of generic support, or missing rules/documentation. So if this thread leads to "do not share locks between a module removal and a sysfs operation" strict rule, it would be at least something. In the same manner as Luis proposed to document try_module_get() expectations.
The rule "do not share locks between a module removal and a sysfs operation" is not clear to me.
IMHO, there are the following rules:
1. rule: kobject_del() or kobject_put() must not be called under a lock that is used by store()/show() callbacks.
reason: kobject_del() waits until the sysfs interface is destroyed. It has to wait until all store()/show() callbacks are finished.
2. rule: kobject_del()/kobject_put() must not be called from the related store() callbacks.
reason: same as in 1st rule.
3. rule: module_exit() must wait until all release() callbacks are called when kobject are static.
reason: kobject_put() must be called to clean up internal dependencies. The clean up might be done asynchronously and need access to the kobject structure.
Best Regards, Petr
PS: I am sorry if I am messing things. I want to be sure that we are all talking about the same and understand it the same way.