Alexandre reported a bug in the loopback rmmod path subsequent to a failure in gb_pm_runtime_get_sync - something that can happen when you are developing device-side firmware easily. Doing some cursory rmmod testing on gb_loopback then showed a second (and long standing) error pertaining to removal of gb_dev.root.
This series fixes both issues.
Bryan O'Donoghue (2): staging: greybus: loopback: fix gb_pm_runtime_get_sync error handling staging: greybus: loopback: fix oops on rmmod gb_loopback
drivers/staging/greybus/loopback.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
commit e854ff58ed70 ("greybus: loopback: add runtime pm support") introduces pm runtime support to the loopback code. If a gb_pm_runtime_get_sync() fails, currently we immediately return from the loopback thread.
Alexandre reports that later on, subsequent to the afore mentioned failure, doing an rmmod will trigger a kthread_stop() which will will generate a fault. The fault results from dereferencing gb->task in gb_loopback_disconnect().
One way to fix that is to have the loopback thread do_exit() instead of simply returning on gb_pm_runtime_get_sync() failure - however this will leave dangling sysfs entries with no loopback thread to take action based on the sysfs inputs.
This patch fixes the fault by ignoring the gb_pm_runtime_get_sync() result code, this will allow only gb_loopback_disconnect() to terminate the loopback thread. Fix validated in gbsim.
Next steps on this driver entail changing the relationship between /sysfs and the loopback thread - probably making the loopback thread dynamically started/stopped - as opposed to the pre-allocation model currently used but, those changes are for a future series to address.
Reported-by: Alexandre Bailon abailon@baylibre.com Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie --- drivers/staging/greybus/loopback.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index 83f3b9f..88d4f5c 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -886,9 +886,7 @@ static int gb_loopback_fn(void *data) gb_pm_runtime_put_autosuspend(bundle); wait_event_interruptible(gb->wq, gb->type || kthread_should_stop()); - ret = gb_pm_runtime_get_sync(bundle); - if (ret) - return ret; + gb_pm_runtime_get_sync(bundle); }
if (kthread_should_stop())
On Sun, Jan 08, 2017 at 03:27:18PM +0000, Bryan O'Donoghue wrote:
commit e854ff58ed70 ("greybus: loopback: add runtime pm support") introduces pm runtime support to the loopback code. If a gb_pm_runtime_get_sync() fails, currently we immediately return from the loopback thread.
Alexandre reports that later on, subsequent to the afore mentioned failure, doing an rmmod will trigger a kthread_stop() which will will generate a fault. The fault results from dereferencing gb->task in gb_loopback_disconnect().
One way to fix that is to have the loopback thread do_exit() instead of simply returning on gb_pm_runtime_get_sync() failure - however this will leave dangling sysfs entries with no loopback thread to take action based on the sysfs inputs.
This patch fixes the fault by ignoring the gb_pm_runtime_get_sync() result code, this will allow only gb_loopback_disconnect() to terminate the loopback thread. Fix validated in gbsim.
No, you must check the return for resume errors, and not attempt any further I/O in case of failure.
Kill the thread, or somehow report a test failure and wait for user space to retry.
This is not directly related to the sysfs entries, they belong to the bundle and should stay while the device is present. As mentioned before, the sysfs-interface is not the right interface for this type of device, but that's a different story.
Next steps on this driver entail changing the relationship between /sysfs and the loopback thread - probably making the loopback thread dynamically started/stopped - as opposed to the pre-allocation model currently used but, those changes are for a future series to address.
You might get away with using a work struct scheduled from the current sysfs-interface, but long term this should probably be shifted over to using a character-device interface.
Thanks, Johan
On January 9, 2017 11:19:09 AM GMT+00:00, Johan Hovold johan@kernel.org wrote:
On Sun, Jan 08, 2017 at 03:27:18PM +0000, Bryan O'Donoghue wrote:
commit e854ff58ed70 ("greybus: loopback: add runtime pm support") introduces pm runtime support to the loopback code. If a gb_pm_runtime_get_sync() fails, currently we immediately return from
the
loopback thread.
Alexandre reports that later on, subsequent to the afore mentioned
failure,
doing an rmmod will trigger a kthread_stop() which will will generate
a
fault. The fault results from dereferencing gb->task in gb_loopback_disconnect().
One way to fix that is to have the loopback thread do_exit() instead
of
simply returning on gb_pm_runtime_get_sync() failure - however this
will
leave dangling sysfs entries with no loopback thread to take action
based
on the sysfs inputs.
This patch fixes the fault by ignoring the gb_pm_runtime_get_sync()
result
code, this will allow only gb_loopback_disconnect() to terminate the loopback thread. Fix validated in gbsim.
No, you must check the return for resume errors, and not attempt any further I/O in case of failure.
The greybus operations themselves do report an error subsequent to the user-space tool.
Kill the thread, or somehow report a test failure and wait for user space to retry.
Killing the thread would be messy, however yes I take your general point.
It should be possible to use the gb_pm_runtime_get_sync as a determinant and cease further operations.
I spin that into a v2 this evening.
This is not directly related to the sysfs entries, they belong to the bundle and should stay while the device is present. As mentioned before, the sysfs-interface is not the right interface for this type of device, but that's a different story.
Next steps on this driver entail changing the relationship between
/sysfs
and the loopback thread - probably making the loopback thread
dynamically
started/stopped - as opposed to the pre-allocation model currently
used
but, those changes are for a future series to address.
You might get away with using a work struct scheduled from the current sysfs-interface, but long term this should probably be shifted over to using a character-device interface.
Thanks, Johan
On Mon, Jan 09, 2017 at 11:29:50AM +0000, Bryan O'Donoghue wrote:
On January 9, 2017 11:19:09 AM GMT+00:00, Johan Hovold johan@kernel.org wrote:
On Sun, Jan 08, 2017 at 03:27:18PM +0000, Bryan O'Donoghue wrote:
commit e854ff58ed70 ("greybus: loopback: add runtime pm support") introduces pm runtime support to the loopback code. If a gb_pm_runtime_get_sync() fails, currently we immediately return from
the
loopback thread.
Alexandre reports that later on, subsequent to the afore mentioned
failure,
doing an rmmod will trigger a kthread_stop() which will will generate
a
fault. The fault results from dereferencing gb->task in gb_loopback_disconnect().
One way to fix that is to have the loopback thread do_exit() instead
of
simply returning on gb_pm_runtime_get_sync() failure - however this
will
leave dangling sysfs entries with no loopback thread to take action
based
on the sysfs inputs.
This patch fixes the fault by ignoring the gb_pm_runtime_get_sync()
result
code, this will allow only gb_loopback_disconnect() to terminate the loopback thread. Fix validated in gbsim.
No, you must check the return for resume errors, and not attempt any further I/O in case of failure.
The greybus operations themselves do report an error subsequent to the user-space tool.
Then you must already have a mechanism for reporting errors. Just set a flag and don't execute any further operations after resume failure.
Kill the thread, or somehow report a test failure and wait for user space to retry.
Killing the thread would be messy, however yes I take your general point.
Setting a flag and doing a clean exit should do right? Considering a resume-failure as a fatal error is reasonable.
It should be possible to use the gb_pm_runtime_get_sync as a determinant and cease further operations.
But cancelling the current test and allow user-space to restart it would be more well-behaved of course. Not sure it's worth it given the state of things though.
Johan
On January 9, 2017 11:57:36 AM GMT+00:00, Johan Hovold johan@kernel.org wrote:
On Mon, Jan 09, 2017 at 11:29:50AM +0000, Bryan O'Donoghue wrote:
On January 9, 2017 11:19:09 AM GMT+00:00, Johan Hovold
johan@kernel.org wrote:
On Sun, Jan 08, 2017 at 03:27:18PM +0000, Bryan O'Donoghue wrote:
commit e854ff58ed70 ("greybus: loopback: add runtime pm support") introduces pm runtime support to the loopback code. If a gb_pm_runtime_get_sync() fails, currently we immediately return
from
the
loopback thread.
Alexandre reports that later on, subsequent to the afore mentioned
failure,
doing an rmmod will trigger a kthread_stop() which will will
generate
a
fault. The fault results from dereferencing gb->task in gb_loopback_disconnect().
One way to fix that is to have the loopback thread do_exit()
instead
of
simply returning on gb_pm_runtime_get_sync() failure - however
this
will
leave dangling sysfs entries with no loopback thread to take
action
based
on the sysfs inputs.
This patch fixes the fault by ignoring the
gb_pm_runtime_get_sync()
result
code, this will allow only gb_loopback_disconnect() to terminate
the
loopback thread. Fix validated in gbsim.
No, you must check the return for resume errors, and not attempt any further I/O in case of failure.
The greybus operations themselves do report an error subsequent to
the
user-space tool.
Then you must already have a mechanism for reporting errors. Just set a flag and don't execute any further operations after resume failure.
Kill the thread, or somehow report a test failure and wait for user space to retry.
Killing the thread would be messy, however yes I take your general point.
Setting a flag and doing a clean exit should do right? Considering a resume-failure as a fatal error is reasonable.
It should be possible to use the gb_pm_runtime_get_sync as a determinant and cease further operations.
But cancelling the current test and allow user-space to restart it would be more well-behaved of course. Not sure it's worth it given the state of things though.
Sorry yes, I'm not saying anything different.
Johan
--- bod
Doing an rmmod gb_loopback will cause an oops with the following backtrace.
Call Trace: debugfs_remove+0xaf/0x180 gb_loopback_disconnect+0x36/0x160 [gb_loopback] greybus_remove+0x87/0x1a0 [greybus] device_release_driver_internal+0x14a/0x200 driver_detach+0x39/0x60 bus_remove_driver+0x47/0xa0 driver_unregister+0x27/0x50 greybus_deregister_driver+0xd/0x10 [greybus] loopback_exit+0x1c/0x36 [gb_loopback] SyS_delete_module+0x170/0x1b0 entry_SYSCALL_64_fastpath+0x13/0x93
This happens because gb_dev.root is removed prior to greybus_deregister_driver which itself will trigger gb_loopback_disconnect to run - leading to the parent of gb->file being NULL.
Typically up to this point when tearing down greybus modules we have removed gb_es2 prior to removing gb_loopback. In this case gb_es2 will run subordinate greybus->disconnect() routines. A subsequent rmmod of gb_loopback then will not re-run gb_loopback_disconnect().
On project Ara the second module removal flow was the only practiced flow hence it's only now in gbsim where we have more freedom to rmmod that this bug is apparent. The fix itself is pretty straight-forward.
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie --- drivers/staging/greybus/loopback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index 88d4f5c..8f3eb33 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -1260,9 +1260,9 @@ module_init(loopback_init);
static void __exit loopback_exit(void) { - debugfs_remove_recursive(gb_dev.root); greybus_deregister(&gb_loopback_driver); class_unregister(&loopback_class); + debugfs_remove_recursive(gb_dev.root); ida_destroy(&loopback_ida); } module_exit(loopback_exit);
On Sun, Jan 08, 2017 at 03:27:19PM +0000, Bryan O'Donoghue wrote:
Doing an rmmod gb_loopback will cause an oops with the following backtrace.
Call Trace: debugfs_remove+0xaf/0x180 gb_loopback_disconnect+0x36/0x160 [gb_loopback] greybus_remove+0x87/0x1a0 [greybus] device_release_driver_internal+0x14a/0x200 driver_detach+0x39/0x60 bus_remove_driver+0x47/0xa0 driver_unregister+0x27/0x50 greybus_deregister_driver+0xd/0x10 [greybus] loopback_exit+0x1c/0x36 [gb_loopback] SyS_delete_module+0x170/0x1b0 entry_SYSCALL_64_fastpath+0x13/0x93
This happens because gb_dev.root is removed prior to greybus_deregister_driver which itself will trigger gb_loopback_disconnect to run - leading to the parent of gb->file being NULL.
Typically up to this point when tearing down greybus modules we have removed gb_es2 prior to removing gb_loopback. In this case gb_es2 will run subordinate greybus->disconnect() routines. A subsequent rmmod of gb_loopback then will not re-run gb_loopback_disconnect().
On project Ara the second module removal flow was the only practiced flow hence it's only now in gbsim where we have more freedom to rmmod that this bug is apparent. The fix itself is pretty straight-forward.
There was nothing preventing you from unloading bundle-drivers while keeping the host-device driver (e.g. gb_es2) loaded on Project Ara, even if this typically wasn't done, yes.
drivers/staging/greybus/loopback.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index 88d4f5c..8f3eb33 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -1260,9 +1260,9 @@ module_init(loopback_init); static void __exit loopback_exit(void) {
- debugfs_remove_recursive(gb_dev.root); greybus_deregister(&gb_loopback_driver); class_unregister(&loopback_class);
- debugfs_remove_recursive(gb_dev.root);
This is just a plain bug, and in fact, you should only be using debugfs_remove here, as any bundle-specific files should have been cleaned up as part of disconnect.
ida_destroy(&loopback_ida); } module_exit(loopback_exit);
Thanks, Johan