Two simple patches here resulting from using greybus on gbsim and developing support for async. I found a bug in the user-space tool and while doing that decided to update the kernel thread to be better behaved when waiting for test completions.
V2: Use available bundle pointer for dev_dbg printout - Johan
Bryan O'Donoghue (2): staging: greybus: loopback: use gb_loopback_async_wait_all don't spin staging: greybus: loopback_test: Fix race preventing test completion
drivers/staging/greybus/loopback.c | 11 +++++++++++ drivers/staging/greybus/tools/loopback_test.c | 10 +--------- 2 files changed, 12 insertions(+), 9 deletions(-)
Currently the greybus-loopback thread logic spins around waiting for send_count == iteration_max which on real hardware doesn't make a difference to us but in simulation is excruciatingly slow, anti-social and bad manners. Use the existing gb_loopback_async_wait_all() function to gate continuing when the send_count == iteration_max and go to sleep until there's something worthwhile to-do.
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie --- drivers/staging/greybus/loopback.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index 7882306..3184dd3 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -1008,11 +1008,22 @@ static int gb_loopback_fn(void *data)
/* Optionally terminate */ if (gb->send_count == gb->iteration_max) { + mutex_unlock(&gb->mutex); + + /* Wait for synchronous and asynchronus completion */ + gb_loopback_async_wait_all(gb); + + /* Mark complete unless user-space has poked us */ + mutex_lock(&gb->mutex); if (gb->iteration_count == gb->iteration_max) { gb->type = 0; gb->send_count = 0; sysfs_notify(&gb->dev->kobj, NULL, "iteration_count"); + dev_dbg(&bundle->dev, "load test complete\n"); + } else { + dev_dbg(&bundle->dev, + "continuing on with new test set\n"); } mutex_unlock(&gb->mutex); continue;
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
We don't even need to have the poll() loop timeout tests in since the kernel threads themselves should eventually complete - and if not it's a kernel bug. A user of the tool can still terminate tests by doing a ctrl-c.
This patch fixes the race by ensuring the poll() loop waits for all active Interfaces to complete - as opposed to the first active Interface before breaking the loop.
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie --- drivers/staging/greybus/tools/loopback_test.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/staging/greybus/tools/loopback_test.c b/drivers/staging/greybus/tools/loopback_test.c index f7f4cd6..f053d4fe 100644 --- a/drivers/staging/greybus/tools/loopback_test.c +++ b/drivers/staging/greybus/tools/loopback_test.c @@ -747,7 +747,7 @@ static int wait_for_complete(struct loopback_test *t) if (t->poll_timeout.tv_sec != 0) ts = &t->poll_timeout;
- while (1) { + while (!is_complete(t)) {
ret = ppoll(t->fds, t->poll_count, ts, &mask_old); if (ret <= 0) { @@ -763,14 +763,6 @@ static int wait_for_complete(struct loopback_test *t) number_of_events++; } } - - if (number_of_events == t->poll_count) - break; - } - - if (!is_complete(t)) { - fprintf(stderr, "Iteration count did not finish!\n"); - return -1; }
return 0;
On Thu, Dec 22, 2016 at 12:37:28AM +0000, Bryan O'Donoghue wrote:
Currently the greybus-loopback thread logic spins around waiting for send_count == iteration_max which on real hardware doesn't make a difference to us but in simulation is excruciatingly slow, anti-social and bad manners. Use the existing gb_loopback_async_wait_all() function to gate continuing when the send_count == iteration_max and go to sleep until there's something worthwhile to-do.
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie
Reviewed-by: Johan Hovold johan@kernel.org
Adding Axel on CC.
On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
So I've ignored the user-space tool so far, but the description above does not seem to explain why there is a race here. Could please expand this a bit?
We don't even need to have the poll() loop timeout tests in since the kernel threads themselves should eventually complete - and if not it's a kernel bug. A user of the tool can still terminate tests by doing a ctrl-c.
No, we still want to be well-behaved and time out if the test fails to complete. Consider automatic testing. But you don't seem to be changing any behaviour in this respect below?
This patch fixes the race by ensuring the poll() loop waits for all active Interfaces to complete - as opposed to the first active Interface before breaking the loop.
Really? It looks to me like the current code tries to wait for notifications on all fds before breaking the loop. Why do say its just the first?
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie
drivers/staging/greybus/tools/loopback_test.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/staging/greybus/tools/loopback_test.c b/drivers/staging/greybus/tools/loopback_test.c index f7f4cd6..f053d4fe 100644 --- a/drivers/staging/greybus/tools/loopback_test.c +++ b/drivers/staging/greybus/tools/loopback_test.c @@ -747,7 +747,7 @@ static int wait_for_complete(struct loopback_test *t) if (t->poll_timeout.tv_sec != 0) ts = &t->poll_timeout;
- while (1) {
- while (!is_complete(t)) {
ret = ppoll(t->fds, t->poll_count, ts, &mask_old); if (ret <= 0) { @@ -763,14 +763,6 @@ static int wait_for_complete(struct loopback_test *t) number_of_events++; } }
if (number_of_events == t->poll_count)
break;
- }
- if (!is_complete(t)) {
fprintf(stderr, "Iteration count did not finish!\n");
}return -1;
return 0;
Johan
Hi Bryan,
On Mon, Jan 2, 2017 at 3:32 PM, Johan Hovold johan@kernel.org wrote:
Adding Axel on CC.
On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
Im not sure why you might be getting this error. I think the while(1) loop should have exited when each interface has sent its event, and all the tests are finished.
If you see "Iteration count did not finish" it means that you got an PRI event for each polled fd, but the iteration_count does not equal iteration_max on one of the interfaces.
if the issue is 100% reproducible and if the patch below fixes the issue it seems that the iteration_count will eventually complete, in which case it makes me think that either an interface sent an event before it had finished the test, or some interface sent more than one event?
it would be nice to trace each event that the application is getting and from which fd it is, and looking at the iteration_count when that happens.
So I've ignored the user-space tool so far, but the description above does not seem to explain why there is a race here. Could please expand this a bit?
We don't even need to have the poll() loop timeout tests in since the kernel threads themselves should eventually complete - and if not it's a kernel bug. A user of the tool can still terminate tests by doing a ctrl-c.
No, we still want to be well-behaved and time out if the test fails to complete. Consider automatic testing. But you don't seem to be changing any behaviour in this respect below?
This patch fixes the race by ensuring the poll() loop waits for all active Interfaces to complete - as opposed to the first active Interface before breaking the loop.
Really? It looks to me like the current code tries to wait for notifications on all fds before breaking the loop. Why do say its just the first?
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie
drivers/staging/greybus/tools/loopback_test.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/staging/greybus/tools/loopback_test.c b/drivers/staging/greybus/tools/loopback_test.c index f7f4cd6..f053d4fe 100644 --- a/drivers/staging/greybus/tools/loopback_test.c +++ b/drivers/staging/greybus/tools/loopback_test.c @@ -747,7 +747,7 @@ static int wait_for_complete(struct loopback_test *t) if (t->poll_timeout.tv_sec != 0) ts = &t->poll_timeout;
while (1) {
while (!is_complete(t)) { ret = ppoll(t->fds, t->poll_count, ts, &mask_old); if (ret <= 0) {
@@ -763,14 +763,6 @@ static int wait_for_complete(struct loopback_test *t) number_of_events++; } }
if (number_of_events == t->poll_count)
break;
if you remove this check you can remove the number_of_events variable and logic to increment it, but i think something else might be hiding here.
}
if (!is_complete(t)) {
fprintf(stderr, "Iteration count did not finish!\n");
return -1; } return 0;
Johan
Regards Axel.
On 02/01/17 17:27, Axel Haslam wrote:
Hi Bryan,
On Mon, Jan 2, 2017 at 3:32 PM, Johan Hovold johan@kernel.org wrote:
Adding Axel on CC.
On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
Im not sure why you might be getting this error. I think the while(1) loop should have exited when each interface has sent its event, and all the tests are finished.
Alex,
Feliz año nuevo (Google translate skillz at work)
What's happening is we break the loop when the number_of_events == number of fd indices captured here @ t->poll_count
open_poll_files() { /* Set the poll count equal to the number of handles to track */ t->poll_count = fds_idx; }
wait_for_complete() {
while(1) { for (i = 0; i < t->poll_count; i++) { if(happy) number_of_events++; } if (number_of_events == t->poll_count) break; }
if (!is_complete(t)) { fprintf(stderr, "life stinks\n"); return -BROKEN; } }
is_complete() - then wants all iteration_counts to be equal to maximum or the test fails.
OTOH if the loop doesn't break until all of the tests are complete we never hit that problem. The responsibility should be on kernel-space to ensure all tests complete anyway IMO.
--- bod
On Tue, Jan 3, 2017 at 10:33 AM, Bryan O'Donoghue pure.logic@nexus-software.ie wrote:
On 02/01/17 17:27, Axel Haslam wrote:
Hi Bryan,
On Mon, Jan 2, 2017 at 3:32 PM, Johan Hovold johan@kernel.org wrote:
Adding Axel on CC.
On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
Im not sure why you might be getting this error. I think the while(1) loop should have exited when each interface has sent its event, and all the tests are finished.
Alex,
Feliz año nuevo (Google translate skillz at work)
Google got it right!!
What's happening is we break the loop when the number_of_events == number of fd indices captured here @ t->poll_count
is this wrong? this means we will break from the loop once all interfaces have sent the event (they are finished)
open_poll_files() { /* Set the poll count equal to the number of handles to track */ t->poll_count = fds_idx; }
wait_for_complete() {
while(1) { for (i = 0; i < t->poll_count; i++) { if(happy) number_of_events++; } if (number_of_events == t->poll_count) break; } if (!is_complete(t)) { fprintf(stderr, "life stinks\n"); return -BROKEN; }
}
is_complete() - then wants all iteration_counts to be equal to maximum or the test fails.
the test should fail if the iteration count does not equal the max, right?
as i see it, a successful test means: 1- each interfaces should send an event upon completion. 2- the iteration count should equal iteration_max on each of the interfaces
what am i missing?
OTOH if the loop doesn't break until all of the tests are complete we never hit that problem. The responsibility should be on kernel-space to ensure all tests complete anyway IMO.
the user app can bail out early too, if a timeout for the poll is given or in case of a signal interrupt.
bod
On 03/01/17 10:17, Axel Haslam wrote:
as i see it, a successful test means: 1- each interfaces should send an event upon completion.
yes
2- the iteration count should equal iteration_max on each of the interfaces
yes
what am i missing?
count == max implies the kernel thread is still working and we should continue to sleep.
I am interested in why count != max when in user-space number_of_events == t->poll_count - I don't see how that's logically possible, though it makes me wonder why the check is_complete() was put in place - you must have seen this yourself ?
As a fundamental though - unless count == max, the test is not done.
On Tue, Jan 3, 2017 at 12:51 PM, Bryan O'Donoghue pure.logic@nexus-software.ie wrote:
On 03/01/17 10:17, Axel Haslam wrote:
as i see it, a successful test means: 1- each interfaces should send an event upon completion.
yes
2- the iteration count should equal iteration_max on each of the interfaces
yes
what am i missing?
count == max implies the kernel thread is still working and we should continue to sleep.
i think you mean count != max here.
I am interested in why count != max when in user-space number_of_events == t->poll_count - I don't see how that's logically possible, though it
right, i think this is the issue.
makes me wonder why the check is_complete() was put in place - you must have seen this yourself ?
as a general sanity check. in case the event count were wrong for some reason. which seems to be happening here. i dont remember if i was actually seeing it and under which case, i will give it a try and see if i can reproduce it.
looking at your patch it seems the count will eventually complete and equal "max_iterations" maybe the event is somehow sent early.
As a fundamental though - unless count == max, the test is not done.
On Thu, Dec 22, 2016 at 12:37:27AM +0000, Bryan O'Donoghue wrote:
Two simple patches here resulting from using greybus on gbsim and developing support for async. I found a bug in the user-space tool and while doing that decided to update the kernel thread to be better behaved when waiting for test completions.
V2: Use available bundle pointer for dev_dbg printout - Johan
Bryan O'Donoghue (2): staging: greybus: loopback: use gb_loopback_async_wait_all don't spin staging: greybus: loopback_test: Fix race preventing test completion
I've applied just the first patch here.
thanks,
greg k-h