Hi Bryan,
On Mon, Jan 2, 2017 at 3:32 PM, Johan Hovold johan@kernel.org wrote:
Adding Axel on CC.
On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
Im not sure why you might be getting this error. I think the while(1) loop should have exited when each interface has sent its event, and all the tests are finished.
If you see "Iteration count did not finish" it means that you got an PRI event for each polled fd, but the iteration_count does not equal iteration_max on one of the interfaces.
if the issue is 100% reproducible and if the patch below fixes the issue it seems that the iteration_count will eventually complete, in which case it makes me think that either an interface sent an event before it had finished the test, or some interface sent more than one event?
it would be nice to trace each event that the application is getting and from which fd it is, and looking at the iteration_count when that happens.
So I've ignored the user-space tool so far, but the description above does not seem to explain why there is a race here. Could please expand this a bit?
We don't even need to have the poll() loop timeout tests in since the kernel threads themselves should eventually complete - and if not it's a kernel bug. A user of the tool can still terminate tests by doing a ctrl-c.
No, we still want to be well-behaved and time out if the test fails to complete. Consider automatic testing. But you don't seem to be changing any behaviour in this respect below?
This patch fixes the race by ensuring the poll() loop waits for all active Interfaces to complete - as opposed to the first active Interface before breaking the loop.
Really? It looks to me like the current code tries to wait for notifications on all fds before breaking the loop. Why do say its just the first?
Signed-off-by: Bryan O'Donoghue pure.logic@nexus-software.ie
drivers/staging/greybus/tools/loopback_test.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/drivers/staging/greybus/tools/loopback_test.c b/drivers/staging/greybus/tools/loopback_test.c index f7f4cd6..f053d4fe 100644 --- a/drivers/staging/greybus/tools/loopback_test.c +++ b/drivers/staging/greybus/tools/loopback_test.c @@ -747,7 +747,7 @@ static int wait_for_complete(struct loopback_test *t) if (t->poll_timeout.tv_sec != 0) ts = &t->poll_timeout;
while (1) {
while (!is_complete(t)) { ret = ppoll(t->fds, t->poll_count, ts, &mask_old); if (ret <= 0) {
@@ -763,14 +763,6 @@ static int wait_for_complete(struct loopback_test *t) number_of_events++; } }
if (number_of_events == t->poll_count)
break;
if you remove this check you can remove the number_of_events variable and logic to increment it, but i think something else might be hiding here.
}
if (!is_complete(t)) {
fprintf(stderr, "Iteration count did not finish!\n");
return -1; } return 0;
Johan
Regards Axel.