On Tue, Jan 3, 2017 at 10:33 AM, Bryan O'Donoghue pure.logic@nexus-software.ie wrote:
On 02/01/17 17:27, Axel Haslam wrote:
Hi Bryan,
On Mon, Jan 2, 2017 at 3:32 PM, Johan Hovold johan@kernel.org wrote:
Adding Axel on CC.
On Thu, Dec 22, 2016 at 12:37:29AM +0000, Bryan O'Donoghue wrote:
commit 9250c0ee2626 ("greybus: Loopback_test: use poll instead of inotify") changes the flow of determining when to break out of a loop polling for loopback test completion.
The clause is_complete() which determines if all tests are complete - as used is subject to a race condition where one of the tests has completed but at least one other test has not. On real hardware this typically doesn't present itself however in gbsim - which is a lot slower due in-part to a panopoly of printouts - we see that running a loopback test to more than one Interface in gbsim will fail in all instances printing out "Iteration count did not finish".
Im not sure why you might be getting this error. I think the while(1) loop should have exited when each interface has sent its event, and all the tests are finished.
Alex,
Feliz año nuevo (Google translate skillz at work)
Google got it right!!
What's happening is we break the loop when the number_of_events == number of fd indices captured here @ t->poll_count
is this wrong? this means we will break from the loop once all interfaces have sent the event (they are finished)
open_poll_files() { /* Set the poll count equal to the number of handles to track */ t->poll_count = fds_idx; }
wait_for_complete() {
while(1) { for (i = 0; i < t->poll_count; i++) { if(happy) number_of_events++; } if (number_of_events == t->poll_count) break; } if (!is_complete(t)) { fprintf(stderr, "life stinks\n"); return -BROKEN; }
}
is_complete() - then wants all iteration_counts to be equal to maximum or the test fails.
the test should fail if the iteration count does not equal the max, right?
as i see it, a successful test means: 1- each interfaces should send an event upon completion. 2- the iteration count should equal iteration_max on each of the interfaces
what am i missing?
OTOH if the loop doesn't break until all of the tests are complete we never hit that problem. The responsibility should be on kernel-space to ensure all tests complete anyway IMO.
the user app can bail out early too, if a timeout for the poll is given or in case of a signal interrupt.
bod