Hi,
I'm trying to run some android tests using multi-node API. In order to make sure both nodes of the multi-node job are in known state I'm using lava-wait test_started/test_finished signals to sync between nodes. Signals are prefixed so they identify host-target lava-test-shell pairs with unique names. This works well when there is only one test scheduled in a job. If anything goes wrong, the worst case is tests will time out. However in case there are more tests scheduled in a single job, the flow control sometimes fails. It happens when lava-test-shell times out on one node. In this scenario I'm doing:
1. host (wait for test_started from target) 2. target -> test_started -> host
Tests are executed...
3. target (wait for test_finished from host) 4. time out on host
So in this scenario target waits for the test_finished signal and eventually times out as well (as the signal never comes). At the same time host node already starts executing next lava-test-shell when it waits for test_started signal from target. So nodes go out of sync and the job produces no results. Is there any way to avoid such situation?
Best Regards, milosz