On 20 May 2014 13:25, Antonio Terceiro antonio.terceiro@linaro.org wrote:
On Mon, May 19, 2014 at 07:47:09PM +0100, Milosz Wasilewski wrote:
Hi,
I'm trying to submit job for TC2 now and I'm in the long queue. There seem to be a few multinode Android jobs that run on dummy-ssh and vexpress-tc2 (workload automation). We only have one dummy-ssh device so there is no way that more than one TC2 is going to be used with dummy-ssh at the same time. On top of that we have vexpress-tc2-benchmark which also can run multinode jobs with dummy-ssh. For some reason if there are couple of multinode jobs requested for dummy-ssh + vexpress-tc2, the TC2 boards get reserved and there is no way to submit any other jobs there. While I understand that 1 board might be in reserved state, there is no point to reserve all 3 (there is only one dummy-ssh). IMHO this is a bug in multinode.
This is a known issue. The only way we found of not letting multinode jobs starve waiting for devices forever is to reserve their devices as they become available instead of waiting for a moment when all of their requested devices would be available simultaneously.
We did not figure out a way of not letting multinode jobs deadlock that wouldn't involve a far more complicated mechanism.
Current status is:
dummy-ssh: 7 jobs in the queue vexpress-tc2: 3 reserved + 3 jobs in the queue
I know that proper solution should be moving with WA to dynamically allocated VMs, but unfortunately licensing is in the way.
Actually I am working right now on a patch to allow multiple dummy-ssh devices on the same host, which might solve this specific problem (assuming WA licensing allow multiple simultaneous uses withing the same host).
There is no restriction on the number connections in the license. There were some problems with lava-test-shell as the dummy-ssh is persistent and doesn't reboot. I guess we might run into some problems with multinode as there are parameters passed between target and host. They are written in the shared file. If there are more simultaneous jobs running on dummy-ssh there may be race conditions or the file might be overwritten.
milosz
-- Antonio Terceiro Software Engineer - Linaro http://www.linaro.org
linaro-validation mailing list linaro-validation@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-validation