QEMU health check apparently not running

List overview All Threads
Download

newer

older

u-boot devices broken after 2018.4...

Unable to run test on multiple...

Robert Marshall

23 May 2018 23 May '18

2:36 p.m.

At some point last week - I think because of network connectivity issues a job got stuck and I I cancelled it, it when run again it again appeared to hang. I again cancelled it and am now seeing the health check not start (at least no output appears on the job's webspage.

Looking at the output.yaml (in /var/lib/lava-server/default/media/job-output/2018/05/23/32 ) I see ... progress output for downloading https://images.validation.linaro.org/kvm/standard/stretch-2.img.gz

- {"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common] Preparing overlay tarball in /var/lib/lava/dispatcher/tmp/32/lava-overlay-aye3n2ke"} - {"dt": - "2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/default/media/job-output/2018/05/23/32

But none of this appears in http://localhost:8080/scheduler/job/32

and at the head of that page I see the message:

Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.

which other logs are best for checking whether this is an error that should be fed back?

(LAVA 2018.4)

Robert

Show replies by date

Neil Williams

29 May 29 May

7:19 a.m.

On 23 May 2018 at 15:36, Robert Marshall robert.marshall@codethink.co.uk wrote:

...

At some point last week - I think because of network connectivity issues a job got stuck and I I cancelled it, it when run again it again appeared to hang. I again cancelled it and am now seeing the health check not start (at least no output appears on the job's webspage.

What is the status of the relevant device(s) and any associated test jobs?

Check the /var/log/lava-server/lava-master.log for the reasons why the device is not being assigned.

Check the status of all daemons, including lava-logs

sudo service lava-master status sudo service lava-logs status sudo service lava-slave status

...

Looking at the output.yaml (in /var/lib/lava-server/default/media/job-output/2018/05/23/32 ) I see ... progress output for downloading https://images.validation. linaro.org/kvm/standard/stretch-2.img.gz

{"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common]

Preparing overlay tarball in /var/lib/lava/dispatcher/tmp/ 32/lava-overlay-aye3n2ke"}

{"dt":

"2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/

default/media/job-output/2018/05/23/32

But none of this appears in http://localhost:8080/scheduler/job/32

and at the head of that page I see the message:

Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.

which other logs are best for checking whether this is an error that should be fed back?

(LAVA 2018.4)

Robert _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Robert Marshall

2:12 p.m.

Neil Williams neil.williams@linaro.org writes:

...

On 23 May 2018 at 15:36, Robert Marshall robert.marshall@codethink.co.uk wrote:

At some point last week - I think because of network connectivity issues a job got stuck and I I cancelled it, it when run again it again appeared to hang. I again cancelled it and am now seeing the health check not start (at least no output appears on the job's webspage.

What is the status of the relevant device(s) and any associated test jobs?

The status of the device was Bad - as the problems with the device have now resolved maybe it is hard to diagnose further? But adding below what I can see.

...

Check the /var/log/lava-server/lava-master.log for the reasons why the device is not being assigned.

I think this was when it was failing rather than when I was cancelling it.

2018-05-23 14:26:18,620 ERROR [32] Error: b'Traceback (most recent call last): File "/usr/bin/lava-run", line 246, in <module> sys.exit(main()) File "/usr/bin/lava-run", line 233, in main logger.close() # pylint: disable=no-member File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 87, in close self.handler.close(linger) File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 71, in close self.context.destroy(linger=linger) File "zmq/backend/cython/context.pyx", line 244, in zmq.backend.cython.context.Context.destroy (zmq/backend/cython/context.c:3067) File "zmq/backend/cython/context.pyx", line 136, in zmq.backend.cython.context.Context.term (zmq/backend/cython/context.c:2348) File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/context.c:3216) File "/usr/bin/lava-run", line 151, in cancelling_handler raise JobCanceled("The job was canceled") lava_dispatcher.action.JobCanceled: The job was canceled '

Though this is maybe more interesting?:

2018-05-23 14:26:18,655 ERROR [32] Unable to dump 'description.yaml' 2018-05-23 14:26:18,655 ERROR [32] Compressed data ended before the end-of-stream marker was reached Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-master.py", line 333, in _handle_end description = lzma.decompress(compressed_description) File "/usr/lib/python3.5/lzma.py", line 340, in decompress raise LZMAError("Compressed data ended before the " _lzma.LZMAError: Compressed data ended before the end-of-stream marker was reached

...

Check the status of all daemons, including lava-logs

WARNING lava-logs is offline: can't schedule jobs

...

sudo service lava-master status sudo service lava-logs status sudo service lava-slave status Looking at the output.yaml (in /var/lib/lava-server/default/media/job-output/2018/05/23/32 ) I see ... progress output for downloading https://images.validation.linaro.org/kvm/standard/stretch-2.img.gz

{"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common] Preparing overlay tarball in

/var/lib/lava/dispatcher/tmp/32/lava-overlay-aye3n2ke"}

{"dt":

"2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/default/media/job-output/2018/05/23/32

But none of this appears in http://localhost:8080/scheduler/job/32

and at the head of that page I see the message:

Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.

which other logs are best for checking whether this is an error that should be fed back?

(LAVA 2018.4)

Robert _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users

-- Robert Marshall, Software Developer Codethink Ltd Telephone: +44 7762 840 414 3rd Floor, Dale House, 35 Dale Street https://www.codethink.co.uk/ MANCHESTER, M1 2HF. United Kingdom We respect your privacy. See https://www.codethink.co.uk/privacy.html

Neil Williams

30 May 30 May

7:22 a.m.

On 29 May 2018 at 15:12, Robert Marshall robert.marshall@codethink.co.uk wrote:

...

Neil Williams neil.williams@linaro.org writes:

...
On 23 May 2018 at 15:36, Robert Marshall <robert.marshall@codethink.co.

uk> wrote:

...
At some point last week - I think because of network connectivity issues a job got stuck and I I cancelled it, it when run again it again

appeared to hang. I again

...
cancelled it and am now seeing the health check not start (at least no output appears on the job's webspage.

What is the status of the relevant device(s) and any associated test

jobs?

The status of the device was Bad - as the problems with the device have now resolved maybe it is hard to diagnose further? But adding below what I can see.

So a health check failed. You will need to resolve the problem and re-run the health check by setting the health to Unknown.

...

...
Check the /var/log/lava-server/lava-master.log for the reasons why the

device is not being assigned.

...
I think this was when it was failing rather than when I was cancelling it.

2018-05-23 14:26:18,620 ERROR [32] Error: b'Traceback (most recent call last): File "/usr/bin/lava-run", line 246, in <module> sys.exit(main()) File "/usr/bin/lava-run", line 233, in main logger.close() # pylint: disable=no-member File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 87, in close self.handler.close(linger) File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 71, in close self.context.destroy(linger=linger) File "zmq/backend/cython/context.pyx", line 244, in zmq.backend.cython.context.Context.destroy (zmq/backend/cython/context.c:3067) File "zmq/backend/cython/context.pyx", line 136, in zmq.backend.cython.context.Context.term (zmq/backend/cython/context.c:2348) File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/context.c:3216) File "/usr/bin/lava-run", line 151, in cancelling_handler raise JobCanceled("The job was canceled") lava_dispatcher.action.JobCanceled: The job was canceled '

Though this is maybe more interesting?:

2018-05-23 14:26:18,655 ERROR [32] Unable to dump 'description.yaml' 2018-05-23 14:26:18,655 ERROR [32] Compressed data ended before the end-of-stream marker was reached Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_server/ management/commands/lava-master.py", line 333, in _handle_end description = lzma.decompress(compressed_description) File "/usr/lib/python3.5/lzma.py", line 340, in decompress raise LZMAError("Compressed data ended before the " _lzma.LZMAError: Compressed data ended before the end-of-stream marker was reached

Likely that the test job specifies the wrong compression or that the file is invalid.

...

...
Check the status of all daemons, including lava-logs

WARNING lava-logs is offline: can't schedule jobs

Check the rest of that log file and the systemd status of the lava-logs service, make sure that service can run normally.

...

...
sudo service lava-master status sudo service lava-logs status sudo service lava-slave status

Looking at the output.yaml (in /var/lib/lava-server/default/media/job-output/2018/05/23/32

) I see

...
... progress output for downloading https://images.validation.

linaro.org/kvm/standard/stretch-2.img.gz

...

{"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common]

Preparing overlay tarball in

...
/var/lib/lava/dispatcher/tmp/32/lava-overlay-aye3n2ke"}

{"dt":

"2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/

default/media/job-output/2018/05/23/32

...
But none of this appears in http://localhost:8080/scheduler/job/32

and at the head of that page I see the message:

Unable to parse invalid logs: This is maybe a bug in LAVA that should

be reported.

...
which other logs are best for checking whether this is an error that should be fed back?

(LAVA 2018.4)

Robert _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users

-- Robert Marshall, Software Developer Codethink Ltd Telephone: +44 7762 840 414 3rd Floor, Dale House, 35 Dale Street https://www.codethink.co.uk/ MANCHESTER, M1 2HF. United Kingdom We respect your privacy. See https://www.codethink.co.uk/privacy.html

-- Neil Williams ============= neil.williams@linaro.org http://www.linux.codehelp.co.uk/

Robert Marshall

31 May 31 May

1:21 p.m.

Neil Williams neil.williams@linaro.org writes:

...

On 29 May 2018 at 15:12, Robert Marshall robert.marshall@codethink.co.uk wrote:

Neil Williams neil.williams@linaro.org writes:

...
On 23 May 2018 at 15:36, Robert Marshall robert.marshall@codethink.co.uk wrote:

At some point last week - I think because of network connectivity issues a job got stuck and I I cancelled it, it when run again it again appeared to hang. I again cancelled it and am now seeing the health check not start (at least no output appears on the job's webspage.

What is the status of the relevant device(s) and any associated test jobs?

The status of the device was Bad - as the problems with the device have now resolved maybe it is hard to diagnose further? But adding below what I can see.

So a health check failed. You will need to resolve the problem and re-run the health check by setting the health to Unknown.

Yes that's what I did (setting the health to Unknown) - multiple times - and I previously was getting that message about 'Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.' on a re-run - I'm no longer getting that so in that sense the issue is resolved.

...

...
Check the /var/log/lava-server/lava-master.log for the reasons why the device is not being assigned.

I think this was when it was failing rather than when I was cancelling it.

2018-05-23 14:26:18,620 ERROR [32] Error: b'Traceback (most recent call last): File "/usr/bin/lava-run", line 246, in <module> sys.exit(main()) File "/usr/bin/lava-run", line 233, in main logger.close() # pylint: disable=no-member File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 87, in close self.handler.close(linger) File "/usr/lib/python3/dist-packages/lava_dispatcher/log.py", line 71, in close self.context.destroy(linger=linger) File "zmq/backend/cython/context.pyx", line 244, in zmq.backend.cython.context.Context.destroy (zmq/backend/cython/context.c:3067) File "zmq/backend/cython/context.pyx", line 136, in zmq.backend.cython.context.Context.term (zmq/backend/cython/context.c:2348) File "zmq/backend/cython/checkrc.pxd", line 12, in zmq.backend.cython.checkrc._check_rc (zmq/backend/cython/context.c:3216) File "/usr/bin/lava-run", line 151, in cancelling_handler raise JobCanceled("The job was canceled") lava_dispatcher.action.JobCanceled: The job was canceled '

Though this is maybe more interesting?:

2018-05-23 14:26:18,655 ERROR [32] Unable to dump 'description.yaml' 2018-05-23 14:26:18,655 ERROR [32] Compressed data ended before the end-of-stream marker was reached Traceback (most recent call last): File "/usr/lib/python3/dist-packages/lava_server/management/commands/lava-master.py", line 333, in _handle_end description = lzma.decompress(compressed_description) File "/usr/lib/python3.5/lzma.py", line 340, in decompress raise LZMAError("Compressed data ended before the " _lzma.LZMAError: Compressed data ended before the end-of-stream marker was reached

Likely that the test job specifies the wrong compression or that the file is invalid.

Though the test file was unchanged between the version that works and the one that doesn't. I'm guessing there was a networking glitch which set this all off.

...

...
Check the status of all daemons, including lava-logs

WARNING lava-logs is offline: can't schedule jobs

Check the rest of that log file and the systemd status of the lava-logs service, make sure that service can run normally.

...
sudo service lava-master status sudo service lava-logs status sudo service lava-slave status Looking at the output.yaml (in /var/lib/lava-server/default/media/job-output/2018/05/23/32 ) I see ... progress output for downloading https://images.validation.linaro.org/kvm/standard/stretch-2.img.gz

{"dt": "2018-05-23T07:39:54.728015", "lvl": "debug", "msg": "[common] Preparing overlay tarball in

/var/lib/lava/dispatcher/tmp/32/lava-overlay-aye3n2ke"}

{"dt":

"2018-05-23T07:39:54.728root@stretch:/var/lib/lava-server/default/media/job-output/2018/05/23/32

But none of this appears in http://localhost:8080/scheduler/job/32

and at the head of that page I see the message:

Unable to parse invalid logs: This is maybe a bug in LAVA that should be reported.

which other logs are best for checking whether this is an error that should be fed back?

(LAVA 2018.4)

Robert _______________________________________________ Lava-users mailing list Lava-users@lists.linaro.org https://lists.linaro.org/mailman/listinfo/lava-users

2589

days inactive

2597

days old

lava-users@lists.linaro.org

4 comments

participants

tags (0)

participants (2)

Neil Williams
Robert Marshall