Hi folks,
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
https://pmwg.validation.linaro.org/scheduler/job/998#L1009
AFAICT, nothing changed. If I were to log onto the PMWG server, I can still see the full list of probes used in the PMWG lab.
Any ideas?
Regards, Lisa
[1] https://pmwg.validation.linaro.org/scheduler/job/720#L1037
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
Thank You.
On 19 March 2017 at 22:10, Senthil Kumaran S senthil.kumaran@linaro.org wrote:
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
I see that the device dictionary got updated with device_info added.
I'm not sure if I see the path of the probe under /dev unless I'm missing something in the job definition? https://pmwg.validation.linaro.org/scheduler/job/1335#L1051
Thank You.
Senthil Kumaran http://www.stylesen.org/ http://www.sasenthilkumaran.com/
Hi Lisa,
I need to fix the device dict based on some things that Neil did.
Still on my radar.
Dave
On 12 Apr 2017, at 16:08, Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 19 March 2017 at 22:10, Senthil Kumaran S <senthil.kumaran@linaro.org mailto:senthil.kumaran@linaro.org> wrote:
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
I see that the device dictionary got updated with device_info added.
I'm not sure if I see the path of the probe under /dev unless I'm missing something in the job definition? https://pmwg.validation.linaro.org/scheduler/job/1335#L1051 https://pmwg.validation.linaro.org/scheduler/job/1335#L1051
Thank You.
Senthil Kumaran http://www.stylesen.org/ http://www.stylesen.org/ http://www.sasenthilkumaran.com/ http://www.sasenthilkumaran.com/
linaro-validation mailing list linaro-validation@lists.linaro.org mailto:linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation https://lists.linaro.org/mailman/listinfo/linaro-validation
Hi Dave,
On 12 April 2017 at 08:09, Dave Pigott dave.pigott@linaro.org wrote:
Hi Lisa,
I need to fix the device dict based on some things that Neil did.
Still on my radar.
As we discussed during our sync, the device dictionary looks correct, yet I'm not sure if the probe is detected inside the LXC. I ran another job and it's the same result as the previous one I posted in this thread.
Also I don't know how USB devices are listed inside LXCs, but I would expect to see something like /dev/bus/usb/***, /dev/ttyUSB*, or /dev/ttyS*.
Can you check to see if I'm overlooking something in the log below? https://pmwg.validation.linaro.org/scheduler/job/1389#L1030
Dave
On 12 Apr 2017, at 16:08, Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 19 March 2017 at 22:10, Senthil Kumaran S senthil.kumaran@linaro.org wrote:
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
I see that the device dictionary got updated with device_info added.
I'm not sure if I see the path of the probe under /dev unless I'm missing something in the job definition? https://pmwg.validation.linaro.org/scheduler/job/1335#L1051
Thank You.
Senthil Kumaran http://www.stylesen.org/ http://www.sasenthilkumaran.com/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
On Tue, 18 Apr 2017 12:13:01 -0700 Lisa Nguyen lisa.nguyen@linaro.org wrote:
Hi Dave,
On 12 April 2017 at 08:09, Dave Pigott dave.pigott@linaro.org wrote:
Hi Lisa,
I need to fix the device dict based on some things that Neil did.
Still on my radar.
https://pmwg.validation.linaro.org/scheduler/job/1405#L385
This problem is now fixed. For reference, USB devices which show a serial number which includes forward slash / will have that slash replaced by an underscore when processed through pyudev. The device dictionary has been updated for pmwg panda-01. A documentation fix is in preparation.
(BTW 2017.4 includes NFS support for panda, so I used that in my test job, your ramdisk test job will work too.)
As we discussed during our sync, the device dictionary looks correct, yet I'm not sure if the probe is detected inside the LXC. I ran another job and it's the same result as the previous one I posted in this thread.
Also I don't know how USB devices are listed inside LXCs, but I would expect to see something like /dev/bus/usb/***, /dev/ttyUSB*, or /dev/ttyS*.
Can you check to see if I'm overlooking something in the log below? https://pmwg.validation.linaro.org/scheduler/job/1389#L1030
Dave
On 12 Apr 2017, at 16:08, Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 19 March 2017 at 22:10, Senthil Kumaran S senthil.kumaran@linaro.org wrote:
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
I see that the device dictionary got updated with device_info added.
I'm not sure if I see the path of the probe under /dev unless I'm missing something in the job definition? https://pmwg.validation.linaro.org/scheduler/job/1335#L1051
Thank You.
Senthil Kumaran http://www.stylesen.org/ http://www.sasenthilkumaran.com/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
On 20 April 2017 at 02:57, Neil Williams codehelp@debian.org wrote:
On Tue, 18 Apr 2017 12:13:01 -0700 Lisa Nguyen lisa.nguyen@linaro.org wrote:
Hi Dave,
On 12 April 2017 at 08:09, Dave Pigott dave.pigott@linaro.org wrote:
Hi Lisa,
I need to fix the device dict based on some things that Neil did.
Still on my radar.
https://pmwg.validation.linaro.org/scheduler/job/1405#L385
This problem is now fixed. For reference, USB devices which show a serial number which includes forward slash / will have that slash replaced by an underscore when processed through pyudev. The device dictionary has been updated for pmwg panda-01. A documentation fix is in preparation.
(BTW 2017.4 includes NFS support for panda, so I used that in my test job, your ramdisk test job will work too.)
Thanks for looking into this, Neil.
I wrote a test definition to build, install, and run the arm-probe command-line tool in LAVA, but I'm running into the error where the probe path cannot be accessed without running into the 'exclusive' error: https://pmwg.validation.linaro.org/scheduler/job/1489#L3776
In a manual setting, I expect to see output similar to this:
# configuration: config-panda-lab # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Wed, 26 Apr 2017 22:47:59 +0100 # host: pmwg-server-01.pmwglab # + /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00 Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.19 0.0994 0.51568 0.000600 5.19 0.0704 0.36538 0.000700 5.19 0.0994 0.51538 0.000800 5.19 0.0764 0.39608 0.000900 5.20 0.0654 0.33967 0.001000 5.19 0.0934 0.48444 0.001100 5.19 0.0754 0.39097 0.001200 5.19 0.0734 0.38052 ... ...
However, I was able to verify that the LXC detected the probe by writing an inline test definition to use the lsusb command: https://pmwg.validation.linaro.org/scheduler/job/1489#L2049
As we discussed during our sync, the device dictionary looks correct, yet I'm not sure if the probe is detected inside the LXC. I ran another job and it's the same result as the previous one I posted in this thread.
Also I don't know how USB devices are listed inside LXCs, but I would expect to see something like /dev/bus/usb/***, /dev/ttyUSB*, or /dev/ttyS*.
Can you check to see if I'm overlooking something in the log below? https://pmwg.validation.linaro.org/scheduler/job/1389#L1030
Dave
On 12 Apr 2017, at 16:08, Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 19 March 2017 at 22:10, Senthil Kumaran S senthil.kumaran@linaro.org wrote:
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
I see that the device dictionary got updated with device_info added.
I'm not sure if I see the path of the probe under /dev unless I'm missing something in the job definition? https://pmwg.validation.linaro.org/scheduler/job/1335#L1051
Thank You.
Senthil Kumaran http://www.stylesen.org/ http://www.sasenthilkumaran.com/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
--
Neil Williams
On Wed, 26 Apr 2017 14:53:41 -0700 Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 20 April 2017 at 02:57, Neil Williams codehelp@debian.org wrote:
On Tue, 18 Apr 2017 12:13:01 -0700 Lisa Nguyen lisa.nguyen@linaro.org wrote:
Hi Dave,
On 12 April 2017 at 08:09, Dave Pigott dave.pigott@linaro.org wrote:
Hi Lisa,
I need to fix the device dict based on some things that Neil did.
Still on my radar.
https://pmwg.validation.linaro.org/scheduler/job/1405#L385
This problem is now fixed. For reference, USB devices which show a serial number which includes forward slash / will have that slash replaced by an underscore when processed through pyudev. The device dictionary has been updated for pmwg panda-01. A documentation fix is in preparation.
(BTW 2017.4 includes NFS support for panda, so I used that in my test job, your ramdisk test job will work too.)
Thanks for looking into this, Neil.
I wrote a test definition to build, install, and run the arm-probe command-line tool in LAVA, but I'm running into the error where the probe path cannot be accessed without running into the 'exclusive' error: https://pmwg.validation.linaro.org/scheduler/job/1489#L3776
OK, I've identified what is wrong here. LAVA is only adding devices from the 'usb' subsystem to the LXC but the energy probe software needs the 'tty' element instead or as well.
I've tried to use mknod to create the device within the LXC but LXC itself seems to do extra work to allow programs like minicom to use the device. In local tests, I can't get minicom to be happy with ttyUSB0 created using mknod inside the LXC, even thought the device looks the same as it is outside the LXC. Using lxc-device add works with minicom. (I'm testing with a usb serial device connected to a BBB rather than an energy probe but I'm hoping the energy probe code is essentially doing the same operations as minicom.)
So this will need a code change (which I've already prepared) to find the tty device node and pass that to lxc-device add. We are looking to release 2017.5 relatively soon - once we have had time to do full testing on staging after the much needed reorganisation of staging.validation.linaro.org. (The re-org includes the provision of devices with energy probes attached, so we can include checks on this functionality in future releases.)
In a manual setting, I expect to see output similar to this:
# configuration: config-panda-lab # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Wed, 26 Apr 2017 22:47:59 +0100 # host: pmwg-server-01.pmwglab #
- /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.19 0.0994 0.51568 0.000600 5.19 0.0704 0.36538 0.000700 5.19 0.0994 0.51538 0.000800 5.19 0.0764 0.39608 0.000900 5.20 0.0654 0.33967 0.001000 5.19 0.0934 0.48444 0.001100 5.19 0.0754 0.39097 0.001200 5.19 0.0734 0.38052 ... ...
However, I was able to verify that the LXC detected the probe by writing an inline test definition to use the lsusb command: https://pmwg.validation.linaro.org/scheduler/job/1489#L2049
As we discussed during our sync, the device dictionary looks correct, yet I'm not sure if the probe is detected inside the LXC. I ran another job and it's the same result as the previous one I posted in this thread.
Also I don't know how USB devices are listed inside LXCs, but I would expect to see something like /dev/bus/usb/***, /dev/ttyUSB*, or /dev/ttyS*.
Can you check to see if I'm overlooking something in the log below? https://pmwg.validation.linaro.org/scheduler/job/1389#L1030
Dave
On 12 Apr 2017, at 16:08, Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 19 March 2017 at 22:10, Senthil Kumaran S senthil.kumaran@linaro.org wrote:
Hi Lisa,
On Saturday 18 March 2017 12:49 AM, Lisa Nguyen wrote:
After re-running a sample test job[1] to test AEP support, I noticed that the usb path to the probe for the pandaboard was not detected in the container anymore:
https://pmwg.validation.linaro.org/scheduler/job/997#L1014
To verify that the /dev/usb/ directory didn't exist, I tweaked the test job a little to check the contents of /dev inside lxc and ran it again:
This has quite changed in 2017.2 as a result of https://projects.linaro.org/browse/LAVA-814
We do not use the symlink provided by udev rules anymore, instead we use the device_info parameter from device dictionary. See https://staging.validation.linaro.org/static/docs/v2/admin-lxc-deploy.html#a... for how to use device_info.
In order to use the above the panda device dictionary should be updated.
I see that the device dictionary got updated with device_info added.
I'm not sure if I see the path of the probe under /dev unless I'm missing something in the job definition? https://pmwg.validation.linaro.org/scheduler/job/1335#L1051
Thank You.
Senthil Kumaran http://www.stylesen.org/ http://www.sasenthilkumaran.com/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
--
Neil Williams
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
CC:'ing Andy to see if there is extra config required to get this to work.
Andy: can you look at https://staging.validation.linaro.org/scheduler/job/174524 and see why the code does not start reading from the probe?
dmesg -T and lsusb -v output is included in the log, from inside the LXC.
On Wed, 26 Apr 2017 14:53:41 -0700 Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 20 April 2017 at 02:57, Neil Williams codehelp@debian.org wrote:
On Tue, 18 Apr 2017 12:13:01 -0700 Lisa Nguyen lisa.nguyen@linaro.org wrote:
On 12 April 2017 at 08:09, Dave Pigott dave.pigott@linaro.org
https://pmwg.validation.linaro.org/scheduler/job/1405#L385
This problem is now fixed. For reference, USB devices which show a serial number which includes forward slash / will have that slash replaced by an underscore when processed through pyudev. The device dictionary has been updated for pmwg panda-01. A documentation fix is in preparation.
(BTW 2017.4 includes NFS support for panda, so I used that in my test job, your ramdisk test job will work too.)
Thanks for looking into this, Neil.
I wrote a test definition to build, install, and run the arm-probe command-line tool in LAVA, but I'm running into the error where the probe path cannot be accessed without running into the 'exclusive' error: https://pmwg.validation.linaro.org/scheduler/job/1489#L3776
OK, I've identified what is wrong here. LAVA is only adding devices from the 'usb' subsystem to the LXC but the energy probe software needs the 'tty' element instead or as well.
I've tried to use mknod to create the device within the LXC but LXC itself seems to do extra work to allow programs like minicom to use the device. In local tests, I can't get minicom to be happy with ttyUSB0 created using mknod inside the LXC, even thought the device looks the same as it is outside the LXC. Using lxc-device add works with minicom. (I'm testing with a usb serial device connected to a BBB rather than an energy probe but I'm hoping the energy probe code is essentially doing the same operations as minicom.)
So this will need a code change (which I've already prepared) to find the tty device node and pass that to lxc-device add. We are looking to release 2017.5 relatively soon - once we have had time to do full testing on staging after the much needed reorganisation of staging.validation.linaro.org. (The re-org includes the provision of devices with energy probes attached, so we can include checks on this functionality in future releases.)
https://staging.validation.linaro.org/scheduler/job/174516#results_0_find-pr...
In a manual setting, I expect to see output similar to this:
# configuration: config-panda-lab # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Wed, 26 Apr 2017 22:47:59 +0100 # host: pmwg-server-01.pmwglab #
- /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00
The /dev/serial/by-id/ symlink is not being created automatically but even creating that within the test, I cannot get the supplied configuration to work.
+ arm-probe -C ../config -l 10 -x Configuration: pandaboard done all capture exited # configuration: ../config # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us # date: Wed, 17 May 2017 14:58:48 +0100 # host: lxc-aep-test-174524 # + set +x <LAVA_SIGNAL_ENDRUN 2_arm-probe 174524_1.7.4.9>
https://staging.validation.linaro.org/scheduler/job/174524#L5766
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.19 0.0994 0.51568 0.000600 5.19 0.0704 0.36538 0.000700 5.19 0.0994 0.51538 0.000800 5.19 0.0764 0.39608 0.000900 5.20 0.0654 0.33967 0.001000 5.19 0.0934 0.48444 0.001100 5.19 0.0754 0.39097 0.001200 5.19 0.0734 0.38052 ... ...
However, I was able to verify that the LXC detected the probe by writing an inline test definition to use the lsusb command: https://pmwg.validation.linaro.org/scheduler/job/1489#L2049
https://staging.validation.linaro.org/scheduler/job/174524#L2079
The only thing which can be done at this stage is to pull staging-panda-03 out of the lab and I'll have to interrogate it on my desk.
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
CC:'ing Andy to see if there is extra config required to get this to work.
Andy: can you look at https://staging.validation.linaro.org/scheduler/job/174524 and see why the code does not start reading from the probe?
dmesg -T and lsusb -v output is included in the log, from inside the LXC.
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 # + /dev/ttyACM0 Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
1. If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but there's no error handling. :-(
2. The "-x" option says that the arm-probe program is meant to exit when you've done capturing, but it just sits there forever when I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Cheers,
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but there's no error handling. :-(
The "-x" option says that the arm-probe program is meant to exit when you've done capturing, but it just sits there forever when I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
On 19 May 2017, at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but there's no error handling. :-(
The "-x" option says that the arm-probe program is meant to exit
when you've done capturing, but it just sits there forever when I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
—
Hi Neil,
This is blocking https://projects.linaro.org/browse/CTT-124 https://projects.linaro.org/browse/CTT-124 - can we not do a hot fix?
Thanks
Dave
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but
there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to exit when you've done capturing, but it just sits there forever when
I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b2958e3045da77d7d...) but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but
there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to exit when you've done capturing, but it just sits there forever when
I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b2958e3045da77d7d...) but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Neil Williams
http://www.linux.codehelp.co.uk/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but
there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to exit when you've done capturing, but it just sits there forever when
I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Neil Williams
http://www.linux.codehelp.co.uk/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or
is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to
exit when you've done capturing, but it just sits there forever when I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or
is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to
exit when you've done capturing, but it just sits there forever when I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
--
Neil Williams
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >On Thu, 27 Apr 2017 08:19:19 +0100 >Neil Williams codehelp@debian.org wrote: >
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or
is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to
exit when you've done capturing, but it just sits there forever when I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
--
Neil Williams
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
What about using 2 AEPs ?
Regards, Vincent
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
> Hi folks! > > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: > >On Thu, 27 Apr 2017 08:19:19 +0100 > >Neil Williams codehelp@debian.org wrote: > > > > I've just run a local test with an AEP inside lxc on my local > machine. As far as I can see, there's nothing particularly magic > going on here. The only problem I see is Lisa's config file > pointing at the wrong device file. arm-probe needs a ttyACM-style > device to talk to. Using: > > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 > > I create that device in my container. I build libwebsockets and > the arm-probe software in the container, then > specify /dev/ttyACM0 in the AEP config file. I can run it just > fine: > > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg > # config_name: pandaboard > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) > 400us Configuration: pandaboard > # date: Fri, 19 May 2017 16:29:50 +0100 > # host: lxc-aep-test-174524 > # > + /dev/ttyACM0 > Starting... > sending start to 0 > # VDD_ALL VDD ROOT #ff0000 SoC > # > # > time VDD(V) VDD(A) VDD(W) > 0.000500 5.11 0.0474 0.24196 > 0.000600 5.11 0.0364 0.18572 > 0.000700 5.11 0.0314 0.16012 > 0.000800 5.10 0.0544 0.27734 > 0.000900 5.10 0.0234 0.11923 > 0.001000 5.11 0.0304 0.15505 > ... > > I don't have any problems running things and getting output here. > > I *have* seen two real bugs here while trying to get things > running, though: > > 1. If the device specified in the config file doesn't exist, or > is the wrong type of device, or (maybe) there is any other kind > of problem with it, you get *no* useful feedback to say there's a > problem. Running things under strace will show the background > libarmep process attempt to use the device specified, but > there's no error handling. :-( > > 2. The "-x" option says that the arm-probe program is meant to > exit when you've done capturing, but it just sits there forever > when I'm testing. I've wrapped it using the "timeout" command to > work around that for now. > > If I knew where to file those bugs, I would, but it's really not > obvious. They're really easy to reproduce, I hope... > > In terms of the /dev/ttyACM0 creation, the lxc-device man page > says that it creates devices based on their existing entries on > the host. Double-check that the host (dispatcher) has an > appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
--
Neil Williams
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely need secondary connections and MultiNode to separate the output.
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
> On Fri, 19 May 2017 16:48:11 +0100 > Steve McIntyre steve.mcintyre@linaro.org wrote: > > > Hi folks! > > > > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: > > >On Thu, 27 Apr 2017 08:19:19 +0100 > > >Neil Williams codehelp@debian.org wrote: > > > > > > > I've just run a local test with an AEP inside lxc on my local > > machine. As far as I can see, there's nothing particularly magic > > going on here. The only problem I see is Lisa's config file > > pointing at the wrong device file. arm-probe needs a ttyACM-style > > device to talk to. Using: > > > > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 > > > > I create that device in my container. I build libwebsockets and > > the arm-probe software in the container, then > > specify /dev/ttyACM0 in the AEP config file. I can run it just > > fine: > > > > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C > > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg > > # config_name: pandaboard > > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) > > 400us Configuration: pandaboard > > # date: Fri, 19 May 2017 16:29:50 +0100 > > # host: lxc-aep-test-174524 > > # > > + /dev/ttyACM0 > > Starting... > > sending start to 0 > > # VDD_ALL VDD ROOT #ff0000 SoC > > # > > # > > time VDD(V) VDD(A) VDD(W) > > 0.000500 5.11 0.0474 0.24196 > > 0.000600 5.11 0.0364 0.18572 > > 0.000700 5.11 0.0314 0.16012 > > 0.000800 5.10 0.0544 0.27734 > > 0.000900 5.10 0.0234 0.11923 > > 0.001000 5.11 0.0304 0.15505 > > ... > > > > I don't have any problems running things and getting output here. > > > > I *have* seen two real bugs here while trying to get things > > running, though: > > > > 1. If the device specified in the config file doesn't exist, or > > is the wrong type of device, or (maybe) there is any other kind > > of problem with it, you get *no* useful feedback to say there's a > > problem. Running things under strace will show the background > > libarmep process attempt to use the device specified, but > > there's no error handling. :-( > > > > 2. The "-x" option says that the arm-probe program is meant to > > exit when you've done capturing, but it just sits there forever > > when I'm testing. I've wrapped it using the "timeout" command to > > work around that for now. > > > > If I knew where to file those bugs, I would, but it's really not > > obvious. They're really easy to reproduce, I hope... > > > > In terms of the /dev/ttyACM0 creation, the lxc-device man page > > says that it creates devices based on their existing entries on > > the host. Double-check that the host (dispatcher) has an > > appropriate /dev/ttyACM0 if you're still seeing problems? > > Steve was using staging-panda03 with the ARM Energy Probe which I'd > been using for the tests of the new code to ensure > that /dev/ttyACM0 can be attached to the LXC. > > That panda and AEP will shortly return to staging and then the > changes to LAVA and the required changes to the test definition > can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
--
Neil Williams
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 6 June 2017 at 14:03, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
ok
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
need secondary connections and MultiNode to separate the output.
Is it something that Lisa can do by herself or does it need some changes from your side ?
Regards, Vincent
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote: > On Fri, 19 May 2017 17:02:14 +0100 > Neil Williams codehelp@debian.org wrote: > >> On Fri, 19 May 2017 16:48:11 +0100 >> Steve McIntyre steve.mcintyre@linaro.org wrote: >> >> > Hi folks! >> > >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >> > >Neil Williams codehelp@debian.org wrote: >> > > >> > >> > I've just run a local test with an AEP inside lxc on my local >> > machine. As far as I can see, there's nothing particularly magic >> > going on here. The only problem I see is Lisa's config file >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >> > device to talk to. Using: >> > >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >> > >> > I create that device in my container. I build libwebsockets and >> > the arm-probe software in the container, then >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >> > fine: >> > >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >> > # config_name: pandaboard >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >> > 400us Configuration: pandaboard >> > # date: Fri, 19 May 2017 16:29:50 +0100 >> > # host: lxc-aep-test-174524 >> > # >> > + /dev/ttyACM0 >> > Starting... >> > sending start to 0 >> > # VDD_ALL VDD ROOT #ff0000 SoC >> > # >> > # >> > time VDD(V) VDD(A) VDD(W) >> > 0.000500 5.11 0.0474 0.24196 >> > 0.000600 5.11 0.0364 0.18572 >> > 0.000700 5.11 0.0314 0.16012 >> > 0.000800 5.10 0.0544 0.27734 >> > 0.000900 5.10 0.0234 0.11923 >> > 0.001000 5.11 0.0304 0.15505 >> > ... >> > >> > I don't have any problems running things and getting output here. >> > >> > I *have* seen two real bugs here while trying to get things >> > running, though: >> > >> > 1. If the device specified in the config file doesn't exist, or >> > is the wrong type of device, or (maybe) there is any other kind >> > of problem with it, you get *no* useful feedback to say there's a >> > problem. Running things under strace will show the background >> > libarmep process attempt to use the device specified, but >> > there's no error handling. :-( >> > >> > 2. The "-x" option says that the arm-probe program is meant to >> > exit when you've done capturing, but it just sits there forever >> > when I'm testing. I've wrapped it using the "timeout" command to >> > work around that for now. >> > >> > If I knew where to file those bugs, I would, but it's really not >> > obvious. They're really easy to reproduce, I hope... >> > >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >> > says that it creates devices based on their existing entries on >> > the host. Double-check that the host (dispatcher) has an >> > appropriate /dev/ttyACM0 if you're still seeing problems? >> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >> been using for the tests of the new code to ensure >> that /dev/ttyACM0 can be attached to the LXC. >> >> That panda and AEP will shortly return to staging and then the >> changes to LAVA and the required changes to the test definition >> can be available for the 2017.6 release. > > OK. staging-panda03 is back and has been running tests. This is what > we've learnt so far: > > 0: This does not appear to be an LXC issue. Running the commands > manually on the worker with the same LXC on the same worker does > return data from the probe. > > 1: Running the same commands in "headless" mode shows that the probe > software starts successfully but something within the protocol > parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
> > 2: The websockets dependency is completely unnecessary and has been > disabled in the build I've been testing: > https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
> > 3: We've added a *lot* of debug to the arm-probe code > (https://staging.validation.linaro.org/scheduler/job/174969 which > was run using > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b 2958e3045da77d7db25a7cfe48359211aa4cf1) > but are not much closer to identifying the precise problem with the > code. However, I am satisfied that this is a problem in the > arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
> > 4: the arm-probe code is appallingly difficult to read and debug. It > also seems unnecessarily complex. > > 5: I plan to remove a lot of the debug from the cloned arm-probe > repository (which has also had a few fixes to compile with gcc6) but > I'm running out of time to work on the arm-probe software myself. > > Someone needs to update the arm-probe software: > > a) to remove websockets as a compile-time option as this only bloats > the build in automation where a web based UI is impossible anyway. > I've done this by brute force in my cloned repo, I just patched out > the dependency. > > b) improve the code to have comments and output about what is > happening and why when verbose mode is used. > > c) Identify what is preventing the software from receiving data from > the probe when run in automation. > > d) the config file still needs fixes to allow for changes in the > device node name from one probe to another. > > --
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
--
Neil Williams
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 6 June 2017 at 13:11, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:03, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
ok
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
These configuration files may need to be generated within the test shell definition at runtime, based on parameters. The test shell will need to work out which device is which probe and this could be awkward without /dev/serial/by-id support. The enumeration order of ttyUSB0 and ttyUSB1 cannot be guaranteed. dmesg remains available inside the LXC, so some automated parsing may be required. If the arm-probe software can be modified to use a more sane configuration file syntax, this could also be addressed there.
need secondary connections and MultiNode to separate the output.
Is it something that Lisa can do by herself or does it need some changes from your side ?
Secondary connections and MultiNode can be adopted by test writers without any changes in LAVA.
https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#...
Any testjob using MultiNode has a certain level of complexity, so the change is non-trivial.
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
Regards, Vincent
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
> Hi Neil, > > Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a > écrit : > > On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote: > > On Fri, 19 May 2017 17:02:14 +0100 > > Neil Williams codehelp@debian.org wrote: > > > >> On Fri, 19 May 2017 16:48:11 +0100 > >> Steve McIntyre steve.mcintyre@linaro.org wrote: > >> > >> > Hi folks! > >> > > >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: > >> > >On Thu, 27 Apr 2017 08:19:19 +0100 > >> > >Neil Williams codehelp@debian.org wrote: > >> > > > >> > > >> > I've just run a local test with an AEP inside lxc on my local > >> > machine. As far as I can see, there's nothing particularly magic > >> > going on here. The only problem I see is Lisa's config file > >> > pointing at the wrong device file. arm-probe needs a ttyACM-style > >> > device to talk to. Using: > >> > > >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 > >> > > >> > I create that device in my container. I build libwebsockets and > >> > the arm-probe software in the container, then > >> > specify /dev/ttyACM0 in the AEP config file. I can run it just > >> > fine: > >> > > >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C > >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg > >> > # config_name: pandaboard > >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) > >> > 400us Configuration: pandaboard > >> > # date: Fri, 19 May 2017 16:29:50 +0100 > >> > # host: lxc-aep-test-174524 > >> > # > >> > + /dev/ttyACM0 > >> > Starting... > >> > sending start to 0 > >> > # VDD_ALL VDD ROOT #ff0000 SoC > >> > # > >> > # > >> > time VDD(V) VDD(A) VDD(W) > >> > 0.000500 5.11 0.0474 0.24196 > >> > 0.000600 5.11 0.0364 0.18572 > >> > 0.000700 5.11 0.0314 0.16012 > >> > 0.000800 5.10 0.0544 0.27734 > >> > 0.000900 5.10 0.0234 0.11923 > >> > 0.001000 5.11 0.0304 0.15505 > >> > ... > >> > > >> > I don't have any problems running things and getting output here. > >> > > >> > I *have* seen two real bugs here while trying to get things > >> > running, though: > >> > > >> > 1. If the device specified in the config file doesn't exist, or > >> > is the wrong type of device, or (maybe) there is any other kind > >> > of problem with it, you get *no* useful feedback to say there's a > >> > problem. Running things under strace will show the background > >> > libarmep process attempt to use the device specified, but > >> > there's no error handling. :-( > >> > > >> > 2. The "-x" option says that the arm-probe program is meant to > >> > exit when you've done capturing, but it just sits there forever > >> > when I'm testing. I've wrapped it using the "timeout" command to > >> > work around that for now. > >> > > >> > If I knew where to file those bugs, I would, but it's really not > >> > obvious. They're really easy to reproduce, I hope... > >> > > >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page > >> > says that it creates devices based on their existing entries on > >> > the host. Double-check that the host (dispatcher) has an > >> > appropriate /dev/ttyACM0 if you're still seeing problems? > >> > >> Steve was using staging-panda03 with the ARM Energy Probe which I'd > >> been using for the tests of the new code to ensure > >> that /dev/ttyACM0 can be attached to the LXC. > >> > >> That panda and AEP will shortly return to staging and then the > >> changes to LAVA and the required changes to the test definition > >> can be available for the 2017.6 release. > > > > OK. staging-panda03 is back and has been running tests. This is what > > we've learnt so far: > > > > 0: This does not appear to be an LXC issue. Running the commands > > manually on the worker with the same LXC on the same worker does > > return data from the probe. > > > > 1: Running the same commands in "headless" mode shows that the probe > > software starts successfully but something within the protocol > > parser or sampler fails to retrieve data. > > > What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
> > > > 2: The websockets dependency is completely unnecessary and has been > > disabled in the build I've been testing: > > https://git.linaro.org/lava-team/arm-probe.git/ > > > Yes. I do the same. aepd is only useful for the web interface. > > > > > > 3: We've added a *lot* of debug to the arm-probe code > > (https://staging.validation.linaro.org/scheduler/job/174969 which > > was run using > > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b > 2958e3045da77d7db25a7cfe48359211aa4cf1) > > but are not much closer to identifying the precise problem with the > > code. However, I am satisfied that this is a problem in the > > arm-probe software when being run in automation. > > > Can you give details about "this is a problem in arm probe software > when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
> > > > 4: the arm-probe code is appallingly difficult to read and debug. It > > also seems unnecessarily complex. > > > > 5: I plan to remove a lot of the debug from the cloned arm-probe > > repository (which has also had a few fixes to compile with gcc6) but > > I'm running out of time to work on the arm-probe software myself. > > > > Someone needs to update the arm-probe software: > > > > a) to remove websockets as a compile-time option as this only bloats > > the build in automation where a web based UI is impossible anyway. > > I've done this by brute force in my cloned repo, I just patched out > > the dependency. > > > > b) improve the code to have comments and output about what is > > happening and why when verbose mode is used. > > > > c) Identify what is preventing the software from receiving data from > > the probe when run in automation. > > > > d) the config file still needs fixes to allow for changes in the > > device node name from one probe to another. > > > > -- > > CC'ing Vincent, so he can read Neil's and Steve's comments above and > respond (if he has anything to say) while I'm on holiday until early > June.
Steve & I are also on annual leave next week.
--
Neil Williams
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 6 June 2017 at 14:25, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 13:11, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:03, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
ok
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
These configuration files may need to be generated within the test shell definition at runtime, based on parameters. The test shell will need to work out which device is which probe and this could be awkward without /dev/serial/by-id support. The enumeration order of ttyUSB0 and ttyUSB1 cannot be guaranteed. dmesg remains available inside the LXC, so some automated parsing may be required. If the arm-probe
To be honest i don't like such way to proceed it is just error prone
software can be modified to use a more sane configuration file syntax, this could also be addressed there.
I don't catch why the config file is insane and how this will help for this problem
need secondary connections and MultiNode to separate the output.
Is it something that Lisa can do by herself or does it need some changes from your side ?
Secondary connections and MultiNode can be adopted by test writers without any changes in LAVA.
https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#...
Any testjob using MultiNode has a certain level of complexity, so the change is non-trivial.
Does it also mean that the datas of the 2 probes will not be in the same file whereas arm-probe already merge datas from multi AEP in its config file into one single output
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
This is something that has been already handled and in the case of the mt8173evb everything is already done and working on our server with current arm-probe, AEPs and workload automation
Regards, Vincent
Regards, Vincent
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote: > On Wed, 24 May 2017 21:07:45 +0200 > Vincent Guittot vincent.guittot@linaro.org wrote: > >> Hi Neil, >> >> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a >> écrit : >> >> On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote: >> > On Fri, 19 May 2017 17:02:14 +0100 >> > Neil Williams codehelp@debian.org wrote: >> > >> >> On Fri, 19 May 2017 16:48:11 +0100 >> >> Steve McIntyre steve.mcintyre@linaro.org wrote: >> >> >> >> > Hi folks! >> >> > >> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >> >> > >Neil Williams codehelp@debian.org wrote: >> >> > > >> >> > >> >> > I've just run a local test with an AEP inside lxc on my local >> >> > machine. As far as I can see, there's nothing particularly magic >> >> > going on here. The only problem I see is Lisa's config file >> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >> >> > device to talk to. Using: >> >> > >> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >> >> > >> >> > I create that device in my container. I build libwebsockets and >> >> > the arm-probe software in the container, then >> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >> >> > fine: >> >> > >> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >> >> > # config_name: pandaboard >> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >> >> > 400us Configuration: pandaboard >> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >> >> > # host: lxc-aep-test-174524 >> >> > # >> >> > + /dev/ttyACM0 >> >> > Starting... >> >> > sending start to 0 >> >> > # VDD_ALL VDD ROOT #ff0000 SoC >> >> > # >> >> > # >> >> > time VDD(V) VDD(A) VDD(W) >> >> > 0.000500 5.11 0.0474 0.24196 >> >> > 0.000600 5.11 0.0364 0.18572 >> >> > 0.000700 5.11 0.0314 0.16012 >> >> > 0.000800 5.10 0.0544 0.27734 >> >> > 0.000900 5.10 0.0234 0.11923 >> >> > 0.001000 5.11 0.0304 0.15505 >> >> > ... >> >> > >> >> > I don't have any problems running things and getting output here. >> >> > >> >> > I *have* seen two real bugs here while trying to get things >> >> > running, though: >> >> > >> >> > 1. If the device specified in the config file doesn't exist, or >> >> > is the wrong type of device, or (maybe) there is any other kind >> >> > of problem with it, you get *no* useful feedback to say there's a >> >> > problem. Running things under strace will show the background >> >> > libarmep process attempt to use the device specified, but >> >> > there's no error handling. :-( >> >> > >> >> > 2. The "-x" option says that the arm-probe program is meant to >> >> > exit when you've done capturing, but it just sits there forever >> >> > when I'm testing. I've wrapped it using the "timeout" command to >> >> > work around that for now. >> >> > >> >> > If I knew where to file those bugs, I would, but it's really not >> >> > obvious. They're really easy to reproduce, I hope... >> >> > >> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >> >> > says that it creates devices based on their existing entries on >> >> > the host. Double-check that the host (dispatcher) has an >> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >> >> >> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >> >> been using for the tests of the new code to ensure >> >> that /dev/ttyACM0 can be attached to the LXC. >> >> >> >> That panda and AEP will shortly return to staging and then the >> >> changes to LAVA and the required changes to the test definition >> >> can be available for the 2017.6 release. >> > >> > OK. staging-panda03 is back and has been running tests. This is what >> > we've learnt so far: >> > >> > 0: This does not appear to be an LXC issue. Running the commands >> > manually on the worker with the same LXC on the same worker does >> > return data from the probe. >> > >> > 1: Running the same commands in "headless" mode shows that the probe >> > software starts successfully but something within the protocol >> > parser or sampler fails to retrieve data. >> >> >> What do you mean by headless mode? > > With no controlling terminal. > > LAVA runs as a daemon and forks processes to run the tests. This does > not usually cause issues and is fundamental to automation. When I run > the same commands in an LXC as a user logged into the machine, I get > output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
> >> > >> > 2: The websockets dependency is completely unnecessary and has been >> > disabled in the build I've been testing: >> > https://git.linaro.org/lava-team/arm-probe.git/ >> >> >> Yes. I do the same. aepd is only useful for the web interface. >> >> >> > >> > 3: We've added a *lot* of debug to the arm-probe code >> > (https://staging.validation.linaro.org/scheduler/job/174969 which >> > was run using >> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >> 2958e3045da77d7db25a7cfe48359211aa4cf1) >> > but are not much closer to identifying the precise problem with the >> > code. However, I am satisfied that this is a problem in the >> > arm-probe software when being run in automation. >> >> >> Can you give details about "this is a problem in arm probe software >> when being run in automation"? Do you mean workload automation? > > No. Not workload automation - that is a specific test framework which > can use LAVA. I'm talking about the process of running tests on behalf > of users without the users being logged in or interacting with the > shell.
ok. Just to be sure about the context
> >> > >> > 4: the arm-probe code is appallingly difficult to read and debug. It >> > also seems unnecessarily complex. >> > >> > 5: I plan to remove a lot of the debug from the cloned arm-probe >> > repository (which has also had a few fixes to compile with gcc6) but >> > I'm running out of time to work on the arm-probe software myself. >> > >> > Someone needs to update the arm-probe software: >> > >> > a) to remove websockets as a compile-time option as this only bloats >> > the build in automation where a web based UI is impossible anyway. >> > I've done this by brute force in my cloned repo, I just patched out >> > the dependency. >> > >> > b) improve the code to have comments and output about what is >> > happening and why when verbose mode is used. >> > >> > c) Identify what is preventing the software from receiving data from >> > the probe when run in automation. >> > >> > d) the config file still needs fixes to allow for changes in the >> > device node name from one probe to another. >> > >> > -- >> >> CC'ing Vincent, so he can read Neil's and Steve's comments above and >> respond (if he has anything to say) while I'm on holiday until early >> June. > > Steve & I are also on annual leave next week. > > -- > > > Neil Williams > ============= > http://www.linux.codehelp.co.uk/ >
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
On 6 June 2017 at 14:32, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:25, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 13:11, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:03, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
ok
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
These configuration files may need to be generated within the test shell definition at runtime, based on parameters. The test shell will need to work out which device is which probe and this could be awkward without /dev/serial/by-id support. The enumeration order of ttyUSB0 and ttyUSB1 cannot be guaranteed. dmesg remains available inside the LXC, so some automated parsing may be required. If the arm-probe
To be honest i don't like such way to proceed it is just error prone
software can be modified to use a more sane configuration file syntax, this could also be addressed there.
I don't catch why the config file is insane and how this will help for this problem
If the config file is to be generated for each test job, the syntax is awkward to handle as it would need a line inserted instead of supporting a parser or similar.
need secondary connections and MultiNode to separate the output.
Is it something that Lisa can do by herself or does it need some changes from your side ?
Secondary connections and MultiNode can be adopted by test writers without any changes in LAVA.
https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#...
Any testjob using MultiNode has a certain level of complexity, so the change is non-trivial.
Does it also mean that the datas of the 2 probes will not be in the same file whereas arm-probe already merge datas from multi AEP in its config file into one single output
OK, then if that is what is desired then this can be done without using secondary connections and therefore without MultiNode. I was expecting that the two would run simultaneously, causing issues with interleaving.
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
This is something that has been already handled and in the case of the mt8173evb everything is already done and working on our server with current arm-probe, AEPs and workload automation
Regards, Vincent
Regards, Vincent
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote: > On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote: >> On Wed, 24 May 2017 21:07:45 +0200 >> Vincent Guittot vincent.guittot@linaro.org wrote: >> >>> Hi Neil, >>> >>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a >>> écrit : >>> >>> On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote: >>> > On Fri, 19 May 2017 17:02:14 +0100 >>> > Neil Williams codehelp@debian.org wrote: >>> > >>> >> On Fri, 19 May 2017 16:48:11 +0100 >>> >> Steve McIntyre steve.mcintyre@linaro.org wrote: >>> >> >>> >> > Hi folks! >>> >> > >>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>> >> > >Neil Williams codehelp@debian.org wrote: >>> >> > > >>> >> > >>> >> > I've just run a local test with an AEP inside lxc on my local >>> >> > machine. As far as I can see, there's nothing particularly magic >>> >> > going on here. The only problem I see is Lisa's config file >>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>> >> > device to talk to. Using: >>> >> > >>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>> >> > >>> >> > I create that device in my container. I build libwebsockets and >>> >> > the arm-probe software in the container, then >>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>> >> > fine: >>> >> > >>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>> >> > # config_name: pandaboard >>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>> >> > 400us Configuration: pandaboard >>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>> >> > # host: lxc-aep-test-174524 >>> >> > # >>> >> > + /dev/ttyACM0 >>> >> > Starting... >>> >> > sending start to 0 >>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>> >> > # >>> >> > # >>> >> > time VDD(V) VDD(A) VDD(W) >>> >> > 0.000500 5.11 0.0474 0.24196 >>> >> > 0.000600 5.11 0.0364 0.18572 >>> >> > 0.000700 5.11 0.0314 0.16012 >>> >> > 0.000800 5.10 0.0544 0.27734 >>> >> > 0.000900 5.10 0.0234 0.11923 >>> >> > 0.001000 5.11 0.0304 0.15505 >>> >> > ... >>> >> > >>> >> > I don't have any problems running things and getting output here. >>> >> > >>> >> > I *have* seen two real bugs here while trying to get things >>> >> > running, though: >>> >> > >>> >> > 1. If the device specified in the config file doesn't exist, or >>> >> > is the wrong type of device, or (maybe) there is any other kind >>> >> > of problem with it, you get *no* useful feedback to say there's a >>> >> > problem. Running things under strace will show the background >>> >> > libarmep process attempt to use the device specified, but >>> >> > there's no error handling. :-( >>> >> > >>> >> > 2. The "-x" option says that the arm-probe program is meant to >>> >> > exit when you've done capturing, but it just sits there forever >>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>> >> > work around that for now. >>> >> > >>> >> > If I knew where to file those bugs, I would, but it's really not >>> >> > obvious. They're really easy to reproduce, I hope... >>> >> > >>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>> >> > says that it creates devices based on their existing entries on >>> >> > the host. Double-check that the host (dispatcher) has an >>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>> >> >>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>> >> been using for the tests of the new code to ensure >>> >> that /dev/ttyACM0 can be attached to the LXC. >>> >> >>> >> That panda and AEP will shortly return to staging and then the >>> >> changes to LAVA and the required changes to the test definition >>> >> can be available for the 2017.6 release. >>> > >>> > OK. staging-panda03 is back and has been running tests. This is what >>> > we've learnt so far: >>> > >>> > 0: This does not appear to be an LXC issue. Running the commands >>> > manually on the worker with the same LXC on the same worker does >>> > return data from the probe. >>> > >>> > 1: Running the same commands in "headless" mode shows that the probe >>> > software starts successfully but something within the protocol >>> > parser or sampler fails to retrieve data. >>> >>> >>> What do you mean by headless mode? >> >> With no controlling terminal. >> >> LAVA runs as a daemon and forks processes to run the tests. This does >> not usually cause issues and is fundamental to automation. When I run >> the same commands in an LXC as a user logged into the machine, I get >> output. When I run the commands from a daemon, the output is not seen. > > even when you redirect the output to a file ? > > On workload automation, arm_probe is called in a dedicated process > with subprocess.Popen and we are able to get data in the file. > Just wonder what could be the difference in lava case > >> >>> > >>> > 2: The websockets dependency is completely unnecessary and has been >>> > disabled in the build I've been testing: >>> > https://git.linaro.org/lava-team/arm-probe.git/ >>> >>> >>> Yes. I do the same. aepd is only useful for the web interface. >>> >>> >>> > >>> > 3: We've added a *lot* of debug to the arm-probe code >>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>> > was run using >>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>> > but are not much closer to identifying the precise problem with the >>> > code. However, I am satisfied that this is a problem in the >>> > arm-probe software when being run in automation. >>> >>> >>> Can you give details about "this is a problem in arm probe software >>> when being run in automation"? Do you mean workload automation? >> >> No. Not workload automation - that is a specific test framework which >> can use LAVA. I'm talking about the process of running tests on behalf >> of users without the users being logged in or interacting with the >> shell. > > ok. Just to be sure about the context > >> >>> > >>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>> > also seems unnecessarily complex. >>> > >>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>> > repository (which has also had a few fixes to compile with gcc6) but >>> > I'm running out of time to work on the arm-probe software myself. >>> > >>> > Someone needs to update the arm-probe software: >>> > >>> > a) to remove websockets as a compile-time option as this only bloats >>> > the build in automation where a web based UI is impossible anyway. >>> > I've done this by brute force in my cloned repo, I just patched out >>> > the dependency. >>> > >>> > b) improve the code to have comments and output about what is >>> > happening and why when verbose mode is used. >>> > >>> > c) Identify what is preventing the software from receiving data from >>> > the probe when run in automation. >>> > >>> > d) the config file still needs fixes to allow for changes in the >>> > device node name from one probe to another. >>> > >>> > -- >>> >>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>> respond (if he has anything to say) while I'm on holiday until early >>> June. >> >> Steve & I are also on annual leave next week. >> >> -- >> >> >> Neil Williams >> ============= >> http://www.linux.codehelp.co.uk/ >>
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
Hey guys,
On Tue, Jun 06, 2017 at 03:24:38PM +0100, Neil Williams wrote:
On 6 June 2017 at 14:32, Vincent Guittot vincent.guittot@linaro.org wrote:
...
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
Vincent: how have you configured your tests with these specific /dev paths? Are your tests set up to only use specific test boards with a specific arm-probe config file (device tags?), or are you generating the config file somehow for each test?
At the moment, in LAVA we're passing the appropriate /dev/ttyACMx device node into the LXC as we start things up. That's clearly not going to scale here once you're wanting to use more than one AEP in a single test. So, I'm working on this right now. *Hopefully* I'll be able to replicate the /dev/serial/by-id paths - I'll update you shortly.
I don't catch why the config file is insane and how this will help for this problem
If the config file is to be generated for each test job, the syntax is awkward to handle as it would need a line inserted instead of supporting a parser or similar.
*nod* IMHO this is not a great design for the config file, but let's see what we can do.
...
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
This is something that has been already handled and in the case of the mt8173evb everything is already done and working on our server with current arm-probe, AEPs and workload automation
OK, cool.
Cheers,
On 6 June 2017 at 16:55, Steve McIntyre steve.mcintyre@linaro.org wrote:
Hey guys,
On Tue, Jun 06, 2017 at 03:24:38PM +0100, Neil Williams wrote:
On 6 June 2017 at 14:32, Vincent Guittot vincent.guittot@linaro.org wrote:
...
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
Vincent: how have you configured your tests with these specific /dev paths? Are your tests set up to only use specific test boards with a specific arm-probe config file (device tags?), or are you generating the config file somehow for each test?
We have one AEP config file per board. As we can use /dev/serial/by-id/ path in our current test env, we don't have to update it as the config file is always valid Then select the right config file according to the board we want to use for our test
At the moment, in LAVA we're passing the appropriate /dev/ttyACMx device node into the LXC as we start things up. That's clearly not going to scale here once you're wanting to use more than one AEP in a single test. So, I'm working on this right now. *Hopefully* I'll be able to replicate the /dev/serial/by-id paths - I'll update you shortly.
This would help us if you are able to replicate the /dev/serial/by-id paths in the LXC because it would just be a matter of providing the right config file name
I don't catch why the config file is insane and how this will help for this problem
If the config file is to be generated for each test job, the syntax is awkward to handle as it would need a line inserted instead of supporting a parser or similar.
*nod* IMHO this is not a great design for the config file, but let's see what we can do.
...
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
This is something that has been already handled and in the case of the mt8173evb everything is already done and working on our server with current arm-probe, AEPs and workload automation
OK, cool.
Cheers,
Steve McIntyre steve.mcintyre@linaro.org http://www.linaro.org/ Linaro.org | Open source software for ARM SoCs
On Tue, Jun 06, 2017 at 05:10:12PM +0200, Vincent Guittot wrote:
On 6 June 2017 at 16:55, Steve McIntyre steve.mcintyre@linaro.org wrote:
Vincent: how have you configured your tests with these specific /dev paths? Are your tests set up to only use specific test boards with a specific arm-probe config file (device tags?), or are you generating the config file somehow for each test?
We have one AEP config file per board. As we can use /dev/serial/by-id/ path in our current test env, we don't have to update it as the config file is always valid Then select the right config file according to the board we want to use for our test
OK. So you're selecting a specific board for your test - that's useful to know!
At the moment, in LAVA we're passing the appropriate /dev/ttyACMx device node into the LXC as we start things up. That's clearly not going to scale here once you're wanting to use more than one AEP in a single test. So, I'm working on this right now. *Hopefully* I'll be able to replicate the /dev/serial/by-id paths - I'll update you shortly.
This would help us if you are able to replicate the /dev/serial/by-id paths in the LXC because it would just be a matter of providing the right config file name
OK, will get back to you...
Cheers,
On 6 June 2017 at 16:24, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 14:32, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:25, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 13:11, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:03, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote: > This problem has been resolved inside the arm-probe configuration, it > is not a fault within LAVA. There was a concern that the probe was not > showing data output because of a theoretical problem of running > daemonized instead of with a controlling terminal. The actual problem > was that the probe software is running more slowly than expected and > extending the runtime of the utility allows the probe to output data. > https://staging.validation.linaro.org/scheduler/job/175033#L2038 > https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
ok
> (The verbose option was later dropped to output only the interesting data.) > > The configuration file in the git repo needs to be modified. > > https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
These configuration files may need to be generated within the test shell definition at runtime, based on parameters. The test shell will need to work out which device is which probe and this could be awkward without /dev/serial/by-id support. The enumeration order of ttyUSB0 and ttyUSB1 cannot be guaranteed. dmesg remains available inside the LXC, so some automated parsing may be required. If the arm-probe
To be honest i don't like such way to proceed it is just error prone
software can be modified to use a more sane configuration file syntax, this could also be addressed there.
I don't catch why the config file is insane and how this will help for this problem
If the config file is to be generated for each test job, the syntax is awkward to handle as it would need a line inserted instead of supporting a parser or similar.
need secondary connections and MultiNode to separate the output.
Is it something that Lisa can do by herself or does it need some changes from your side ?
Secondary connections and MultiNode can be adopted by test writers without any changes in LAVA.
https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#...
Any testjob using MultiNode has a certain level of complexity, so the change is non-trivial.
Does it also mean that the datas of the 2 probes will not be in the same file whereas arm-probe already merge datas from multi AEP in its config file into one single output
OK, then if that is what is desired then this can be done without using secondary connections and therefore without MultiNode. I was
Great
expecting that the two would run simultaneously, causing issues with interleaving.
I haven't used more than 2 AEP simultenously but i remember andry green using 3 AEPs
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
This is something that has been already handled and in the case of the mt8173evb everything is already done and working on our server with current arm-probe, AEPs and workload automation
Regards, Vincent
Regards, Vincent
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
> > > On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote: >> On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote: >>> On Wed, 24 May 2017 21:07:45 +0200 >>> Vincent Guittot vincent.guittot@linaro.org wrote: >>> >>>> Hi Neil, >>>> >>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a >>>> écrit : >>>> >>>> On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote: >>>> > On Fri, 19 May 2017 17:02:14 +0100 >>>> > Neil Williams codehelp@debian.org wrote: >>>> > >>>> >> On Fri, 19 May 2017 16:48:11 +0100 >>>> >> Steve McIntyre steve.mcintyre@linaro.org wrote: >>>> >> >>>> >> > Hi folks! >>>> >> > >>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>>> >> > >Neil Williams codehelp@debian.org wrote: >>>> >> > > >>>> >> > >>>> >> > I've just run a local test with an AEP inside lxc on my local >>>> >> > machine. As far as I can see, there's nothing particularly magic >>>> >> > going on here. The only problem I see is Lisa's config file >>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>>> >> > device to talk to. Using: >>>> >> > >>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>>> >> > >>>> >> > I create that device in my container. I build libwebsockets and >>>> >> > the arm-probe software in the container, then >>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>>> >> > fine: >>>> >> > >>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>>> >> > # config_name: pandaboard >>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>>> >> > 400us Configuration: pandaboard >>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>>> >> > # host: lxc-aep-test-174524 >>>> >> > # >>>> >> > + /dev/ttyACM0 >>>> >> > Starting... >>>> >> > sending start to 0 >>>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>>> >> > # >>>> >> > # >>>> >> > time VDD(V) VDD(A) VDD(W) >>>> >> > 0.000500 5.11 0.0474 0.24196 >>>> >> > 0.000600 5.11 0.0364 0.18572 >>>> >> > 0.000700 5.11 0.0314 0.16012 >>>> >> > 0.000800 5.10 0.0544 0.27734 >>>> >> > 0.000900 5.10 0.0234 0.11923 >>>> >> > 0.001000 5.11 0.0304 0.15505 >>>> >> > ... >>>> >> > >>>> >> > I don't have any problems running things and getting output here. >>>> >> > >>>> >> > I *have* seen two real bugs here while trying to get things >>>> >> > running, though: >>>> >> > >>>> >> > 1. If the device specified in the config file doesn't exist, or >>>> >> > is the wrong type of device, or (maybe) there is any other kind >>>> >> > of problem with it, you get *no* useful feedback to say there's a >>>> >> > problem. Running things under strace will show the background >>>> >> > libarmep process attempt to use the device specified, but >>>> >> > there's no error handling. :-( >>>> >> > >>>> >> > 2. The "-x" option says that the arm-probe program is meant to >>>> >> > exit when you've done capturing, but it just sits there forever >>>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>>> >> > work around that for now. >>>> >> > >>>> >> > If I knew where to file those bugs, I would, but it's really not >>>> >> > obvious. They're really easy to reproduce, I hope... >>>> >> > >>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>>> >> > says that it creates devices based on their existing entries on >>>> >> > the host. Double-check that the host (dispatcher) has an >>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>>> >> >>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>>> >> been using for the tests of the new code to ensure >>>> >> that /dev/ttyACM0 can be attached to the LXC. >>>> >> >>>> >> That panda and AEP will shortly return to staging and then the >>>> >> changes to LAVA and the required changes to the test definition >>>> >> can be available for the 2017.6 release. >>>> > >>>> > OK. staging-panda03 is back and has been running tests. This is what >>>> > we've learnt so far: >>>> > >>>> > 0: This does not appear to be an LXC issue. Running the commands >>>> > manually on the worker with the same LXC on the same worker does >>>> > return data from the probe. >>>> > >>>> > 1: Running the same commands in "headless" mode shows that the probe >>>> > software starts successfully but something within the protocol >>>> > parser or sampler fails to retrieve data. >>>> >>>> >>>> What do you mean by headless mode? >>> >>> With no controlling terminal. >>> >>> LAVA runs as a daemon and forks processes to run the tests. This does >>> not usually cause issues and is fundamental to automation. When I run >>> the same commands in an LXC as a user logged into the machine, I get >>> output. When I run the commands from a daemon, the output is not seen. >> >> even when you redirect the output to a file ? >> >> On workload automation, arm_probe is called in a dedicated process >> with subprocess.Popen and we are able to get data in the file. >> Just wonder what could be the difference in lava case >> >>> >>>> > >>>> > 2: The websockets dependency is completely unnecessary and has been >>>> > disabled in the build I've been testing: >>>> > https://git.linaro.org/lava-team/arm-probe.git/ >>>> >>>> >>>> Yes. I do the same. aepd is only useful for the web interface. >>>> >>>> >>>> > >>>> > 3: We've added a *lot* of debug to the arm-probe code >>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>>> > was run using >>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>>> > but are not much closer to identifying the precise problem with the >>>> > code. However, I am satisfied that this is a problem in the >>>> > arm-probe software when being run in automation. >>>> >>>> >>>> Can you give details about "this is a problem in arm probe software >>>> when being run in automation"? Do you mean workload automation? >>> >>> No. Not workload automation - that is a specific test framework which >>> can use LAVA. I'm talking about the process of running tests on behalf >>> of users without the users being logged in or interacting with the >>> shell. >> >> ok. Just to be sure about the context >> >>> >>>> > >>>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>>> > also seems unnecessarily complex. >>>> > >>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>>> > repository (which has also had a few fixes to compile with gcc6) but >>>> > I'm running out of time to work on the arm-probe software myself. >>>> > >>>> > Someone needs to update the arm-probe software: >>>> > >>>> > a) to remove websockets as a compile-time option as this only bloats >>>> > the build in automation where a web based UI is impossible anyway. >>>> > I've done this by brute force in my cloned repo, I just patched out >>>> > the dependency. >>>> > >>>> > b) improve the code to have comments and output about what is >>>> > happening and why when verbose mode is used. >>>> > >>>> > c) Identify what is preventing the software from receiving data from >>>> > the probe when run in automation. >>>> > >>>> > d) the config file still needs fixes to allow for changes in the >>>> > device node name from one probe to another. >>>> > >>>> > -- >>>> >>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>>> respond (if he has anything to say) while I'm on holiday until early >>>> June. >>> >>> Steve & I are also on annual leave next week. >>> >>> -- >>> >>> >>> Neil Williams >>> ============= >>> http://www.linux.codehelp.co.uk/ >>> > > > > -- > > Neil Williams > ============= > neil.williams@linaro.org > http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
linaro-validation@lists.linaro.org