Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
Hi folks!
On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
On Thu, 27 Apr 2017 08:19:19 +0100 Neil Williams codehelp@debian.org wrote:
I've just run a local test with an AEP inside lxc on my local machine. As far as I can see, there's nothing particularly magic going on here. The only problem I see is Lisa's config file pointing at the wrong device file. arm-probe needs a ttyACM-style device to talk to. Using:
# lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
I create that device in my container. I build libwebsockets and the arm-probe software in the container, then specify /dev/ttyACM0 in the AEP config file. I can run it just fine:
root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C panda-aep.cfg -l10 -x # configuration: panda-aep.cfg # config_name: pandaboard # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) 400us Configuration: pandaboard # date: Fri, 19 May 2017 16:29:50 +0100 # host: lxc-aep-test-174524 #
- /dev/ttyACM0
Starting... sending start to 0 # VDD_ALL VDD ROOT #ff0000 SoC # # time VDD(V) VDD(A) VDD(W) 0.000500 5.11 0.0474 0.24196 0.000600 5.11 0.0364 0.18572 0.000700 5.11 0.0314 0.16012 0.000800 5.10 0.0544 0.27734 0.000900 5.10 0.0234 0.11923 0.001000 5.11 0.0304 0.15505 ...
I don't have any problems running things and getting output here.
I *have* seen two real bugs here while trying to get things running, though:
- If the device specified in the config file doesn't exist, or is the wrong type of device, or (maybe) there is any other kind of problem with it, you get *no* useful feedback to say there's a problem. Running things under strace will show the background libarmep process attempt to use the device specified, but
there's no error handling. :-(
- The "-x" option says that the arm-probe program is meant to exit when you've done capturing, but it just sits there forever when
I'm testing. I've wrapped it using the "timeout" command to work around that for now.
If I knew where to file those bugs, I would, but it's really not obvious. They're really easy to reproduce, I hope...
In terms of the /dev/ttyACM0 creation, the lxc-device man page says that it creates devices based on their existing entries on the host. Double-check that the host (dispatcher) has an appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Neil Williams
http://www.linux.codehelp.co.uk/
linaro-validation mailing list linaro-validation@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-validation