On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote:
This problem has been resolved inside the arm-probe configuration, it is not a fault within LAVA. There was a concern that the probe was not showing data output because of a theoretical problem of running daemonized instead of with a controlling terminal. The actual problem was that the probe software is running more slowly than expected and extending the runtime of the utility allows the probe to output data. https://staging.validation.linaro.org/scheduler/job/175033#L2038 https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
(The verbose option was later dropped to output only the interesting data.)
The configuration file in the git repo needs to be modified.
https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
What about using 2 AEPs ?
Regards, Vincent
On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote:
On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote:
On Wed, 24 May 2017 21:07:45 +0200 Vincent Guittot vincent.guittot@linaro.org wrote:
Hi Neil,
Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a écrit :
On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 17:02:14 +0100 Neil Williams codehelp@debian.org wrote:
On Fri, 19 May 2017 16:48:11 +0100 Steve McIntyre steve.mcintyre@linaro.org wrote:
> Hi folks! > > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: > >On Thu, 27 Apr 2017 08:19:19 +0100 > >Neil Williams codehelp@debian.org wrote: > > > > I've just run a local test with an AEP inside lxc on my local > machine. As far as I can see, there's nothing particularly magic > going on here. The only problem I see is Lisa's config file > pointing at the wrong device file. arm-probe needs a ttyACM-style > device to talk to. Using: > > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 > > I create that device in my container. I build libwebsockets and > the arm-probe software in the container, then > specify /dev/ttyACM0 in the AEP config file. I can run it just > fine: > > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg > # config_name: pandaboard > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) > 400us Configuration: pandaboard > # date: Fri, 19 May 2017 16:29:50 +0100 > # host: lxc-aep-test-174524 > # > + /dev/ttyACM0 > Starting... > sending start to 0 > # VDD_ALL VDD ROOT #ff0000 SoC > # > # > time VDD(V) VDD(A) VDD(W) > 0.000500 5.11 0.0474 0.24196 > 0.000600 5.11 0.0364 0.18572 > 0.000700 5.11 0.0314 0.16012 > 0.000800 5.10 0.0544 0.27734 > 0.000900 5.10 0.0234 0.11923 > 0.001000 5.11 0.0304 0.15505 > ... > > I don't have any problems running things and getting output here. > > I *have* seen two real bugs here while trying to get things > running, though: > > 1. If the device specified in the config file doesn't exist, or > is the wrong type of device, or (maybe) there is any other kind > of problem with it, you get *no* useful feedback to say there's a > problem. Running things under strace will show the background > libarmep process attempt to use the device specified, but > there's no error handling. :-( > > 2. The "-x" option says that the arm-probe program is meant to > exit when you've done capturing, but it just sits there forever > when I'm testing. I've wrapped it using the "timeout" command to > work around that for now. > > If I knew where to file those bugs, I would, but it's really not > obvious. They're really easy to reproduce, I hope... > > In terms of the /dev/ttyACM0 creation, the lxc-device man page > says that it creates devices based on their existing entries on > the host. Double-check that the host (dispatcher) has an > appropriate /dev/ttyACM0 if you're still seeing problems?
Steve was using staging-panda03 with the ARM Energy Probe which I'd been using for the tests of the new code to ensure that /dev/ttyACM0 can be attached to the LXC.
That panda and AEP will shortly return to staging and then the changes to LAVA and the required changes to the test definition can be available for the 2017.6 release.
OK. staging-panda03 is back and has been running tests. This is what we've learnt so far:
0: This does not appear to be an LXC issue. Running the commands manually on the worker with the same LXC on the same worker does return data from the probe.
1: Running the same commands in "headless" mode shows that the probe software starts successfully but something within the protocol parser or sampler fails to retrieve data.
What do you mean by headless mode?
With no controlling terminal.
LAVA runs as a daemon and forks processes to run the tests. This does not usually cause issues and is fundamental to automation. When I run the same commands in an LXC as a user logged into the machine, I get output. When I run the commands from a daemon, the output is not seen.
even when you redirect the output to a file ?
On workload automation, arm_probe is called in a dedicated process with subprocess.Popen and we are able to get data in the file. Just wonder what could be the difference in lava case
2: The websockets dependency is completely unnecessary and has been disabled in the build I've been testing: https://git.linaro.org/lava-team/arm-probe.git/
Yes. I do the same. aepd is only useful for the web interface.
3: We've added a *lot* of debug to the arm-probe code (https://staging.validation.linaro.org/scheduler/job/174969 which was run using https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b
2958e3045da77d7db25a7cfe48359211aa4cf1)
but are not much closer to identifying the precise problem with the code. However, I am satisfied that this is a problem in the arm-probe software when being run in automation.
Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?
No. Not workload automation - that is a specific test framework which can use LAVA. I'm talking about the process of running tests on behalf of users without the users being logged in or interacting with the shell.
ok. Just to be sure about the context
4: the arm-probe code is appallingly difficult to read and debug. It also seems unnecessarily complex.
5: I plan to remove a lot of the debug from the cloned arm-probe repository (which has also had a few fixes to compile with gcc6) but I'm running out of time to work on the arm-probe software myself.
Someone needs to update the arm-probe software:
a) to remove websockets as a compile-time option as this only bloats the build in automation where a web based UI is impossible anyway. I've done this by brute force in my cloned repo, I just patched out the dependency.
b) improve the code to have comments and output about what is happening and why when verbose mode is used.
c) Identify what is preventing the software from receiving data from the probe when run in automation.
d) the config file still needs fixes to allow for changes in the device node name from one probe to another.
--
CC'ing Vincent, so he can read Neil's and Steve's comments above and respond (if he has anything to say) while I'm on holiday until early June.
Steve & I are also on annual leave next week.
--
Neil Williams
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/