On 6 June 2017 at 16:24, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 14:32, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:25, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 13:11, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 14:03, Neil Williams neil.williams@linaro.org wrote:
On 6 June 2017 at 12:53, Vincent Guittot vincent.guittot@linaro.org wrote:
On 6 June 2017 at 13:38, Neil Williams neil.williams@linaro.org wrote: > This problem has been resolved inside the arm-probe configuration, it > is not a fault within LAVA. There was a concern that the probe was not > showing data output because of a theoretical problem of running > daemonized instead of with a controlling terminal. The actual problem > was that the probe software is running more slowly than expected and > extending the runtime of the utility allows the probe to output data. > https://staging.validation.linaro.org/scheduler/job/175033#L2038 > https://git.linaro.org/lava-team/refactoring.git/commit/?id=7916e6c3db5188e2...
ok so the 2seconds for timeout was your problem
That and the problem with the config file.
ok
> (The verbose option was later dropped to output only the interesting data.) > > The configuration file in the git repo needs to be modified. > > https://git.linaro.org/lava-team/refactoring.git/tree/testdefs/aep-config?id...
can you point out the modification you did that has been needed ? I can't see any obvious difference except using /dev/ttyACM0 instead of /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO44440001-if00. Is it the difference ?
Yes, because inside the LXC, /dev/serial/by-id does not get created (there is no udev support for that inside containers).
What about using 2 AEPs ?
That would have to be fixed either in the test shell definitions (e.g. using parameters passed through the test job) or within the arm-probe code itself. I have no idea at this stage whether the arm-probe software can cope with multiple probes - in LAVA that would likely
arm-probe supports multi AEP and we are using with multi AEPs with the mtk8173 evb. arm-probe just rely of the config file to get the path of the AEP. I have put the content of the config file below:
# arm-probe configuration file # # setup name mt8173-evb
# <device path> /dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730001-if00 VDD_CA57_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Cache A57_CACHE #ff0000 SoC VDD_CA57_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core0 A57_CORE #ff0000 SoC VDD_CA57_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A57/Core1 A57_CORE #ff0000 SoC
/dev/serial/by-id/usb-NXP_SEMICOND_ARM_Energy_Probe_S_NO81730000-if00 VDD_CA53_0 0.500000 1 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Cache A53_CACHE #ff0000 SoC VDD_CA53_1 0.100000 2 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core0 A53_CORE #ff0000 SoC VDD_CA53_2 0.100000 3 -0.179000 13.363000 -0.000000 0.163300 0 SoC/A53/Core1 A53_CORE #ff0000 SoC
These configuration files may need to be generated within the test shell definition at runtime, based on parameters. The test shell will need to work out which device is which probe and this could be awkward without /dev/serial/by-id support. The enumeration order of ttyUSB0 and ttyUSB1 cannot be guaranteed. dmesg remains available inside the LXC, so some automated parsing may be required. If the arm-probe
To be honest i don't like such way to proceed it is just error prone
software can be modified to use a more sane configuration file syntax, this could also be addressed there.
I don't catch why the config file is insane and how this will help for this problem
If the config file is to be generated for each test job, the syntax is awkward to handle as it would need a line inserted instead of supporting a parser or similar.
need secondary connections and MultiNode to separate the output.
Is it something that Lisa can do by herself or does it need some changes from your side ?
Secondary connections and MultiNode can be adopted by test writers without any changes in LAVA.
https://validation.linaro.org/static/docs/v2/dispatcher-design.html#index-4 https://validation.linaro.org/static/docs/v2/pipeline-writer-secondary.html#...
Any testjob using MultiNode has a certain level of complexity, so the change is non-trivial.
Does it also mean that the datas of the 2 probes will not be in the same file whereas arm-probe already merge datas from multi AEP in its config file into one single output
OK, then if that is what is desired then this can be done without using secondary connections and therefore without MultiNode. I was
Great
expecting that the two would run simultaneously, causing issues with interleaving.
I haven't used more than 2 AEP simultenously but i remember andry green using 3 AEPs
Note also that physically fitting more AEPs will involve work by the LAB team - especially for devices like the panda, because the power connector which comes with the AEP does not fit the panda and a one-off daughter board is required.
This is something that has been already handled and in the case of the mt8173evb everything is already done and working on our server with current arm-probe, AEPs and workload automation
Regards, Vincent
Regards, Vincent
The syntax of the arm-probe configuration file does not make this easy but that section could be patched to use a more sane structure. That isn't related to the LAVA support though.
> > > On 29 May 2017 at 16:45, Vincent Guittot vincent.guittot@linaro.org wrote: >> On 25 May 2017 at 10:03, Neil Williams codehelp@debian.org wrote: >>> On Wed, 24 May 2017 21:07:45 +0200 >>> Vincent Guittot vincent.guittot@linaro.org wrote: >>> >>>> Hi Neil, >>>> >>>> Le 24 mai 2017 7:42 PM, "Lisa Nguyen" lisa.nguyen@linaro.org a >>>> écrit : >>>> >>>> On 24 May 2017 at 17:02, Neil Williams codehelp@debian.org wrote: >>>> > On Fri, 19 May 2017 17:02:14 +0100 >>>> > Neil Williams codehelp@debian.org wrote: >>>> > >>>> >> On Fri, 19 May 2017 16:48:11 +0100 >>>> >> Steve McIntyre steve.mcintyre@linaro.org wrote: >>>> >> >>>> >> > Hi folks! >>>> >> > >>>> >> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote: >>>> >> > >On Thu, 27 Apr 2017 08:19:19 +0100 >>>> >> > >Neil Williams codehelp@debian.org wrote: >>>> >> > > >>>> >> > >>>> >> > I've just run a local test with an AEP inside lxc on my local >>>> >> > machine. As far as I can see, there's nothing particularly magic >>>> >> > going on here. The only problem I see is Lisa's config file >>>> >> > pointing at the wrong device file. arm-probe needs a ttyACM-style >>>> >> > device to talk to. Using: >>>> >> > >>>> >> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0 >>>> >> > >>>> >> > I create that device in my container. I build libwebsockets and >>>> >> > the arm-probe software in the container, then >>>> >> > specify /dev/ttyACM0 in the AEP config file. I can run it just >>>> >> > fine: >>>> >> > >>>> >> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C >>>> >> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg >>>> >> > # config_name: pandaboard >>>> >> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W) >>>> >> > 400us Configuration: pandaboard >>>> >> > # date: Fri, 19 May 2017 16:29:50 +0100 >>>> >> > # host: lxc-aep-test-174524 >>>> >> > # >>>> >> > + /dev/ttyACM0 >>>> >> > Starting... >>>> >> > sending start to 0 >>>> >> > # VDD_ALL VDD ROOT #ff0000 SoC >>>> >> > # >>>> >> > # >>>> >> > time VDD(V) VDD(A) VDD(W) >>>> >> > 0.000500 5.11 0.0474 0.24196 >>>> >> > 0.000600 5.11 0.0364 0.18572 >>>> >> > 0.000700 5.11 0.0314 0.16012 >>>> >> > 0.000800 5.10 0.0544 0.27734 >>>> >> > 0.000900 5.10 0.0234 0.11923 >>>> >> > 0.001000 5.11 0.0304 0.15505 >>>> >> > ... >>>> >> > >>>> >> > I don't have any problems running things and getting output here. >>>> >> > >>>> >> > I *have* seen two real bugs here while trying to get things >>>> >> > running, though: >>>> >> > >>>> >> > 1. If the device specified in the config file doesn't exist, or >>>> >> > is the wrong type of device, or (maybe) there is any other kind >>>> >> > of problem with it, you get *no* useful feedback to say there's a >>>> >> > problem. Running things under strace will show the background >>>> >> > libarmep process attempt to use the device specified, but >>>> >> > there's no error handling. :-( >>>> >> > >>>> >> > 2. The "-x" option says that the arm-probe program is meant to >>>> >> > exit when you've done capturing, but it just sits there forever >>>> >> > when I'm testing. I've wrapped it using the "timeout" command to >>>> >> > work around that for now. >>>> >> > >>>> >> > If I knew where to file those bugs, I would, but it's really not >>>> >> > obvious. They're really easy to reproduce, I hope... >>>> >> > >>>> >> > In terms of the /dev/ttyACM0 creation, the lxc-device man page >>>> >> > says that it creates devices based on their existing entries on >>>> >> > the host. Double-check that the host (dispatcher) has an >>>> >> > appropriate /dev/ttyACM0 if you're still seeing problems? >>>> >> >>>> >> Steve was using staging-panda03 with the ARM Energy Probe which I'd >>>> >> been using for the tests of the new code to ensure >>>> >> that /dev/ttyACM0 can be attached to the LXC. >>>> >> >>>> >> That panda and AEP will shortly return to staging and then the >>>> >> changes to LAVA and the required changes to the test definition >>>> >> can be available for the 2017.6 release. >>>> > >>>> > OK. staging-panda03 is back and has been running tests. This is what >>>> > we've learnt so far: >>>> > >>>> > 0: This does not appear to be an LXC issue. Running the commands >>>> > manually on the worker with the same LXC on the same worker does >>>> > return data from the probe. >>>> > >>>> > 1: Running the same commands in "headless" mode shows that the probe >>>> > software starts successfully but something within the protocol >>>> > parser or sampler fails to retrieve data. >>>> >>>> >>>> What do you mean by headless mode? >>> >>> With no controlling terminal. >>> >>> LAVA runs as a daemon and forks processes to run the tests. This does >>> not usually cause issues and is fundamental to automation. When I run >>> the same commands in an LXC as a user logged into the machine, I get >>> output. When I run the commands from a daemon, the output is not seen. >> >> even when you redirect the output to a file ? >> >> On workload automation, arm_probe is called in a dedicated process >> with subprocess.Popen and we are able to get data in the file. >> Just wonder what could be the difference in lava case >> >>> >>>> > >>>> > 2: The websockets dependency is completely unnecessary and has been >>>> > disabled in the build I've been testing: >>>> > https://git.linaro.org/lava-team/arm-probe.git/ >>>> >>>> >>>> Yes. I do the same. aepd is only useful for the web interface. >>>> >>>> >>>> > >>>> > 3: We've added a *lot* of debug to the arm-probe code >>>> > (https://staging.validation.linaro.org/scheduler/job/174969 which >>>> > was run using >>>> > https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b >>>> 2958e3045da77d7db25a7cfe48359211aa4cf1) >>>> > but are not much closer to identifying the precise problem with the >>>> > code. However, I am satisfied that this is a problem in the >>>> > arm-probe software when being run in automation. >>>> >>>> >>>> Can you give details about "this is a problem in arm probe software >>>> when being run in automation"? Do you mean workload automation? >>> >>> No. Not workload automation - that is a specific test framework which >>> can use LAVA. I'm talking about the process of running tests on behalf >>> of users without the users being logged in or interacting with the >>> shell. >> >> ok. Just to be sure about the context >> >>> >>>> > >>>> > 4: the arm-probe code is appallingly difficult to read and debug. It >>>> > also seems unnecessarily complex. >>>> > >>>> > 5: I plan to remove a lot of the debug from the cloned arm-probe >>>> > repository (which has also had a few fixes to compile with gcc6) but >>>> > I'm running out of time to work on the arm-probe software myself. >>>> > >>>> > Someone needs to update the arm-probe software: >>>> > >>>> > a) to remove websockets as a compile-time option as this only bloats >>>> > the build in automation where a web based UI is impossible anyway. >>>> > I've done this by brute force in my cloned repo, I just patched out >>>> > the dependency. >>>> > >>>> > b) improve the code to have comments and output about what is >>>> > happening and why when verbose mode is used. >>>> > >>>> > c) Identify what is preventing the software from receiving data from >>>> > the probe when run in automation. >>>> > >>>> > d) the config file still needs fixes to allow for changes in the >>>> > device node name from one probe to another. >>>> > >>>> > -- >>>> >>>> CC'ing Vincent, so he can read Neil's and Steve's comments above and >>>> respond (if he has anything to say) while I'm on holiday until early >>>> June. >>> >>> Steve & I are also on annual leave next week. >>> >>> -- >>> >>> >>> Neil Williams >>> ============= >>> http://www.linux.codehelp.co.uk/ >>> > > > > -- > > Neil Williams > ============= > neil.williams@linaro.org > http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/
--
Neil Williams
neil.williams@linaro.org http://www.linux.codehelp.co.uk/