Hi Neil,

Le 24 mai 2017 7:42 PM, "Lisa Nguyen" <lisa.nguyen@linaro.org> a écrit :
On 24 May 2017 at 17:02, Neil Williams <codehelp@debian.org> wrote:
> On Fri, 19 May 2017 17:02:14 +0100
> Neil Williams <codehelp@debian.org> wrote:
>
>> On Fri, 19 May 2017 16:48:11 +0100
>> Steve McIntyre <steve.mcintyre@linaro.org> wrote:
>>
>> > Hi folks!
>> >
>> > On Wed, May 17, 2017 at 03:05:41PM +0100, Neil Williams wrote:
>> > >On Thu, 27 Apr 2017 08:19:19 +0100
>> > >Neil Williams <codehelp@debian.org> wrote:
>> > >
>> >
>> > I've just run a local test with an AEP inside lxc on my local
>> > machine. As far as I can see, there's nothing particularly magic
>> > going on here. The only problem I see is Lisa's config file
>> > pointing at the wrong device file. arm-probe needs a ttyACM-style
>> > device to talk to. Using:
>> >
>> > # lxc-device -n lxc-aep-test-174524 add /dev/ttyACM0
>> >
>> > I create that device in my container. I build libwebsockets and the
>> > arm-probe software in the container, then specify /dev/ttyACM0 in
>> > the AEP config file. I can run it just fine:
>> >
>> > root@lxc-aep-test-174524:/arm-probe# ./arm-probe/arm-probe -C
>> > panda-aep.cfg -l10 -x # configuration: panda-aep.cfg
>> > # config_name: pandaboard
>> > # trigger: 0.400000V (hyst 0.200000V) 0.000000W (hyst 0.200000W)
>> > 400us Configuration: pandaboard
>> > # date: Fri, 19 May 2017 16:29:50 +0100
>> > # host: lxc-aep-test-174524
>> > #
>> > + /dev/ttyACM0
>> > Starting...
>> > sending start to 0
>> > # VDD_ALL       VDD     ROOT    #ff0000 SoC
>> > #
>> > #
>> > time  VDD(V) VDD(A) VDD(W)
>> > 0.000500  5.11 0.0474 0.24196
>> > 0.000600  5.11 0.0364 0.18572
>> > 0.000700  5.11 0.0314 0.16012
>> > 0.000800  5.10 0.0544 0.27734
>> > 0.000900  5.10 0.0234 0.11923
>> > 0.001000  5.11 0.0304 0.15505
>> > ...
>> >
>> > I don't have any problems running things and getting output here.
>> >
>> > I *have* seen two real bugs here while trying to get things running,
>> > though:
>> >
>> >  1. If the device specified in the config file doesn't exist, or is
>> >     the wrong type of device, or (maybe) there is any other kind of
>> >     problem with it, you get *no* useful feedback to say there's a
>> >     problem. Running things under strace will show the background
>> >     libarmep process attempt to use the device specified, but
>> > there's no error handling. :-(
>> >
>> > 2. The "-x" option says that the arm-probe program is meant to exit
>> >    when you've done capturing, but it just sits there forever when
>> > I'm testing. I've wrapped it using the "timeout" command to work
>> > around that for now.
>> >
>> > If I knew where to file those bugs, I would, but it's really not
>> > obvious. They're really easy to reproduce, I hope...
>> >
>> > In terms of the /dev/ttyACM0 creation, the lxc-device man page says
>> > that it creates devices based on their existing entries on the
>> > host. Double-check that the host (dispatcher) has an appropriate
>> > /dev/ttyACM0 if you're still seeing problems?
>>
>> Steve was using staging-panda03 with the ARM Energy Probe which I'd
>> been using for the tests of the new code to ensure that /dev/ttyACM0
>> can be attached to the LXC.
>>
>> That panda and AEP will shortly return to staging and then the changes
>> to LAVA and the required changes to the test definition can be
>> available for the 2017.6 release.
>
> OK. staging-panda03 is back and has been running tests. This is what
> we've learnt so far:
>
> 0: This does not appear to be an LXC issue. Running the commands
> manually on the worker with the same LXC on the same worker does return
> data from the probe.
>
> 1: Running the same commands in "headless" mode shows that the probe
> software starts successfully but something within the protocol parser
> or sampler fails to retrieve data.

What do you mean by headless mode?

>
> 2: The websockets dependency is completely unnecessary and has been
> disabled in the build I've been testing:
> https://git.linaro.org/lava-team/arm-probe.git/

Yes. I do the same. aepd is only useful for the web interface.

>
> 3: We've added a *lot* of debug to the arm-probe code
> (https://staging.validation.linaro.org/scheduler/job/174969 which was
> run using
> https://git.linaro.org/lava-team/arm-probe.git/commit/?id=9b2958e3045da77d7db25a7cfe48359211aa4cf1)
> but are not much closer to identifying the precise problem with the
> code. However, I am satisfied that this is a problem in the arm-probe
> software when being run in automation.

Can you give details about "this is a problem in arm probe software when being run in automation"? Do you mean workload automation?

>
> 4: the arm-probe code is appallingly difficult to read and debug. It
> also seems unnecessarily complex.
>
> 5: I plan to remove a lot of the debug from the cloned arm-probe
> repository (which has also had a few fixes to compile with gcc6) but
> I'm running out of time to work on the arm-probe software myself.
>
> Someone needs to update the arm-probe software:
>
> a) to remove websockets as a compile-time option as this only bloats
> the build in automation where a web based UI is impossible anyway. I've
> done this by brute force in my cloned repo, I just patched out the
> dependency.
>
> b) improve the code to have comments and output about what is happening
> and why when verbose mode is used.
>
> c) Identify what is preventing the software from receiving data from
> the probe when run in automation.
>
> d) the config file still needs fixes to allow for changes in the device
> node name from one probe to another.
>
> --

CC'ing Vincent, so he can read Neil's and Steve's comments above and
respond (if he has anything to say) while I'm on holiday until early
June.

>
>
> Neil Williams
> =============
> http://www.linux.codehelp.co.uk/
>
>
> _______________________________________________
> linaro-validation mailing list
> linaro-validation@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-validation
>