linaro-validation

linaro-validation@lists.linaro.org

818 discussions

by Dave Pigott

Hi all, Just a heads up: I started to create three cloud nodes today, two for toolchain and one for the backup production system so we can seamlessly upgrade control, and I hit a problem. It's reporting (not very clearly - you have to dig) that we have run out of floating IPs. I allocated 192.168.1.48/29, which should give us 8 (just enough, as it happens) and I can't see why it's holding onto the other two. I'm investigating, but also suggest that I move it up to 192.168.1.48/28 so that we can have 16 instances. Obviously, once we move to 192.1.0.0/16 this problem will go away and we'll be able to allocate a whole tranche of IPs to cloud nodes. Upshot is, I may have to restart some services and do a db sync, so when I know how the land lies, I'll schedule some cloud downtime. In theory there shouldn't be any, but just to be on the safe side. Thanks Dave

13 years

Failure fix...

by Dave Pigott

Hi all, Andy put me onto a good idea for forcing the edge case to trigger my code. He suggested I go on with conmux and interfere with it. Problem is, timing your keystrokes just right turns out to be practically impossible, and you end up making the job fail but in the wrong way. However, thinking about it, the obvious thing to do is to take the board offline, go on with conmux, disable eth0, put back online. I'm just about to do that. As long as it passes, I'll put it back to looping. Thanks Dave

13 years

Staging update...

by Dave Pigott

Hi all, I deployed an update onto staging, and I can see that it's there, but somehow it seems to be running the old code. Take a look at: http://staging.validation.linaro.org/scheduler/job/34608 This was run after my update, but still complains about "target". What have I don't wrong? I did an su as instance manager, and then a "$HOME/bin/update-staging" What have I missed? Thanks Dave

13 years

About the telnet process

by YongQin Liu

HI, All I just run a CTS job on staging now, and found that the cpu usage of the telnet process is nearly 100% Anyone has any idea about that? Below is the link of the collectd information installed on staging: http://staging-metrics.validation.linaro.org:8080/collectd/bin/index.cgi?ho… The high CPU usage is from 13:40 on CPU3 and changed to CPU1 at 14:30. below is the output of the top command: top - 15:18:26 up 7 days, 5:27, 1 user, load average: 1.81, 1.80, 1.46 Tasks: 144 total, 2 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 2.5%us, 23.1%sy, 0.0%ni, 74.0%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8178504k total, 4866776k used, 3311728k free, 66408k buffers Swap: 0k total, 0k used, 0k free, 3870820k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 31383 root 20 0 27516 1504 1224 R 100 0.0 97:09.40 telnet 3626 root 20 0 2437m 92m 8496 S 0 1.2 0:22.77 java 13383 lava-sta 20 0 198m 47m 5180 S 0 0.6 0:18.85 uwsgi 1 root 20 0 24460 2336 1244 S 0 0.0 0:00.98 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:08.43 ksoftirqd/0 Thanks, Yongqin Liu --------------------------------------------------------------- #mailing list linaro-android(a)lists.linaro.org <linaro-dev(a)lists.linaro.org> http://lists.linaro.org/mailman/listinfo/linaro-android linaro-validation(a)lists.linaro.org <linaro-dev(a)lists.linaro.org> http://lists.linaro.org/pipermail/linaro-validation

13 years

panda10 sdcard dead

by Andy Doan

Hey Guys, I just looked into Panda10 health check failures over the past 24hours. The good news is my bad code isn't to blame. The bad news is that the SD card appears to be having some issues.

13 years, 1 month

munin for the lab

by Michael Hudson-Doyle

Hi all, I've just created a very basic munin installation for the lab: http://munin.validation.linaro.org/ The monitoring wonk I consulted said that munin is perhaps not the greatest way of getting graphs of your system but that it's probably the easiest to set up. Better than nothing :-) To add a system to munin you need to: 1) apt-get install munin-node on the system 2) Edit /etc/munin/munin-node.conf on the system to contain: host_name XXX.validation.linaro.org allow ^192\.168\.1\.32$ 3) sudo service munin-node restart on the system 4) Add the following to /etc/munin/munin.conf on linaro-gateway: [XXX.validation.linaro.org] address 192.168.1.YYY use_node_name yes and that's it! The data viewable at http://munin.validation.linaro.org/ is generated by a */5 cron, so it takes a while for a new host to appear. If someone wants to add dogfood, the compute nodes, the fast model instances etc etc be my guest... Once all the systems are added, the next thing is to start looking at adding more us-specific metrics -- scheduler queue lengths, request numbers and duration from django or apache, various postgres stats etc etc. It would also be nice to add "events" to the graphs such as rollouts and job start/ends but I don't know if that is supported. Cheers, mwh

13 years, 1 month

TC2s offline

by Dave Pigott

Hi All, Due to a glitch with UEFI and the latest kernels, we are forced to leave the TC2s offline until the issue is resolved. Ryan Harkin and I have been working to try and resolve this, but the best we could do is to get them to pass their health check (using sticking plaster, string and a large hammer) but they would then fail every test that was submitted to them, which would be kind of pointless. We're working actively to fix this problem, and I'll let you know when we're back up and running. Thanks, and apologies once again, Dave

13 years, 1 month

first dispatcher run capturing energy data (with DS-5)

by Michael Hudson-Doyle

Hi all, It's all kinds of rough but I've just crossed a milestone: I ran the dispatcher and had DS-5 capture energy data from my host while it was running, using this branch (lots of which Andy wrote): https://code.launchpad.net/~mwhudson/lava-dispatcher/signals/+merge/131128 Currently the output of streamline -report is attached to the test result as an attribute, which is just awful. Either it should be parsed into interesting data, or the -report output should be attached to test run in a useful way (or both). But it's a start! I'm attaching the test definition and job file I used. Cheers, mwh

13 years, 1 month

2012.10 deployment issues

by Andy Doan

Hey Guys, I just hit a really annoying issue while trying to upgrade control to our latest lava-dispatcher code. Everything works great in dogfood and staging. However, I guess the python version on control is just different enough to cause a problem with our new use of "configglue". The issue is with our "boot_cmds" that are set by our device-type .conf files. The faulty snippet is roughly: string_to_list(boot_cmds) on a "normal" system, this produces an array of commands. On control we get a encoding mess that doesn't work with u-boot. eg: ['m\x00\x00\x00m\x00\x00\x00c\x00\x00\x00 ....... I think the easiest fix is to change our master.py to call: string_to_list(boot_cmds.encode('ascii')) I'm doing another round of unit testing to prove this works before attempting to deploy. For now I've marked all the devices that execute from control as offline. If the fix takes too long, I'll just revert to the previous lava deployment -andy

13 years, 1 month

hacking goals at Connect

by Andy Doan

Its often difficult to achieve pre-planned hacking goals at Connect, but Michael and I spent a little time thinking about this topic and wanted to try and lay out an agenda for LAVA next week. The general thought process we took for these was: * Is it beneficial to work on as a group? * Is it something that we'll benefit from even we only wind up having 20 minutes and could it also work well if we find the time to work on it for 2 hours. With that in mind we are thinking about these items: = Galaxy Nexus Fastboot Hacking Have a "learn fastboot" session based on the email thread from last week. = NI battery simulator I can show how this works Zach can try and help with the TCP disconnect issue = Versatile Express intro hacking Dave can give us some education on how the VExpress works/boots/etc. Possibly grab Ryan/Tixy to join. = Deployment Type Improvements I think this will roughly be a "get Antonio to teach us Chef" session. Maybe think about how to get Chef built into open stack image with cloud-init. = monitoring – adding app-specific metrics to munin Michael can talk to us a bit about Munin and how to add new custom metrics like. Then maybe we can hack on adding some like: * web/django stuff * postgres stuff * job stuff - num flocks - device type wait time at various %iles - device type utilizations

13 years, 1 month

← Newer
1
...
54
55
56
57
58
59
60
...
82
Older →

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

linaro-validation