Hi all,
I've just created a very basic munin installation for the lab:
http://munin.validation.linaro.org/
The monitoring wonk I consulted said that munin is perhaps not the greatest way of getting graphs of your system but that it's probably the easiest to set up. Better than nothing :-)
To add a system to munin you need to:
1) apt-get install munin-node on the system 2) Edit /etc/munin/munin-node.conf on the system to contain: host_name XXX.validation.linaro.org allow ^192.168.1.32$ 3) sudo service munin-node restart on the system 4) Add the following to /etc/munin/munin.conf on linaro-gateway: [XXX.validation.linaro.org] address 192.168.1.YYY use_node_name yes
and that's it! The data viewable at http://munin.validation.linaro.org/ is generated by a */5 cron, so it takes a while for a new host to appear. If someone wants to add dogfood, the compute nodes, the fast model instances etc etc be my guest...
Once all the systems are added, the next thing is to start looking at adding more us-specific metrics -- scheduler queue lengths, request numbers and duration from django or apache, various postgres stats etc etc. It would also be nice to add "events" to the graphs such as rollouts and job start/ends but I don't know if that is supported.
Cheers, mwh