Re: [RFC] Energy/power monitoring within the kernel

24 Oct 2012

      On Wed, 2012-10-24 at 01:40 +0100, Thomas Renninger wrote:
...
...
More and more of people are getting interested in the subject of power
(energy) consumption monitoring. We have some external tools like
"battery simulators", energy probes etc., but some targets can measure
their power usage on their own.
Traditionally such data should be exposed to the user via hwmon sysfs
interface, and that's exactly what I did for "my" platform - I have
a /sys/class/hwmon/hwmon*/device/energy*_input and this was good
enough to draw pretty graphs in userspace. Everyone was happy...
Now I am getting new requests to do more with this data. In particular
I'm asked how to add such information to ftrace/perf output.
Why? What is the gain?
Perf events can be triggered at any point in the kernel.
A cpufreq event is triggered when the frequency gets changed.
CPU idle events are triggered when the kernel requests to enter an idle state
or exits one.
When would you trigger a thermal or a power event?
There is the possibility of (critical) thermal limits.
But if I understand this correctly you want this for debugging and
I guess you have everything interesting one can do with temperature
values:

read the temperature
draw some nice graphs from the results

Hm, I guess I know what you want to do:
In your temperature/energy graph, you want to have some dots
when relevant HW states (frequency, sleep states,  DDR power,...)
changed. Then you are able to see the effects over a timeline.
So you have to bring the existing frequency/idle perf events together
with temperature readings
Cleanest solution could be to enhance the exisiting userspace apps
(pytimechart/perf timechart) and let them add another line
(temperature/energy), but the data would not come from perf, but
from sysfs/hwmon.
Not sure whether this works out with the timechart tools.
Anyway, this sounds like a userspace only problem.
Ok, so it is actually what I'm working on right now. Not with the
standard perf tool (there are other users of that API ;-) but indeed I'm
trying to "enrich" the data stream coming from kernel with user-space
originating values. I am a little bit concerned about effect of extra
syscalls (accessing the value and gettimeofday to generate a timestamp)
at a higher sampling rates, but most likely it won't be a problem. Can
report once I know more, if this is of interest to anyone.
Anyway, there are at least two debug/trace related use cases that can
not be satisfied that way (of course one could argue about their
usefulness):
1. ftrace-over-network (https://lwn.net/Articles/410200/) which is
particularly appealing for "embedded users", where there's virtually no
useful userspace available (think Android). Here a (functional) trace
event is embedded into a normal trace and available "for free" at the
host side.
2. perf groups - the general idea is that one event (let it be cycle
counter interrupt or even a timer) triggers read of other values (eg.
cache counter or - in this case - energy counter). The aim is to have a
regular "snapshots" of the system state. I'm not sure if the standard
perf tool can do this, but I do :-)
And last, but not least, there are the non-debug/trace clients for
energy data as discussed in other mails in this thread. Of course the
trace event won't really satisfy their needs either.
Thanks for your feedback!
Paweł

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

Re: [RFC] Energy/power monitoring within the kernel