Hello Artem,
On 02/09/20 8:55 pm, Artem Bityutskiy wrote:
On Wed, 2020-09-02 at 17:15 +0530, Pratik Rajesh Sampat wrote:
Measure cpuidle latencies on wakeup to determine and compare with the advertsied wakeup latencies for each idle state.
Thank you for pointing me to your talk. It was very interesting! I certainly did not know about that the Intel architecture being aware of timers and pre-wakes the CPUs which makes the timer experiment observations void.
It looks like the measurements include more than just C-state wake, they also include the overhead of waking up the proces, context switch, and potentially any interrupts that happen on that CPU. I am not saying this is not interesting data, it surely is, but it is going to be larger than you see in cpuidle latency tables. Potentially significantly larger.
The measurements will definitely include overhead than just the C-State wakeup.
However, we are also collecting a baseline measurement wherein we run the same test on a 100% busy CPU and the measurement of latency from that could be considered to the kernel-userspace overhead. The rest of the measurements would be considered keeping this baseline in mind.
Therefore, I am not sure this program should be advertised as "cpuidle measurement". It really measures the "IPI latency" in case of the IPI method.
Now with the new found knowledge of timers in Intel, I understand that this really only seems to measure IPI latency and not timer latency, although both the observations shouldn't be too far off anyways.
A baseline measurement for each case of IPI and timers is taken at 100 percent CPU usage to quantify for the kernel-userpsace overhead during execution.
At least on Intel platforms, this will mean that the IPI method won't cover deep C-states like, say, PC6, because one CPU is busy. Again, not saying this is not interesting, just pointing out the limitation.
That's a valid point. We have similar deep idle states in POWER too. The idea here is that this test should be run on an already idle system, of course there will be kernel jitters along the way which can cause little skewness in observations across some CPUs but I believe the observations overall should be stable.
Another solution to this could be using isolcpus, but that just increases the complexity all the more. If you have any suggestions of any other way that could guarantee idleness that would be great.
I was working on a somewhat similar stuff for x86 platforms, and I am almost ready to publish that on github. I can notify you when I do so if you are interested. But here is a small presentation of the approach that I did on Plumbers last year:
https://youtu.be/Opk92aQyvt0?t=8266
(the link points to the start of my talk)
Sure thing. Do notify me when it comes up. I would be happy to have a look at it.
-- Thanks! Pratik