Assume that a CPU time“ A” is read from /proc/stat, and after a while, a CPU time “B” is read. If T = B – A < 0, T is identified as a large number as an unsigned integer. As a result, the CPU usage calculated by this way will be abnormally high. It seems to be a problem to be fixed.
original link: https://lore.kernel.org/lkml/20220813000102.42051-1-hucool.lihua@huawei.com/
在 2022/8/15 16:15, Peter Zijlstra 写道:
On Sat, Aug 13, 2022 at 08:01:02AM +0800, Li Hua wrote:
The problem that the statistical time goes backward, the value read first is 319, and the value read again is 318. As follows: first: cat /proc/stat | grep cpu1 cpu1 319 0 496 41665 0 0 0 0 0 0 then: cat /proc/stat | grep cpu1 cpu1 318 0 497 41674 0 0 0 0 0 0
Time goes back, which is counterintuitive.
After debug this, The problem is caused by the implementation of kcpustat_cpu_fetch_vtime. As follows:
CPU0 CPU1
First: show_stat(): ->kcpustat_cpu_fetch() ->kcpustat_cpu_fetch_vtime() ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu) + vtime->utime + delta; rq->curr is in user mod ---> When CPU1 rq->curr running on userspace, need add utime and delta ---> rq->curr->vtime->utime is less than 1 tick Then: show_stat(): ->kcpustat_cpu_fetch() ->kcpustat_cpu_fetch_vtime() ->cpustat[CPUTIME_USER] = kcpustat_cpu(cpu); rq->curr is in kernel mod ---> When CPU1 rq->curr running on kernel space, just got kcpustat
This is unreadable, what?!? .