On Thu, Jun 07, 2012 at 11:25:30PM -0400, KOSAKI Motohiro wrote: [...]
As I already told you, vmevent shouldn't deal a timer at all. It is NOT familiar to embedded world. Because of, time subsystem is one of most complex one on linux. Our 'time' is not simple concept. time.h says we have 5 possibilities user want, at least.
include/linux/time.h
#define CLOCK_REALTIME 0 #define CLOCK_MONOTONIC 1 #define CLOCK_MONOTONIC_RAW 4 #define CLOCK_REALTIME_COARSE 5 #define CLOCK_MONOTONIC_COARSE 6
And, some people want to change timer slack for optimize power consumption.
So, Don't reinventing the wheel. Just use posix tiemr apis.
I'm puzzled, why you mention posix timers in the context of the in-kernel user? And none of the posix timers are deferrable.
The whole point of vmevent is to be lightweight and save power. Vmevent is doing all the work in the kernel, and it uses deferrable timers/workqueues to save power, and it is a proper in-kernel API to do so.
If you're saying that we should set up a timer in the userland and constantly read /proc/vmstat, then we will cause CPU wake up every 100ms, which is not acceptable. Well, we can try to introduce deferrable timers for the userspace. But then it would still add a lot more overhead for our task, as this solution adds other two context switches to read and parse /proc/vmstat. I guess this is not a show-stopper though, so we can discuss this.
Leonid, Pekka, what do you think about the idea?
Thanks,