(5/1/12 8:20 PM), Anton Vorontsov wrote:
Hello Rik,
Thanks for looking into this!
On Tue, May 01, 2012 at 05:04:21PM -0400, Rik van Riel wrote:
On 05/01/2012 09:18 AM, Anton Vorontsov wrote:
This patch implements a new event type, it will trigger whenever a value becomes greater than user-specified threshold, it complements the 'less-then' trigger type.
Also, let's implement the one-shot mode for the events, when set, userspace will only receive one notification per crossing the boundaries.
Now when both LT and GT are set on the same level, the event type works as a cross event type: it triggers whenever a value crosses the threshold from a lesser values side to a greater values side, and vice versa.
We use the event types in an userspace low-memory killer: we get a notification when memory becomes low, so we start freeing memory by killing unneeded processes, and we get notification when memory hits the threshold from another side, so we know that we freed enough of memory.
How are these vmevents supposed to work with cgroups?
Currently these are independent subsystems, if you have memcg enabled, you can do almost anything* with the memory, as memg has all the needed hooks in the mm/ subsystem (it is more like "memory management tracer" nowadays :-).
But cgroups have its cost, both performance penalty and memory wastage. For example, in the best case, memcg constantly consumes 0.5% of RAM to track memory usage, this is 5 MB on a 1 GB "embedded" machine. To some people it feels just wrong to waste that memory for mere notifications.
Of course, this alone can be considered as a lame argument for making another subsystem (instead of "fixing" the current one). But see below, vmevent is just a convenient ABI.
What do we do when a cgroup nears its limit, and there is no more swap space available?
What do we do when a cgroup nears its limit, and there is swap space available?
As of now, this is all orthogonal to vmevent. Vmevent doesn't know about cgroups. If kernel has the memcg enabled, one should probably* go with it (or better, with its ABI). At least for now.
It would be nice to be able to share the same code for embedded, desktop and server workloads...
It would be great indeed, but so far I don't see much that vmevent could share. Plus, sharing the code at this point is not that interesting; it's mere 500 lines of code (comparing to more than 10K lines for cgroups, and it's not including memcg_ hooks and logic that is spread all over mm/).
Today vmevent code is mostly an ABI implementation, there is very little memory management logic (in contrast to the memcg).
But, if it doesn't work desktop/server area, it shouldn't be merged. We have to consider the best design before kernel inclusion. They cann't be separeted to discuss.