Re: [PATCH v4] vmevent: Implement greater-than attribute state and one-shot mode

2 May 2012


      Hello Rik,
Thanks for looking into this!
On Tue, May 01, 2012 at 05:04:21PM -0400, Rik van Riel wrote:
...
On 05/01/2012 09:18 AM, Anton Vorontsov wrote:
...
This patch implements a new event type, it will trigger whenever a
value becomes greater than user-specified threshold, it complements
the 'less-then' trigger type.
Also, let's implement the one-shot mode for the events, when set,
userspace will only receive one notification per crossing the
boundaries.
Now when both LT and GT are set on the same level, the event type
works as a cross event type: it triggers whenever a value crosses
the threshold from a lesser values side to a greater values side,
and vice versa.
We use the event types in an userspace low-memory killer: we get a
notification when memory becomes low, so we start freeing memory by
killing unneeded processes, and we get notification when memory hits
the threshold from another side, so we know that we freed enough of
memory.
How are these vmevents supposed to work with cgroups?
Currently these are independent subsystems, if you have memcg enabled,
you can do almost anything* with the memory, as memg has all the needed
hooks in the mm/ subsystem (it is more like "memory management tracer"
nowadays :-).
But cgroups have its cost, both performance penalty and memory wastage.
For example, in the best case, memcg constantly consumes 0.5% of RAM to
track memory usage, this is 5 MB on a 1 GB "embedded" machine.  To some
people it feels just wrong to waste that memory for mere notifications.
Of course, this alone can be considered as a lame argument for making
another subsystem (instead of "fixing" the current one). But see below,
vmevent is just a convenient ABI.
...
What do we do when a cgroup nears its limit, and there
is no more swap space available?
What do we do when a cgroup nears its limit, and there
is swap space available?
As of now, this is all orthogonal to vmevent. Vmevent doesn't know
about cgroups. If kernel has the memcg enabled, one should probably*
go with it (or better, with its ABI). At least for now.
...
It would be nice to be able to share the same code for
embedded, desktop and server workloads...
It would be great indeed, but so far I don't see much that
vmevent could share. Plus, sharing the code at this point is not
that interesting; it's mere 500 lines of code (comparing to
more than 10K lines for cgroups, and it's not including memcg_
hooks and logic that is spread all over mm/).
Today vmevent code is mostly an ABI implementation, there is
very little memory management logic (in contrast to the memcg).
Personally, I would rather consider sharing ABI at some point:
i.e. making a memcg backend for the vmevent. That would be pretty
cool. And once done, vmevent would be cgroups-aware (if memcg
enabled, of course; and if not, vmevent would still work, with
no memcg-related expenses).
* For low memory notifications, there are still some unresolved
  issues with memcg. Mainly, slab accounting for the root cgroup:
  currently developed slab accounting doesn't account kernel's
  internal memory consumption, plus it doesn't account slab memory
  for the root cgroup at all.
A few days ago I asked[1] why memcg doesn't do all this, and
  whether it is a design decision or just an implementation detail
  (so that we have a chance to fix it).
But so far there were no feedback. We'll see how things turn out.
[1] http://lkml.org/lkml/2012/4/30/115
Thanks!
-- 
Anton Vorontsov
Email: cbouatmailru@gmail.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH v4] vmevent: Implement greater-than attribute state and one-shot mode