On Wed, Nov 07, 2012 at 02:11:10PM +0200, Kirill A. Shutemov wrote: [...]
We can have plenty of "free" memory, of which say 90% will be caches, and say 10% idle. But we do want to differentiate these types of memory (although not going into details about it), i.e. we want to get notified when kernel is reclaiming. And we also want to know when the memory comes from swapping others' pages out (well, actually we don't call it swap, it's "new allocations cost becomes high" -- it might be a result of many factors (swapping, fragmentation, etc.) -- and userland might analyze the situation when this happens).
Exposing all the VM details to userland is not an option
IIUC, you want MemFree + Buffers + Cached + SwapCached, right? It's already exposed to userspace.
How? If you mean vmstat, then no, that interface is not efficient at all: we have to poll it from userland, which is no go for embedded (although, as a workaround it can be done via deferrable timers in userland, which I posted a few months ago).
But even with polling vmstat via deferrable timers, it leaves us with the ugly timers-based approach (and no way to catch the pre-OOM conditions). With vmpressure_fd() we have the synchronous notifications right from the core (upon which, you can, if you want to, analyze the vmstat).
- The last time I checked, cgroups memory controller did not (and I guess still does not) not account kernel-owned slabs. I asked several times why so, but nobody answered.
Almost there. Glauber works on it.
It's good to hear, but still, the number of "used KBs" is a bad (or irrelevant) metric for the pressure. We'd still need to analyze the memory in more details, and "'limit - used' KBs" doesn't tell us anything about the cost of the available memory.
Thanks, Anton.