Hi Luiz,
Thanks for your email!
On Fri, Nov 30, 2012 at 03:47:25PM -0200, Luiz Capitulino wrote: [...]
But there is one, rather major issue: we're crossing kernel-userspace boundary. And with the scheme we'll have to cross the boundary four times: query / reply-available / control / reply-shrunk / (and repeat if necessary, every SHRINK_BATCH pages). Plus, it has to be done somewhat synchronously (all the four stages), and/or we have to make a "userspace shrinker" thread working in parallel with the normal shrinker, and here, I'm afraid, we'll see more strange interactions. :)
Wouldn't this be just like kswapd?
Sure, this is similar, but only for indirect reclaim (obviously).
How we'd do this for the direct reclaim I have no idea, honestly, with Andrew's idea it must be all synchronous, so playing ping-pong with userland during the direct reclaim will be hard.
So, the best thing to do with the direct recaim, IMHO, is just send a notification.
But there is a good news: for these kind of fine-grained control we have a better interface, where we don't have to communicate [very often] w/ the kernel. These are "volatile ranges", where userland itself marks chunks of data as "I might need it, but I won't cry if you recycle it; but when I access it next time, let me know if you actually recycled it". Yes, userland no longer able to decide which exact page it permits to recycle, but we don't have use-cases when we actually care that much. And if we do, we'd rather introduce volatile LRUs with different priorities, or something alike.
I'm new to this stuff so please take this with a grain of salt, but I'm not sure volatile ranges would be a good fit for our use case: we want to make (kvm) guests reduce their memory when the host is getting memory pressure.
Yes, for this kind of things you want a simple notification.
I wasn't saying that volatile ranges must be a substitute for notifications, quite the opposite: I was saying that you can do volatile ranges in userland by using "userland-shrinker".
It can be even wrapped into a library, with the same mmap() libc interface. But it will be inefficient.
Thanks, Anton.