On Tue, Oct 09, 2018 at 04:25:10PM +0200, Michal Hocko wrote:
On Tue 09-10-18 14:00:34, Mel Gorman wrote:
On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote:
[Sorry for being slow in responding but I was mostly offline last few days]
On Tue 09-10-18 10:48:25, Mel Gorman wrote: [...]
This goes back to my point that the MADV_HUGEPAGE hint should not make promises about locality and that introducing MADV_LOCAL for specialised libraries may be more appropriate with the initial semantic being how it treats MADV_HUGEPAGE regions.
I agree with your other points and not going to repeat them. I am not sure madvise s the best API for the purpose though. We are talking about memory policy here and there is an existing api for that so I would _prefer_ to reuse it for this purpose.
I flip-flopped on that one in my head multiple times on the basis of how strict it should be. Memory policies tend to be black or white -- bind here, interleave there, etc. It wasn't clear to me what the best policy would be to describe "allocate local as best as you can but allow fallbacks if necessary".
MPOL_PREFERRED is not black and white. In fact I asked David earlier if MPOL_PREFERRED could check if it would already be a good fit for this. Still the point is it requires privilege (and for a good reason).
I was thinking about MPOL_NODE_PROXIMITY with the following semantic:
- try hard to allocate from a local or very close numa node(s) even when
that requires expensive operations like the memory reclaim/compaction before falling back to other more distant numa nodes.
If MPOL_PREFERRED can't work something like this could be added.
I think "madvise vs mbind" is more an issue of "no-permission vs permission" required. And if the processes ends up swapping out all other process with their memory already allocated in the node, I think some permission is correct to be required, in which case an mbind looks a better fit. MPOL_PREFERRED also looks a first candidate for investigation as it's already not black and white and allows spillover and may already do the right thing in fact if set on top of MADV_HUGEPAGE.
Thanks, Andrea