On 03.12.20 09:28, Michal Hocko wrote:
On Wed 02-12-20 21:22:36, David Hildenbrand wrote:
On 02.12.20 20:26, Minchan Kim wrote:
On Wed, Dec 02, 2020 at 07:51:07PM +0100, Michal Hocko wrote:
[...]
I am still not sure a specific flag is a good interface. Really can this be gfp_mask instead?
I am not strong(even, I did it with GFP_NORETRY) but David wanted to have special mode and I agreed when he mentioned ALLOC_CONTIG_HARD as one of options in future(it would be hard to indicate that mode with gfp flags).
I can't tell regarding the CMA interface, but for the alloc_contig() interface I think modes make sense. Yes, it's different to other allocaters, but the contig range allocater is different already. E.g., the CMA allocater mostly hides "which exact PFNs you try to allocate".
Yes, alloc_contig_range is a low level API but it already has a gfp_mask parameter. Adding yet another allocation mode sounds like API convolution to me.
Well, if another parameter is a concern, we can introduce
alloc_contig_range_fast()
instead.
In the contig range allocater, gfp flags are currently used to express how to allocate pages used as migration targets. I don't think mangling in other gfp flags (or even overloading them) makes things a lot clearer. E.g., GFP_NORETRY: don't retry to allocate migration targets? don't retry to migrate pages? both?
As I said, other aspects might be harder to model (e.g., don't drain LRU) and hiding them behind generic gfp flags (e.g., GFP_NORETRY) feels wrong.
With the mode, we're expressing details for the necessary page migration. Suggestions on how to model that are welcome.
The question is whether the caller should really have such an intimate knowledge and control of the alloc_contig_range implementation. This all are implementation details. Should really anybody think about how many times migration retries or control LRU draining? Those can change in the
The question is not "how many times", rather "if at all". I can understand the possible performance improvements by letting the caller handle things (lru draining, pcp draining) like that when issuing gazillions of alloc_contig_range() calls.
future and I do not think we really want to go over all users grown over that time and try to deduce what was the intention behind.
That's why I think we need a clear mechanism to express the expected behavior - something we can properly document and users can actually understand to optimize for - and we can fix them up when the documented behavior changes. Mangling this into somewhat-fitting gfp flags makes the interface harder to use and more error-prone IMHO.
I think we should aim at easy and very highlevel behavior:
- GFP_NOWAIT - unsupported currently IIRC but something that something that should be possible to implement. Isolation is non blocking, migration could be skipped
- GFP_KERNEL - default behavior whatever that means
- GFP_NORETRY - opportunistic allocation as lightweight as we can get. Failures to be expected also for transient reasons.
- GFP_RETRY_MAYFAIL - try hard but not as hard as to trigger disruption (e.g. via oom killer).
I think we currently see demand for 3 modes for alloc_contig_range()
a) normal
As is. Try, but don't try too hard. E.g., drain LRU, drain PCP, retry a couple of times. Failures in some cases (short-term pinning, PCP races) are still possible and acceptable.
GFP_RETRY_MAYFAIL ?
E.g., "Allocations with this flag may fail, but only when there is genuinely little unused memory." - current description does not match at all. When allocating ranges things behave completely different.
b) fast
Try, but fail fast. Leave optimizations that can improve the result to the caller. E.g., don't drain LRU, don't drain PCP, don't retry. Frequent failures are expected and acceptable.
__GFP_NORETRY ?
E.g., "The VM implementation will try only very lightweight memory direct reclaim to get some memory under memory pressure" - again, I think current description does not really match.
c) hard
Try hard, E.g., temporarily disabling the PCP. Certainly not __GFP_NOFAIL, that would be highly dangerous. So no flags / GFP_KERNEL?
- __GFP_THIS_NODE - stick to a node without fallback
- we can support zone modifiers although there is no existing user.
- __GFP_NOWARN - obvious
And that is it. Or maybe I am seeing that oversimplified.
Again, I think most flags make sense for the migration target allocation path and mainly deal with OOM situations and reclaim. For the migration path - which is specific to the alloc_contig_range() allocater - they don't really apply and create more confusion than they actually help - IMHO.