On 03.12.20 12:47, Michal Hocko wrote:
On Thu 03-12-20 10:47:02, David Hildenbrand wrote:
On 03.12.20 09:28, Michal Hocko wrote:
[...]
I think we should aim at easy and very highlevel behavior:
- GFP_NOWAIT - unsupported currently IIRC but something that something that should be possible to implement. Isolation is non blocking, migration could be skipped
- GFP_KERNEL - default behavior whatever that means
- GFP_NORETRY - opportunistic allocation as lightweight as we can get. Failures to be expected also for transient reasons.
- GFP_RETRY_MAYFAIL - try hard but not as hard as to trigger disruption (e.g. via oom killer).
I think we currently see demand for 3 modes for alloc_contig_range()
a) normal
As is. Try, but don't try too hard. E.g., drain LRU, drain PCP, retry a couple of times. Failures in some cases (short-term pinning, PCP races) are still possible and acceptable.
GFP_RETRY_MAYFAIL ?
normal shouldn't really require anybody to think about gfp flags hard. That to most people really means GFP_KERNEL.
E.g., "Allocations with this flag may fail, but only when there is genuinely little unused memory." - current description does not match at all. When allocating ranges things behave completely different.
b) fast
Try, but fail fast. Leave optimizations that can improve the result to the caller. E.g., don't drain LRU, don't drain PCP, don't retry. Frequent failures are expected and acceptable.
__GFP_NORETRY ?
E.g., "The VM implementation will try only very lightweight memory direct reclaim to get some memory under memory pressure" - again, I think current description does not really match.
Agreed. As mentioned above this would be an opportunistic allocation mode.
c) hard
Try hard, E.g., temporarily disabling the PCP. Certainly not __GFP_NOFAIL, that would be highly dangerous. So no flags / GFP_KERNEL?
NOFAIL semantic is out of question. Should we have a mode to try harder than the default? I dunno. Do we have users? I think RETRY_MAYFAIL is a middle ground between the default and NORETRY which is just too easy to fail. This is the case for the allocator as well. And from what I have seen people are already using MAYFAIL in order to prevent oom killer so this is a generally recognized pattern.
virtio-mem might be one user. It might first try in normal mode to get as much memory out as possible, but switch to hard mode when it might make sense.
- __GFP_THIS_NODE - stick to a node without fallback
- we can support zone modifiers although there is no existing user.
- __GFP_NOWARN - obvious
And that is it. Or maybe I am seeing that oversimplified.
Again, I think most flags make sense for the migration target allocation path and mainly deal with OOM situations and reclaim. For the migration path - which is specific to the alloc_contig_range() allocater - they don't really apply and create more confusion than they actually help - IMHO.
Migration is really an implementation detail of this interface. You shouldn't be even thinking that there is a migration underneath not even mention to actually trying to control it.
CMA? I tend to agree. alloc_contig_range? I disagree.