On Mon 29-08-22 12:40:05, Michal Hocko wrote:
On Sun 28-08-22 13:50:09, Yu Zhao wrote:
On Tue, Aug 23, 2022 at 2:36 AM Michal Hocko mhocko@suse.com wrote:
[...]
You cannot really make any assumptions about oom_reaper and how quickly it is going to free the memory.
Agreed. But here we are talking about heuristics, not dependencies on certain behaviors. Assume we are playing a guessing game: there are multiple mm_structs available for reclaim, would the oom-killed ones be more profitable on average? I'd say no, because I assume it's more likely than unlikely that the oom reaper is doing/to do its work. Note that the assumption is about likelihood, hence arguably valid.
Well, my main counter argument would be that we do not really want to carve last resort mechanism (which the oom reaper is) into any heuristic because any future changes into that mechanism will be much harder to justify and change. There is a cost of the maintenance that should be considered. While you might be right that this change would be beneficial, there is no actual proof of that. Historically we've had several examples of such a behavior which was really hard to change later on because the effect would be really hard to evaluate.
Forgot to mention the recent change as a clear example of the change which would be have a higher burden to evaluate. e4a38402c36e ("oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup") has changed the wake up logic to be triggered after a timeout. This means that the task will be sitting there on the queue without any actual reclaim done on it. The timeout itself can be changed in the future and I would really hate to argue that changeing it from $FOO to $FOO + epsilon breaks a very subtle dependency somewhere deep in the reclaim path. From the oom reaper POV any timeout is reasonable becaude this is the _last_ resort to resolve OOM stall/deadlock when the victim cannot exit on its own for whatever reason. This is a considerably different objective from "we want to optimize which taks to scan to reclaim efficiently".
See my point?