Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas

18 May 2021


      On Tue 18-05-21 12:35:36, David Hildenbrand wrote:
...
On 18.05.21 12:31, Michal Hocko wrote:
...
On Tue 18-05-21 12:06:42, David Hildenbrand wrote:
...
On 18.05.21 11:59, Michal Hocko wrote:
...
On Sun 16-05-21 10:29:24, Mike Rapoport wrote:
...
On Fri, May 14, 2021 at 11:25:43AM +0200, David Hildenbrand wrote:
[...]
...
...
> +		if (!page)
> +			return VM_FAULT_OOM;
> +
> +		err = set_direct_map_invalid_noflush(page, 1);
> +		if (err) {
> +			put_page(page);
> +			return vmf_error(err);
Would we want to translate that to a proper VM_FAULT_..., which would most
probably be VM_FAULT_OOM when we fail to allocate a pagetable?
That's what vmf_error does, it translates -ESOMETHING to VM_FAULT_XYZ.
I haven't read through the rest but this has just caught my attention.
Is it really reasonable to trigger the oom killer when you cannot
invalidate the direct mapping. From a quick look at the code it is quite
unlikely to se ENOMEM from that path (it allocates small pages) but this
can become quite sublte over time. Shouldn't this simply SIGBUS if it
cannot manipulate the direct mapping regardless of the underlying reason
for that?
OTOH, it means our kernel zones are depleted, so we'd better reclaim somehow
...
Killing a userspace seems to be just a bad way around that.
Although I have to say openly that I am not a great fan of VM_FAULT_OOM
in general. It is usually a a wrong way to tell the handle the failure
because it happens outside of the allocation context so you lose all the
details (e.g. allocation constrains, numa policy etc.). Also whenever
there is ENOMEM then the allocation itself has already made sure that
all the reclaim attempts have been already depleted. Just consider an
allocation with GFP_NOWAIT/NO_RETRY or similar to fail and propagate
ENOMEM up the call stack. Turning that into the OOM killer sounds like a
bad idea to me.  But that is a more general topic. I have tried to bring
this up in the past but there was not much of an interest to fix it as
it was not a pressing problem...
I'm certainly interested; it would mean that we actually want to try
recovering from VM_FAULT_OOM in various cases, and as you state, we might
have to supply more information to make that work reliably.
Or maybe we want to get rid of VM_FAULT_OOM altogether... But this is
really tangent to this discussion. The only relation is that this would
be another place to check when somebody wants to go that direction.
...
Having that said, I guess what we have here is just the same as when our
process fails to allocate a generic page table in __handle_mm_fault(), when
we fail p4d_alloc() and friends ...
...
From a quick look it is really similar in a sense that it effectively never
happens and if it does then it certainly does the wrong thing. The point
I was trying to make is that there is likely no need to go that way.
Fundamentally, not being able to handle direct map for the page fault
sounds like what SIGBUS should be used for. From my POV it is similar to
ENOSPC when FS cannot allocate metadata on the storage.
-- 
Michal Hocko
SUSE Labs

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v19 5/8] mm: introduce memfd_secret system call to create "secret" memory areas