On Thu, Dec 14, 2023 at 2:31 AM Linus Torvalds torvalds@linux-foundation.org wrote:
On Wed, 13 Dec 2023 at 16:36, Jeff Xu jeffxu@google.com wrote:
IOW, when would you *ever* say "seal this area, but MADV_DONTNEED is ok"?
The MADV_DONTNEED is OK for file-backed mapping.
Right. It makes no semantic difference. So there's no point to it.
My point was that you added this magic flag for "not ok for RO anon mapping".
It's such a *completely* random flag, that I go "that's just crazy random - make sealing _always_ disallow that case".
So what I object to in this series is basically random small details that should just eb part of the basic act of sealing.
I think sealing should just mean "you can't do any operations that have semantic meaning for the mapping, because it is SEALED".
So I think sealing should automatically mean "can't do MADV_DONTNEED on anon memory", because that's basically equivalent to a munmap/remap operation.
In Chrome, we have a use case to allow MADV_DONTNEED on sealed memory. We have a pkey-tagged heap and code region for JIT code. The regions are writable by page permissions, but we use the pkey to control write access. These regions are mmapped at process startup and we want to seal them to ensure that the pkey and page permissions can't change. Since these regions are used for dynamic allocations, we still need a way to release unneeded resources, i.e. madvise(DONTNEED) unused pages on free().
AIUI, the madvise(DONTNEED) should effectively only change the content of anonymous pages, i.e. it's similar to a memset(0) in that case. That's why we added this special case: if you want to madvise(DONTNEED) an anonymous page, you should have write permissions to the page.
In our allocator, on free we can then release resources via: * allow pkey writes * madvise(DONTNEED) * disallow pkey writes