Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot

3 Jun 2025

      On Tue, Jun 03, 2025 at 08:21:02PM +0200, Jann Horn wrote:
...
When fork() encounters possibly-pinned pages, those pages are immediately
copied instead of just marking PTEs to make CoW happen later. If the parent
is multithreaded, this can cause the child to see memory contents that are
inconsistent in multiple ways:

We are copying the contents of a page with a memcpy() while userspace
may be writing to it. This can cause the resulting data in the child to
be inconsistent.

This is an interesting problem, but we'll get to it later.
...

After we've copied this page, future writes to other pages may
continue to be visible to the child while future writes to this page are
no longer visible to the child.

Yes, and this is not fixable. It's also a problem for the regular write-protect
pte path where inevitably only a part of the address space will be write-protected.
This would only be fixable if e.g we suspended every thread on a multi-threaded fork.
...
This means the child could theoretically see incoherent states where
allocator freelists point to objects that are actually in use or stuff like
that. A mitigating factor is that, unless userspace already has a deadlock
bug, userspace can pretty much only observe such issues when fancy lockless
data structures are used (because if another thread was in the middle of
mutating data during fork() and the post-fork child tried to take the mutex
protecting that data, it might wait forever).
Ok, so the issue here is that atomics + memcpy (or our kernel variants) will
possibly observe tearing. This is indeed a problem, and POSIX doesn't _really_
tell us anything about this. _However_:
POSIX says:
...
Any locks held by any thread in the calling process that have been set to be process-shared
shall not be held by the child process. For locks held by any thread in the calling process
that have not been set to be process-shared, any attempt by the child process to perform
any operation on the lock results in undefined behavior (regardless of whether the calling
process is single-threaded or multi-threaded).
The interesting bit here is "For locks held by any thread [...] any attempt by
the child [...] results in UB". I don't think it's entirely far-fetched to say
the spirit of the law is that atomics may also be UB (just like a lock[1] that was
held by a separate thread, then unlocked mid-concurrent-fork is in a UB state).
In any way, I think the bottom-line is that fork memory snapshot coherency is
a fallacy. It's really impossible to reach without adding insane constraints
(like the aforementioned thread suspending + resume). It's not even possible
when going through normal write-protect paths that have been conceptually stable since
the BSDs in the 1980s (due to the write-protect-a-page-at-a-time-problem).
Thus, personally I don't think this is worth fixing.
[1] This (at least in theory) covers every lock, so it also encompasses pthread spinlocks
-- 
Pedro

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot