On 23.10.24 11:18, Lorenzo Stoakes wrote:
On Wed, Oct 23, 2024 at 11:13:47AM +0200, David Hildenbrand wrote:
On 23.10.24 11:06, Vlastimil Babka wrote:
On 10/23/24 10:56, Dmitry Vyukov wrote:
Overall while I sympathise with this, it feels dangerous and a pretty major change, because there'll be something somewhere that will break because it expects faults to be swallowed that we no longer do swallow.
So I'd say it'd be something we should defer, but of course it's a highly user-facing change so how easy that would be I don't know.
But I definitely don't think a 'introduce the ability to do cheap PROT_NONE guards' series is the place to also fundmentally change how user access page faults are handled within the kernel :)
Will delivering signals on kernel access be a backwards compatible change? Or will we need a different API? MADV_GUARD_POISON_KERNEL? It's just somewhat painful to detect/update all userspace if we add this feature in future. Can we say signal delivery on kernel accesses is unspecified?
Would adding signal delivery to guard PTEs only help enough the ASAN etc usecase? Wouldn't it be instead possible to add some prctl to opt-in the whole ASANized process to deliver all existing segfaults as signals instead of -EFAULT ?
Not sure if it is an "instead", you might have to deliver the signal in addition to letting the syscall fail (not that I would be an expert on signal delivery :D ).
prctl sounds better, or some way to configure the behavior on VMA ranges; otherwise we would need yet another marker, which is not the end of the world but would make it slightly more confusing.
Yeah prctl() sounds sensible, and since we are explicitly adding a marker for guard pages here we can do this as a follow up too without breaking any userland expectations, i.e. 'new feature to make guard pages signal' is not going to contradict the default behaviour.
So all makes sense to me, but I do think best as a follow up! :)
Yeah, fully agreed. And my gut feeling is that it might not be that easy ... :)
In the end, what we want is *some* notification that a guard PTE was accessed. Likely the notification must not necessarily completely synchronous (although it would be ideal) and it must not be a signal.
Maybe having a different way to obtain that information from user space would work.