Hi, Axel,
On Wed, May 17, 2023 at 03:28:36PM -0700, Axel Rasmussen wrote:
I do plan a v2, if for no other reason than to update the documentation. Happy to add a cover letter with it as well.
+Jiaqi back to CC, this is one piece of a larger memory poisoning / recovery design Jiaqi is working on, so he may have some ideas why MADV_HWPOISON or MADV_PGER will or won't work.
One idea is, at least for our use case, we have to have the range be userfaultfd registered, because we need to intercept the first access and check at that point whether or not it should be poisoned. But, I think in principle a scheme like this could work:
- Intercept first access with UFFD
- Issue MADV_HWPOISON or MADV_PGERR or etc to put a pte denoting the
poisoned page in place 3. UFFDIO_WAKE to have the faulting thread retry, see the new entry, and SIGBUS
It's arguably slightly weird, since normally UFFD events are resolved with UFFDIO_* operations, but I don't see why it *couldn't* work.
Then again I am not super familiar with MADV_HWPOISON, I will have to do a bit of reading to understand if its semantics are the same (future accesses to this address get SIGBUS).
Yes, it'll be great if this can be checked up before sending v2. What you said match exactly what I was in mind. I hope it will already work, or we can always discuss what is missing.