On Fri, Oct 20, 2023 at 3:02 AM David Hildenbrand david@redhat.com wrote:
On 19.10.23 21:53, Peter Xu wrote:
On Thu, Oct 19, 2023 at 05:41:01PM +0200, David Hildenbrand wrote:
That's not my main point. It can easily become a maintenance burden without any real use cases yet that we are willing to support.
That's why I requested a few times that we can discuss the complexity of cross-mm support already here, and I'm all ears if I missed something on the "maintenance burden" part..
I started by listing what I think might be different, and we can easily speedup single-mm with things like "if (ctx->mm != mm)" checks with e.g. memcg, just like what this patch already did with pgtable depositions.
We keep saying "maintenance burden" but we refuse to discuss what is that..
Let's recap
(1) We have person A up-streaming code written by person B, whereby B is not involved in the discussions nor seems to be active to maintain that code.
Worse, the code that is getting up-streamed was originally based on a different kernel version that has significant differences in some key areas -- for example, page pinning, exclusive vs. shared.
I claim that nobody here fully understands the code at hand (just look at the previous discussions), and reviewers have to sort out the mess that was created by the very way this stuff is getting upstreamed here.
We're already struggling to get the single-mm case working correctly.
(2) Cross-mm was not even announced anywhere nor mentioned which use it would have; I had to stumble over this while digging through the code. Further, is it even *tested*? AFAIKS in patch #3 no. Why do we have to make the life of reviewers harder by forcing them to review code that currently *nobody* on this earth needs?
(3) You said "What else we can benefit from single mm? One less mmap read lock, but probably that's all we can get;" and I presented two non-obvious issues. I did not even look any further because I really have better things to do than review complicated code without real use cases at hand. As I said "maybe that works as expected, I don't know and I have no time to spare on reviewing features with no real use cases)"; apparently I was right by just guessing that memcg handling is missing.
The sub-feature in question (cross-mm) has no solid use cases; at this point I am not even convinced the use case you raised requires *userfaultfd*; for the purpose of moving a whole VMA worth of pages between two processes; I don't see the immediate need to get userfaultfd involved and move individual pages under page lock etc.
You make a compelling case against cross-mm support. While I can't force Andrea to participate in upstreaming nor do I have his background, keeping it simple, as you requested, is doable. That's what I plan on doing by splitting the patch and I think we all agreed to that. I'll also see if I can easily add a separate patch to test cross-mm support. I do apologize for the extra effort required from reviewers to cover for the gaps in my patches. I'm doing my best to minimize that and I really appreciate your time.
I'll leave that to Suren and Lokesh to decide. For me the worst case is one more flag which might be confusing, which is not the end of the world.. Suren, you may need to work more thoroughly to remove cross-mm implications if so, just like when renaming REMAP to MOVE.
I'm asking myself why you are pushing so hard to include complexity "just because we can"; doesn't make any sense to me, honestly.
Maybe you have some other real use cases that ultimately require userfaultfd for cross-mm that you cannot share?
Will the world end when we have to use a separate flag so we can open this pandora's box when really required?
Again, moving anon pages within a process is a known thing; we do that already via mremap; the only difference here really is, that we have to get the rmap right because we don't adjust VMAs. It's a shame we don't try to combine both code paths, maybe it's not easily possible like we did with mprotect vs. uffd-wp.
That's a good point. With cross-mm support baked in, the overlap was not obvious to me. I'll see how much we can reuse from the mremap path.
Moving anon pages between process is currently only done via COW, where all things (page pinning, memcg, ...) have been figured out and are simply working as expected. Making uffd special by coding-up their own thing does not sound compelling to me.
I am clearly against any unwarranted features+complexity. Again, I will stop arguing further, the whole thing of "include it just because we can" to avoid a flag (that we might never even see) doesn't make any sense to me and likely never will.
The whole way this feature is getting upstreamed is just messed up IMHO and I the reasoning used in this thread to stick as-close-as-possible to some code person B wrote some years ago (e.g., naming, sub-features) is far out of my comprehension.
I don't think staying as-close-as-possible to the original version was the way I was driving this so far. At least that was not my conscious intention. I'm open to further suggestions whenever it makes sense to deviate from it. Thanks, Suren.
-- Cheers,
David / dhildenb