On Wed, Dec 23, 2020 at 1:39 PM Andrea Arcangeli aarcange@redhat.com wrote:
On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote:
Thanks for the details.
I hope we can find a way put the page_mapcount back where there's a page_count right now.
I really don't think that's ever going to happen - at least if you're talking about do_wp_page().
I refuse to make the *simple* core operations VM have to jump through hoops, and the old COW mapcount logic was that. I *much* prefer the newer COW code, because the rules are more straightforward.
page_count is far from optimal, but it is a feature it finally allowed us to notice that various code (clear_refs_write included apparently even after the fix) leaves stale too permissive TLB entries when it shouldn't.
I absolutely agree that page_count isn't exactly optimal, but "mapcount" is just so much worse.
page_count() is at least _logical_, and has a very clear meaning: "this page has other users". mapcount() means something else, and is simply not sufficient or relevant wrt COW.
That doesn't mean that page_mapcount() is wrong - it's just that it's wrong for COW. page_mapcount() is great for rmap, so that we can see when we need to shoot down a memory mapping of a page that gets released (truncate being the classic example).
I think that the mapcount games we used to have were horrible. I absolutely much prefer where we are now wrt COW.
The modern rules for COW handling are:
- if we got a COW fault there might be another user, we copy (and this is where the page count makes so much logical sense).
- if somebody needs to pin the page in the VM, we either make sure that it is pre-COWed and we
(a) either never turn it back into a COW page (ie the fork()-time stuff we have for pinned pages)
(b) or there is some explicit marker on the page or in the page table (ie the userfaultfd_pte_wp thing).
those are _so_ much more straightforward than the very complex rules we used to have that were actively buggy, in addition to requiring the page lock. So they were buggy and slow.
And yes, I had forgotten about that userfaultfd_pte_wp() because I was being myopic and only looking at wp_page_copy(). So using that as a way to make sure that a page doesn't go through COW is a good way to avoid the COW race, but I think that thing requires a bit in the page table which might be a problem on some architectures?
Linus