On Tue, Jul 17, 2018 at 09:00:53AM +0000, Michal Hocko wrote:
On Mon 16-07-18 23:38:46, Kirill A. Shutemov wrote:
On Mon, Jul 16, 2018 at 07:40:42PM +0200, Michal Hocko wrote:
On Mon 16-07-18 17:47:39, Kirill A. Shutemov wrote:
On Mon, Jul 16, 2018 at 04:22:45PM +0200, Michal Hocko wrote:
On Mon 16-07-18 17:04:41, Kirill A. Shutemov wrote:
On Mon, Jul 16, 2018 at 01:30:28PM +0000, Michal Hocko wrote: > On Tue 10-07-18 13:48:58, Andrew Morton wrote: > > On Tue, 10 Jul 2018 16:48:20 +0300 "Kirill A. Shutemov" kirill.shutemov@linux.intel.com wrote: > > > > > vma_is_anonymous() relies on ->vm_ops being NULL to detect anonymous > > > VMA. This is unreliable as ->mmap may not set ->vm_ops. > > > > > > False-positive vma_is_anonymous() may lead to crashes: > > > > > > ... > > > > > > This can be fixed by assigning anonymous VMAs own vm_ops and not relying > > > on it being NULL. > > > > > > If ->mmap() failed to set ->vm_ops, mmap_region() will set it to > > > dummy_vm_ops. This way we will have non-NULL ->vm_ops for all VMAs. > > > > Is there a smaller, simpler fix which we can use for backporting > > purposes and save the larger rework for development kernels? > > Why cannot we simply keep anon vma with null vm_ops and set dummy_vm_ops > for all users who do not initialize it in their mmap callbacks? > Basically have a sanity check&fixup in call_mmap?
As I said, there's a corner case of MAP_PRIVATE of /dev/zero.
This is really creative. I really didn't think about that. I am wondering whether this really has to be handled as a private anonymous mapping implicitly. Why does vma_is_anonymous has to succeed for these mappings? Why cannot we simply handle it as any other file backed PRIVATE mapping?
Because it's established way to create anonymous mappings in Linux. And we cannot break the semantics.
How exactly would semantic break? You would still get zero pages on read faults and anonymous pages on CoW. So basically the same thing as for any other file backed MAP_PRIVATE mapping.
You are wrong about zero page.
Well, if we redirect ->fault to do_anonymous_page and
Yeah. And it will make write fault to allocate *two* pages. One in do_anonymous_page() and one in do_cow_fault(). Just no.
We have a reason why anon VMAs handled separately. It's possible to unify them, but it requires substantial ground work.
And you won't get THP.
huge_fault to do_huge_pmd_anonymous_page then we should emulate the standard anonymous mapping.
And I'm sure there's more differences. Just grep for vma_is_anonymous().
I am sorry to push on this but if we have one odd case I would rather handle it and have a simple _rule_ that every mmap provide _has_ to provide vm_ops and have a trivial fix up at a single place rather than patch a subtle placeholders you were proposing.
I will not insist of course but this looks less fragile to me.
You propose quite a big redesign on how we handle anonymous VMAs. Feel free to propose the patch(set). But I don't think it would fly for stable@.