+cc David
On Fri, Mar 07, 2025 at 02:35:12PM +0000, Ryan Roberts wrote:
On 07/03/2025 13:59, Lorenzo Stoakes wrote:
On Fri, Mar 07, 2025 at 01:42:13PM +0000, Ryan Roberts wrote:
On 07/03/2025 13:04, Lorenzo Stoakes wrote:
On Fri, Mar 07, 2025 at 12:33:06PM +0000, Ryan Roberts wrote:
Instead of writing a pte directly into the table, use the set_pte_at() helper, which gives the arch visibility of the change.
In this instance we are guaranteed that the pte was originally none and is being modified to a not-present pte, so there was unlikely to be a bug in practice (at least not on arm64). But it's bad practice to write the page table memory directly without arch involvement.
Cc: stable@vger.kernel.org Fixes: 662df3e5c376 ("mm: madvise: implement lightweight guard page mechanism") Signed-off-by: Ryan Roberts ryan.roberts@arm.com
mm/madvise.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/madvise.c b/mm/madvise.c index 388dc289b5d1..6170f4acc14f 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1101,7 +1101,7 @@ static int guard_install_set_pte(unsigned long addr, unsigned long next, unsigned long *nr_pages = (unsigned long *)walk->private;
/* Simply install a PTE marker, this causes segfault on access. */
- *ptep = make_pte_marker(PTE_MARKER_GUARD);
- set_pte_at(walk->mm, addr, ptep, make_pte_marker(PTE_MARKER_GUARD));
I agree with you, but I think perhaps the arg name here is misleading :) If you look at mm/pagewalk.c and specifically, in walk_pte_range_inner():
if (ops->install_pte && pte_none(ptep_get(pte))) { pte_t new_pte; err = ops->install_pte(addr, addr + PAGE_SIZE, &new_pte, walk); if (err) break; set_pte_at(walk->mm, addr, pte, new_pte); ... }
So the ptep being assigned here is a stack value, new_pte, which we simply assign to, and _then_ the page walker code handles the set_pte_at() for us.
So we are indeed doing the right thing here, just in a different place :P
Ahh my bad. In that case, please ignore the patch.
But out of interest, why are you doing it like this? I find it a bit confusing as all the other ops (e.g. pte_entry()) work directly on the pgtable's pte without the intermediate.
In those cases it's read-only, the data's already there, you can just go ahead and manipulate it (and would expect to be able to do so).
It's certainly not read-only in general. Just having a quick look to verify, the very first callback I landed on was clear_refs_pte_range(), which implements .pmd_entry to clear the softdirty and access flags from a leaf pmd or from all the child ptes.
Yup sorry I misspoke, working some long hours atm so forgive me :) what I meant to say is that we either read or modify existing.
And yes users do do potentially crazy things and yada yada.
David and I have spoken quite a few times about implementing generic page table code that could help abstract a lot of things, and it feels like this logic could all be rejigged in some fashion as to prevent the kind of 'everybody does their own handler' logic.q
I guess I felt it was more _dangerous_ as you are establishing _new_ mappings here, with the page tables being constructed for you up to the PTE level.
And wanted to 'lock things down' somewhat.
But indeed, all this cries out for a need for a more generalised, robust interface that handles some of what the downstream users of this are doing.
When setting things are a little different, I'd rather not open up things to a user being able to do *whatever*, but rather limit to the smallest scope possible for installing the PTE.
Understandable, but personally I think it will lead to potential misunderstandings:
- it will get copy/pasted as an example of how to set a pte (which is wrong;
you have to use set_pte_at()/set_ptes()). There is currently only a single other case of direct dereferencing a pte to set it (in write_protect_page()).
Yeah, at least renaming the param could help, as 'ptep' implies you really do have a pointer to the page table entry.
If we didn't return an error we could just return the PTE value or something... hm.
- new users of .install_pte may assume (like I did) that the passed in ptep is
pointing to the pgtable and they will manipulate it with arch helpers. arm64 arch helpers all assume they are only ever passed pointers into pgtable memory. It will end horribly if that is not the case.
It will end very horribly indeed :P or perhaps with more of a fizzle than anticipated...
And also of course, it allows us to _mandate_ that set_pte_at() is used so we do the right thing re: arches :)
I could have named the parameter better though, in guard_install_pte_entry() would be better to have called it 'new_pte' or something.
I'd suggest at least describing this in the documentation in pagewalk.h. Or better yet, you could make the pte the return value for the function. Then it is clear because you have no pointer. You'd lose the error code but the only user of this currently can't fail anyway.
Haha and here you make the same point I did above... great minds :)
I mean yeah returning a pte would make it clearer what you're doing, but then it makes it different from every other callback... but this already is different :)
I do very much want the ability to return an error value to stop the walk (if you return >0 you can indicate to caller that a non-error stop occurred for instance, something I use on the reading side).
But we do need to improve this one way or another, at the very least the documentation/comments.
David - any thoughts?
I'm not necessarily against just making this consitent, but I like this property of us controlling what happens instead of just giving a pointer into the page table - the principle of exposing the least possible.
ANWYAY, I will add to my ever expanding whiteboard TODO list [literally the only todo that work for me] to look at this, will definitely improve docs at very least.
Anyway, just my 2 pence.
Your input is very much appreciated! Though, with inflation, I think we had better say 2 pounds... ;)
Thanks, Ryan
Cheers!
Thanks, Ryan
(*nr_pages)++;
return 0;
2.43.0
Thanks for looking at this by the way, obviously I appreciate your point in chasing up cases like this as endeavoured to do the right thing here, albeit abstracted away :)