Re: [PATCH v12 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs

11 Apr 2023

      On Fri, 7 Apr 2023 at 13:11, Muhammad Usama Anjum
usama.anjum@collabora.com wrote:
...
On 4/7/23 3:14 PM, Michał Mirosław wrote:
...
On Fri, 7 Apr 2023 at 12:04, Muhammad Usama Anjum
usama.anjum@collabora.com wrote:
...
On 4/7/23 12:34 PM, Michał Mirosław wrote:
...
On Thu, 6 Apr 2023 at 23:04, Muhammad Usama Anjum
usama.anjum@collabora.com wrote:
...
On 4/7/23 1:00 AM, Michał Mirosław wrote:
...
On Thu, 6 Apr 2023 at 19:58, Muhammad Usama Anjum
usama.anjum@collabora.com wrote:
[...]
...
...
...
...
>>> +       /*
>>> +        * Allocate smaller buffer to get output from inside the page walk
>>> +        * functions and walk page range in PAGEMAP_WALK_SIZE size chunks. As
>>> +        * we want to return output to user in compact form where no two
>>> +        * consecutive regions should be continuous and have the same flags.
>>> +        * So store the latest element in p.cur between different walks and
>>> +        * store the p.cur at the end of the walk to the user buffer.
>>> +        */
>>> +       p.vec = kmalloc_array(p.vec_len, sizeof(struct page_region),
>>> +                             GFP_KERNEL);
>>> +       if (!p.vec)
>>> +               return -ENOMEM;
>>> +
>>> +       walk_start = walk_end = start;
>>> +       while (walk_end < end && !ret) {
>>
>> The loop will stop if a previous iteration returned ENOSPC (and the
>> error will be lost) - is it intended?
> It is intentional. -ENOSPC means that the user buffer is full even though
> there was more memory to walk over. We don't treat this error. So when
> buffer gets full, we stop walking over further as user buffer has gotten
> full and return as success.
Thanks. What's the difference between -ENOSPC and
PM_SCAN_FOUND_MAX_PAGES? They seem to result in the same effect (code
flow).
-ENOSPC --> user buffer has been filled completely
PM_SCAN_FOUND_MAX_PAGES --> max_pages have been found, user buffer may
                            still have more space
What is the difference in code behaviour when those two cases are
compared? (I'd expect none.)
There is difference:
We add data to user buffer. If it succeeds with return code 0, we engage
the WP. If it succeeds with PM_SCAN_FOUND_MAX_PAGES, we still engage the
WP. But if we get -ENOSPC, we don't perform engage as the data wasn't added
to the user buffer.
Thanks! I see it now. I see a few more corner cases here:

If we did engage WP but fail to copy the vector we return -EFAULT

but the WP is already engaged. I'm not sure this is something worth
guarding against, but documenting that would be helpful I think.
Sure.
...

If uffd_wp_range() fails, but we have already processed pages

earlier, we should treat the error like ENOSPC and back out the failed
range (the earier changes would be lost otherwise).
Backing out is easier to do for hugepages. But for normal pages, we'll have
to write some code to find where the current data was added (in cur or in
vec) and back out from that. I'll have to write some more code to avoid the
side-effects as well.
If I read the code correctly, the last page should always be in `cur`
and on failure only a single page is needed to be backed out. Did I
miss something?
...
But aren't we going over-engineering here? Error occurred and we are trying
to keep the previously generated correct data and returning successfully
still to the user? I don't think we should do this. An error is error. We
should return the error simply even if the memory flags would get lost. We
don't know what caused the error in uffd_wp_range(). Under normal
situation, we there shouldn't have had error.
In this case it means that on (intermittent) allocation error we get
inconsistent or non-deterministic results. I wouldn't want to be the
one debugging this later - I'd prefer either the syscall be
"exception-safe" (give consistent and predictable output) or kill the
process.
Best Regards
Michał Mirosław

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH v12 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs