On Thu, 22 Jun 2023 at 12:21, Muhammad Usama Anjum usama.anjum@collabora.com wrote:
On 6/22/23 12:45 AM, Andrei Vagin wrote:
On Wed, Jun 21, 2023 at 11:34:54AM +0500, Muhammad Usama Anjum wrote:
On 6/20/23 11:03 PM, Andrei Vagin wrote:
[...]
should it be PM_SCAN_FOUND_MAX_PAGES? Otherwise, it fails the ioctl even if it has handled some pages already.
It is a double edge sword. If we don't return error, user will never know that he needs to specify more max_pages or his output buffer is small and not coverig the entire range. The real purpose here that user gets aware that he needs to specify full hugetlb range and found pages should cover the entire range as well.
but if PM_SCAN_OP_WP is set, this error will be fatal, because some pages can be marked write-protected, but they are not be reported to user-space.
I think the ioctl has to report back the end address of the handled region. It is like read and write syscalls that can return less data than requested.
This is good point. I'll abort the walk here instead of retuning the error to user.
It would be great if the ioctl returned the address the walk finished at. This would remove the special case for "buffer full after full page was added" and if it happens that despite `max_pages` limit was reached but no more pages would need to be added the caller would not have to call the ioctl again on the remaining range.
[...]
How long can it take to run this loop? Should we interrupt it if a signal has been queued?
I'm following mincore and pagemap_read here. There is no such thing there. IOCTL is performing what user has requested.
In case of pagemap, its buffer is limited by MAX_RW_COUNT (0x7ffff000), so it can handle maximum 0xffffe00 pages in one iteration. Do you have any limits for input parameters?
If the execution time is a concern, user should have called the IOCTL on shorter address range.
It doesn't work this way. There can be many users and signals has to be delivered in a reasonable time. One of the obvious examples when a signal has to be delivered asap is OOM.
This IOCTL is just like mincore, but with other flags and functionalities. Mincore doesn't put any limit like this. I don't think we should put any limit here as well.
I don't think we should treat mincore() as a good API example. Its interface has similar performance problems to what select() or poll() does. In this ioctl's case, we can limit the output at this end (large anyway) as it won't affect the performance if for x TiB of memory the call is made twice instead of once. (Returning the end of the walk reached would be much help here.)
Best Regards Michał Mirosław