On 8/11/23 6:32 PM, Michał Mirosław wrote:
On Fri, Aug 11, 2023 at 05:02:44PM +0500, Muhammad Usama Anjum wrote:
On 8/10/23 10:32 PM, Michał Mirosław wrote:
On Wed, 9 Aug 2023 at 08:16, Muhammad Usama Anjum usama.anjum@collabora.com wrote:
[...]
--- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c
[...]
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static unsigned long pagemap_thp_category(pmd_t pmd) +{
unsigned long categories = PAGE_IS_HUGE;
if (pmd_present(pmd)) {
categories |= PAGE_IS_PRESENT;
if (!pmd_uffd_wp(pmd))
categories |= PAGE_IS_WRITTEN;
if (is_zero_pfn(pmd_pfn(pmd)))
categories |= PAGE_IS_PFNZERO;
} else if (is_swap_pmd(pmd)) {
categories |= PAGE_IS_SWAPPED;
if (!pmd_swp_uffd_wp(pmd))
categories |= PAGE_IS_WRITTEN;
}
return categories;
+}
I guess THPs can't be file-backed currently, but can we somehow mark this assumption so it can be easily found if the capability arrives?
Yeah, THPs cannot be file backed. Lets not care for some feature which may not arrive in several years or eternity.
Yes, it might not arrive. But please add at least a comment, so that it is clearly visible that lack if PAGE_IS_FILE here is intentional.
I'll add a comment.
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+#ifdef CONFIG_HUGETLB_PAGE +static unsigned long pagemap_hugetlb_category(pte_t pte) +{
unsigned long categories = PAGE_IS_HUGE;
if (pte_present(pte)) {
categories |= PAGE_IS_PRESENT;
if (!huge_pte_uffd_wp(pte))
categories |= PAGE_IS_WRITTEN;
if (!PageAnon(pte_page(pte)))
categories |= PAGE_IS_FILE;
if (is_zero_pfn(pte_pfn(pte)))
categories |= PAGE_IS_PFNZERO;
} else if (is_swap_pte(pte)) {
categories |= PAGE_IS_SWAPPED;
if (!pte_swp_uffd_wp_any(pte))
categories |= PAGE_IS_WRITTEN;
}
BTW, can a HugeTLB page be file-backed and swapped out?
Accourding to pagemap_hugetlb_range(), file-backed HugeTLB page cannot be swapped.
Here too a comment that leaving out this case is intentional would be useful.
Sure.
[...]
walk_start = p.arg.start;
for (; walk_start < p.arg.end; walk_start = p.arg.walk_end) {
[...[
ret = mmap_read_lock_killable(mm);
if (ret)
break;
ret = walk_page_range(mm, walk_start, p.arg.end,
&pagemap_scan_ops, &p);
mmap_read_unlock(mm);
[...]
if (ret != -ENOSPC || p.arg.vec_len - 1 == 0 ||
p.found_pages == p.arg.max_pages)
break;
The second condition is equivalent to `p.arg.vec_len == 1`, but why is that an ending condition? Isn't the last entry enough to gather one more range? (The walk could have returned -ENOSPC due to buffer full and after flushing it could continue with the last free entry.)
Now we are walking the entire range walk_page_range(). We don't break loop when we get -ENOSPC as this error may only mean that the temporary buffer is full. So we need check if max pages have been found or output buffer is full or ret is 0 or any other error. When p.arg.vec_len = 1 is end condition as the last entry is in cur. As we have walked over the entire range, cur must be full after which the walk returned.
So current condition is necessary. I've double checked it. I'll change it to `p.arg.vec_len == 1`.
If we have walked the whole range, then the loop will end anyway due to `walk_start < walk_end` not held in the `for()`'s condition.
Sorry, for not explaining to-the-point. Why would we walk the entire range when we should recognize that the output buffer is full and break the loop?
I've test cases written for this case. If I remove `p.arg.vec_len == 1` check, there is infinite loop for walking. So we are doing correct thing here.
[...]
+/*
- struct pm_scan_arg - Pagemap ioctl argument
- @size: Size of the structure
- @flags: Flags for the IOCTL
- @start: Starting address of the region
- @end: Ending address of the region
- @walk_end Address where the scan stopped (written by kernel).
walk_end == end informs that the scan completed on entire range.
Can we ensure this holds also for the tagged pointers?
No, we cannot.
So this need explanation in the comment here. (Though I'd still like to know how the address tags are supposed to be used from someone that knows them.)
I've looked at some documentations (presentations/talks) about tags. Tags is more like userspace feature. Kernel should just ignore them for our use case. I'll add comment.
Best Regards Michał Mirosław