On 5/22/23 3:24 PM, Muhammad Usama Anjum wrote:
On 4/26/23 7:13 PM, Peter Xu wrote:
Hi, Muhammad,
On Wed, Apr 26, 2023 at 12:06:23PM +0500, Muhammad Usama Anjum wrote:
On 4/20/23 11:01 AM, Muhammad Usama Anjum wrote:
+/* Supported flags */ +#define PM_SCAN_OP_GET (1 << 0) +#define PM_SCAN_OP_WP (1 << 1)
We have only these flag options available in PAGEMAP_SCAN IOCTL. PM_SCAN_OP_GET must always be specified for this IOCTL. PM_SCAN_OP_WP can be specified as need. But PM_SCAN_OP_WP cannot be specified without PM_SCAN_OP_GET. (This was removed after you had asked me to not duplicate functionality which can be achieved by UFFDIO_WRITEPROTECT.)
- PM_SCAN_OP_GET | PM_SCAN_OP_WP
vs 2) UFFDIO_WRITEPROTECT
After removing the usage of uffd_wp_range() from PAGEMAP_SCAN IOCTL, we are getting really good performance which is comparable just like we are depending on SOFT_DIRTY flags in the PTE. But when we want to perform wp, PM_SCAN_OP_GET | PM_SCAN_OP_WP is more desirable than UFFDIO_WRITEPROTECT performance and behavior wise.
I've got the results from someone else that UFFDIO_WRITEPROTECT block pagefaults somehow which PAGEMAP_IOCTL doesn't. I still need to verify this as I don't have tests comparing them one-to-one.
What are your thoughts about it? Have you thought about making UFFDIO_WRITEPROTECT perform better?
I'm sorry to mention the word "performance" here. Actually we want better performance to emulate Windows syscall. That is why we are adding this functionality. So either we need to see what can be improved in UFFDIO_WRITEPROTECT or can I please add only PM_SCAN_OP_WP back in pagemap_ioctl?
I'm fine if you want to add it back if it works for you. Though before that, could you remind me why there can be a difference on performance?
I've looked at the code again and I think I've found something. Lets look at exact performance numbers:
I've run 2 different tests. In first test UFFDIO_WRITEPROTECT is being used for engaging WP. In second test PM_SCAN_OP_WP is being used. I've measured the average write time to the same memory which is being WP-ed and total time of execution of these APIs:
**avg write time:** | No of pages | 2000 | 8192 | 100000 | 500000 | |------------------------|------|------|--------|--------| | UFFDIO_WRITEPROTECT | 2200 | 2300 | 4100 | 4200 | | PM_SCAN_OP_WP | 2000 | 2300 | 2500 | 2800 |
**Execution time measured in rdtsc:** | No of pages | 2000 | 8192 | 100000 | 500000 | |------------------------|------|-------|--------|--------| | UFFDIO_WRITEPROTECT | 3200 | 14000 | 59000 | 58000 | | PM_SCAN_OP_WP | 1900 | 7000 | 38000 | 40000 |
Avg write time for UFFDIO_WRITEPROTECT is 1.3 times slow. The execution time is 1.5 times slower in the case of UFFDIO_WRITEPROTECT. So UFFDIO_WRITEPROTECT is making writes slower to the pages and execution time is also slower.
This proves that PM_SCAN_OP_WP is better than UFFDIO_WRITEPROTECT. Although PM_SCAN_OP_WP and UFFDIO_WRITEPROTECT have been implemented differently. We should have seen no difference in performance. But we have quite a lot of difference in performance here. PM_SCAN_OP_WP takes read mm lock, uses walk_page_range() to walk over pages which finds VMAs from address ranges to walk over them and pagemap_scan_pmd_entry() is handling most of the work including tlb flushing. UFFDIO_WRITEPROTECT is also taking the mm lock and iterating from all the different page directories until a pte is found and then flags are updated there and tlb is flushed for every pte.
My next deduction would be that we are getting worse performance as we are flushing tlb for one page at a time in case of UFFDIO_WRITEPROTECT. While we flush tlb for 512 pages (moslty) at a time in case of PM_SCAN_OP_WP. I've just verified this by adding some logs to the change_pte_range() and pagemap_scan_pmd_entry(). Logs are attached. I've allocated memory of 1000 pages and write-protected it with UFFDIO_WRITEPROTECT and PM_SCAN_OP_WP. The logs show that UFFDIO_WRITEPROTECT has flushed tlb 1000 times of size 1 page each time. While PM_SCAN_OP_WP has flushed only 3 times of bigger sizes. I've learned over my last experience that tlb flush is very expensive. Probably this is what we need to improve if we don't want to add PM_SCAN_OP_WP?
The UFFDIO_WRITEPROTECT uses change_pte_range() which is very generic function and I'm not sure if can try to not do tlb flushes if uffd_wp is true. We can try to do flush somewhere else and hopefully we should do only one flush if possible. It will not be so straight forward to move away from generic fundtion. Thoughts?
I've just tested this theory of not doing per pte flushes and only did one flush on entire range in uffd_wp_range(). But it didn't improve the situation either. I was wrong that tlb flushes may be the cause.
Thanks,