On Tue, Oct 11, 2022 at 6:52 AM Andrei Vagin avagin@gmail.com wrote:
On Mon, Oct 03, 2022 at 04:21:22PM +0500, Muhammad Usama Anjum wrote:
On 9/28/22 10:24 PM, Andrei Vagin wrote:
On Wed, Sep 21, 2022 at 11:26 AM Muhammad Usama Anjum usama.anjum@collabora.com wrote:
Hi,
Thank you for reviewing.
On 9/19/22 7:58 PM, Andrei Vagin wrote:
This ioctl can be used by the CRIU project and other applications which require soft-dirty PTE bit information. The following operations are supported in this ioctl:
- Get the pages that are soft-dirty.
I think this interface doesn't have to be limited by the soft-dirty bits only. For example, CRIU needs to know whether file, present and swap bits are set or not.
These operations can be performed by pagemap procfs file. Definitely performing them through IOCTL will be faster. But I'm trying to add a simple IOCTL by which some specific PTE bit can be read and cleared atomically. This IOCTL can be extended to include other bits like file, present and swap bits by keeping the interface simple. The following mask advice is nice. But if we add that kind of masking, it'll start to look like a filter on top of pagemap. My intention is to not duplicate the functionality already provided by the pagemap. One may ask, then why am I adding "get the soft-dirty pages" functionality? I'm adding it to complement the get and clear operation. The "get" and "get and clear" operations with special flag (PAGEMAP_SD_NO_REUSED_REGIONS) can give results quicker by not splitting the VMAs.
This simple interface is good only for a limited number of use-cases. The interface that I suggest doesn't duplicate more code than this one, but it is much more universal. It will be a big mess if you add a separate API for each specific use-case.
I mean we should be able to specify for what pages we need to get info for. An ioctl argument can have these four fields:
- required bits (rmask & mask == mask) - all bits from this mask have to be set.
- any of these bits (amask & mask != 0) - any of these bits is set.
- exclude masks (emask & mask == 0) = none of these bits are set.
- return mask - bits that have to be reported to user.
The required mask (rmask) makes sense to me. At the moment, I only know about the practical use case for the required mask. Can you share how can any and exclude masks help for the CRIU?
I looked at should_dump_page in the CRIU code: https://github.com/checkpoint-restore/criu/blob/45641ab26d7bb78706a6215fdef8...
When CRIU dumps file private mappings, it needs to get pages that have PME_PRESENT or PME_SWAP but don't have PME_FILE.
I would really like to see the mask discussed will be adopted. With it CRIU will be able to migrate huge sparse VMAs assuming that a single hole is processed in O(1) time.
Use cases for migrating sparse VMAs are binaries sanitized with ASAN, MSAN or TSAN [1]. All of these sanitizers produce sparse mappings of shadow memory [2]. Being able to migrate such binaries allows to highly reduce the amount of work needed to identify and fix post-migration crashes, which happen constantly.
- Clear the pages which are soft-dirty.
- The optional flag to ignore the VM_SOFTDIRTY and only track per page
soft-dirty PTE bit
There are two decisions which have been taken about how to get the output from the syscall.
- Return offsets of the pages from the start in the vec
We can conside to return regions that contains pages with the same set of bits.
struct page_region { void *start; long size; u64 bitmap; }
And ioctl returns arrays of page_region-s. I believe it will be more compact form for many cases.
Thank you for mentioning this. I'd considered this while development. But I gave up and used the simple array to return the offsets of the pages as in the problem I'm trying to solve, the dirty pages may be present amid non-dirty pages. The range may not be useful in that case.
This is a good example. If we expect more than two consequent pages on average, the "region" interface looks more prefered. I don't know your use-case, but in the case of CRIU, this assumption looks reasonable.
Plus one for page_region data structure. It will make ASAN shadow memory representation much more compact as well as any other classical use-case.
[1] https://github.com/google/sanitizers [2] https://github.com/google/sanitizers/wiki/AddressSanitizerAlgorithm#64-bit
Best, Danylo