On Tue, Dec 02, 2025 at 11:50:31AM +0000, Nikita Kalyazin wrote:
It looks fine indeed, but it looks slightly weird then, as you'll have two ways to populate the page cache. Logically here atomicity is indeed not needed when you trap both MISSING + MINOR.
I reran the test based on the UFFDIO_COPY prototype I had using your series [2], and UFFDIO_COPY is slower than write() to populate 512 MiB: 237 vs 202 ms (+17%). Even though UFFDIO_COPY alone is functionally sufficient, I would prefer to have an option to use write() where possible and only falling back to UFFDIO_COPY for userspace faults to have better performance.
Yes, write() should be fine.
Especially to gmem, I guess write() support is needed when VMAs cannot be mapped at all in strict CoCo context, so it needs to be available one way or another.
IIUC it's because UFFDIO_COPY (or memcpy(), I recall you used to test that instead) will involve pgtable operations. So I wonder if the VMA mapping the gmem will still be accessed at some point later (either private->share convertable ones for device DMAs for CoCo, or fully shared non-CoCo use case), then the pgtable overhead will happen later for a write()-styled fault resolution.
From that POV, above number makes sense.
Thanks for the extra testing results.
[2] https://lore.kernel.org/all/7666ee96-6f09-4dc1-8cb2-002a2d2a29cf@amazon.com