On Mon, Sep 15, 2025 at 4:01 AM Nikita Kalyazin kalyazin@amazon.com wrote:
On 13/09/2025 01:32, Vishal Annapurve wrote:
On Fri, Sep 12, 2025 at 3:35 PM James Houghton jthoughton@google.com wrote:
if (folio_test_uptodate(folio)) {
folio_unlock(folio);
folio_put(folio);
return -ENOSPC;
Does it actually matter for the folio not to be uptodate? It seems unnecessarily restrictive not to be able to overwrite data if we're saying that this is only usable for unencrypted memory anyway.
In the context of direct map removal [1] it does actually because when we mark a folio as prepared, we remove it from the direct map making it inaccessible to the way write() performs the copy. It does not matter if direct map removal isn't enabled though. Do you think it should be conditional?
Oh, good point. It's simpler (both to implement and to describe) to disallow a second write() call in all cases (no matter if the direct map for the page has been removed or if the contents have been encrypted), so I'm all for leaving it unconditional like you have now. Thanks!
Are we deviating from the way read/write semantics work for the other filesystems? I don't think other filesystems carry this restriction of one-time-write only. Do we strictly need the differing semantics?
Yes, I believe we are deviating from other "regular" filesystems, but I don't think what we propose is too uncommon as in "special" filesystems such as sysfs subsequent calls to attributes like "remove" will likely fail as well (not due to up-to-date flag though).
Maybe it would be simpler to not overload uptodate flag and just not allow read/write if folio is not mapped in the direct map for non-conf VMs (assuming there could be other ways to deduce that information).
The only such interface I'm aware of is kernel_page_present() so the check may look like:
for (i = 0; i < folio_nr_pages(folio); i++) { struct page *page = folio_page(folio, i); if (!kernel_page_present(page)) { folio_unlock(folio); folio_put(folio); return -ENOSPC; } }
However, kernel_page_present() is not currently exported to modules.
I think it should be exposed if there is no cleaner way to deduce if a folio is mapped in the direct map. That being said, we should probably cleanly separate the series to add write population support and the series for removal from direct map [1] and figure out the order in which they need to be merged upstream. I would still vote for not overloading folio_test_uptodate() in either series.
Ackerley and Fuad are planning to post a series just for supporting in-place conversion for 4K pages which is going to introduce a maple tree for storing private/shared-ness of ranges. We could possibly augment the support to track directmap removal there. RFC version [2] is a good reference for now.
[1] https://lore.kernel.org/kvm/20250912091708.17502-1-roypat@amazon.co.uk/ [2] https://lore.kernel.org/kvm/d3832fd95a03aad562705872cbda5b3d248ca321.1747264...