I also don't know how you treat things like folio_test_hugetlb() on possible assumptions that the VMA must be a hugetlb vma. I'd confess I didn't yet check the rest of the patchset yet - reading a large series without a git tree is sometimes challenging to me.
I'm thinking to basically never involve folio_test_hugetlb(), and the VMAs used by guest_memfd will also never be a HugeTLB VMA. That's because only the HugeTLB allocator is used, but by the time the folio is mapped to userspace, it would have already have been split. After the page is split, the folio loses its HugeTLB status. guest_memfd folios will never be mapped to userspace while they still have a HugeTLB status.
We absolutely must convert these hugetlb folios to non-hugetlb folios.
That is one of the reasons why I raised at LPC that we should focus on leaving hugetlb out of the picture and rather have a global pool, and the option to move folios from the global pool back and forth to hugetlb or to guest_memfd.
How exactly that would look like is TBD.
For the time being, I think we could add a "hack" to take hugetlb folios from hugetlb for our purposes, but we would absolutely have to convert them to non-hugetlb folios, especially when we split them to small folios and start using the mapcount. But it doesn't feel quite clean.
Simply starting with a separate global pool (e.g., boot-time allocation similar to as done by hugetlb, or CMA) might be cleaner, and a lot of stuff could be factored out from hugetlb code to achieve that.