On Wed, Apr 23, 2025 at 08:55:51AM -0300, Jason Gunthorpe wrote:
On Wed, Apr 23, 2025 at 08:05:49AM +0000, Tian, Kevin wrote:
It's not a good idea having the kernel trust the VMM.
It certainly shouldn't trust it, but it can validate the VMM's choice and generate a failure if it isn't good.
Also I'm not sure the contiguity is guaranteed all the time with huge page (e.g. if just using THP).
If things are aligned then the contiguity will work out. Ie a 64K aligned allocation on a 2M GPA is fine. I don't think there are edge cases where a GPA will be fragmented. It does rely on the VMM always getting some kind of huge page and then pinning it in iommufd.
With QEMU that does ensure the alignment when using system huge pages, I haven't seen any edge problem yet.
IMHO this is bad HW design, but it is what it is..
btw does smmu only read the cmdq or also update some fields in the queue? If the latter, then it also brings a security hole as a malicious VMM could violate the contiguity requirement to instruct the smmu to touch pages which don't belong to it...
This really must be prevented. I haven't looked closely here, but the GPA -> PA mapping should go through the IOAS and that should generate a page list and that should be validated for contiguity.
It also needs to act like a mdev and lock down the part of the IOAS that provides that memory so the pin can't be released and UAF things.
If I capture this correctly, the GPA->PA mapping is already done at the IOAS level for the S2 HWPT/domain, i.e. pages are already pinned. So we just need to a pair of for-driver APIs to validate the contiguity and refcount pages calling iopt_area_add_access().
Thanks Nicolin