From: Vasant Hegde vasant.hegde@amd.com Sent: Tuesday, May 20, 2025 4:39 PM
Hi Nicolin,
On 5/19/2025 11:44 PM, Nicolin Chen wrote:
On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
Jason, Nicolin, Kevin,
On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
+/**
- struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)
- @size: sizeof(struct iommu_hw_queue_alloc)
- @flags: Must be 0
- @viommu_id: Virtual IOMMU ID to associate the HW queue with
- @type: One of enum iommu_hw_queue_type
- @index: The logical index to the HW queue per virtual IOMMU for a
multi-queue
model
- @out_hw_queue_id: The ID of the new HW queue
- @base_addr: Base address of the queue memory in guest physical
address space
- @length: Length of the queue memory in the guest physical address
space
- Allocate a HW queue object for a vIOMMU-specific HW-accelerated
queue, which
- allows HW to access a guest queue memory described by
@base_addr and @length.
- Upon success, the underlying physical pages of the guest queue
memory will be
- pinned to prevent VMM from unmapping them in the IOAS until the
HW queue gets
- destroyed.
Do we have way to make the pinning optional?
As I understand AMD's system the iommu HW itself translates the base_addr through the S2 page table automatically, so it doesn't need pinned memory and physical addresses but just the IOVA.
Correct. HW will translate GPA -> SPA automatically using below
information.
AMD IOMMU need special device ID to setup with GPA -> SPA mapping
per VM.
and its programmed in VF Control BAR (VFCntlMMIO Offset
{16’b[GuestID],
6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use
this
address for GPA to SPA translation for buffers like command buffer.
So HW will use Base address (GPA), head/tail pointer to get the offset
from
Base. Then it will use GPA -> SPA translation.
Perhaps for this reason the pinning should be done with a function call from the driver?
We still need to make sure memory allocated for page is present in
memory so
that IOMMU HW can access it.
Pinning at the time of guest boot is enough here -OR- do we need to
increase
reference in queue_alloc() path ?
For NVIDIA's vCMDQ that reads host PA directly, pages should be pinned once when stage 2 mappings are created for the guest RAM, and iommu_hw_queue_alloc() should pin the pages again to prevent the gPA from being unmapped in the stage 2 page table. Otherwise it will be a security hole, as HW continues to read the unmapped memory through physical address space.
I understand that AMD Command Buffer also needs the S2 mappings to be present in order to work correctly. But what happens if a queue memory that isn't pinned (or even gets unmapped)? Will it raise a translation fault v.s. HW reading the unmapped memory?
If page is unmapped then stage 2 (Host page table) gets updated. IOMMU will not be able to find page and logs fault.
As long as the fault is contained only for the relevant queue, yes we don't need another pinning from the driver.