RE: [PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl

23 May 2025

      ...
From: Vasant Hegde vasant.hegde@amd.com
Sent: Tuesday, May 20, 2025 4:39 PM
Hi Nicolin,
On 5/19/2025 11:44 PM, Nicolin Chen wrote:
...
On Mon, May 19, 2025 at 10:59:49PM +0530, Vasant Hegde wrote:
...
Jason, Nicolin, Kevin,
On 5/15/2025 9:36 PM, Jason Gunthorpe wrote:
...
On Thu, May 08, 2025 at 08:02:32PM -0700, Nicolin Chen wrote:
...
+/**

struct iommu_hw_queue_alloc - ioctl(IOMMU_HW_QUEUE_ALLOC)

@size: sizeof(struct iommu_hw_queue_alloc)

@flags: Must be 0

@viommu_id: Virtual IOMMU ID to associate the HW queue with

@type: One of enum iommu_hw_queue_type

@index: The logical index to the HW queue per virtual IOMMU for a

multi-queue
...
...
...
...

    model

@out_hw_queue_id: The ID of the new HW queue

@base_addr: Base address of the queue memory in guest physical

address space
...
...
...
...

@length: Length of the queue memory in the guest physical address

space
...
...
...
...

Allocate a HW queue object for a vIOMMU-specific HW-accelerated

queue, which
...
...
...
...

allows HW to access a guest queue memory described by

@base_addr and @length.
...
...
...
...

Upon success, the underlying physical pages of the guest queue

memory will be
...
...
...
...

pinned to prevent VMM from unmapping them in the IOAS until the

HW queue gets
...
...
...
...

destroyed.

Do we have way to make the pinning optional?
As I understand AMD's system the iommu HW itself translates the
base_addr through the S2 page table automatically, so it doesn't need
pinned memory and physical addresses but just the IOVA.
Correct. HW will translate GPA -> SPA automatically using below
information.
...
...
AMD IOMMU need special device ID to setup with  GPA -> SPA mapping
per VM.
...
...
and its programmed in VF Control BAR (VFCntlMMIO Offset
{16’b[GuestID],
...
...
6’b01_0000} Guest Miscellaneous Control Register). IOMMU HW will use
this
...
...
address for GPA to SPA translation for buffers like command buffer.
So HW will use Base address (GPA), head/tail pointer to get the offset
from
...
...
Base. Then it will use GPA -> SPA translation.
...
Perhaps for this reason the pinning should be done with a function
call from the driver?
We still need to make sure memory allocated for page is present in
memory so
...
...
that IOMMU HW can access it.
Pinning at the time of guest boot is enough here -OR- do we need to
increase
...
...
reference in queue_alloc() path ?
For NVIDIA's vCMDQ that reads host PA directly, pages should be
pinned once when stage 2 mappings are created for the guest RAM,
and iommu_hw_queue_alloc() should pin the pages again to prevent
the gPA from being unmapped in the stage 2 page table. Otherwise
it will be a security hole, as HW continues to read the unmapped
memory through physical address space.
I understand that AMD Command Buffer also needs the S2 mappings
to be present in order to work correctly. But what happens if a
queue memory that isn't pinned (or even gets unmapped)? Will it
raise a translation fault v.s. HW reading the unmapped memory?
If page is unmapped then stage 2 (Host page table) gets updated. IOMMU
will not
be able to find page and logs fault.
As long as the fault is contained only for the relevant queue, yes
we don't need another pinning from the driver.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

RE: [PATCH v4 11/23] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC ioctl