On Mon, Apr 28, 2025 at 05:42:27PM +0530, Vasant Hegde wrote:
+/**
- struct iommu_vcmdq_alloc - ioctl(IOMMU_VCMDQ_ALLOC)
- @size: sizeof(struct iommu_vcmdq_alloc)
- @flags: Must be 0
- @viommu_id: Virtual IOMMU ID to associate the virtual command queue with
- @type: One of enum iommu_vcmdq_type
- @index: The logical index to the virtual command queue per virtual IOMMU, for
a multi-queue model
- @out_vcmdq_id: The ID of the new virtual command queue
- @addr: Base address of the queue memory in the guest physical address space
Sorry. I didn't get this part.
So here `addr` is command queue base address like
- NVIDIA's virtual command queue
- AMD vIOMMU's command buffer
.. and it will allocate vcmdq for each buffer type. Is that the correct understanding?
Yes. For AMD "vIOMMU", it needs a new type for iommufd vIOMMU: IOMMU_VIOMMU_TYPE_AMD_VIOMMU,
For AMD "vIOMMU" command buffer, it needs a new type too: IOMMU_VCMDQ_TYPE_AMD_VIOMMU, /* Kdoc it to be Command Buffer */
Then, use IOMMUFD_CMD_VIOMMU_ALLOC ioctl to allocate an vIOMMU obj, and use IOMMUFD_CMD_VCMDQ_ALLOC ioctl(s) to allocate vCMDQ objs.
In case of AMD vIOMMU, buffer base address is programmed in different register (ex: MMIO Offset 0008h Command Buffer Base Address Register) and buffer enable/disable is done via different register (ex: MMIO Offset 0018h IOMMU Control Register). And we need to communicate both to hypervisor. Not sure this API can accommodate this as addr seems to be mandatory.
NVIDIA's CMDQV has all three of them too. What we do here is to let VMM trap the buffer base address (in guest physical address space) and forward it to kernel using this @addr. Then, kernel will translate this @addr to host physical address space, and program the physical address and size to the register.
[1] https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/specifi...
Thanks for the doc. So, AMD has:
Command Buffer Base Address Register [MMIO Offset 0008h] "used to program the system physical base address and size of the command buffer. The command buffer occupies contiguous physical memory starting at the programmed base address, up to the programmed size." Command Buffer Head Pointer Register [MMIO Offset 2000h] Command Buffer Tail Pointer Register [MMIO Offset 2008h]
IIUIC, AMD should do the same: VMM traps VM's Command Buffer Base Address register when the guest kernel allocates a command buffer by programming the VM's Command Buffer Base Address register, to capture the guest PA and size. Then, VMM allocates a vCMDQ object (for this command buffer) forwarding its buffer address and size via @addr and @length to the host kernel. Then, the kernel should translate the guest PA to host PA to program the HW.
We can see that the Head/Tail registers are in a different MMIO page (offset by two 4K pages), which is very like NVIDIA CMDQV that allows VMM to mmap that MMIO page of the Head/Tail registers for guest OS to directly control the HW (i.e. VMM doesn't trap these two registers.
When guest OS wants to issue a new command, the guest kernel can just fill the guest command buffer at the entry that the Head register points to, and program the Tail register (backed by an mmap'd MMIO page), then the HW will read the programmed physical address from the entry (Head) till the entry (Tail) in the guest command buffer.
@@ -170,3 +170,97 @@ int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) iommufd_put_object(ucmd->ictx, &viommu->obj); return rc; }
+void iommufd_vcmdq_destroy(struct iommufd_object *obj) +{
I didn't understood destroy flow in general. Can you please help me to understand:
VMM is expected to track all buffers and call this interface? OR iommufd will take care of it? What happens if VM crashes ?
In a normal routine, VMM gets a vCMDQ object ID for each vCMDQ object it allocated. So, it should track all the IDs and release them when VM shuts down.
The iommufd core does track all the objects that belong to an iommufd context (ictx), and automatically release them. But, it can't resolve certain dependency on other FD, e.g. vEVENTQ and FAULT QUEUE would return another FD that user space listens to and must be closed properly to destroy the QUEUE object.
- /* The underlying physical pages must be pinned in the IOAS */
- rc = iopt_pin_pages(&viommu->hwpt->ioas->iopt, cmd->addr, cmd->length,
pages, 0);
Why do we need this? is it not pinned already as part of vfio binding?
I think this could be clearer: /* * The underlying physical pages must be pinned to prevent them from * being unmapped (via IOMMUFD_CMD_IOAS_UNMAP) during the life cycle * of the vCMDQ object. */
Thanks Nicolin