On Thu, Aug 10, 2023 at 10:14:37AM -0700, Nicolin Chen wrote:
On Thu, Aug 10, 2023 at 12:57:04PM -0300, Jason Gunthorpe wrote:
On Thu, Aug 10, 2023 at 02:49:59AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Thursday, August 10, 2023 4:17 AM
On Wed, Aug 09, 2023 at 04:19:01PM -0300, Jason Gunthorpe wrote:
On Wed, Aug 09, 2023 at 12:12:25PM -0700, Nicolin Chen wrote:
On Wed, Aug 09, 2023 at 01:24:56PM -0300, Jason Gunthorpe wrote: > Similarly for managing the array of invalidation commands.
You mean an embedded uptr inside a driver user data struct right? Sure, that should go through the new helper too.
If we are committed that all drivers have to process an array then put the array in the top level struct and pass it in the same user_data struct and use another helper to allow the driver to iterate through it.
I see. Both VTD and SMMU pass uptr to the arrays of invalidation commands/requests. The only difference is that SMMU's array is a ring buffer other than a plain one indexing from the beginning. But the helper could take two index inputs, which should work for VTD case too. If another IOMMU driver only supports one request, rather than a array of requests, we can treat that as a single- entry array.
I like this approach.
Do we need to worry about the ring wrap around? It is already the case that the VMM has to scan the ring and extract the invalidation commands, wouldn't it already just linearize them?
I haven't got the chance to send the latest vSMMU series but I pass down the raw user CMDQ to the host to go through, as it'd be easier to stall the consumer index movement when a command in the middle fails.
Don't some commands have to be executed by the VMM?
Even so, it seems straightforward enough for the kernel to report the number of commands it executed and the VMM can adjust the virtual consumer index.
Is there a use case for invaliation only SW emulated rings, and do we care about optimizing for the wrap around case?
Hmm, why a SW emulated ring?
That is what you are building. The VMM catches the write of the producer pointer and the VMM SW bundles it up to call into the kernel.
Yes for the latter question. SMMU kernel driver has something like Q_WRP and other helpers, so it wasn't difficult to process the user CMDQ in the same raw form. But it does complicates the common code if we want to do it there.
Optimizing wrap around means when the producer/consumer pointers pass the end of the queue memory we execute one, not two ioctls toward the kernel. That is possible a very minor optimization, it depends how big the queues are and how frequent multi-entry items will be present.
Jaso