On Fri, Jun 16, 2023 at 12:32:32PM +0100, Jean-Philippe Brucker wrote:
We might need to revisit supporting stop markers: request that each device driver declares whether their device uses stop markers on unbind() ("This mechanism must indicate that a Stop Marker Message will be generated." says the spec, but doesn't say if the function always uses one or the other mechanism so it's per-unbind). Then we still have to synchronize unbind() with the fault handler to deal with the pending stop marker, which might have already gone through or be generated later.
An explicit API to wait for the stop marker makes sense, with the expectation that well behaved devices will generate it and well behaved drivers will wait for it.
Things like VFIO should have a way to barrier/drain the PRI queue after issuing FLR. ie the VMM processing FLR should also barrier the real HW queues and flush them to VM visibility.
with stop markers, the host needs to flush the PRI queue when a PASID is detached. I guess on Intel detaching the PASID goes through the host which can flush the host queue. On Arm we'll probably need to flush the queue when receiving a PASID cache invalidation, which the guest issues after clearing a PASID table entry.
We are trying to get ARM to a point where invalidations don't need to be trapped. It would be good to not rely on that anyplace.
Jason