On Thu, Nov 30, 2023 at 08:45:23PM -0400, Jason Gunthorpe wrote:
On Thu, Nov 30, 2023 at 12:41:20PM -0800, Nicolin Chen wrote:
So userspace would have to read the event FD before returning to be correct?
Maybe the kernel can somehow return a flag to indicate the event fd has data in it?
If yes then all errors would flow through the event fd?
I think it'd be nicer to return an immediate error to stop guest CMDQ to raise a fault there accordingly, similar to returning a -EIO for a bad STE in your SMMU part-3 series.
If the "return a flag" is an errno of the ioctl, it could work by reading from a separate memory that belongs to the event fd. Yet, in this case, an eventfd signal (assuming there is one to trigger VMM's fault handler) becomes unnecessary, since the invalidation ioctl is already handling it?
My concern is how does all this fit together and do we push the right things to the right places in the right order when an error occurs.
I did not study the spec carefully to see what exactly is supposed to happen here, and I don't see things in Linux that make me think it particularly cares..
ie Linux doesn't seem like it will know that an async event was even triggered while processing the sync to generate an EIO. It looks like it just gets ETIMEDOUT? Presumably we should be checking the event queue to detect a pushed error?
It is worth understanding if the spec has language that requires certain order so we can try to follow it.
Oh, I replied one misinformation previously. Actually eventq doesn't report a CERROR. The global error interrupt does.
7.1 has that sequence: 1) CMDQ stops 2) Log current index to the CONS register 3) Log error code to the CONS register 4) Set bit-0 "CMDQ error" of GERROR register to rise an irq.
FWIW, both gerror and cmdq are global. So we can't know if the error is for which master or domain. So, the only way is to get errno from the arm_smmu_cmdq_issue_cmd_with_sync call in our user invalidate function, where we can then get the error code. But this feels very much synchronous, since both the error code and faulty CONS index could be simply returned without an async eventfd.
Thanks Nic