On Tue, Nov 28, 2023 at 08:03:35AM +0000, Tian, Kevin wrote:
From: Nicolin Chen nicolinc@nvidia.com Sent: Tuesday, November 28, 2023 3:53 AM
On Fri, Nov 24, 2023 at 02:36:29AM +0000, Tian, Kevin wrote:
> > >> + * @out_driver_error_code: Report a driver speicifc error code
upon
> > failure. > > >> + * It's optional, driver has a choice to fill it or > > >> + * not. > > > > > > Being optional how does the user tell whether the code is filled
or
not?
Well, naming it "error_code" indicates zero means no error while non-zero means something? An error return from this ioctl could also tell the user space to look up for this driver error code, if it ever cares.
probably over-thinking but I'm not sure whether zero is guaranteed to mean no error in all implementations...
Well, you are right. Usually HW conveniently raises a flag in a register to indicate something wrong, yet it is probably unsafe to say it definitely.
this reminds me one open. What about an implementation having a hierarchical error code layout e.g. one main error register with each bit representing an error category then multiple error code registers each for one error category? In this case probably a single out_driver_error_code cannot carry that raw information.
Hmm, good point.
Instead the iommu driver may need to define a customized error code convention in uapi header which is converted from the raw error information.
From this angle should we simply say that the error code definition must be included in the uapi header? If raw error information can be carried by this field then this hw can simply say that the error code format is same as the hw spec defines.
With that explicit information then the viommu can easily tell whether error code is filled or not based on its own convention.
That'd be to put this error_code field into the driver uAPI structure right?
I also thought about making this out_driver_error_code per HW. Yet, an error can be either per array or per entry/quest. The array-related error should be reported in the array structure that is a core uAPI, v.s. the per-HW entry structure. Though we could still report an array error in the entry structure at the first entry (or indexed by "array->entry_num")?
why would there be an array error? array is just a software entity containing actual HW invalidation cmds. If there is any error with the array itself it should be reported via ioctl errno.
User array reading is a software operation, but kernel array reading is a hardware operation that can raise an error when the memory location to the array is incorrect or so.
With that being said, I think errno (-EIO) could do the job, as you suggested too.
Thanks Nic
Jason, how about your opinion? I didn't spot big issues except this one. Hope it can make into 6.8.