Hi Dan,
On Wed, Apr 20, 2022 at 04:27:27PM -0500, Dan Vacura wrote:
On Tue, Apr 19, 2022 at 11:46:37PM +0300, Laurent Pinchart wrote:
This indeed fixes an issue, so I think we can merge the patch, but I also believe we need further improvements on top (of course if you would like to improve the implementation in a v4, I won't complain :-))
It looks like Greg has already accepted the change and it's in linux-next. We can discuss here how to better handle these -EXDEV errors for future improvements, as it seems like it's been an issue in the past as well: https://www.mail-archive.com/linux-usb@vger.kernel.org/msg105615.html
As replied in v2 (sorry for the late reply), it seems that this error can occur under normal conditions. This means we shouldn't cancel the queue, at least when the error is intermitent (if all URBs fail that's another story).
My impression was that canceling the queue was still necessary as we may be in progress for the current frame. Perhaps we don't need to flush all the frames from the queue, but at a minimum we need to reset the buf_used value.
I think we have three classes of errors:
- "Packet-level" errors, resulting in either data loss or erroneous data being transferred to the host for one (or more) packets in a frame. When such errors occur, we should probably notify the application (on the gadget side), but we can continue sending the rest of the frame.
- "Frame-level" errors, resulting in errors in the rest of the frame. When such an error occurs, we should notify the application, and stop sending data for the current frame, moving to the next frame.
- "Stream-level" errors, resulting in errors in all subsequent frames. When such an error occurs, we should notify the application and stop sending data until the application takes corrective measures.
I'm not sure if packet-level errors make sense, if data is lost, maybe we would be better off just cancelling the current frame and moving to the next one.
For both packet-level errors and frame-level errors, the buffer should be marked as erroneous to notify the application, but there should be no need to cancel the queue and drop all queued buffers. We can just move to the next buffer.
For stream-level errors, I would cancel the queue, and additionally prevent new buffers from being queued until the application stops and restarts the stream.
Finally, which class an error belongs to may not be an intrinsic property of the error itself, packet-level or frame-level errors that occur too often may be worth cancelling the queue (I'm not sure how to quantify "too often" though).
Does this make sense ?
We likely need to differentiate between -EXDEV and other errors in uvc_video_complete(), as I'd like to be conservative and cancel the queue for unknown errors. We also need to improve the queue cancellation implementation so that userspace gets an error when queuing further buffers.
We already feedback to userspace the error, via the state of vb2_buffer_done(). When userspace dequeues the buffer it can check if v4l2_buffer.flags has V4L2_BUF_FLAG_ERROR to see if things failed, then decide what to do like re-queue that frame. However, this appears to not always occur since I believe the pump thread is independent of the uvc_video_complete() callback. As a result, the complete callback of the failed URB may be associated with a buffer that was already released back to the userspace client.
Good point. That would only be the case for errors in the last request(s) for a frame, right ?
In this case, I don't know if there's anything to be done, since a new buffer and subsequent URBs might already be queued up. You suggested an error on a subsequent buffer queue, but I don't know how helpful that'd be at this point, perhaps in the scenario that all URBs are failing?
Should we delay sending the buffer back to userspace until all the requests for the buffer have completed ?