On 23/01/17 15:13, Johan Hovold wrote:
On Mon, Jan 23, 2017 at 02:32:48PM +0000, Bryan O'Donoghue wrote:
On 23/01/17 12:04, Johan Hovold wrote:
+static void gb_operation_timeout(unsigned long arg) +{
- struct gb_operation *operation = (void *)arg;
- if (gb_operation_result_set(operation, -ETIMEDOUT)) {
/*
* A stuck request message will be cancelled from the
* workqueue.
*/
queue_work(gb_operation_completion_wq, &operation->work);
- }
+}
If queue_work fails, you will never wake up the waiter.
How could queue_work fail here? The operation result is always set exactly once before being queued so this is fine.
Perhaps it relates to a bug elsewhere, though I struggle to see how we could feasibly re-use the work associated with an in-flight gb_message.
Could you trap the result code and BUG_ON()/WARN_ON() for it so that we can debug this a little bit further ?
- ret = wait_for_completion_interruptible_timeout(&operation->completion,
timeout_jiffies);
- ret = wait_for_completion_interruptible(&operation->completion);
Note, that's a case I explicitly handle (failure to queue the work) in the async set I sent out.
Yep, I noticed that, but that was for queueing your timeout work at submission. And you queued unconditionally, so you could potentially fail to queue if an operation was submitted twice, even if that would in itself be a driver bug.
Have you run this through loopback without any # of in-flight operations constrained ?