On Fri, 2018-01-12 at 08:23 +0200, Leon Romanovsky wrote:
On Thu, Jan 11, 2018 at 07:07:06PM +0000, Bart Van Assche wrote:
On Thu, 2018-01-11 at 21:00 +0200, Leon Romanovsky wrote:
On Thu, Jan 11, 2018 at 04:02:33PM +0000, Bart Van Assche wrote:
On Thu, 2018-01-11 at 08:22 +0200, Leon Romanovsky wrote:
The proposed patch definitely decreases the chance of races, but it is not fixing them. There is a chance to have change in qp state immediately after your "if ..." check.
Hello Leon,
Please have a look at rxe_qp_error() and you will see that the patch I posted is a proper fix. In the scenario you described rxe_qp_error() will trigger a run of rxe_completer().
Bart,
What am I missing?
CPU1 CPU2 if (unlikely.... <--- /* move the qp to the error state */ void rxe_qp_error(struct rxe_qp *qp) { qp->req.state = QP_STATE_ERROR; qp->resp.state = QP_STATE_ERROR; qp->attr.qp_state = IB_QPS_ERR; ---> rxe_run_task(&qp->req.task, must_sched);
It is more or less the same as without "if (unlikely..."
Hello Leon,
In the above the part of rxe_qp_error() that I was referring to in my e-mail is missing:
if (qp_type(qp) == IB_QPT_RC) rxe_run_task(&qp->comp.task, 1);
But it is exactly where race exists, as long QP isn't protected, it can switch CPUs and create race.
Hello Leon,
Can you clarify which race you are referring to? rxe_run_task() uses the tasklet mechanism and tasklets are guaranteed to run on at most one CPU at a time. See also the "Top and Bottom Halves" chapter in Linux Device Drivers, 3rd edition. See also the tasklet_schedule() implementation in <linux/interrupt.h> and in kernel/softirq.c.
Thanks,
Bart.