When do_task() exhausts its RXE_MAX_ITERATIONS budget, it unconditionally sets the task state to TASK_STATE_IDLE to reschedule. This overwrites the TASK_STATE_DRAINING state that may have been concurrently set by rxe_cleanup_task() or rxe_disable_task().
This race condition breaks the cleanup and disable logic, which expects the task to stop processing new work. The cleanup code may proceed while do_task() reschedules itself, leading to a potential use-after-free.
This bug was introduced during the migration from tasklets to workqueues, where the special handling for the draining case was lost.
Fix this by restoring the original behavior. If the state is TASK_STATE_DRAINING when iterations are exhausted, continue the loop by setting cont to 1. This allows new iterations to finish the remaining work and reach the switch statement, which properly transitions the state to TASK_STATE_DRAINED and stops the task as intended.
Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han hanguidong02@gmail.com --- drivers/infiniband/sw/rxe/rxe_task.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c index 6f8f353e9583..f522820b950c 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.c +++ b/drivers/infiniband/sw/rxe/rxe_task.c @@ -132,8 +132,12 @@ static void do_task(struct rxe_task *task) * yield the cpu and reschedule the task */ if (!ret) { - task->state = TASK_STATE_IDLE; - resched = 1; + if (task->state != TASK_STATE_DRAINING) { + task->state = TASK_STATE_IDLE; + resched = 1; + } else { + cont = 1; + } goto exit; }
On 9/17/25 3:06 AM, Gui-Dong Han wrote:
When do_task() exhausts its RXE_MAX_ITERATIONS budget, it unconditionally
From the source code, it will check ret value, then set it to TASK_STATE_IDLE, not unconditionally.
sets the task state to TASK_STATE_IDLE to reschedule. This overwrites the TASK_STATE_DRAINING state that may have been concurrently set by rxe_cleanup_task() or rxe_disable_task().
From the source code, there is a spin lock to protect the state. It will not make race condition.
This race condition breaks the cleanup and disable logic, which expects the task to stop processing new work. The cleanup code may proceed while do_task() reschedules itself, leading to a potential use-after-free.
Can you post the call trace when this problem occurred?
Hi, Jason && Leon
Please comment on this problem.
Thanks a lot. Yanjun.Zhu
This bug was introduced during the migration from tasklets to workqueues, where the special handling for the draining case was lost.
Fix this by restoring the original behavior. If the state is TASK_STATE_DRAINING when iterations are exhausted, continue the loop by setting cont to 1. This allows new iterations to finish the remaining work and reach the switch statement, which properly transitions the state to TASK_STATE_DRAINED and stops the task as intended.
Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han hanguidong02@gmail.com
drivers/infiniband/sw/rxe/rxe_task.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/rxe/rxe_task.c b/drivers/infiniband/sw/rxe/rxe_task.c index 6f8f353e9583..f522820b950c 100644 --- a/drivers/infiniband/sw/rxe/rxe_task.c +++ b/drivers/infiniband/sw/rxe/rxe_task.c @@ -132,8 +132,12 @@ static void do_task(struct rxe_task *task) * yield the cpu and reschedule the task */ if (!ret) {
task->state = TASK_STATE_IDLE;
resched = 1;
if (task->state != TASK_STATE_DRAINING) {
task->state = TASK_STATE_IDLE;
resched = 1;
} else {
cont = 1;
}} goto exit;
On Thu, Sep 18, 2025 at 3:31 AM yanjun.zhu yanjun.zhu@linux.dev wrote:
On 9/17/25 3:06 AM, Gui-Dong Han wrote:
When do_task() exhausts its RXE_MAX_ITERATIONS budget, it unconditionally
From the source code, it will check ret value, then set it to TASK_STATE_IDLE, not unconditionally.
Hi Yanjun,
Thanks for your review. Let me clarify a few points.
You are correct that the code checks the ret value. The if (!ret) branch specifically handles the case where the RXE_MAX_ITERATIONS limit is reached while work still remains. My use of "unconditionally" refers to the action inside this branch, which sets the state to TASK_STATE_IDLE without a secondary check on task->state. The original tasklet implementation effectively checked both conditions in this scenario.
sets the task state to TASK_STATE_IDLE to reschedule. This overwrites the TASK_STATE_DRAINING state that may have been concurrently set by rxe_cleanup_task() or rxe_disable_task().
From the source code, there is a spin lock to protect the state. It will not make race condition.
While a spinlock protects state changes, rxe_cleanup_task() and rxe_disable_task() do not hold it for its entire duration. It acquires the lock to set TASK_STATE_DRAINING, but then releases it to wait in the while (!is_done(task)) loop. The race window exists when do_task() acquires the lock during this wait period, allowing it to overwrite the TASK_STATE_DRAINING state.
This race condition breaks the cleanup and disable logic, which expects the task to stop processing new work. The cleanup code may proceed while do_task() reschedules itself, leading to a potential use-after-free.
Can you post the call trace when this problem occurred?
This issue was identified through code inspection and a static analysis tool we are developing to detect TOCTOU bugs in the kernel, so I do not have a runtime call trace. The bug is confirmed by inspecting the Fixes commit (9b4b7c1f9f54), which lost the special handling for the draining case during the migration from tasklets to workqueues.
Regards, Han
linux-stable-mirror@lists.linaro.org