Re: [PATCH] avoid race condition between start_xmit and cm_rep_handler

16 Aug 2018

      On 8/16/18 12:54 AM, Leon Romanovsky wrote:
...
On Wed, Aug 15, 2018 at 09:11:49PM -0400, Aaron Knister wrote:
...
Inside of start_xmit() the call to check if the connection is up and the
queueing of the packets for later transmission is not atomic which
leaves a window where cm_rep_handler can run, set the connection up,
dequeue pending packets and leave the subsequently queued packets by
start_xmit() sitting on neigh->queue until they're dropped when the
connection is torn down. This only applies to connected mode. These
dropped packets can really upset TCP, for example,  and cause
multi-minute delays in transmission for open connections.
I've got a reproducer available if it's needed.
Here's the code in start_xmit where we check to see if the connection
is up:
    if (ipoib_cm_get(neigh)) {
            if (ipoib_cm_up(neigh)) {
                    ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
                    goto unref;
            }
    }

The race occurs if cm_rep_handler execution occurs after the above
connection check (specifically if it gets to the point where it acquires
priv->lock to dequeue pending skb's) but before the below code snippet
in start_xmit where packets are queued.
    if (skb_queue_len(&neigh->queue) < IPOIB_MAX_PATH_REC_QUEUE) {
            push_pseudo_header(skb, phdr->hwaddr);
            spin_lock_irqsave(&priv->lock, flags);
            __skb_queue_tail(&neigh->queue, skb);
            spin_unlock_irqrestore(&priv->lock, flags);
    } else {
            ++dev->stats.tx_dropped;
            dev_kfree_skb_any(skb);
    }

The patch re-checks ipoib_cm_up with priv->lock held to avoid this
race condition. Since odds are the conn should be up most of the time
(and thus the connection *not* down most of the time) we don't hold the
lock for the first check attempt to avoid a slowdown from unecessary
locking for the majority of the packets transmitted during the
connection's life.
Cc: stable@vger.kernel.org
Tested-by: Ira Weiny ira.weiny@intel.com
Signed-off-by: Aaron Knister aaron.s.knister@nasa.gov

Sorry, but no mainly for two reasons:

Don't lock/unlock in different functions.
Don't create unbalanced number of lock/unlocks.

Thanks
Thanks, Leon. I'm on-board with not locking/unlocking between functions. That felt a little weird, and I think I can work around that. I'm curious, though, about the unbalanced number of lock/unlocks because I don't see that looking at the patch. Could you help me understand your concern? Having said that, this struck me as appearing unbalanced:
+		spin_lock_irqsave(&priv->lock, flags);
+		if (ipoib_cm_up(neigh)) {
+			spin_unlock_irqrestore(&priv->lock, flags);
+			ipoib_cm_send(dev, skb, ipoib_cm_get(neigh));
+		} else  {
+			defer_neigh_skb(skb, dev, neigh, phdr, &flags);
+		}
but that call to defer_neigh_skb would call spin_unlock_irqrestore because it passes in &flags to defer_neigh_skb. It's not obvious, though, which is probably an issue. I'm trying to balance only holding priv->lock where absolutely necessary with not typing chunks of code out twice but with subtle differences.
I'll re-work this and re-submit.
-Aaron
-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH] avoid race condition between start_xmit and cm_rep_handler