Hi,
I also tried to reproduce this in a targeted way, and run into the same difficulty as you: satisfying the first condition “ (sk->sk_wmem_queued >> 1) > limit “. I will not have bandwidth the coming days to try and reproduce it in this way. Maybe simply forcing a very small send buffer using sysctl net.ipv4.tcp_wmem might even do the trick?
I suspect that the bug is easier to trigger with the MPTCP patch like I did originally, due to the way this patch manages the tcp subflow buffers (it can temporarily overfill the buffers, satisfying that first condition more often).
another thing, the stacktrace you shared before seems caused by another issue (corrupted socket?), it will not be solved by the patch we submitted.
kind regards,
Tim
On Tue, Sep 3, 2019 at 5:22 AM maowenan maowenan@huawei.com wrote:
Hi Tim,
I try to reproduce it with packetdrill or user application, but I can’t.
The first condition “ (sk->sk_wmem_queued >> 1) > limit “ can’t be satisfied,
This condition is to avoid tiny SO_SNDBUF values set by user.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size).
limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE); if (unlikely((sk->sk_wmem_queued >> 1) > limit && skb != tcp_rtx_queue_head(sk) && skb != tcp_rtx_queue_tail(sk))) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); return -ENOMEM; }
Can you try to reproduce it with packetdrill or C socket application?
On 2019/9/3 14:58, Tim Froidcoeur wrote:
Hi,
I also tried to reproduce this in a targeted way, and run into the same difficulty as you: satisfying the first condition “ (sk->sk_wmem_queued >> 1) > limit “. I will not have bandwidth the coming days to try and reproduce it in this way. Maybe simply forcing a very small send buffer using sysctl net.ipv4.tcp_wmem might even do the trick?
I suspect that the bug is easier to trigger with the MPTCP patch like I did originally, due to the way this patch manages the tcp subflow buffers (it can temporarily overfill the buffers, satisfying that first condition more often).
another thing, the stacktrace you shared before seems caused by another issue (corrupted socket?), it will not be solved by the patch we submitted.
The trace shows zero window probe message can be BUG_ON in skb_queue_prev, this is reproduced on our platform with syzkaller. It can be resolved by your fix patch. The thing I need to think is why the first condition can be satisfied? Eric, Do you have any comments to reproduce it as the first condition is hard to be true? (sk->sk_wmem_queued >> 1) > limit
kind regards,
Tim
On Tue, Sep 3, 2019 at 5:22 AM maowenan maowenan@huawei.com wrote:
Hi Tim,
I try to reproduce it with packetdrill or user application, but I can’t.
The first condition “ (sk->sk_wmem_queued >> 1) > limit “ can’t be satisfied,
This condition is to avoid tiny SO_SNDBUF values set by user.
It also adds the some room due to the fact that tcp_sendmsg()
and tcp_sendpage() might overshoot sk_wmem_queued by about one full
TSO skb (64KB size).
limit = sk->sk_sndbuf + 2 * SKB_TRUESIZE(GSO_MAX_SIZE); if (unlikely((sk->sk_wmem_queued >> 1) > limit && skb != tcp_rtx_queue_head(sk) && skb != tcp_rtx_queue_tail(sk))) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPWQUEUETOOBIG); return -ENOMEM; }
Can you try to reproduce it with packetdrill or C socket application?
linux-stable-mirror@lists.linaro.org