On Sun, Apr 26, 2020 at 2:38 PM Eric Dumazet eric.dumazet@gmail.com wrote:
On 4/26/20 1:26 PM, Eric Dumazet wrote:
On 4/26/20 12:42 PM, Jason A. Donenfeld wrote:
On Sun, Apr 26, 2020 at 1:40 PM Eric Dumazet eric.dumazet@gmail.com wrote:
On 4/26/20 10:57 AM, syzbot wrote:
syzbot has bisected this bug to:
commit e7096c131e5161fa3b8e52a650d7719d2857adfd Author: Jason A. Donenfeld Jason@zx2c4.com Date: Sun Dec 8 23:27:34 2019 +0000
net: WireGuard secure network tunnel
bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15258fcfe00000 start commit: b2768df2 Merge branch 'for-linus' of git://git.kernel.org/.. git tree: upstream final crash: https://syzkaller.appspot.com/x/report.txt?x=17258fcfe00000 console output: https://syzkaller.appspot.com/x/log.txt?x=13258fcfe00000 kernel config: https://syzkaller.appspot.com/x/.config?x=b7a70e992f2f9b68 dashboard link: https://syzkaller.appspot.com/bug?extid=0251e883fe39e7a0cb0a userspace arch: i386 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f5f47fe00000 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=11e8efb4100000
Reported-by: syzbot+0251e883fe39e7a0cb0a@syzkaller.appspotmail.com Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
For information about bisection process see: https://goo.gl/tpsmEJ#bisection
I have not looked at the repro closely, but WireGuard has some workers that might loop forever, cond_resched() might help a bit.
I'm working on this right now. Having a bit difficult of a time getting it to reproduce locally...
The reports show the stall happening always at:
static struct sk_buff * sfq_dequeue(struct Qdisc *sch) { struct sfq_sched_data *q = qdisc_priv(sch); struct sk_buff *skb; sfq_index a, next_a; struct sfq_slot *slot;
/* No active slots */ if (q->tail == NULL) return NULL;
next_slot: a = q->tail->next; slot = &q->slots[a];
Which is kind of interesting, because it's not like that should block or anything, unless there's some kasan faulting happening.
I am not really sure WireGuard is involved, the repro does not rely on it anyway.
Yes, do not spend too much time on this.
syzbot found its way into crazy qdisc settings these last days.
( I sent a patch yesterday for choke qdisc, it seems similar checks are needed in sfq )
Ah, whew, okay. I had just begun instrumenting sfq (the highly technical term for "adding printks everywhere") to figure out what's going on. Looks like you've got a handle on it, so I'll let you have at it.
On the brighter side, it seems like Dmitry's and my effort to get full coverage of WireGuard has paid off in the sense that tons of packets wind up being shoveled through it in one way or another, which is good.