On Tue, 16 Apr 2024 at 15:28, Jakub Kicinski kuba@kernel.org wrote:
On Sat, 13 Apr 2024 02:42:51 +0100 Dmitry Safonov via B4 Relay wrote:
Started as addressing the flakiness issues in rst_ipv*, that affect netdev dashboard.
Thank you! :)
Jakub, you are very welcome :) I'll keep an eye on the dashboard, but I very much encourage you to ping me in case of any other issues with tcp_ao selftests.
I currently have v2 for tcp-ao tracepoints, but delaying it as working on a reproducer/selftest for an issue I think I have a patch for.
BTW, do you know if those were addressed or anyone is looking into them? (from other tcp-ao hits, that seem not anyhow related to tcp-ao itself):
1. [ 240.001391][ T833] Possible interrupt unsafe locking scenario: [ 240.001391][ T833] [ 240.001635][ T833] CPU0 CPU1 [ 240.001797][ T833] ---- ---- [ 240.001958][ T833] lock(&p->alloc_lock); [ 240.002083][ T833] local_irq_disable(); [ 240.002284][ T833] lock(&ndev->lock); [ 240.002490][ T833] lock(&p->alloc_lock); [ 240.002709][ T833] <Interrupt> [ 240.002819][ T833] lock(&ndev->lock); [ 240.002937][ T833] [ 240.002937][ T833] *** DEADLOCK ***
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537021/14-self-con...
2. [ 251.411647][ T71] WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected [ 251.411986][ T71] 6.9.0-rc1-virtme #1 Not tainted [ 251.412214][ T71] ----------------------------------------------------- [ 251.412533][ T71] kworker/u16:1/71 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: [ 251.412837][ T71] ffff888005182c28 (&p->alloc_lock){+.+.}-{2:2}, at: __get_task_comm+0x27/0x70 [ 251.413214][ T71] [ 251.413214][ T71] and this task is already holding: [ 251.413527][ T71] ffff88802f83efd8 (&ul->lock){+.-.}-{2:2}, at: rt6_uncached_list_flush_dev+0x138/0x840 [ 251.413887][ T71] which would create a new lock dependency: [ 251.414153][ T71] (&ul->lock){+.-.}-{2:2} -> (&p->alloc_lock){+.+.}-{2:2} [ 251.414464][ T71] [ 251.414464][ T71] but this new dependency connects a SOFTIRQ-irq-safe lock: [ 251.414808][ T71] (&ul->lock){+.-.}-{2:2}
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/537201/17-icmps-di...
3. [ 264.280734][ C3] Possible unsafe locking scenario: [ 264.280734][ C3] [ 264.280968][ C3] CPU0 CPU1 [ 264.281117][ C3] ---- ---- [ 264.281263][ C3] lock((&tw->tw_timer)); [ 264.281427][ C3] lock(&hashinfo->ehash_locks[i]); [ 264.281647][ C3] lock((&tw->tw_timer)); [ 264.281834][ C3] lock(&hashinfo->ehash_locks[i]);
https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/547461/19-self-con...
I can spend some time on them after I verify that my fix for -stable is actually fixing an issue I think it fixes. Seems like your automation + my selftests are giving some fruits, hehe.
Thanks, Dmitry