Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Linux system crashed
[ 0.000000] Linux version 5.4.0-54-generic (buildd@lcy01-amd64-008) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020 (Ubuntu 5.4.0-54.60~18.04.1-generic 5.4.65) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-54-generic root=UUID=11885fd3-b840-4c9b-a500-532c73ac952a ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=512M-:192M
… [ 156.321147] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [ 177.519159] general protection fault: 0000 [#1] SMP PTI [ 177.519737] CPU: 5 PID: 18484 Comm: worker-1 Kdump: loaded Not tainted 5.4.0-54-generic #60~18.04.1-Ubuntu [ 177.519742] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 177.519814] RIP: 0010:dev_hard_start_xmit+0x38/0x200 [ 177.519827] Code: 55 41 54 53 48 83 ec 20 48 85 ff 48 89 55 c8 48 89 4d b8 0f 84 c1 01 00 00 48 8d 86 90 00 00 00 48 89 fb 49 89 f4 48 89 45 c0 <4c> 8b 2b 48 c7 c0 d0 f2 04 8f 48 c7 03 00 00 00 00 48 8b 00 4d 85 [ 177.519829] RSP: 0018:ffffbc6d0609b5e8 EFLAGS: 00010286 [ 177.519833] RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff95cf4bcfe800 [ 177.519835] RDX: 0000000000000000 RSI: ffff95cf4bcfe800 RDI: 0000000000000286 [ 177.519837] RBP: ffffbc6d0609b630 R08: ffff95cf6a190ec8 R09: ffff95cf4a2f7438 [ 177.519839] R10: ffffbc6d0609b6d0 R11: ffff95cf49d4d180 R12: ffff95cf51a5f000 [ 177.519841] R13: dead000000000100 R14: 000000000000009c R15: ffff95d02996b400 [ 177.519844] FS: 00007ff394cdfb20(0000) GS:ffff95d035d40000(0000) knlGS:0000000000000000 [ 177.519846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 177.519848] CR2: 00007fb4a9c2d000 CR3: 00000001049fa004 CR4: 00000000003606e0 [ 177.519908] Call Trace: [ 177.519917] __dev_queue_xmit+0x719/0x920 [ 177.519930] ? ctnetlink_conntrack_event+0x8c/0x5e0 [nf_conntrack_netlink]
Can you reproduce this on 5.7 or later, or with following patches backported to 5.4.y?
dd3cc111f2e3220ddc9c4ab17f13dc97759b5163 119e52e664c57d5f7c0174dc2b3a296b1e40591d af370ab36fcd19f04e3408c402608e7e56e6f188 28f715b9e6dd7cbf07c2aea913fea7c87a56a3b5
The series fixed nfqueue reference counting.
Seems that upgrade to Linux 5.7 solved the problem, we will run more tests. Thank you, Yuri
On Nov 30, 2020, at 2:58 PM, Florian Westphal fw@strlen.de wrote:
Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Linux system crashed
[ 0.000000] Linux version 5.4.0-54-generic (buildd@lcy01-amd64-008) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020 (Ubuntu 5.4.0-54.60~18.04.1-generic 5.4.65) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-54-generic root=UUID=11885fd3-b840-4c9b-a500-532c73ac952a ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=512M-:192M
… [ 156.321147] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [ 177.519159] general protection fault: 0000 [#1] SMP PTI [ 177.519737] CPU: 5 PID: 18484 Comm: worker-1 Kdump: loaded Not tainted 5.4.0-54-generic #60~18.04.1-Ubuntu [ 177.519742] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 177.519814] RIP: 0010:dev_hard_start_xmit+0x38/0x200 [ 177.519827] Code: 55 41 54 53 48 83 ec 20 48 85 ff 48 89 55 c8 48 89 4d b8 0f 84 c1 01 00 00 48 8d 86 90 00 00 00 48 89 fb 49 89 f4 48 89 45 c0 <4c> 8b 2b 48 c7 c0 d0 f2 04 8f 48 c7 03 00 00 00 00 48 8b 00 4d 85 [ 177.519829] RSP: 0018:ffffbc6d0609b5e8 EFLAGS: 00010286 [ 177.519833] RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff95cf4bcfe800 [ 177.519835] RDX: 0000000000000000 RSI: ffff95cf4bcfe800 RDI: 0000000000000286 [ 177.519837] RBP: ffffbc6d0609b630 R08: ffff95cf6a190ec8 R09: ffff95cf4a2f7438 [ 177.519839] R10: ffffbc6d0609b6d0 R11: ffff95cf49d4d180 R12: ffff95cf51a5f000 [ 177.519841] R13: dead000000000100 R14: 000000000000009c R15: ffff95d02996b400 [ 177.519844] FS: 00007ff394cdfb20(0000) GS:ffff95d035d40000(0000) knlGS:0000000000000000 [ 177.519846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 177.519848] CR2: 00007fb4a9c2d000 CR3: 00000001049fa004 CR4: 00000000003606e0 [ 177.519908] Call Trace: [ 177.519917] __dev_queue_xmit+0x719/0x920 [ 177.519930] ? ctnetlink_conntrack_event+0x8c/0x5e0 [nf_conntrack_netlink]
Can you reproduce this on 5.7 or later, or with following patches backported to 5.4.y?
dd3cc111f2e3220ddc9c4ab17f13dc97759b5163 119e52e664c57d5f7c0174dc2b3a296b1e40591d af370ab36fcd19f04e3408c402608e7e56e6f188 28f715b9e6dd7cbf07c2aea913fea7c87a56a3b5
The series fixed nfqueue reference counting.
Hello Florian,
I need assistance on this one. Our customer system 5.10.25-flatcar crashed with following trace
Aug 26 10:26:32.686733 amc-k8sdevsl01-worker-lx13 kernel: ------------[ cut here ]------------ Aug 26 10:26:32.686855 amc-k8sdevsl01-worker-lx13 kernel: refcount_t: underflow; use-after-free. Aug 26 10:26:32.686877 amc-k8sdevsl01-worker-lx13 kernel: WARNING: CPU: 4 PID: 2422635 at lib/refcount.c:28 refcount_warn_saturat> Aug 26 10:26:32.686930 amc-k8sdevsl01-worker-lx13 kernel: Modules linked in: binfmt_misc nfnetlink_queue xt_NFQUEUE xt_multiport > Aug 26 10:26:32.689906 amc-k8sdevsl01-worker-lx13 kernel: dm_region_hash dm_log dm_mod Aug 26 10:26:32.690398 amc-k8sdevsl01-worker-lx13 kernel: CPU: 4 PID: 2422635 Comm: worker-1 Not tainted 5.10.25-flatcar #1 Aug 26 10:26:32.690526 amc-k8sdevsl01-worker-lx13 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Refer> Aug 26 10:26:32.691653 amc-k8sdevsl01-worker-lx13 kernel: RIP: 0010:refcount_warn_saturate+0xa6/0xf0 Aug 26 10:26:32.691720 amc-k8sdevsl01-worker-lx13 kernel: Code: 05 3c 1d 40 01 01 e8 81 46 38 00 0f 0b c3 80 3d 2a 1d 40 01 00 75> Aug 26 10:26:32.691747 amc-k8sdevsl01-worker-lx13 kernel: RSP: 0018:ffffa3a0c3627938 EFLAGS: 00010282 Aug 26 10:26:32.692385 amc-k8sdevsl01-worker-lx13 kernel: RAX: 0000000000000000 RBX: ffff8c011b14fa00 RCX: 0000000000000027 Aug 26 10:26:32.692422 amc-k8sdevsl01-worker-lx13 kernel: RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff8c045d918b08 Aug 26 10:26:32.692446 amc-k8sdevsl01-worker-lx13 kernel: RBP: ffff8c011b14fa00 R08: ffff8c045d918b00 R09: ffffa3a0c3627750 Aug 26 10:26:32.693526 amc-k8sdevsl01-worker-lx13 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff8c011b14fa30 Aug 26 10:26:32.693584 amc-k8sdevsl01-worker-lx13 kernel: R13: 0000000000000002 R14: ffff8bfda3b43180 R15: ffff8c00cddb3a00 Aug 26 10:26:32.693615 amc-k8sdevsl01-worker-lx13 kernel: FS: 00007ff7a2331b38(0000) GS:ffff8c045d900000(0000) knlGS:00000000000> Aug 26 10:26:32.693649 amc-k8sdevsl01-worker-lx13 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 26 10:26:32.694304 amc-k8sdevsl01-worker-lx13 kernel: CR2: 00007ff79ac17a28 CR3: 00000001ee34e003 CR4: 00000000007706e0 Aug 26 10:26:32.694334 amc-k8sdevsl01-worker-lx13 kernel: PKRU: 55555554 Aug 26 10:26:32.694351 amc-k8sdevsl01-worker-lx13 kernel: Call Trace: Aug 26 10:26:32.694370 amc-k8sdevsl01-worker-lx13 kernel: nf_queue_entry_release_refs+0x82/0xa0 Aug 26 10:26:32.695381 amc-k8sdevsl01-worker-lx13 kernel: nf_reinject+0x6f/0x1a0 Aug 26 10:26:32.695404 amc-k8sdevsl01-worker-lx13 kernel: 0xffffffffc0857980 Aug 26 10:26:32.695425 amc-k8sdevsl01-worker-lx13 kernel: nfnetlink_unicast+0x1f1/0x420 [nfnetlink] Aug 26 10:26:32.695441 amc-k8sdevsl01-worker-lx13 kernel: ? cred_has_capability+0x7f/0x120 Aug 26 10:26:32.695457 amc-k8sdevsl01-worker-lx13 kernel: ? nfnetlink_unicast+0xa0/0x420 [nfnetlink] Aug 26 10:26:32.695475 amc-k8sdevsl01-worker-lx13 kernel: netlink_rcv_skb+0x50/0x100 Aug 26 10:26:32.696440 amc-k8sdevsl01-worker-lx13 kernel: nfnetlink_subsys_register+0x789/0x869 [nfnetlink] Aug 26 10:26:32.696465 amc-k8sdevsl01-worker-lx13 kernel: netlink_unicast+0x191/0x230 Aug 26 10:26:32.696492 amc-k8sdevsl01-worker-lx13 kernel: netlink_sendmsg+0x243/0x480 Aug 26 10:26:32.696513 amc-k8sdevsl01-worker-lx13 kernel: sock_sendmsg+0x5e/0x60 Aug 26 10:26:32.696529 amc-k8sdevsl01-worker-lx13 kernel: ____sys_sendmsg+0x1f3/0x260 Aug 26 10:26:32.697288 amc-k8sdevsl01-worker-lx13 kernel: ? copy_msghdr_from_user+0x5c/0x90 Aug 26 10:26:32.697309 amc-k8sdevsl01-worker-lx13 kernel: ? _cond_resched+0x15/0x30 Aug 26 10:26:32.697329 amc-k8sdevsl01-worker-lx13 kernel: ___sys_sendmsg+0x81/0xc0 Aug 26 10:26:32.697348 amc-k8sdevsl01-worker-lx13 kernel: ? do_lock_file_wait+0x6e/0xe0 Aug 26 10:26:32.697370 amc-k8sdevsl01-worker-lx13 kernel: ? _cond_resched+0x15/0x30 Aug 26 10:26:32.698946 amc-k8sdevsl01-worker-lx13 kernel: ? fcntl_setlk+0x1a5/0x2d0 Aug 26 10:26:32.698988 amc-k8sdevsl01-worker-lx13 kernel: __sys_sendmsg+0x59/0xa0 Aug 26 10:26:32.699005 amc-k8sdevsl01-worker-lx13 kernel: do_syscall_64+0x33/0x40 Aug 26 10:26:32.699020 amc-k8sdevsl01-worker-lx13 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Aug 26 10:26:32.699039 amc-k8sdevsl01-worker-lx13 kernel: RIP: 0033:0x7ff7ab1283ad Aug 26 10:26:32.699071 amc-k8sdevsl01-worker-lx13 kernel: Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2> Aug 26 10:26:32.699090 amc-k8sdevsl01-worker-lx13 kernel: RSP: 002b:00007ff7a232f9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e Aug 26 10:26:32.699505 amc-k8sdevsl01-worker-lx13 kernel: RAX: ffffffffffffffda RBX: 00007ff7a2331b38 RCX: 00007ff7ab1283ad Aug 26 10:26:32.699534 amc-k8sdevsl01-worker-lx13 kernel: RDX: 0000000000000000 RSI: 00007ff7a232fa48 RDI: 0000000000000078 Aug 26 10:26:09.088408 amc-k8sdevsl01-worker-lx13 kernel: SELinux: Class xdp_socket not defined in policy.
Is there a fix available for that crash?
Thank you, Yuri
On Dec 3, 2020, at 12:00 PM, Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Seems that upgrade to Linux 5.7 solved the problem, we will run more tests. Thank you, Yuri
On Nov 30, 2020, at 2:58 PM, Florian Westphal fw@strlen.de wrote:
Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Linux system crashed
[ 0.000000] Linux version 5.4.0-54-generic (buildd@lcy01-amd64-008) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020 (Ubuntu 5.4.0-54.60~18.04.1-generic 5.4.65) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-54-generic root=UUID=11885fd3-b840-4c9b-a500-532c73ac952a ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=512M-:192M
… [ 156.321147] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [ 177.519159] general protection fault: 0000 [#1] SMP PTI [ 177.519737] CPU: 5 PID: 18484 Comm: worker-1 Kdump: loaded Not tainted 5.4.0-54-generic #60~18.04.1-Ubuntu [ 177.519742] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 177.519814] RIP: 0010:dev_hard_start_xmit+0x38/0x200 [ 177.519827] Code: 55 41 54 53 48 83 ec 20 48 85 ff 48 89 55 c8 48 89 4d b8 0f 84 c1 01 00 00 48 8d 86 90 00 00 00 48 89 fb 49 89 f4 48 89 45 c0 <4c> 8b 2b 48 c7 c0 d0 f2 04 8f 48 c7 03 00 00 00 00 48 8b 00 4d 85 [ 177.519829] RSP: 0018:ffffbc6d0609b5e8 EFLAGS: 00010286 [ 177.519833] RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff95cf4bcfe800 [ 177.519835] RDX: 0000000000000000 RSI: ffff95cf4bcfe800 RDI: 0000000000000286 [ 177.519837] RBP: ffffbc6d0609b630 R08: ffff95cf6a190ec8 R09: ffff95cf4a2f7438 [ 177.519839] R10: ffffbc6d0609b6d0 R11: ffff95cf49d4d180 R12: ffff95cf51a5f000 [ 177.519841] R13: dead000000000100 R14: 000000000000009c R15: ffff95d02996b400 [ 177.519844] FS: 00007ff394cdfb20(0000) GS:ffff95d035d40000(0000) knlGS:0000000000000000 [ 177.519846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 177.519848] CR2: 00007fb4a9c2d000 CR3: 00000001049fa004 CR4: 00000000003606e0 [ 177.519908] Call Trace: [ 177.519917] __dev_queue_xmit+0x719/0x920 [ 177.519930] ? ctnetlink_conntrack_event+0x8c/0x5e0 [nf_conntrack_netlink]
Can you reproduce this on 5.7 or later, or with following patches backported to 5.4.y?
dd3cc111f2e3220ddc9c4ab17f13dc97759b5163 119e52e664c57d5f7c0174dc2b3a296b1e40591d af370ab36fcd19f04e3408c402608e7e56e6f188 28f715b9e6dd7cbf07c2aea913fea7c87a56a3b5
The series fixed nfqueue reference counting.
Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Hello Florian,
I need assistance on this one. Our customer system 5.10.25-flatcar crashed with following trace
Aug 26 10:26:32.686733 amc-k8sdevsl01-worker-lx13 kernel: ------------[ cut here ]------------ Aug 26 10:26:32.686855 amc-k8sdevsl01-worker-lx13 kernel: refcount_t: underflow; use-after-free. Aug 26 10:26:32.686877 amc-k8sdevsl01-worker-lx13 kernel: WARNING: CPU: 4 PID: 2422635 at lib/refcount.c:28 refcount_warn_saturat> Aug 26 10:26:32.686930 amc-k8sdevsl01-worker-lx13 kernel: Modules linked in: binfmt_misc nfnetlink_queue xt_NFQUEUE xt_multiport > Aug 26 10:26:32.689906 amc-k8sdevsl01-worker-lx13 kernel: dm_region_hash dm_log dm_mod Aug 26 10:26:32.690398 amc-k8sdevsl01-worker-lx13 kernel: CPU: 4 PID: 2422635 Comm: worker-1 Not tainted 5.10.25-flatcar #1 Aug 26 10:26:32.690526 amc-k8sdevsl01-worker-lx13 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Refer> Aug 26 10:26:32.691653 amc-k8sdevsl01-worker-lx13 kernel: RIP: 0010:refcount_warn_saturate+0xa6/0xf0 Aug 26 10:26:32.691720 amc-k8sdevsl01-worker-lx13 kernel: Code: 05 3c 1d 40 01 01 e8 81 46 38 00 0f 0b c3 80 3d 2a 1d 40 01 00 75> Aug 26 10:26:32.691747 amc-k8sdevsl01-worker-lx13 kernel: RSP: 0018:ffffa3a0c3627938 EFLAGS: 00010282 Aug 26 10:26:32.692385 amc-k8sdevsl01-worker-lx13 kernel: RAX: 0000000000000000 RBX: ffff8c011b14fa00 RCX: 0000000000000027 Aug 26 10:26:32.692422 amc-k8sdevsl01-worker-lx13 kernel: RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff8c045d918b08 Aug 26 10:26:32.692446 amc-k8sdevsl01-worker-lx13 kernel: RBP: ffff8c011b14fa00 R08: ffff8c045d918b00 R09: ffffa3a0c3627750 Aug 26 10:26:32.693526 amc-k8sdevsl01-worker-lx13 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff8c011b14fa30 Aug 26 10:26:32.693584 amc-k8sdevsl01-worker-lx13 kernel: R13: 0000000000000002 R14: ffff8bfda3b43180 R15: ffff8c00cddb3a00 Aug 26 10:26:32.693615 amc-k8sdevsl01-worker-lx13 kernel: FS: 00007ff7a2331b38(0000) GS:ffff8c045d900000(0000) knlGS:00000000000> Aug 26 10:26:32.693649 amc-k8sdevsl01-worker-lx13 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 26 10:26:32.694304 amc-k8sdevsl01-worker-lx13 kernel: CR2: 00007ff79ac17a28 CR3: 00000001ee34e003 CR4: 00000000007706e0 Aug 26 10:26:32.694334 amc-k8sdevsl01-worker-lx13 kernel: PKRU: 55555554 Aug 26 10:26:32.694351 amc-k8sdevsl01-worker-lx13 kernel: Call Trace: Aug 26 10:26:32.694370 amc-k8sdevsl01-worker-lx13 kernel: nf_queue_entry_release_refs+0x82/0xa0
Is that sock_put()?
If so, I don't understand this backtrace. When refcount_t debugging is on, sock_hold() would also generate a backtrace in case we try to incrase refcount on a socket that already has a zero refcount.
So, looks like something else decremented sk refcount while packet was queued. No idea how that could happen.
linux-stable-mirror@lists.linaro.org