Hello Florian,
I need assistance on this one. Our customer system 5.10.25-flatcar crashed with following trace
Aug 26 10:26:32.686733 amc-k8sdevsl01-worker-lx13 kernel: ------------[ cut here ]------------ Aug 26 10:26:32.686855 amc-k8sdevsl01-worker-lx13 kernel: refcount_t: underflow; use-after-free. Aug 26 10:26:32.686877 amc-k8sdevsl01-worker-lx13 kernel: WARNING: CPU: 4 PID: 2422635 at lib/refcount.c:28 refcount_warn_saturat> Aug 26 10:26:32.686930 amc-k8sdevsl01-worker-lx13 kernel: Modules linked in: binfmt_misc nfnetlink_queue xt_NFQUEUE xt_multiport > Aug 26 10:26:32.689906 amc-k8sdevsl01-worker-lx13 kernel: dm_region_hash dm_log dm_mod Aug 26 10:26:32.690398 amc-k8sdevsl01-worker-lx13 kernel: CPU: 4 PID: 2422635 Comm: worker-1 Not tainted 5.10.25-flatcar #1 Aug 26 10:26:32.690526 amc-k8sdevsl01-worker-lx13 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Refer> Aug 26 10:26:32.691653 amc-k8sdevsl01-worker-lx13 kernel: RIP: 0010:refcount_warn_saturate+0xa6/0xf0 Aug 26 10:26:32.691720 amc-k8sdevsl01-worker-lx13 kernel: Code: 05 3c 1d 40 01 01 e8 81 46 38 00 0f 0b c3 80 3d 2a 1d 40 01 00 75> Aug 26 10:26:32.691747 amc-k8sdevsl01-worker-lx13 kernel: RSP: 0018:ffffa3a0c3627938 EFLAGS: 00010282 Aug 26 10:26:32.692385 amc-k8sdevsl01-worker-lx13 kernel: RAX: 0000000000000000 RBX: ffff8c011b14fa00 RCX: 0000000000000027 Aug 26 10:26:32.692422 amc-k8sdevsl01-worker-lx13 kernel: RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff8c045d918b08 Aug 26 10:26:32.692446 amc-k8sdevsl01-worker-lx13 kernel: RBP: ffff8c011b14fa00 R08: ffff8c045d918b00 R09: ffffa3a0c3627750 Aug 26 10:26:32.693526 amc-k8sdevsl01-worker-lx13 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff8c011b14fa30 Aug 26 10:26:32.693584 amc-k8sdevsl01-worker-lx13 kernel: R13: 0000000000000002 R14: ffff8bfda3b43180 R15: ffff8c00cddb3a00 Aug 26 10:26:32.693615 amc-k8sdevsl01-worker-lx13 kernel: FS: 00007ff7a2331b38(0000) GS:ffff8c045d900000(0000) knlGS:00000000000> Aug 26 10:26:32.693649 amc-k8sdevsl01-worker-lx13 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 26 10:26:32.694304 amc-k8sdevsl01-worker-lx13 kernel: CR2: 00007ff79ac17a28 CR3: 00000001ee34e003 CR4: 00000000007706e0 Aug 26 10:26:32.694334 amc-k8sdevsl01-worker-lx13 kernel: PKRU: 55555554 Aug 26 10:26:32.694351 amc-k8sdevsl01-worker-lx13 kernel: Call Trace: Aug 26 10:26:32.694370 amc-k8sdevsl01-worker-lx13 kernel: nf_queue_entry_release_refs+0x82/0xa0 Aug 26 10:26:32.695381 amc-k8sdevsl01-worker-lx13 kernel: nf_reinject+0x6f/0x1a0 Aug 26 10:26:32.695404 amc-k8sdevsl01-worker-lx13 kernel: 0xffffffffc0857980 Aug 26 10:26:32.695425 amc-k8sdevsl01-worker-lx13 kernel: nfnetlink_unicast+0x1f1/0x420 [nfnetlink] Aug 26 10:26:32.695441 amc-k8sdevsl01-worker-lx13 kernel: ? cred_has_capability+0x7f/0x120 Aug 26 10:26:32.695457 amc-k8sdevsl01-worker-lx13 kernel: ? nfnetlink_unicast+0xa0/0x420 [nfnetlink] Aug 26 10:26:32.695475 amc-k8sdevsl01-worker-lx13 kernel: netlink_rcv_skb+0x50/0x100 Aug 26 10:26:32.696440 amc-k8sdevsl01-worker-lx13 kernel: nfnetlink_subsys_register+0x789/0x869 [nfnetlink] Aug 26 10:26:32.696465 amc-k8sdevsl01-worker-lx13 kernel: netlink_unicast+0x191/0x230 Aug 26 10:26:32.696492 amc-k8sdevsl01-worker-lx13 kernel: netlink_sendmsg+0x243/0x480 Aug 26 10:26:32.696513 amc-k8sdevsl01-worker-lx13 kernel: sock_sendmsg+0x5e/0x60 Aug 26 10:26:32.696529 amc-k8sdevsl01-worker-lx13 kernel: ____sys_sendmsg+0x1f3/0x260 Aug 26 10:26:32.697288 amc-k8sdevsl01-worker-lx13 kernel: ? copy_msghdr_from_user+0x5c/0x90 Aug 26 10:26:32.697309 amc-k8sdevsl01-worker-lx13 kernel: ? _cond_resched+0x15/0x30 Aug 26 10:26:32.697329 amc-k8sdevsl01-worker-lx13 kernel: ___sys_sendmsg+0x81/0xc0 Aug 26 10:26:32.697348 amc-k8sdevsl01-worker-lx13 kernel: ? do_lock_file_wait+0x6e/0xe0 Aug 26 10:26:32.697370 amc-k8sdevsl01-worker-lx13 kernel: ? _cond_resched+0x15/0x30 Aug 26 10:26:32.698946 amc-k8sdevsl01-worker-lx13 kernel: ? fcntl_setlk+0x1a5/0x2d0 Aug 26 10:26:32.698988 amc-k8sdevsl01-worker-lx13 kernel: __sys_sendmsg+0x59/0xa0 Aug 26 10:26:32.699005 amc-k8sdevsl01-worker-lx13 kernel: do_syscall_64+0x33/0x40 Aug 26 10:26:32.699020 amc-k8sdevsl01-worker-lx13 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Aug 26 10:26:32.699039 amc-k8sdevsl01-worker-lx13 kernel: RIP: 0033:0x7ff7ab1283ad Aug 26 10:26:32.699071 amc-k8sdevsl01-worker-lx13 kernel: Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2> Aug 26 10:26:32.699090 amc-k8sdevsl01-worker-lx13 kernel: RSP: 002b:00007ff7a232f9f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e Aug 26 10:26:32.699505 amc-k8sdevsl01-worker-lx13 kernel: RAX: ffffffffffffffda RBX: 00007ff7a2331b38 RCX: 00007ff7ab1283ad Aug 26 10:26:32.699534 amc-k8sdevsl01-worker-lx13 kernel: RDX: 0000000000000000 RSI: 00007ff7a232fa48 RDI: 0000000000000078 Aug 26 10:26:09.088408 amc-k8sdevsl01-worker-lx13 kernel: SELinux: Class xdp_socket not defined in policy.
Is there a fix available for that crash?
Thank you, Yuri
On Dec 3, 2020, at 12:00 PM, Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Seems that upgrade to Linux 5.7 solved the problem, we will run more tests. Thank you, Yuri
On Nov 30, 2020, at 2:58 PM, Florian Westphal fw@strlen.de wrote:
Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Linux system crashed
[ 0.000000] Linux version 5.4.0-54-generic (buildd@lcy01-amd64-008) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #60~18.04.1-Ubuntu SMP Fri Nov 6 17:25:16 UTC 2020 (Ubuntu 5.4.0-54.60~18.04.1-generic 5.4.65) [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.4.0-54-generic root=UUID=11885fd3-b840-4c9b-a500-532c73ac952a ro find_preseed=/preseed.cfg auto noprompt priority=critical locale=en_US quiet crashkernel=512M-:192M
… [ 156.321147] TCP: eth0: Driver has suspect GRO implementation, TCP performance may be compromised. [ 177.519159] general protection fault: 0000 [#1] SMP PTI [ 177.519737] CPU: 5 PID: 18484 Comm: worker-1 Kdump: loaded Not tainted 5.4.0-54-generic #60~18.04.1-Ubuntu [ 177.519742] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 [ 177.519814] RIP: 0010:dev_hard_start_xmit+0x38/0x200 [ 177.519827] Code: 55 41 54 53 48 83 ec 20 48 85 ff 48 89 55 c8 48 89 4d b8 0f 84 c1 01 00 00 48 8d 86 90 00 00 00 48 89 fb 49 89 f4 48 89 45 c0 <4c> 8b 2b 48 c7 c0 d0 f2 04 8f 48 c7 03 00 00 00 00 48 8b 00 4d 85 [ 177.519829] RSP: 0018:ffffbc6d0609b5e8 EFLAGS: 00010286 [ 177.519833] RAX: 0000000000000000 RBX: dead000000000100 RCX: ffff95cf4bcfe800 [ 177.519835] RDX: 0000000000000000 RSI: ffff95cf4bcfe800 RDI: 0000000000000286 [ 177.519837] RBP: ffffbc6d0609b630 R08: ffff95cf6a190ec8 R09: ffff95cf4a2f7438 [ 177.519839] R10: ffffbc6d0609b6d0 R11: ffff95cf49d4d180 R12: ffff95cf51a5f000 [ 177.519841] R13: dead000000000100 R14: 000000000000009c R15: ffff95d02996b400 [ 177.519844] FS: 00007ff394cdfb20(0000) GS:ffff95d035d40000(0000) knlGS:0000000000000000 [ 177.519846] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 177.519848] CR2: 00007fb4a9c2d000 CR3: 00000001049fa004 CR4: 00000000003606e0 [ 177.519908] Call Trace: [ 177.519917] __dev_queue_xmit+0x719/0x920 [ 177.519930] ? ctnetlink_conntrack_event+0x8c/0x5e0 [nf_conntrack_netlink]
Can you reproduce this on 5.7 or later, or with following patches backported to 5.4.y?
dd3cc111f2e3220ddc9c4ab17f13dc97759b5163 119e52e664c57d5f7c0174dc2b3a296b1e40591d af370ab36fcd19f04e3408c402608e7e56e6f188 28f715b9e6dd7cbf07c2aea913fea7c87a56a3b5
The series fixed nfqueue reference counting.