Yuri Lipnesh yuri.lipnesh@gmail.com wrote:
Hello Florian,
I need assistance on this one. Our customer system 5.10.25-flatcar crashed with following trace
Aug 26 10:26:32.686733 amc-k8sdevsl01-worker-lx13 kernel: ------------[ cut here ]------------ Aug 26 10:26:32.686855 amc-k8sdevsl01-worker-lx13 kernel: refcount_t: underflow; use-after-free. Aug 26 10:26:32.686877 amc-k8sdevsl01-worker-lx13 kernel: WARNING: CPU: 4 PID: 2422635 at lib/refcount.c:28 refcount_warn_saturat> Aug 26 10:26:32.686930 amc-k8sdevsl01-worker-lx13 kernel: Modules linked in: binfmt_misc nfnetlink_queue xt_NFQUEUE xt_multiport > Aug 26 10:26:32.689906 amc-k8sdevsl01-worker-lx13 kernel: dm_region_hash dm_log dm_mod Aug 26 10:26:32.690398 amc-k8sdevsl01-worker-lx13 kernel: CPU: 4 PID: 2422635 Comm: worker-1 Not tainted 5.10.25-flatcar #1 Aug 26 10:26:32.690526 amc-k8sdevsl01-worker-lx13 kernel: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Refer> Aug 26 10:26:32.691653 amc-k8sdevsl01-worker-lx13 kernel: RIP: 0010:refcount_warn_saturate+0xa6/0xf0 Aug 26 10:26:32.691720 amc-k8sdevsl01-worker-lx13 kernel: Code: 05 3c 1d 40 01 01 e8 81 46 38 00 0f 0b c3 80 3d 2a 1d 40 01 00 75> Aug 26 10:26:32.691747 amc-k8sdevsl01-worker-lx13 kernel: RSP: 0018:ffffa3a0c3627938 EFLAGS: 00010282 Aug 26 10:26:32.692385 amc-k8sdevsl01-worker-lx13 kernel: RAX: 0000000000000000 RBX: ffff8c011b14fa00 RCX: 0000000000000027 Aug 26 10:26:32.692422 amc-k8sdevsl01-worker-lx13 kernel: RDX: 0000000000000027 RSI: 00000000ffffdfff RDI: ffff8c045d918b08 Aug 26 10:26:32.692446 amc-k8sdevsl01-worker-lx13 kernel: RBP: ffff8c011b14fa00 R08: ffff8c045d918b00 R09: ffffa3a0c3627750 Aug 26 10:26:32.693526 amc-k8sdevsl01-worker-lx13 kernel: R10: 0000000000000001 R11: 0000000000000001 R12: ffff8c011b14fa30 Aug 26 10:26:32.693584 amc-k8sdevsl01-worker-lx13 kernel: R13: 0000000000000002 R14: ffff8bfda3b43180 R15: ffff8c00cddb3a00 Aug 26 10:26:32.693615 amc-k8sdevsl01-worker-lx13 kernel: FS: 00007ff7a2331b38(0000) GS:ffff8c045d900000(0000) knlGS:00000000000> Aug 26 10:26:32.693649 amc-k8sdevsl01-worker-lx13 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Aug 26 10:26:32.694304 amc-k8sdevsl01-worker-lx13 kernel: CR2: 00007ff79ac17a28 CR3: 00000001ee34e003 CR4: 00000000007706e0 Aug 26 10:26:32.694334 amc-k8sdevsl01-worker-lx13 kernel: PKRU: 55555554 Aug 26 10:26:32.694351 amc-k8sdevsl01-worker-lx13 kernel: Call Trace: Aug 26 10:26:32.694370 amc-k8sdevsl01-worker-lx13 kernel: nf_queue_entry_release_refs+0x82/0xa0
Is that sock_put()?
If so, I don't understand this backtrace. When refcount_t debugging is on, sock_hold() would also generate a backtrace in case we try to incrase refcount on a socket that already has a zero refcount.
So, looks like something else decremented sk refcount while packet was queued. No idea how that could happen.