 
            On Fri, 2024-10-04 at 09:39 +0200, Sebastian Andrzej Siewior wrote:
On 2024-09-27 15:01:00 [-0400], Joseph Salisbury wrote:
Is it needed in all stable release patch sets, including v5.15?
Yes. I would appreciate backporting it all the way where the code is available. The dependencies 1eacdd71b3436 ("netfilter: nft_counter: Disable BH in nft_counter_offload_stats().") a0b39e2dc7017 ("netfilter: nft_counter: Synchronize nft_counter_reset() against reader.")
were already routed via stable. The problem is that the seqcount has no lock associated so a reader could preempt a writer and then lockup spinning.
Hi,
this needs to be backported to all stable RT trees (just checked 4.19 and 6.1. 5.15 already has it). We observed the reader live-lock issue in "nft_counter_fetch" on 6.1.120-rt47 (leading to a system stall) and were also able to find it with lockdep (see stacktrace below).
I'm wondering if this patch could be applied to linux-stable, even if it is just a performance optimization on non-rt kernels (not a fix).
The patch "netfilter: nft_counter: Use u64_stats_t for statistic" cleanly applies on 6.1.y and 6.1.127-rt48.
Stacktrace from lockdep: [ 33.643632] ------------[ cut here ]------------ [ 33.643637] WARNING: CPU: 0 PID: 972 at include/linux/seqlock.h:269 nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643657] Modules linked in: br_netfilter bridge stp llc xt_comment xt_recent xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink rfkill intel_rapl_msr intel_rapl_common ccp binfmt_misc kvm irqbypass ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 ppdev snd_pcm snd_timer aesni_intel snd crypto_simd cryptd soundcore pcspkr parport_pc iTCO_wdt bochs parport drm_vram_helper intel_pmc_bxt drm_ttm_helper iTCO_vendor_support button ttm drm_kms_helper watchdog sg joydev evdev serio_raw drm fuse loop efi_pstore configfs qemu_fw_cfg ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc32c_generic crc16 mbcache jbd2 xts ecb squashfs dm_verity dm_bufio reed_solomon dm_mod sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic virtio_net net_failover ahci failover libahci crct10dif_pclmul [ 33.643727] crct10dif_common libata virtio_pci i2c_i801 crc32_pclmul scsi_mod crc32c_intel virtio_pci_legacy_dev i2c_smbus psmouse virtio_pci_modern_dev virtio scsi_common virtio_ring lpc_ich [ 33.643739] CPU: 0 PID: 972 Comm: onboardservice Not tainted 6.1.120-rt47 #1 [ 33.643742] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 33.643744] RIP: 0010:nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643759] Code: 52 3f 85 d2 74 26 65 8b 05 ba bd 52 3f 85 c0 75 1b 65 8b 05 e7 b3 52 3f a9 ff ff ff 7f 75 0d 65 8b 05 dd ba 52 3f 85 c0 74 02 <0f> 0b ff 74 24 20 4c 8d 6d 08 45 31 c9 31 c9 41 b8 01 00 00 00 31 [ 33.643776] RSP: 0018:ffffa045007736a0 EFLAGS: 00010202 [ 33.643778] RAX: 0000000000000001 RBX: ffffc044ffc2ae80 RCX: 00000000000026af [ 33.643780] RDX: 0000000000000001 RSI: ffff8d29050db388 RDI: ffffffffc0af49a4 [ 33.643781] RBP: ffff8d293f638060 R08: 0000000000000000 R09: 0000000000000000 [ 33.643782] R10: 0000000000000001 R11: 000000009bb77572 R12: ffffa04500773920 [ 33.643783] R13: ffff8d29011db358 R14: ffff8d29011db208 R15: ffff8d29011db240 [ 33.643807] FS: 000000c000047c90(0000) GS:ffff8d293f600000(0000) knlGS:0000000000000000 [ 33.643811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.643813] CR2: 000000c0005fe000 CR3: 000000003a212000 CR4: 00000000003506f0 [ 33.643836] Call Trace: [ 33.643840] <TASK> [ 33.643844] ? __warn+0x82/0xe0 [ 33.643852] ? nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643877] ? report_bug+0x10e/0x180 [ 33.643889] ? handle_bug+0x41/0x70 [ 33.643895] ? exc_invalid_op+0x13/0x60 [ 33.643899] ? asm_exc_invalid_op+0x16/0x20 [ 33.643912] ? nft_counter_eval+0x24/0xd0 [nf_tables] [ 33.643931] ? nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643962] nft_do_chain+0x45b/0x690 [nf_tables] [ 33.644025] nft_do_chain_ipv4+0x78/0xa0 [nf_tables] [ 33.644046] nf_hook_slow+0x41/0xc0 [ 33.644054] __ip_local_out+0x14c/0x300 [ 33.644062] ? ip_output+0xb0/0xb0 [ 33.644074] __ip_queue_xmit+0x1c0/0x7f0 [ 33.644086] __tcp_transmit_skb+0xabe/0xcb0 [ 33.644107] tcp_write_xmit+0x521/0x14a0 [ 33.644117] __tcp_push_pending_frames+0x32/0xf0 [ 33.644120] tcp_sendmsg_locked+0x4cd/0xc20 [ 33.644133] tcp_sendmsg+0x27/0x40 [ 33.644137] __sock_sendmsg+0x58/0x70 [ 33.644142] sock_write_iter+0x9a/0x100 [ 33.644151] vfs_write+0x2c8/0x330 [ 33.644164] ksys_write+0xc3/0xf0 [ 33.644169] do_syscall_64+0x55/0xb0 [ 33.644173] ? lock_acquire+0xc4/0x2d0 [ 33.644178] ? find_held_lock+0x2b/0x80 [ 33.644182] ? finish_task_switch.isra.0+0xca/0x380 [ 33.644186] ? lock_release+0xd0/0x2d0 [ 33.644191] ? lockdep_hardirqs_on_prepare+0xdc/0x190 [ 33.644196] ? finish_task_switch.isra.0+0xcf/0x380 [ 33.644201] ? __schedule+0x3f8/0xd20 [ 33.644206] ? restore_fpregs_from_fpstate+0x38/0x90 [ 33.644211] ? trace_x86_fpu_regs_activated+0x1f/0xb0 [ 33.644213] ? switch_fpu_return+0x58/0x90 [ 33.644218] ? exit_to_user_mode_prepare+0x1af/0x250 [ 33.644223] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 33.644227] RIP: 0033:0x40720e [ 33.644230] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [ 33.644232] RSP: 002b:000000c000069980 EFLAGS: 00000216 ORIG_RAX: 0000000000000001 [ 33.644234] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 000000000040720e [ 33.644236] RDX: 000000000000008c RSI: 000000c0001746c0 RDI: 0000000000000009 [ 33.644237] RBP: 000000c0000699c0 R08: 0000000000000000 R09: 0000000000000000 [ 33.644238] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c000069b00 [ 33.644239] R13: 000000000000000e R14: 000000c00016ed00 R15: 0000000000a88360 [ 33.644250] </TASK> [ 33.644250] irq event stamp: 10266 [ 33.644251] hardirqs last enabled at (10268): [<ffffffff96339836>] vprintk_store+0x326/0x550 [ 33.644256] hardirqs last disabled at (10269): [<ffffffff9633987c>] vprintk_store+0x36c/0x550 [ 33.644259] softirqs last enabled at (9900): [<ffffffff962af77e>] __local_bh_enable_ip+0xfe/0x140 [ 33.644264] softirqs last disabled at (9904): [<ffffffffc0af49a4>] nft_counter_eval+0x24/0xd0 [nf_tables] [ 33.644277] ---[ end trace 0000000000000000 ]---
Best regards, Felix
Sebastian