On Fri, 2024-10-04 at 09:39 +0200, Sebastian Andrzej Siewior wrote:
On 2024-09-27 15:01:00 [-0400], Joseph Salisbury wrote:
Is it needed in all stable release patch sets, including v5.15?
Yes. I would appreciate backporting it all the way where the code is available. The dependencies 1eacdd71b3436 ("netfilter: nft_counter: Disable BH in nft_counter_offload_stats().") a0b39e2dc7017 ("netfilter: nft_counter: Synchronize nft_counter_reset() against reader.")
were already routed via stable. The problem is that the seqcount has no lock associated so a reader could preempt a writer and then lockup spinning.
Hi,
this needs to be backported to all stable RT trees (just checked 4.19 and 6.1. 5.15 already has it). We observed the reader live-lock issue in "nft_counter_fetch" on 6.1.120-rt47 (leading to a system stall) and were also able to find it with lockdep (see stacktrace below).
I'm wondering if this patch could be applied to linux-stable, even if it is just a performance optimization on non-rt kernels (not a fix).
The patch "netfilter: nft_counter: Use u64_stats_t for statistic" cleanly applies on 6.1.y and 6.1.127-rt48.
Stacktrace from lockdep: [ 33.643632] ------------[ cut here ]------------ [ 33.643637] WARNING: CPU: 0 PID: 972 at include/linux/seqlock.h:269 nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643657] Modules linked in: br_netfilter bridge stp llc xt_comment xt_recent xt_hl ip6_tables ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat nf_tables libcrc32c nfnetlink rfkill intel_rapl_msr intel_rapl_common ccp binfmt_misc kvm irqbypass ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 ppdev snd_pcm snd_timer aesni_intel snd crypto_simd cryptd soundcore pcspkr parport_pc iTCO_wdt bochs parport drm_vram_helper intel_pmc_bxt drm_ttm_helper iTCO_vendor_support button ttm drm_kms_helper watchdog sg joydev evdev serio_raw drm fuse loop efi_pstore configfs qemu_fw_cfg ip_tables x_tables autofs4 overlay nls_ascii nls_cp437 vfat fat ext4 crc32c_generic crc16 mbcache jbd2 xts ecb squashfs dm_verity dm_bufio reed_solomon dm_mod sd_mod t10_pi crc64_rocksoft crc64 crc_t10dif crct10dif_generic virtio_net net_failover ahci failover libahci crct10dif_pclmul [ 33.643727] crct10dif_common libata virtio_pci i2c_i801 crc32_pclmul scsi_mod crc32c_intel virtio_pci_legacy_dev i2c_smbus psmouse virtio_pci_modern_dev virtio scsi_common virtio_ring lpc_ich [ 33.643739] CPU: 0 PID: 972 Comm: onboardservice Not tainted 6.1.120-rt47 #1 [ 33.643742] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 33.643744] RIP: 0010:nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643759] Code: 52 3f 85 d2 74 26 65 8b 05 ba bd 52 3f 85 c0 75 1b 65 8b 05 e7 b3 52 3f a9 ff ff ff 7f 75 0d 65 8b 05 dd ba 52 3f 85 c0 74 02 <0f> 0b ff 74 24 20 4c 8d 6d 08 45 31 c9 31 c9 41 b8 01 00 00 00 31 [ 33.643776] RSP: 0018:ffffa045007736a0 EFLAGS: 00010202 [ 33.643778] RAX: 0000000000000001 RBX: ffffc044ffc2ae80 RCX: 00000000000026af [ 33.643780] RDX: 0000000000000001 RSI: ffff8d29050db388 RDI: ffffffffc0af49a4 [ 33.643781] RBP: ffff8d293f638060 R08: 0000000000000000 R09: 0000000000000000 [ 33.643782] R10: 0000000000000001 R11: 000000009bb77572 R12: ffffa04500773920 [ 33.643783] R13: ffff8d29011db358 R14: ffff8d29011db208 R15: ffff8d29011db240 [ 33.643807] FS: 000000c000047c90(0000) GS:ffff8d293f600000(0000) knlGS:0000000000000000 [ 33.643811] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.643813] CR2: 000000c0005fe000 CR3: 000000003a212000 CR4: 00000000003506f0 [ 33.643836] Call Trace: [ 33.643840] <TASK> [ 33.643844] ? __warn+0x82/0xe0 [ 33.643852] ? nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643877] ? report_bug+0x10e/0x180 [ 33.643889] ? handle_bug+0x41/0x70 [ 33.643895] ? exc_invalid_op+0x13/0x60 [ 33.643899] ? asm_exc_invalid_op+0x16/0x20 [ 33.643912] ? nft_counter_eval+0x24/0xd0 [nf_tables] [ 33.643931] ? nft_counter_eval+0x6b/0xd0 [nf_tables] [ 33.643962] nft_do_chain+0x45b/0x690 [nf_tables] [ 33.644025] nft_do_chain_ipv4+0x78/0xa0 [nf_tables] [ 33.644046] nf_hook_slow+0x41/0xc0 [ 33.644054] __ip_local_out+0x14c/0x300 [ 33.644062] ? ip_output+0xb0/0xb0 [ 33.644074] __ip_queue_xmit+0x1c0/0x7f0 [ 33.644086] __tcp_transmit_skb+0xabe/0xcb0 [ 33.644107] tcp_write_xmit+0x521/0x14a0 [ 33.644117] __tcp_push_pending_frames+0x32/0xf0 [ 33.644120] tcp_sendmsg_locked+0x4cd/0xc20 [ 33.644133] tcp_sendmsg+0x27/0x40 [ 33.644137] __sock_sendmsg+0x58/0x70 [ 33.644142] sock_write_iter+0x9a/0x100 [ 33.644151] vfs_write+0x2c8/0x330 [ 33.644164] ksys_write+0xc3/0xf0 [ 33.644169] do_syscall_64+0x55/0xb0 [ 33.644173] ? lock_acquire+0xc4/0x2d0 [ 33.644178] ? find_held_lock+0x2b/0x80 [ 33.644182] ? finish_task_switch.isra.0+0xca/0x380 [ 33.644186] ? lock_release+0xd0/0x2d0 [ 33.644191] ? lockdep_hardirqs_on_prepare+0xdc/0x190 [ 33.644196] ? finish_task_switch.isra.0+0xcf/0x380 [ 33.644201] ? __schedule+0x3f8/0xd20 [ 33.644206] ? restore_fpregs_from_fpstate+0x38/0x90 [ 33.644211] ? trace_x86_fpu_regs_activated+0x1f/0xb0 [ 33.644213] ? switch_fpu_return+0x58/0x90 [ 33.644218] ? exit_to_user_mode_prepare+0x1af/0x250 [ 33.644223] entry_SYSCALL_64_after_hwframe+0x6e/0xd8 [ 33.644227] RIP: 0033:0x40720e [ 33.644230] Code: 48 83 ec 38 e8 13 00 00 00 48 83 c4 38 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 49 89 f2 48 89 fa 48 89 ce 48 89 df 0f 05 <48> 3d 01 f0 ff ff 76 15 48 f7 d8 48 89 c1 48 c7 c0 ff ff ff ff 48 [ 33.644232] RSP: 002b:000000c000069980 EFLAGS: 00000216 ORIG_RAX: 0000000000000001 [ 33.644234] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 000000000040720e [ 33.644236] RDX: 000000000000008c RSI: 000000c0001746c0 RDI: 0000000000000009 [ 33.644237] RBP: 000000c0000699c0 R08: 0000000000000000 R09: 0000000000000000 [ 33.644238] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c000069b00 [ 33.644239] R13: 000000000000000e R14: 000000c00016ed00 R15: 0000000000a88360 [ 33.644250] </TASK> [ 33.644250] irq event stamp: 10266 [ 33.644251] hardirqs last enabled at (10268): [<ffffffff96339836>] vprintk_store+0x326/0x550 [ 33.644256] hardirqs last disabled at (10269): [<ffffffff9633987c>] vprintk_store+0x36c/0x550 [ 33.644259] softirqs last enabled at (9900): [<ffffffff962af77e>] __local_bh_enable_ip+0xfe/0x140 [ 33.644264] softirqs last disabled at (9904): [<ffffffffc0af49a4>] nft_counter_eval+0x24/0xd0 [nf_tables] [ 33.644277] ---[ end trace 0000000000000000 ]---
Best regards, Felix
Sebastian
On 2025-02-19 09:24:37 [+0000], MOESSBAUER, Felix wrote:
On Fri, 2024-10-04 at 09:39 +0200, Sebastian Andrzej Siewior wrote:
On 2024-09-27 15:01:00 [-0400], Joseph Salisbury wrote:
Is it needed in all stable release patch sets, including v5.15?
Yes. I would appreciate backporting it all the way where the code is available. The dependencies 1eacdd71b3436 ("netfilter: nft_counter: Disable BH in nft_counter_offload_stats().") a0b39e2dc7017 ("netfilter: nft_counter: Synchronize nft_counter_reset() against reader.")
were already routed via stable. The problem is that the seqcount has no lock associated so a reader could preempt a writer and then lockup spinning.
Hi,
this needs to be backported to all stable RT trees (just checked 4.19 and 6.1. 5.15 already has it). We observed the reader live-lock issue in "nft_counter_fetch" on 6.1.120-rt47 (leading to a system stall) and were also able to find it with lockdep (see stacktrace below).
I'm wondering if this patch could be applied to linux-stable, even if it is just a performance optimization on non-rt kernels (not a fix).
The patch "netfilter: nft_counter: Use u64_stats_t for statistic" cleanly applies on 6.1.y and 6.1.127-rt48.
I assumed the backport did already happen. So at least 4.19 and 6.1 is missing you say. 4.19 will remain at missing it because it is EOL. 6.1 would be Clark's department. Could everyone please report what the status on backporting is?
If you want to pull the performance card and route it via the stable tree, I suggest to ask the netfilter people if they object. And then it might work.
Sebastian
linux-stable-mirror@lists.linaro.org