April 2022 - Linux-stable-mirror

[PATCH v2] arm64: paravirt: Use RCU read locks to guard stolen_time

by Elliot Berman

From: Prakruthi Deepak Heragu <quic_pheragu(a)quicinc.com> During hotplug, the stolen time data structure is unmapped and memset. There is a possibility of the timer IRQ being triggered before memset and stolen time is getting updated as part of this timer IRQ handler. This causes the below crash in timer handler - [ 3457.473139][ C5] Unable to handle kernel paging request at virtual address ffffffc03df05148 ... [ 3458.154398][ C5] Call trace: [ 3458.157648][ C5] para_steal_clock+0x30/0x50 [ 3458.162319][ C5] irqtime_account_process_tick+0x30/0x194 [ 3458.168148][ C5] account_process_tick+0x3c/0x280 [ 3458.173274][ C5] update_process_times+0x5c/0xf4 [ 3458.178311][ C5] tick_sched_timer+0x180/0x384 [ 3458.183164][ C5] __run_hrtimer+0x160/0x57c [ 3458.187744][ C5] hrtimer_interrupt+0x258/0x684 [ 3458.192698][ C5] arch_timer_handler_virt+0x5c/0xa0 [ 3458.198002][ C5] handle_percpu_devid_irq+0xdc/0x414 [ 3458.203385][ C5] handle_domain_irq+0xa8/0x168 [ 3458.208241][ C5] gic_handle_irq.34493+0x54/0x244 [ 3458.213359][ C5] call_on_irq_stack+0x40/0x70 [ 3458.218125][ C5] do_interrupt_handler+0x60/0x9c [ 3458.223156][ C5] el1_interrupt+0x34/0x64 [ 3458.227560][ C5] el1h_64_irq_handler+0x1c/0x2c [ 3458.232503][ C5] el1h_64_irq+0x7c/0x80 [ 3458.236736][ C5] free_vmap_area_noflush+0x108/0x39c [ 3458.242126][ C5] remove_vm_area+0xbc/0x118 [ 3458.246714][ C5] vm_remove_mappings+0x48/0x2a4 [ 3458.251656][ C5] __vunmap+0x154/0x278 [ 3458.255796][ C5] stolen_time_cpu_down_prepare+0xc0/0xd8 [ 3458.261542][ C5] cpuhp_invoke_callback+0x248/0xc34 [ 3458.266842][ C5] cpuhp_thread_fun+0x1c4/0x248 [ 3458.271696][ C5] smpboot_thread_fn+0x1b0/0x400 [ 3458.276638][ C5] kthread+0x17c/0x1e0 [ 3458.280691][ C5] ret_from_fork+0x10/0x20 As a fix, introduce rcu lock to update stolen time structure. Fixes: 75df529bec91 ("arm64: paravirt: Initialize steal time when cpu is online") Cc: stable(a)vger.kernel.org Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu(a)quicinc.com> Signed-off-by: Elliot Berman <quic_eberman(a)quicinc.com> --- Changes since v1: https://lore.kernel.org/all/20220420204417.155194-1-quic_eberman@quicinc.co… - Use RCU instead of disabling interrupts arch/arm64/kernel/paravirt.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/arch/arm64/kernel/paravirt.c b/arch/arm64/kernel/paravirt.c index 75fed4460407..e724ea3d86f0 100644 --- a/arch/arm64/kernel/paravirt.c +++ b/arch/arm64/kernel/paravirt.c @@ -52,7 +52,9 @@ early_param("no-steal-acc", parse_no_stealacc); /* return stolen time in ns by asking the hypervisor */ static u64 para_steal_clock(int cpu) { + struct pvclock_vcpu_stolen_time *kaddr = NULL; struct pv_time_stolen_time_region *reg; + u64 ret = 0; reg = per_cpu_ptr(&stolen_time_region, cpu); @@ -61,28 +63,38 @@ static u64 para_steal_clock(int cpu) * online notification callback runs. Until the callback * has run we just return zero. */ - if (!reg->kaddr) + rcu_read_lock(); + kaddr = rcu_dereference(reg->kaddr); + if (!kaddr) { + rcu_read_unlock(); return 0; + } - return le64_to_cpu(READ_ONCE(reg->kaddr->stolen_time)); + ret = le64_to_cpu(READ_ONCE(kaddr->stolen_time)); + rcu_read_unlock(); + return ret; } static int stolen_time_cpu_down_prepare(unsigned int cpu) { + struct pvclock_vcpu_stolen_time *kaddr = NULL; struct pv_time_stolen_time_region *reg; reg = this_cpu_ptr(&stolen_time_region); if (!reg->kaddr) return 0; - memunmap(reg->kaddr); - memset(reg, 0, sizeof(*reg)); + kaddr = reg->kaddr; + rcu_assign_pointer(reg->kaddr, NULL); + synchronize_rcu(); + memunmap(kaddr); return 0; } static int stolen_time_cpu_online(unsigned int cpu) { + struct pvclock_vcpu_stolen_time *kaddr = NULL; struct pv_time_stolen_time_region *reg; struct arm_smccc_res res; @@ -93,10 +105,12 @@ static int stolen_time_cpu_online(unsigned int cpu) if (res.a0 == SMCCC_RET_NOT_SUPPORTED) return -EINVAL; - reg->kaddr = memremap(res.a0, + kaddr = memremap(res.a0, sizeof(struct pvclock_vcpu_stolen_time), MEMREMAP_WB); + rcu_assign_pointer(reg->kaddr, kaddr); + if (!reg->kaddr) { pr_warn("Failed to map stolen time data structure\n"); return -ENOMEM; -- 2.25.1

3 years, 7 months

3
5
0 0

[PATCH v3 0/2] mm: fix cma allocation fail sometimes

by Dong Aisheng

We observed an issue with NXP 5.15 LTS kernel that dma_alloc_coherent() may fail sometimes when there're multiple processes trying to allocate CMA memory. This issue can be very easily reproduced on MX6Q SDB board with latest linux-next kernel by writing a test module creating 16 or 32 threads allocating random size of CMA memory in parallel at the background. Or simply enabling CONFIG_CMA_DEBUG, you can see endless of CMA alloc retries during booting: [ 1.452124] cma: cma_alloc(): memory range at (ptrval) is busy,retrying .... (thousands of reties) The root cause of this issue is that since commit a4efc174b382 ("mm/cma.c: remove redundant cma_mutex lock"), CMA supports concurrent memory allocation. It's possible that the memory range process A try to alloc has already been isolated by the allocation of process B during memory migration. The problem here is that the memory range isolated during one allocation by start_isolate_page_range() could be much bigger than the real size we want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES. Taking an ARMv7 platform with 1G memory as an example, when MAX_ORDER_NR_PAGES is big (e.g. 32M with max_order 14) and CMA memory is relatively small (e.g. 128M), there're only 4 MAX_ORDER slot, then it's very easy that all CMA memory may have already been isolated by other processes when one trying to allocate memory using dma_alloc_coherent(). Since current CMA code will only scan one time of whole available CMA memory, then dma_alloc_coherent() may easy fail due to contention with other processes. This patchset introduces a retry mechanism to rescan CMA bitmap for -EBUSY error in case the target pageblock may has been temporarily isolated by others and released later. It also improves the CMA allocation performance by trying the next MAX_ORDER_NR_PAGES range during reties rather than looping within the same isolated range in small steps which wasting CPU mips. The following test is based on linux-next: next-20211213. Without the fix, it's easily fail. # insmod cma_alloc.ko pnum=16 [ 274.322369] CMA alloc test enter: thread number: 16 [ 274.329948] cpu: 0, pid: 692, index 4 pages 144 [ 274.330143] cpu: 1, pid: 694, index 2 pages 44 [ 274.330359] cpu: 2, pid: 695, index 7 pages 757 [ 274.330760] cpu: 2, pid: 696, index 4 pages 144 [ 274.330974] cpu: 2, pid: 697, index 6 pages 512 [ 274.331223] cpu: 2, pid: 698, index 6 pages 512 [ 274.331499] cpu: 2, pid: 699, index 2 pages 44 [ 274.332228] cpu: 2, pid: 700, index 0 pages 7 [ 274.337421] cpu: 0, pid: 701, index 1 pages 38 [ 274.337618] cpu: 2, pid: 702, index 0 pages 7 [ 274.344669] cpu: 1, pid: 703, index 0 pages 7 [ 274.344807] cpu: 3, pid: 704, index 6 pages 512 [ 274.348269] cpu: 2, pid: 705, index 5 pages 148 [ 274.349490] cma: cma_alloc: reserved: alloc failed, req-size: 38 pages, ret: -16 [ 274.366292] cpu: 1, pid: 706, index 4 pages 144 [ 274.366562] cpu: 0, pid: 707, index 3 pages 128 [ 274.367356] cma: cma_alloc: reserved: alloc failed, req-size: 128 pages, ret: -16 [ 274.367370] cpu: 0, pid: 707, index 3 pages 128 failed [ 274.371148] cma: cma_alloc: reserved: alloc failed, req-size: 148 pages, ret: -16 [ 274.375348] cma: cma_alloc: reserved: alloc failed, req-size: 144 pages, ret: -16 [ 274.384256] cpu: 2, pid: 708, index 0 pages 7 .... With the fix, 32 threads allocating in parallel can pass overnight stress test. root@imx6qpdlsolox:~# insmod cma_alloc.ko pnum=32 [ 112.976809] cma_alloc: loading out-of-tree module taints kernel. [ 112.984128] CMA alloc test enter: thread number: 32 [ 112.989748] cpu: 2, pid: 707, index 6 pages 512 [ 112.994342] cpu: 1, pid: 708, index 6 pages 512 [ 112.995162] cpu: 0, pid: 709, index 3 pages 128 [ 112.995867] cpu: 2, pid: 710, index 0 pages 7 [ 112.995910] cpu: 3, pid: 711, index 2 pages 44 [ 112.996005] cpu: 3, pid: 712, index 7 pages 757 [ 112.996098] cpu: 3, pid: 713, index 7 pages 757 ... [41877.368163] cpu: 1, pid: 737, index 2 pages 44 [41877.369388] cpu: 1, pid: 736, index 3 pages 128 [41878.486516] cpu: 0, pid: 737, index 2 pages 44 [41878.486515] cpu: 2, pid: 739, index 4 pages 144 [41878.486622] cpu: 1, pid: 736, index 3 pages 128 [41878.486948] cpu: 2, pid: 735, index 7 pages 757 [41878.487279] cpu: 2, pid: 738, index 4 pages 144 [41879.526603] cpu: 1, pid: 739, index 3 pages 128 [41879.606491] cpu: 2, pid: 737, index 3 pages 128 [41879.606550] cpu: 0, pid: 736, index 0 pages 7 [41879.612271] cpu: 2, pid: 738, index 4 pages 144 ... v1: https://patchwork.kernel.org/project/linux-mm/cover/20211215080242.3034856-… v2: https://patchwork.kernel.org/project/linux-mm/cover/20220112131552.3329380-… Dong Aisheng (2): mm: cma: fix allocation may fail sometimes mm: cma: try next MAX_ORDER_NR_PAGES during retry mm/cma.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) -- 2.25.1

3 years, 7 months

5
13
0 0

[PATCH] irqchip: irq-xtensa-mx: fix initial IRQ affinity

by Max Filippov

When irq-xtensa-mx chip is used in non-SMP configuration its irq_set_affinity callback is not called leaving IRQ affinity set empty. As a result IRQ delivery does not work in that configuration. Initialize IRQ affinity of the xtensa MX interrupt distributor to CPU 0 for all external IRQ lines. Cc: stable(a)vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc(a)gmail.com> --- drivers/irqchip/irq-xtensa-mx.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/drivers/irqchip/irq-xtensa-mx.c b/drivers/irqchip/irq-xtensa-mx.c index 27933338f7b3..8c581c985aa7 100644 --- a/drivers/irqchip/irq-xtensa-mx.c +++ b/drivers/irqchip/irq-xtensa-mx.c @@ -151,14 +151,25 @@ static struct irq_chip xtensa_mx_irq_chip = { .irq_set_affinity = xtensa_mx_irq_set_affinity, }; +static void __init xtensa_mx_init_common(struct irq_domain *root_domain) +{ + unsigned int i; + + irq_set_default_host(root_domain); + secondary_init_irq(); + + /* Initialize default IRQ routing to CPU 0 */ + for (i = 0; i < XCHAL_NUM_EXTINTERRUPTS; ++i) + set_er(1, MIROUT(i)); +} + int __init xtensa_mx_init_legacy(struct device_node *interrupt_parent) { struct irq_domain *root_domain = irq_domain_add_legacy(NULL, NR_IRQS - 1, 1, 0, &xtensa_mx_irq_domain_ops, &xtensa_mx_irq_chip); - irq_set_default_host(root_domain); - secondary_init_irq(); + xtensa_mx_init_common(root_domain); return 0; } @@ -168,8 +179,7 @@ static int __init xtensa_mx_init(struct device_node *np, struct irq_domain *root_domain = irq_domain_add_linear(np, NR_IRQS, &xtensa_mx_irq_domain_ops, &xtensa_mx_irq_chip); - irq_set_default_host(root_domain); - secondary_init_irq(); + xtensa_mx_init_common(root_domain); return 0; } IRQCHIP_DECLARE(xtensa_mx_irq_chip, "cdns,xtensa-mx", xtensa_mx_init); -- 2.30.2

3 years, 7 months

2
1
0 0

Apply d799769188529abc6cbf035a10087a51f7832b6b to 5.17 and 5.15?

by Nathan Chancellor

Hi Greg, Sasha, and Michael, Commit d79976918852 ("powerpc/64: Add UADDR64 relocation support") fixes a boot failure with CONFIG_RELOCATABLE=y kernels linked with recent versions of ld.lld [1]. Additionally, it resolves a separate boot failure that Paul Menzel reported [2] with ld.lld 13.0.0. Is this a reasonable backport for 5.17 and 5.15? It applies cleanly, resolves both problems, and does not appear to cause any other issues in my testing for both trees but I was curious what Michael's opinion was, as I am far from a PowerPC expert. This change does apply cleanly to 5.10 (I did not try earlier branches) but there are other changes needed for ld.lld to link CONFIG_RELOCATABLE kernels in that branch so to avoid any regressions, I think it is safe to just focus on 5.15 and 5.17. Paul, it would not hurt to confirm the results of my testing with your setup, just to make sure I did not miss anything :) [1]: https://github.com/ClangBuiltLinux/linux/issues/1581 [2]: https://lore.kernel.org/Yg2h2Q2vXFkkLGTh@dev-arch.archlinux-ax161/ Cheers, Nathan

3 years, 7 months

3
6
0 0

[PATCH v2] powerpc/rtas: Keep MSR[RI] set when calling RTAS

by Laurent Dufour

RTAS runs in real mode (MSR[DR] and MSR[IR] unset) and in 32bits mode (MSR[SF] unset). The change in MSR is done in enter_rtas() in a relatively complex way, since the MSR value could be hardcoded. Furthermore, a panic has been reported when hitting the watchdog interrupt while running in RTAS, this leads to the following stack trace: [69244.027433][ C24] watchdog: CPU 24 Hard LOCKUP [69244.027442][ C24] watchdog: CPU 24 TB:997512652051031, last heartbeat TB:997504470175378 (15980ms ago) [69244.027451][ C24] Modules linked in: chacha_generic(E) libchacha(E) xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) dm_service_time(E) sd_mod(E) t10_pi(E) [69244.027555][ C24] ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) xor(E) raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) [69244.027587][ C24] Supported: No, Unreleased kernel [69244.027600][ C24] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c [69244.027609][ C24] NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 [69244.027612][ C24] REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) [69244.027615][ C24] MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 [69244.027625][ C24] CFAR: 000000000000011c IRQMASK: 1 [69244.027625][ C24] GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc [69244.027625][ C24] GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 [69244.027625][ C24] GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 [69244.027625][ C24] GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 [69244.027625][ C24] GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 [69244.027625][ C24] GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f [69244.027625][ C24] GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 [69244.027625][ C24] GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 [69244.027663][ C24] NIP [000000001fb41050] 0x1fb41050 [69244.027696][ C24] LR [000000001fb4104c] 0x1fb4104c [69244.027699][ C24] Call Trace: [69244.027701][ C24] Instruction dump: [69244.027723][ C24] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [69244.027728][ C24] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [69244.027762][T87504] Oops: Unrecoverable System Reset, sig: 6 [#1] [69244.028044][T87504] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries [69244.028089][T87504] Modules linked in: chacha_generic(E) libchacha(E) xxhash_generic(E) wp512(E) sha3_generic(E) rmd160(E) poly1305_generic(E) libpoly1305(E) michael_mic(E) md4(E) crc32_generic(E) cmac(E) ccm(E) algif_rng(E) twofish_generic(E) twofish_common(E) serpent_generic(E) fcrypt(E) des_generic(E) libdes(E) cast6_generic(E) cast5_generic(E) cast_common(E) camellia_generic(E) blowfish_generic(E) blowfish_common(E) algif_skcipher(E) algif_hash(E) gcm(E) algif_aead(E) af_alg(E) tun(E) rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) rpadlpar_io(EX) rpaphp(EX) xsk_diag(E) tcp_diag(E) udp_diag(E) raw_diag(E) inet_diag(E) unix_diag(E) af_packet_diag(E) netlink_diag(E) nfsv3(E) nfs_acl(E) nfs(E) lockd(E) grace(E) sunrpc(E) fscache(E) netfs(E) af_packet(E) rfkill(E) bonding(E) tls(E) ibmveth(EX) crct10dif_vpmsum(E) rtc_generic(E) drm(E) drm_panel_orientation_quirks(E) fuse(E) configfs(E) backlight(E) ip_tables(E) x_tables(E) dm_service_time(E) sd_mod(E) t10_pi(E) [69244.028171][T87504] ibmvfc(EX) scsi_transport_fc(E) vmx_crypto(E) gf128mul(E) btrfs(E) blake2b_generic(E) libcrc32c(E) crc32c_vpmsum(E) xor(E) raid6_pq(E) dm_mirror(E) dm_region_hash(E) dm_log(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) [69244.028307][T87504] Supported: No, Unreleased kernel [69244.028385][T87504] CPU: 24 PID: 87504 Comm: drmgr Kdump: loaded Tainted: G E X 5.14.21-150400.71.1.bz196362_2-default #1 SLE15-SP4 (unreleased) 0d821077ef4faa8dfaf370efb5fdca1fa35f4e2c [69244.028408][T87504] NIP: 000000001fb41050 LR: 000000001fb4104c CTR: 0000000000000000 [69244.028418][T87504] REGS: c00000000fc33d60 TRAP: 0100 Tainted: G E X (5.14.21-150400.71.1.bz196362_2-default) [69244.028429][T87504] MSR: 8000000002981000 <SF,VEC,VSX,ME> CR: 48800002 XER: 20040020 [69244.028444][T87504] CFAR: 000000000000011c IRQMASK: 1 [69244.028444][T87504] GPR00: 0000000000000003 ffffffffffffffff 0000000000000001 00000000000050dc [69244.028444][T87504] GPR04: 000000001ffb6100 0000000000000020 0000000000000001 000000001fb09010 [69244.028444][T87504] GPR08: 0000000020000000 0000000000000000 0000000000000000 0000000000000000 [69244.028444][T87504] GPR12: 80040000072a40a8 c00000000ff8b680 0000000000000007 0000000000000034 [69244.028444][T87504] GPR16: 000000001fbf6e94 000000001fbf6d84 000000001fbd1db0 000000001fb3f008 [69244.028444][T87504] GPR20: 000000001fb41018 ffffffffffffffff 000000000000017f fffffffffffff68f [69244.028444][T87504] GPR24: 000000001fb18fe8 000000001fb3e000 000000001fb1adc0 000000001fb1cf40 [69244.028444][T87504] GPR28: 000000001fb26000 000000001fb460f0 000000001fb17f18 000000001fb17000 [69244.028534][T87504] NIP [000000001fb41050] 0x1fb41050 [69244.028543][T87504] LR [000000001fb4104c] 0x1fb4104c [69244.028549][T87504] Call Trace: [69244.028554][T87504] Instruction dump: [69244.028561][T87504] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [69244.028575][T87504] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [69244.028607][T87504] ---[ end trace 3ddec07f638c34a2 ]--- This happens because MSR[RI] is unset when entering RTAS but there is no valid reason to not set it here. RTAS is expected to be called with MSR[RI] as specified in PAPR+ section "7.2.1 Machine State": R1–7.2.1–9. If called with MSR[RI] equal to 1, then RTAS must protect its own critical regions from recursion by setting the MSRRI bit to 0 when in the critical regions. Fixing this by reviewing the way MSR is compute before calling RTAS. Now a hardcoded value meaning real mode, 32 bits and Recoverable Interrupt is loaded. In addition a check is added in do_enter_rtas() to detect calls made with MSR[RI] unset, as we are forcing it on later. This patch has been tested on the following machines: Power KVM Guest P8 S822L (host Ubuntu kernel 5.11.0-49-generic) PowerVM LPAR P8 9119-MME (FW860.A1) p9 9008-22L (FW950.00) P10 9080-HEX (FW1010.00) Changes in V2: - Change comment in code to indicate NMI (Nick's comment) - Add reference to PAPR+ in the change log (Michael's comment) Cc: stable(a)vger.kernel.org Suggested-by: Nicholas Piggin <npiggin(a)gmail.com> Signed-off-by: Laurent Dufour <ldufour(a)linux.ibm.com> --- arch/powerpc/kernel/entry_64.S | 20 ++++++++------------ arch/powerpc/kernel/rtas.c | 5 +++++ 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 9581906b5ee9..65cb14b56f8d 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -330,22 +330,18 @@ _GLOBAL(enter_rtas) clrldi r4,r4,2 /* convert to realmode address */ mtlr r4 - li r0,0 - ori r0,r0,MSR_EE|MSR_SE|MSR_BE|MSR_RI - andc r0,r6,r0 - - li r9,1 - rldicr r9,r9,MSR_SF_LG,(63-MSR_SF_LG) - ori r9,r9,MSR_IR|MSR_DR|MSR_FE0|MSR_FE1|MSR_FP|MSR_RI|MSR_LE - andc r6,r0,r9 - __enter_rtas: - sync /* disable interrupts so SRR0/1 */ - mtmsrd r0 /* don't get trashed */ - LOAD_REG_ADDR(r4, rtas) ld r5,RTASENTRY(r4) /* get the rtas->entry value */ ld r4,RTASBASE(r4) /* get the rtas->base value */ + + /* RTAS runs in 32bits real mode but let MSR[]RI on as we may hit + * NMI (SRESET or MCE). RTAS should disable RI in its critical + * regions (as specified in PAPR+ section 7.2.1). */ + LOAD_REG_IMMEDIATE(r6, MSR_ME|MSR_RI) + + li r0,0 + mtmsrd r0,1 /* disable RI before using SRR0/1 */ mtspr SPRN_SRR0,r5 mtspr SPRN_SRR1,r6 diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 1f42aabbbab3..d7775b8c8853 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -49,6 +49,11 @@ void enter_rtas(unsigned long); static inline void do_enter_rtas(unsigned long args) { + unsigned long msr; + + msr = mfmsr(); + BUG_ON(!(msr & MSR_RI)); + enter_rtas(args); srr_regs_clobbered(); /* rtas uses SRRs, invalidate */ -- 2.35.1

3 years, 7 months

4
10
0 0

[PATCH] f2fs: fix to do sanity check on total_data_blocks

by Chao Yu

As Yanming reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215916 The kernel message is shown below: kernel BUG at fs/f2fs/segment.c:2560! Call Trace: allocate_segment_by_default+0x228/0x440 f2fs_allocate_data_block+0x13d1/0x31f0 do_write_page+0x18d/0x710 f2fs_outplace_write_data+0x151/0x250 f2fs_do_write_data_page+0xef9/0x1980 move_data_page+0x6af/0xbc0 do_garbage_collect+0x312f/0x46f0 f2fs_gc+0x6b0/0x3bc0 f2fs_balance_fs+0x921/0x2260 f2fs_write_single_data_page+0x16be/0x2370 f2fs_write_cache_pages+0x428/0xd00 f2fs_write_data_pages+0x96e/0xd50 do_writepages+0x168/0x550 __writeback_single_inode+0x9f/0x870 writeback_sb_inodes+0x47d/0xb20 __writeback_inodes_wb+0xb2/0x200 wb_writeback+0x4bd/0x660 wb_workfn+0x5f3/0xab0 process_one_work+0x79f/0x13e0 worker_thread+0x89/0xf60 kthread+0x26a/0x300 ret_from_fork+0x22/0x30 RIP: 0010:new_curseg+0xe8d/0x15f0 The root cause is: ckpt.valid_block_count is inconsistent with SIT table, stat info indicates filesystem has free blocks, but SIT table indicates filesystem has no free segment. So that during garbage colloection, it triggers panic when LFS allocator fails to find free segment. This patch tries to fix this issue by checking consistency in between ckpt.valid_block_count and block accounted from SIT. Cc: stable(a)vger.kernel.org Reported-by: Ming Yan <yanming(a)tju.edu.cn> Signed-off-by: Chao Yu <chao.yu(a)oppo.com> --- fs/f2fs/segment.c | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index 8c17fed8987e..eddaf3b45b25 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -4462,6 +4462,7 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) unsigned int readed, start_blk = 0; int err = 0; block_t total_node_blocks = 0; + block_t total_data_blocks = 0; do { readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_VECS, @@ -4488,6 +4489,8 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) seg_info_from_raw_sit(se, &sit); if (IS_NODESEG(se->type)) total_node_blocks += se->valid_blocks; + else + total_data_blocks += se->valid_blocks; if (f2fs_block_unit_discard(sbi)) { /* build discard map only one time */ @@ -4529,6 +4532,8 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) old_valid_blocks = se->valid_blocks; if (IS_NODESEG(se->type)) total_node_blocks -= old_valid_blocks; + else + total_data_blocks -= old_valid_blocks; err = check_block_count(sbi, start, &sit); if (err) @@ -4536,6 +4541,8 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) seg_info_from_raw_sit(se, &sit); if (IS_NODESEG(se->type)) total_node_blocks += se->valid_blocks; + else + total_data_blocks += se->valid_blocks; if (f2fs_block_unit_discard(sbi)) { if (is_set_ckpt_flags(sbi, CP_TRIMMED_FLAG)) { @@ -4557,13 +4564,24 @@ static int build_sit_entries(struct f2fs_sb_info *sbi) } up_read(&curseg->journal_rwsem); - if (!err && total_node_blocks != valid_node_count(sbi)) { + if (err) + return err; + + if (total_node_blocks != valid_node_count(sbi)) { f2fs_err(sbi, "SIT is corrupted node# %u vs %u", total_node_blocks, valid_node_count(sbi)); - err = -EFSCORRUPTED; + return -EFSCORRUPTED; } - return err; + if (total_data_blocks + total_node_blocks != + valid_user_blocks(sbi)) { + f2fs_err(sbi, "SIT is corrupted data# %u vs %u", + total_data_blocks, + valid_user_blocks(sbi) - total_node_blocks); + return -EFSCORRUPTED; + } + + return 0; } static void init_free_segmap(struct f2fs_sb_info *sbi) -- 2.25.1

3 years, 7 months

2
2
0 0

[PATCH] f2fs: fix deadloop in foreground GC

by Chao Yu

As Yanming reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215914 The root cause is: in a very small sized image, it's very easy to exceed threshold of foreground GC, if we calculate free space and dirty data based on section granularity, in corner case, has_not_enough_free_secs() will always return true, result in deadloop in f2fs_gc(). So this patch refactors has_not_enough_free_secs() as below to fix this issue: 1. calculate needed space based on block granularity, and separate all blocks to two parts, section part, and block part, comparing section part to free section, and comparing block part to free space in openned log. 2. account F2FS_DIRTY_NODES, F2FS_DIRTY_IMETA and F2FS_DIRTY_DENTS as node block consumer; 3. account F2FS_DIRTY_DENTS as data block consumer; Cc: stable(a)vger.kernel.org Reported-by: Ming Yan <yanming(a)tju.edu.cn> Signed-off-by: Chao Yu <chao.yu(a)oppo.com> --- fs/f2fs/segment.h | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h index 8a591455d796..28f7aa9b40bf 100644 --- a/fs/f2fs/segment.h +++ b/fs/f2fs/segment.h @@ -575,11 +575,10 @@ static inline int reserved_sections(struct f2fs_sb_info *sbi) return GET_SEC_FROM_SEG(sbi, reserved_segments(sbi)); } -static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi) +static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi, + unsigned int node_blocks, unsigned int dent_blocks) { - unsigned int node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) + - get_pages(sbi, F2FS_DIRTY_DENTS); - unsigned int dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS); + unsigned int segno, left_blocks; int i; @@ -605,19 +604,24 @@ static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi) static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi, int freed, int needed) { - int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES); - int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS); - int imeta_secs = get_blocktype_secs(sbi, F2FS_DIRTY_IMETA); + unsigned int total_node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) + + get_pages(sbi, F2FS_DIRTY_DENTS) + + get_pages(sbi, F2FS_DIRTY_IMETA); + unsigned int total_dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS); + unsigned int node_secs = total_node_blocks / BLKS_PER_SEC(sbi); + unsigned int dent_secs = total_dent_blocks / BLKS_PER_SEC(sbi); + unsigned int node_blocks = total_node_blocks % BLKS_PER_SEC(sbi); + unsigned int dent_blocks = total_dent_blocks % BLKS_PER_SEC(sbi); if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING))) return false; - if (free_sections(sbi) + freed == reserved_sections(sbi) + needed && - has_curseg_enough_space(sbi)) - return false; - return (free_sections(sbi) + freed) <= - (node_secs + 2 * dent_secs + imeta_secs + - reserved_sections(sbi) + needed); + if (free_sections(sbi) + freed <= + node_secs + dent_secs + reserved_sections(sbi) + needed) + return true; + if (!has_curseg_enough_space(sbi, node_blocks, dent_blocks)) + return true; + return false; } static inline bool f2fs_is_checkpoint_ready(struct f2fs_sb_info *sbi) -- 2.32.0

3 years, 7 months

2
2
0 0

[PATCH net v2] ping: fix address binding wrt vrf

by Nicolas Dichtel

When ping_group_range is updated, 'ping' uses the DGRAM ICMP socket, instead of an IP raw socket. In this case, 'ping' is unable to bind its socket to a local address owned by a vrflite. Before the patch: $ sysctl -w net.ipv4.ping_group_range='0 2147483647' $ ip link add blue type vrf table 10 $ ip link add foo type dummy $ ip link set foo master blue $ ip link set foo up $ ip addr add 192.168.1.1/24 dev foo $ ip vrf exec blue ping -c1 -I 192.168.1.1 192.168.1.2 ping: bind: Cannot assign requested address CC: stable(a)vger.kernel.org Fixes: 1b69c6d0ae90 ("net: Introduce L3 Master device abstraction") Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com> --- v1 -> v2: add the tag "Cc: stable(a)vger.kernel.org" for correct stable submission net/ipv4/ping.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 3ee947557b88..9ea326b50775 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -305,6 +305,7 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, struct net *net = sock_net(sk); if (sk->sk_family == AF_INET) { struct sockaddr_in *addr = (struct sockaddr_in *) uaddr; + u32 tb_id = RT_TABLE_LOCAL; int chk_addr_ret; if (addr_len < sizeof(*addr)) @@ -318,7 +319,8 @@ static int ping_check_bind_addr(struct sock *sk, struct inet_sock *isk, pr_debug("ping_check_bind_addr(sk=%p,addr=%pI4,port=%d)\n", sk, &addr->sin_addr.s_addr, ntohs(addr->sin_port)); - chk_addr_ret = inet_addr_type(net, addr->sin_addr.s_addr); + tb_id = l3mdev_fib_table_by_index(net, sk->sk_bound_dev_if) ? : tb_id; + chk_addr_ret = inet_addr_type_table(net, addr->sin_addr.s_addr, tb_id); if (!inet_addr_valid_or_nonlocal(net, inet_sk(sk), addr->sin_addr.s_addr, -- 2.33.0

3 years, 7 months

2
2
0 0

[PATCH v2 06/12] ptrace: Reimplement PTRACE_KILL by always sending SIGKILL

by Eric W. Biederman

Call send_sig_info in PTRACE_KILL instead of ptrace_resume. Calling ptrace_resume is not safe to call if the task has not been stopped with ptrace_freeze_traced. Cc: stable(a)vger.kernel.org Reported-by: Al Viro <viro(a)zeniv.linux.org.uk> Suggested-by: Al Viro <viro(a)zeniv.linux.org.uk> Signed-off-by: "Eric W. Biederman" <ebiederm(a)xmission.com> --- kernel/ptrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/ptrace.c b/kernel/ptrace.c index ccc4b465775b..43da5764b6f3 100644 --- a/kernel/ptrace.c +++ b/kernel/ptrace.c @@ -1238,7 +1238,7 @@ int ptrace_request(struct task_struct *child, long request, case PTRACE_KILL: if (child->exit_state) /* already dead */ return 0; - return ptrace_resume(child, request, SIGKILL); + return send_sig_info(SIGKILL, SEND_SIG_NOINFO, child); #ifdef CONFIG_HAVE_ARCH_TRACEHOOK case PTRACE_GETREGSET: -- 2.35.3

3 years, 7 months

2
2
0 0

[PATCH v5.10] dm: fix mempool NULL pointer race when completing IO

by Mikulas Patocka

Hi This is backport of patches d208b89401e0 ("dm: fix mempool NULL pointer race when completing IO") and 9f6dc6337610 ("dm: interlock pending dm_io and dm_wait_for_bios_completion") for the kernel 5.10. The bugs fixed by these patches can cause random crashing when reloading dm table, so it is eligible for stable backport. This patch is different from the upstream patches because the code diverged significantly. Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com> --- drivers/md/dm.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) Index: linux-stable/drivers/md/dm.c =================================================================== --- linux-stable.orig/drivers/md/dm.c 2022-04-19 16:17:52.000000000 +0200 +++ linux-stable/drivers/md/dm.c 2022-04-19 16:23:23.000000000 +0200 @@ -607,19 +607,26 @@ static void start_io_acct(struct dm_io * false, 0, &io->stats_aux); } +static void free_io(struct mapped_device *md, struct dm_io *io); + static void end_io_acct(struct dm_io *io) { struct mapped_device *md = io->md; struct bio *bio = io->orig_bio; - unsigned long duration = jiffies - io->start_time; - - bio_end_io_acct(bio, io->start_time); + unsigned long start_time = io->start_time; + unsigned long duration = jiffies - start_time; if (unlikely(dm_stats_used(&md->stats))) dm_stats_account_io(&md->stats, bio_data_dir(bio), bio->bi_iter.bi_sector, bio_sectors(bio), true, duration, &io->stats_aux); + free_io(md, io); + + smp_wmb(); + + bio_end_io_acct(bio, start_time); + /* nudge anyone waiting on suspend queue */ if (unlikely(wq_has_sleeper(&md->wait))) wake_up(&md->wait); @@ -930,7 +937,6 @@ static void dec_pending(struct dm_io *io io_error = io->status; bio = io->orig_bio; end_io_acct(io); - free_io(md, io); if (io_error == BLK_STS_DM_REQUEUE) return; @@ -2345,6 +2351,8 @@ static int dm_wait_for_bios_completion(s } finish_wait(&md->wait, &wait); + smp_rmb(); + return r; }

3 years, 7 months

3
4
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror April 2022