- Linux-stable-mirror - lists.linaro.org

[PATCH] media: videobuf2-core: copy vb planes unconditionally

by Tudor Ambarus

Copy the relevant data from userspace to the vb->planes unconditionally as it's possible some of the fields may have changed after the buffer has been validated. Keep the dma_buf_put(planes[plane].dbuf) calls in the first `if (!reacquired)` case, in order to be close to the plane validation code where the buffers were got in the first place. Cc: stable(a)vger.kernel.org Fixes: 95af7c00f35b ("media: videobuf2-core: release all planes first in __prepare_dmabuf()") Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org> --- .../media/common/videobuf2/videobuf2-core.c | 28 ++++++++++--------- 1 file changed, 15 insertions(+), 13 deletions(-) diff --git a/drivers/media/common/videobuf2/videobuf2-core.c b/drivers/media/common/videobuf2/videobuf2-core.c index f07dc53a9d06..c0cc441b5164 100644 --- a/drivers/media/common/videobuf2/videobuf2-core.c +++ b/drivers/media/common/videobuf2/videobuf2-core.c @@ -1482,18 +1482,23 @@ static int __prepare_dmabuf(struct vb2_buffer *vb) } vb->planes[plane].dbuf_mapped = 1; } + } else { + for (plane = 0; plane < vb->num_planes; ++plane) + dma_buf_put(planes[plane].dbuf); + } - /* - * Now that everything is in order, copy relevant information - * provided by userspace. - */ - for (plane = 0; plane < vb->num_planes; ++plane) { - vb->planes[plane].bytesused = planes[plane].bytesused; - vb->planes[plane].length = planes[plane].length; - vb->planes[plane].m.fd = planes[plane].m.fd; - vb->planes[plane].data_offset = planes[plane].data_offset; - } + /* + * Now that everything is in order, copy relevant information + * provided by userspace. + */ + for (plane = 0; plane < vb->num_planes; ++plane) { + vb->planes[plane].bytesused = planes[plane].bytesused; + vb->planes[plane].length = planes[plane].length; + vb->planes[plane].m.fd = planes[plane].m.fd; + vb->planes[plane].data_offset = planes[plane].data_offset; + } + if (reacquired) { /* * Call driver-specific initialization on the newly acquired buffer, * if provided. @@ -1503,9 +1508,6 @@ static int __prepare_dmabuf(struct vb2_buffer *vb) dprintk(q, 1, "buffer initialization failed\n"); goto err_put_vb2_buf; } - } else { - for (plane = 0; plane < vb->num_planes; ++plane) - dma_buf_put(planes[plane].dbuf); } ret = call_vb_qop(vb, buf_prepare, vb); -- 2.47.0.199.ga7371fff76-goog

1 year, 1 month

4
3
0 0

[PATCH 4.19 000/350] 4.19.323-rc1 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.19.323 release. There are 350 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Fri, 08 Nov 2024 12:02:47 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.323-r… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.19.323-rc1 Jeongjun Park <aha310510(a)gmail.com> vt: prevent kernel-infoleak in con_font_get() Jeongjun Park <aha310510(a)gmail.com> mm: shmem: fix data-race in shmem_getattr() Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix kernel bug due to missing clearing of checked flag Edward Adam Davis <eadavis(a)qq.com> ocfs2: pass u64 to ocfs2_truncate_inline maybe overflow Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix potential deadlock with newly created symlinks Ville Syrjälä <ville.syrjala(a)linux.intel.com> wifi: iwlegacy: Clear stale interrupts before resuming device Manikanta Pubbisetty <quic_mpubbise(a)quicinc.com> wifi: ath10k: Fix memory leak in management tx Felix Fietkau <nbd(a)nbd.name> wifi: mac80211: do not pass a stopped vif to the driver in .get_txpower Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Revert "driver core: Fix uevent_show() vs driver detach race" Faisal Hassan <quic_faisalh(a)quicinc.com> xhci: Fix Link TRB DMA in command ring stopped completion event Zijun Hu <quic_zijuhu(a)quicinc.com> usb: phy: Fix API devm_usb_put_phy() can not release the phy Zongmin Zhou <zhouzongmin(a)kylinos.cn> usbip: tools: Fix detach_port() invalid port error path Dimitri Sivanich <sivanich(a)hpe.com> misc: sgi-gru: Don't disable preemption in GRU driver Daniel Palmer <daniel(a)0x0f.com> net: amd: mvme147: Fix probe banner message Xiongfeng Wang <wangxiongfeng2(a)huawei.com> firmware: arm_sdei: Fix the input parameter of cpuhp_remove_state() Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: nft_payload: sanitize offset and length before calling skb_checksum() Benoît Monin <benoit.monin(a)gmx.fr> net: skip offload for NETIF_F_IPV6_CSUM if ipv6 header contains extension Xin Long <lucien.xin(a)gmail.com> net: support ip generic csum processing in skb_csum_hwoffload_help Byeonguk Jeong <jungbu2855(a)gmail.com> bpf: Fix out-of-bounds write in trie_get_next_key() Pedro Tammela <pctammela(a)mojatatu.com> net/sched: stop qdisc_tree_reduce_backlog on TC_H_ROOT Pablo Neira Ayuso <pablo(a)netfilter.org> gtp: allow -1 to be specified as file description from userspace Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> gtp: simplify error handling code in 'gtp_encap_enable()' Wander Lairson Costa <wander(a)redhat.com> igb: Disable threaded IRQ for igb_msix_other Felix Fietkau <nbd(a)nbd.name> wifi: mac80211: skip non-uploaded keys in ieee80211_iter_keys Xiu Jianfeng <xiujianfeng(a)huawei.com> cgroup: Fix potential overflow issue when checking max_depth Selvarasu Ganesan <selvarasu.g(a)samsung.com> usb: dwc3: core: Stop processing of pending events if controller is halted Yu Chen <chenyu56(a)huawei.com> usb: dwc3: Add splitdisable quirk for Hisilicon Kirin Soc Marek Szyprowski <m.szyprowski(a)samsung.com> usb: dwc3: remove generic PHY calibrate() calls Sabrina Dubroca <sd(a)queasysnail.net> xfrm: validate new SA's prefixlen using SA family when sel.family is unset junhua huang <huang.junhua(a)zte.com.cn> arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning Paul Moore <paul(a)paul-moore.com> selinux: improve error checking in sel_write_load() Haiyang Zhang <haiyangz(a)microsoft.com> hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix kernel bug due to missing clearing of buffer delay flag Shubham Panwar <shubiisp8(a)gmail.com> ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue Mario Limonciello <mario.limonciello(a)amd.com> drm/amd: Guard against bad data for ATIF ACPI method Kailang Yang <kailang(a)realtek.com> ALSA: hda/realtek: Update default depop procedure Jinjie Ruan <ruanjinjie(a)huawei.com> posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() Oliver Neukum <oneukum(a)suse.com> net: usb: usbnet: fix name regression Biju Das <biju.das(a)bp.renesas.com> dt-bindings: power: Add r8a774b1 SYSC power domain definitions Wang Hai <wanghai38(a)huawei.com> be2net: fix potential memory leak in be_xmit() Wang Hai <wanghai38(a)huawei.com> net/sun3_82586: fix potential memory leak in sun3_82586_send_packet() Dave Kleikamp <dave.kleikamp(a)oracle.com> jfs: Fix sanity check in dbMount Gianfranco Trad <gianf.trad(a)gmail.com> udf: fix uninit-value use in udf_get_fileshortad Nico Boehr <nrb(a)linux.ibm.com> KVM: s390: gaccess: Check if guest address is in memslot Janis Schoetterl-Glausch <scgl(a)linux.ibm.com> KVM: s390: gaccess: Cleanup access to guest pages Janis Schoetterl-Glausch <scgl(a)linux.ibm.com> KVM: s390: gaccess: Refactor access address range check Janis Schoetterl-Glausch <scgl(a)linux.ibm.com> KVM: s390: gaccess: Refactor gpa and length calculation Mark Rutland <mark.rutland(a)arm.com> arm64: probes: Fix uprobes for big-endian kernels junhua huang <huang.junhua(a)zte.com.cn> arm64:uprobe fix the uprobe SWBP_INSN in big-endian Ye Bin <yebin10(a)huawei.com> Bluetooth: bnep: fix wild-memory-access in proto_unregister Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> usb: typec: altmode should keep reference to parent Wang Hai <wanghai38(a)huawei.com> net: systemport: fix potential memory leak in bcm_sysport_xmit() Wang Hai <wanghai38(a)huawei.com> net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit() Sabrina Dubroca <sd(a)queasysnail.net> macsec: don't increment counters for an unrelated SA Jonathan Marek <jonathan(a)marek.ca> drm/msm/dsi: fix 32-bit signed integer extension in pclk_rate calculation Kalesh AP <kalesh-anakkur.purayil(a)broadcom.com> RDMA/bnxt_re: Return more meaningful error Anumula Murali Mohan Reddy <anumula(a)chelsio.com> RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP Saravanan Vajravel <saravanan.vajravel(a)broadcom.com> RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Andrey Skvortsov <andrej.skvortzov(a)gmail.com> clk: Fix slab-out-of-bounds error in devm_clk_release() Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de> clk: Fix pointer casting to prevent oops in devm_clk_release() Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: propagate directory read errors from nilfs_find_entry() Zhang Rui <rui.zhang(a)intel.com> x86/apic: Always explicitly disarm TSC-deadline timer Takashi Iwai <tiwai(a)suse.de> parport: Proper fix for array out-of-bounds access Daniele Palmas <dnlplm(a)gmail.com> USB: serial: option: add Telit FN920C04 MBIM compositions Benjamin B. Frost <benjamin(a)geanix.com> USB: serial: option: add support for Quectel EG916Q-GL Mathias Nyman <mathias.nyman(a)linux.intel.com> xhci: Fix incorrect stream context type macro Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Bluetooth: btusb: Fix regression with fake CSR controllers 0a12:0001 Aaron Thompson <dev(a)aaront.org> Bluetooth: Remove debugfs directory on module init failure Emil Gedenryd <emil.gedenryd(a)axis.com> iio: light: opt3001: add missing full-scale range value Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> iio: hid-sensors: Fix an error handling path in _hid_sensor_set_report_latency() Javier Carrasco <javier.carrasco.cruz(a)gmail.com> iio: adc: ti-ads8688: add missing select IIO_(TRIGGERED_)BUFFER in Kconfig Javier Carrasco <javier.carrasco.cruz(a)gmail.com> iio: dac: stm32-dac-core: add missing select REGMAP_MMIO in Kconfig Nikolay Kuratov <kniv(a)yandex-team.ru> drm/vmwgfx: Handle surface check failure correctly Jim Mattson <jmattson(a)google.com> x86/cpufeatures: Define X86_FEATURE_AMD_IBPB_RET Michael Mueller <mimu(a)linux.ibm.com> KVM: s390: Change virtual to physical address access in diag 0x258 handler Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> s390/sclp_vt220: Convert newlines to CRLF instead of LFCR Joseph Huang <Joseph.Huang(a)garmin.com> net: dsa: mv88e6xxx: Fix out-of-bound access Breno Leitao <leitao(a)debian.org> KVM: Fix a data race on last_boosted_vcpu in kvm_vcpu_on_spin() OGAWA Hirofumi <hirofumi(a)mail.parknet.co.jp> fat: fix uninitialized variable WangYuli <wangyuli(a)uniontech.com> PCI: Add function 0 DMA alias quirk for Glenfly Arise chip Mark Rutland <mark.rutland(a)arm.com> arm64: probes: Fix simulate_ldr*_literal() Mark Rutland <mark.rutland(a)arm.com> arm64: probes: Remove broken LDR (literal) uprobe support Jinjie Ruan <ruanjinjie(a)huawei.com> posix-clock: Fix missing timespec64 check in pc_clock_settime() Anastasia Kovaleva <a.kovaleva(a)yadro.com> net: Fix an unsafe loop on the list Icenowy Zheng <uwu(a)icenowy.me> usb: storage: ignore bogus device raised by JieLi BR21 USB sound chip Jose Alberto Reguero <jose.alberto.reguero(a)gmail.com> usb: xhci: Fix problem with xhci resume from suspend Oliver Neukum <oneukum(a)suse.com> Revert "usb: yurex: Replace snprintf() with the safer scnprintf() variant" Wade Wang <wade.wang(a)hp.com> HID: plantronics: Workaround for an unexcepted opposite volume key Oliver Neukum <oneukum(a)suse.com> CDC-NCM: avoid overflow in sanity checking j.nixdorf(a)avm.de <j.nixdorf(a)avm.de> net: ipv6: ensure we call ipv6_mc_down() at most once Eric Dumazet <edumazet(a)google.com> ppp: fix ppp_async_encode() illegal access Rosen Penev <rosenp(a)gmail.com> net: ibm: emac: mal: fix wrong goto Mohamed Khalfella <mkhalfella(a)purestorage.com> igb: Do not bring the device up after non-fatal error Billy Tsai <billy_tsai(a)aspeedtech.com> gpio: aspeed: Use devm_clk api to manage clock source Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de> clk: Provide new devm_clk helpers for prepared and enabled clocks Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de> clk: generalize devm_clk_get() a bit Phil Edworthy <phil.edworthy(a)renesas.com> clk: Add (devm_)clk_get_optional() functions Billy Tsai <billy_tsai(a)aspeedtech.com> gpio: aspeed: Add the flush write to ensure the write complete. Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Bluetooth: RFCOMM: FIX possible deadlock in rfcomm_sk_state_change Andy Roulin <aroulin(a)nvidia.com> netfilter: br_netfilter: fix panic with metadata_dst skb Neal Cardwell <ncardwell(a)google.com> tcp: fix tcp_enter_recovery() to zero retrans_stamp when it's safe Dan Carpenter <dan.carpenter(a)linaro.org> SUNRPC: Fix integer overflow in decode_rc_list() Chuck Lever <chuck.lever(a)oracle.com> NFS: Remove print_overflow_msg() Andrey Shumilin <shum.sdl(a)nppct.ru> fbdev: sisfb: Fix strbuf array overflow Zijun Hu <quic_zijuhu(a)quicinc.com> driver core: bus: Return -EIO instead of 0 when show/store invalid bus attribute Zhu Jun <zhujun2(a)cmss.chinamobile.com> tools/iio: Add memory allocation failure check for trigger_name Xu Yang <xu.yang_2(a)nxp.com> usb: chipidea: udc: enable suspend interrupt after usb reset Yunke Cao <yunkec(a)chromium.org> media: videobuf2-core: clear memory related fields in __vb2_plane_dmabuf_put() Alex Williamson <alex.williamson(a)redhat.com> PCI: Mark Creative Labs EMU20k2 INTx masking as broken Hans de Goede <hdegoede(a)redhat.com> i2c: i801: Use a different adapter-name for IDF adapters Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> clk: bcm: bcm53573: fix OF node leak in init Daniel Jordan <daniel.m.jordan(a)oracle.com> ktest.pl: Avoid false positives with grub2 skip regex Thomas Richter <tmricht(a)linux.ibm.com> s390/cpum_sf: Remove WARN_ON_ONCE statements Wojciech Gładysz <wojciech.gladysz(a)infogain.com> ext4: nested locking for xattr inode Gerald Schaefer <gerald.schaefer(a)linux.ibm.com> s390/mm: Add cond_resched() to cmm_alloc/free_pages() Heiko Carstens <hca(a)linux.ibm.com> s390/facility: Disable compile time optimization for decompressor code Tao Chen <chen.dylane(a)gmail.com> bpf: Check percpu map value size first Mathias Krause <minipli(a)grsecurity.net> Input: synaptics-rmi4 - fix UAF of IRQ domain on driver removal Michael S. Tsirkin <mst(a)redhat.com> virtio_console: fix misc probe bugs Rob Clark <robdclark(a)chromium.org> drm/crtc: fix uninitialized variable use even harder Sean Paul <seanpaul(a)chromium.org> drm: Move drm_mode_setcrtc() local re-init to failure path Steven Rostedt (Google) <rostedt(a)goodmis.org> tracing: Remove precision vsnprintf() check from print event Linus Walleij <linus.walleij(a)linaro.org> net: ethernet: cortina: Drop TSO support zhanchengbin <zhanchengbin1(a)huawei.com> ext4: fix inode tree inconsistency caused by ENOMEM Armin Wolf <W_Armin(a)gmx.de> ACPI: battery: Fix possible crash when unregistering a battery hook Armin Wolf <W_Armin(a)gmx.de> ACPI: battery: Simplify battery hook locking Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> rtc: at91sam9: fix OF node leak in probe() error path Alexandre Belloni <alexandre.belloni(a)bootlin.com> rtc: at91sam9: drop platform_data support NeilBrown <neilb(a)suse.de> nfsd: fix delegation_blocked() to block correctly for at least 30 seconds Arnd Bergmann <arnd(a)arndb.de> nfsd: use ktime_get_seconds() for timestamps Oleg Nesterov <oleg(a)redhat.com> uprobes: fix kernel info leak via "[uprobes]" vma Mark Rutland <mark.rutland(a)arm.com> arm64: errata: Expand speculative SSBS workaround once more Mark Rutland <mark.rutland(a)arm.com> arm64: cputype: Add Neoverse-N3 definitions Anshuman Khandual <anshuman.khandual(a)arm.com> arm64: Add Cortex-715 CPU part definition Baokun Li <libaokun1(a)huawei.com> ext4: update orig_path in ext4_find_extent() Baokun Li <libaokun1(a)huawei.com> ext4: fix slab-use-after-free in ext4_split_extent_at() Theodore Ts'o <tytso(a)mit.edu> ext4: avoid ext4_error()'s caused by ENOMEM in the truncate path Emanuele Ghidoli <emanuele.ghidoli(a)toradex.com> gpio: davinci: fix lazy disable Filipe Manana <fdmanana(a)suse.com> btrfs: wait for fixup workers before stopping cleaner kthread during umount Nuno Sa <nuno.sa(a)analog.com> Input: adp5589-keys - fix adp5589_gpio_get_value() Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp> tomoyo: fallback to realpath if symlink's pathname does not exist Barnabás Czémán <barnabas.czeman(a)mainlining.org> iio: magnetometer: ak8975: Fix reading for ak099xx sensors Zheng Wang <zyytlz.wz(a)163.com> media: venus: fix use after free bug in venus_remove due to race condition Hans Verkuil <hverkuil-cisco(a)xs4all.nl> media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags Sebastian Reichel <sebastian.reichel(a)collabora.com> clk: rockchip: fix error for unknown clocks Chun-Yi Lee <joeyli.kernel(a)gmail.com> aoe: fix the potential use-after-free problem in more places Jisheng Zhang <jszhang(a)kernel.org> riscv: define ILLEGAL_POINTER_VALUE for 64bit Lizhi Xu <lizhi.xu(a)windriver.com> ocfs2: fix possible null-ptr-deref in ocfs2_set_buffer_uptodate Julian Sun <sunjunchao2870(a)gmail.com> ocfs2: fix null-ptr-deref when journal load failed. Lizhi Xu <lizhi.xu(a)windriver.com> ocfs2: remove unreasonable unlock in ocfs2_read_blocks Joseph Qi <joseph.qi(a)linux.alibaba.com> ocfs2: cancel dqi_sync_work before freeing oinfo Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com> ocfs2: reserve space for inline xattr before attaching reflink tree Joseph Qi <joseph.qi(a)linux.alibaba.com> ocfs2: fix uninit-value in ocfs2_get_block() Heming Zhao <heming.zhao(a)suse.com> ocfs2: fix the la space leak when unmounting an ocfs2 volume Baokun Li <libaokun1(a)huawei.com> jbd2: stop waiting for space when jbd2_cleanup_journal_tail() returns error Andrew Jones <ajones(a)ventanamicro.com> of/irq: Support #msi-cells=<0> in of_msi_get_domain Helge Deller <deller(a)kernel.org> parisc: Fix 64-bit userspace syscall path Luis Henriques (SUSE) <luis.henriques(a)linux.dev> ext4: fix incorrect tid assumption in ext4_wait_for_tail_page_commit() Baokun Li <libaokun1(a)huawei.com> ext4: fix double brelse() the buffer of the extents path Baokun Li <libaokun1(a)huawei.com> ext4: aovid use-after-free in ext4_ext_insert_extent() Luis Henriques (SUSE) <luis.henriques(a)linux.dev> ext4: fix incorrect tid assumption in __jbd2_log_wait_for_space() Baokun Li <libaokun1(a)huawei.com> ext4: propagate errors from ext4_find_extent() in ext4_insert_range() Edward Adam Davis <eadavis(a)qq.com> ext4: no need to continue when the number of entries is 1 Jaroslav Kysela <perex(a)perex.cz> ALSA: core: add isascii() check to card ID generator Helge Deller <deller(a)gmx.de> parisc: Fix itlb miss handler for 64-bit programs Luo Gengkun <luogengkun(a)huaweicloud.com> perf/core: Fix small negative period being ignored Jinjie Ruan <ruanjinjie(a)huawei.com> spi: bcm63xx: Fix module autoloading Robert Hancock <robert.hancock(a)calian.com> i2c: xiic: Wait for TX empty to avoid missed TX NAKs Christophe Leroy <christophe.leroy(a)csgroup.eu> selftests: vDSO: fix vDSO symbols lookup for powerpc64 Yifei Liu <yifei.l.liu(a)oracle.com> selftests: breakpoints: use remaining time to check if suspend succeed Ben Dooks <ben.dooks(a)codethink.co.uk> spi: s3c64xx: fix timeout counters in flush_fifo Artem Sadovnikov <ancowi69(a)gmail.com> ext4: fix i_data_sem unlock order in ext4_ind_migrate() Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> ext4: ext4_search_dir should return a proper error Geert Uytterhoeven <geert+renesas(a)glider.be> of/irq: Refer to actual buffer size in of_irq_parse_one() Geert Uytterhoeven <geert+renesas(a)glider.be> drm/radeon/r100: Handle unknown family in r100_cp_init_microcode() Kees Cook <kees(a)kernel.org> scsi: aacraid: Rearrange order of struct aac_srb_unit Matthew Brost <matthew.brost(a)intel.com> drm/printer: Allow NULL data in devcoredump printer Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com> drm/amd/display: Fix index out of bounds in degamma hardware format translation Alex Hung <alex.hung(a)amd.com> drm/amd/display: Check stream before comparing them Zhao Mengmeng <zhaomengmeng(a)kylinos.cn> jfs: Fix uninit-value access of new_ea in ea_buffer Edward Adam Davis <eadavis(a)qq.com> jfs: check if leafidx greater than num leaves per dmap tree Edward Adam Davis <eadavis(a)qq.com> jfs: Fix uaf in dbFreeBits Remington Brasga <rbrasga(a)uci.edu> jfs: UBSAN: shift-out-of-bounds in dbFindBits Damien Le Moal <dlemoal(a)kernel.org> ata: sata_sil: Rename sil_blacklist to sil_quirks Andrew Davis <afd(a)ti.com> power: reset: brcmstb: Do not go into infinite loop if reset fails Kaixin Wang <kxwang23(a)m.fudan.edu.cn> fbdev: pxafb: Fix possible use after free in pxafb_task() Takashi Iwai <tiwai(a)suse.de> ALSA: hdsp: Break infinite MIDI input flush loop Takashi Iwai <tiwai(a)suse.de> ALSA: asihpi: Fix potential OOB array access Thomas Gleixner <tglx(a)linutronix.de> signal: Replace BUG_ON()s Gustavo A. R. Silva <gustavoars(a)kernel.org> wifi: mwifiex: Fix memcpy() field-spanning write warning in mwifiex_cmd_802_11_scan_ext() Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com> ACPICA: iasl: handle empty connection_node Jason Xing <kernelxing(a)tencent.com> tcp: avoid reusing FIN_WAIT2 when trying to find port in connect() process Ido Schimmel <idosch(a)nvidia.com> ipv4: Mask upper DSCP bits and ECN bits in NETLINK_FIB_LOOKUP family Kuniyuki Iwashima <kuniyu(a)amazon.com> ipv4: Check !in_dev earlier for ioctl(SIOCSIFADDR). Simon Horman <horms(a)kernel.org> net: mvpp2: Increase size of queue_name buffer Simon Horman <horms(a)kernel.org> tipc: guard against string buffer overrun Pei Xiao <xiaopei01(a)kylinos.cn> ACPICA: check null return of ACPI_ALLOCATE_ZEROED() in acpi_db_convert_to_package() Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> ACPI: EC: Do not release locks during operation region accesses Armin Wolf <W_Armin(a)gmx.de> ACPICA: Fix memory leak if acpi_ps_get_next_field() fails Armin Wolf <W_Armin(a)gmx.de> ACPICA: Fix memory leak if acpi_ps_get_next_namepath() fails Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> net: hisilicon: hns_mdio: fix OF node leak in probe() Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> net: hisilicon: hns_dsaf_mac: fix OF node leak in hns_mac_get_info() Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> net: hisilicon: hip04: fix OF node leak in probe() Toke Høiland-Jørgensen <toke(a)redhat.com> wifi: ath9k_htc: Use __skb_set_length() for resetting urb before resubmit Dmitry Kandybka <d.kandybka(a)gmail.com> wifi: ath9k: fix possible integer overflow in ath9k_get_et_stats() Jann Horn <jannh(a)google.com> f2fs: Require FMODE_WRITE for atomic write ioctls Takashi Iwai <tiwai(a)suse.de> ALSA: hda/conexant: Fix conflicting quirk for System76 Pangolin Takashi Iwai <tiwai(a)suse.de> ALSA: hda/generic: Unconditionally prefer preferred_dacs pairs Xin Long <lucien.xin(a)gmail.com> sctp: set sk_state back to CLOSED if autobind fails in sctp_listen_start Anton Danilov <littlesmilingcloud(a)gmail.com> ipv4: ip_gre: Fix drops of small packets in ipgre_xmit Eric Dumazet <edumazet(a)google.com> net: add more sanity checks to qdisc_pkt_len_init() Eric Dumazet <edumazet(a)google.com> net: avoid potential underflow in qdisc_pkt_len_init() with UFO Aleksander Jan Bajkowski <olek2(a)wp.pl> net: ethernet: lantiq_etop: fix memory disclosure Prashant Malani <pmalani(a)chromium.org> r8152: Factor out OOB link list waits Eric Dumazet <edumazet(a)google.com> netfilter: nf_tables: prevent nf_skb_duplicated corruption Phil Sutter <phil(a)nwl.cc> netfilter: uapi: NFTA_FLOWTABLE_HOOK is NLA_NESTED Xiubo Li <xiubli(a)redhat.com> ceph: remove the incorrect Fw reference check when dirtying pages Stefan Wahren <wahrenst(a)gmx.net> mailbox: bcm2835: Fix timeout during suspend mode Liao Chen <liaochen4(a)huawei.com> mailbox: rockchip: fix a typo in module autoloading Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com> usb: yurex: Fix inconsistent locking bug in yurex_read() Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> i2c: isch: Add missed 'else' Tommy Huang <tommy_huang(a)aspeedtech.com> i2c: aspeed: Update the stop sw state when the bus recovery occurs Ma Ke <make24(a)iscas.ac.cn> pps: add an error check in parport_attach Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> pps: remove usage of the deprecated ida_simple_xx() API Oliver Neukum <oneukum(a)suse.com> USB: misc: yurex: fix race between read and write Lee Jones <lee(a)kernel.org> usb: yurex: Replace snprintf() with the safer scnprintf() variant Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> soc: versatile: realview: fix soc_dev leak during device remove Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> soc: versatile: realview: fix memory leak during device remove Sean Anderson <sean.anderson(a)linux.dev> PCI: xilinx-nwl: Fix off-by-one in INTx IRQ handler Thomas Gleixner <tglx(a)linutronix.de> PCI: xilinx-nwl: Use irq_data_get_irq_chip_data() Li Lingfeng <lilingfeng3(a)huawei.com> nfs: fix memory leak in error path of nfs4_do_reclaim Mickaël Salaün <mic(a)digikod.net> fs: Fix file_set_fowner LSM hook inconsistencies Julian Sun <sunjunchao2870(a)gmail.com> vfs: fix race between evice_inodes() and find_inode()&iput() Nikita Zhandarovich <n.zhandarovich(a)fintech.ru> f2fs: avoid potential int overflow in sanity_check_area_boundary() Nikita Zhandarovich <n.zhandarovich(a)fintech.ru> f2fs: prevent possible int overflow in dir_block_index() Thomas Weißschuh <linux(a)weissschuh.net> ACPI: sysfs: validate return type of _STR method Mikhail Lobanov <m.lobanov(a)rosalinux.ru> drbd: Add NULL check for net_conf to prevent dereference in state validation Qiu-ji Chen <chenqiuji666(a)gmail.com> drbd: Fix atomicity violation in drbd_uuid_set_bm() Florian Fainelli <florian.fainelli(a)broadcom.com> tty: rp2: Fix reset with non forgiving PCIe host bridges Jann Horn <jannh(a)google.com> firmware_loader: Block path traversal Oliver Neukum <oneukum(a)suse.com> USB: misc: cypress_cy7c63: check for short transfer Oliver Neukum <oneukum(a)suse.com> USB: appledisplay: close race between probe and completion handler Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> soc: versatile: integrator: fix OF node leak in probe() error path Laurent Pinchart <laurent.pinchart(a)ideasonboard.com> Remove *.orig pattern from .gitignore Hailey Mothershead <hailmo(a)amazon.com> crypto: aead,cipher - zeroize key buffer after use Simon Horman <horms(a)kernel.org> netfilter: ctnetlink: compile ctnetlink_label_size with CONFIG_NF_CONNTRACK_EVENTS Youssef Samir <quic_yabdulra(a)quicinc.com> net: qrtr: Update packets cloning when broadcasting Josh Hunt <johunt(a)akamai.com> tcp: check skb is non-NULL in tcp_rto_delta_us() Eric Dumazet <edumazet(a)google.com> tcp: introduce tcp_skb_timestamp_us() helper Kaixin Wang <kxwang23(a)m.fudan.edu.cn> net: seeq: Fix use after free vulnerability in ether3 Driver Due to Race Condition Eric Dumazet <edumazet(a)google.com> netfilter: nf_reject_ipv6: fix nf_reject_ip6_tcphdr_put() Suzuki K Poulose <suzuki.poulose(a)arm.com> coresight: tmc: sg: Do not leak sg_table Chao Yu <chao(a)kernel.org> f2fs: reduce expensive checkpoint trigger frequency Chao Yu <chao(a)kernel.org> f2fs: remove unneeded check condition in __f2fs_setxattr() Chao Yu <chao(a)kernel.org> f2fs: fix to update i_ctime in __f2fs_setxattr() Yonggil Song <yonggil.song(a)samsung.com> f2fs: fix typo Chao Yu <yuchao0(a)huawei.com> f2fs: enhance to update i_mode and acl atomically in f2fs_setattr() Guoqing Jiang <guoqing.jiang(a)linux.dev> nfsd: call cache_put if xdr_reserve_space returns NULL Jinjie Ruan <ruanjinjie(a)huawei.com> ntb: intel: Fix the NULL vs IS_ERR() bug for debugfs_create_dir() Mikhail Lobanov <m.lobanov(a)rosalinux.ru> RDMA/cxgb4: Added NULL check for lookup_atid Wang Jianzheng <wangjianzheng(a)vivo.com> pinctrl: mvebu: Fix devinit_dove_pinctrl_probe function Yangtao Li <frank.li(a)vivo.com> pinctrl: mvebu: Use devm_platform_get_and_ioremap_resource() David Lechner <dlechner(a)baylibre.com> clk: ti: dra7-atl: Fix leak of of_nodes Yang Yingliang <yangyingliang(a)huawei.com> pinctrl: single: fix missing error code in pcs_probe() Zhu Yanjun <yanjun.zhu(a)linux.dev> RDMA/iwcm: Fix WARNING:at_kernel/workqueue.c:#check_flush_dependency Sean Anderson <sean.anderson(a)linux.dev> PCI: xilinx-nwl: Fix register misspelling Junlin Li <make24(a)iscas.ac.cn> drivers: media: dvb-frontends/rtl2830: fix an out-of-bounds write error Junlin Li <make24(a)iscas.ac.cn> drivers: media: dvb-frontends/rtl2832: fix an out-of-bounds write error Jonas Karlman <jonas(a)kwiboo.se> clk: rockchip: Set parent rate for DCLK_VOP clock on RK3228 Ian Rogers <irogers(a)google.com> perf time-utils: Fix 32-bit nsec parsing Yang Jihong <yangjihong(a)bytedance.com> perf sched timehist: Fixed timestamp error when unable to confirm event sched_in time Yang Jihong <yangjihong(a)bytedance.com> perf sched timehist: Fix missing free of session in perf_sched__timehist() Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix potential oob read in nilfs_btree_check_delete() Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: determine empty node blocks as corrupted Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix potential null-ptr-deref in nilfs_btree_insert() Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> ext4: avoid OOB when system.data xattr changes underneath the filesystem Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> ext4: return error on ext4_find_inline_entry Kemeng Shi <shikemeng(a)huaweicloud.com> ext4: avoid negative min_clusters in find_group_orlov() Jiawei Ye <jiawei.ye(a)foxmail.com> smackfs: Use rcu_assign_pointer() to ensure safe assignment in smk_set_cipso yangerkun <yangerkun(a)huawei.com> ext4: clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT even mount with discard Mauricio Faria de Oliveira <mfo(a)canonical.com> jbd2: introduce/export functions jbd2_journal_submit|finish_inode_data_buffers() Chen Yu <yu.c.chen(a)intel.com> kthread: fix task state in kthread worker if being frozen Rob Clark <robdclark(a)chromium.org> kthread: add kthread_work tracepoints Lasse Collin <lasse.collin(a)tukaani.org> xz: cleanup CRC32 edits from 2018 Tony Ambardar <tony.ambardar(a)gmail.com> selftests/bpf: Fix error compiling test_lru_map.c Juergen Gross <jgross(a)suse.com> xen/swiotlb: add alignment check for dma buffers Juergen Gross <jgross(a)suse.com> xen/swiotlb: simplify range_straddles_page_boundary() Juergen Gross <jgross(a)suse.com> xen: use correct end address of kernel for conflict checking Sherry Yang <sherry.yang(a)oracle.com> drm/msm: fix %s null argument error Wolfram Sang <wsa+renesas(a)sang-engineering.com> ipmi: docs: don't advertise deprecated sysfs entries Vladimir Lypak <vladimir.lypak(a)gmail.com> drm/msm/a5xx: fix races in preemption evaluation stage Vladimir Lypak <vladimir.lypak(a)gmail.com> drm/msm/a5xx: properly clear preemption records on resume Jeongjun Park <aha310510(a)gmail.com> jfs: fix out-of-bounds in dbNextAG() and diAlloc() Nikita Zhandarovich <n.zhandarovich(a)fintech.ru> drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets Alex Bee <knaerzche(a)gmail.com> drm/rockchip: vop: Allow 4096px width scaling Alex Deucher <alexander.deucher(a)amd.com> drm/radeon: properly handle vbios fake edid sizing Paulo Miguel Almeida <paulo.miguel.almeida.rodenas(a)gmail.com> drm/radeon: Replace one-element array with flexible-array member Alex Deucher <alexander.deucher(a)amd.com> drm/amdgpu: properly handle vbios fake edid sizing Paulo Miguel Almeida <paulo.miguel.almeida.rodenas(a)gmail.com> drm/amdgpu: Replace one-element array with flexible-array member Matteo Croce <mcroce(a)redhat.com> drm/amd: fix typo Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> drm/stm: Fix an error handling path in stm_drm_platform_probe() Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> fbdev: hpfb: Fix an error handling path in hpfb_dio_probe() Artur Weber <aweber.kernel(a)gmail.com> power: supply: max17042_battery: Fix SOC threshold calc w/ no current sense Yuntao Liu <liuyuntao12(a)huawei.com> hwmon: (ntc_thermistor) fix module autoloading Mirsad Todorovac <mtodorovac69(a)gmail.com> mtd: slram: insert break after errors in parsing the map Guenter Roeck <linux(a)roeck-us.net> hwmon: (max16065) Fix overflows seen when writing limits Ankit Agrawal <agrawal.ag.ankit(a)gmail.com> clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init() Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> reset: berlin: fix OF node leak in probe() error path Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> ARM: versatile: fix OF node leak in CPUs prepare Andy Shevchenko <andriy.shevchenko(a)linux.intel.com> spi: ppc4xx: Avoid returning 0 when failed to parse and map IRQ Ma Ke <make24(a)iscas.ac.cn> spi: ppc4xx: handle irq_of_parse_and_map() errors Yu Kuai <yukuai3(a)huawei.com> block, bfq: don't break merge chain in bfq_split_bfqq() Yu Kuai <yukuai3(a)huawei.com> block, bfq: choose the last bfqq from merge chain in bfq_setup_cooperator() Yu Kuai <yukuai3(a)huawei.com> block, bfq: fix possible UAF for bfqq->bic with merge chain Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Bluetooth: btusb: Fix not handling ZPL/short-transfer Kuniyuki Iwashima <kuniyu(a)amazon.com> can: bcm: Clear bo->bcm_proc_read after remove_proc_entry(). Dmitry Antipov <dmantipov(a)yandex.ru> wifi: mac80211: use two-phase skb reclamation in ieee80211_do_stop() Dmitry Antipov <dmantipov(a)yandex.ru> wifi: cfg80211: fix two more possible UBSAN-detected off-by-one errors Dmitry Antipov <dmantipov(a)yandex.ru> wifi: cfg80211: fix UBSAN noise in cfg80211_wext_siwscan() Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire Toke Høiland-Jørgensen <toke(a)redhat.com> wifi: ath9k: Remove error checks when creating debugfs entries Minjie Du <duminjie(a)vivo.com> wifi: ath9k: fix parameter check in ath9k_init_debug() Aleksandr Mishin <amishin(a)t-argos.ru> ACPI: PMIC: Remove unneeded check in tps68470_pmic_opregion_probe() Junhao Xie <bigfoot(a)classfun.cn> USB: serial: pl2303: add device id for Macrosilicon MS3020 Hagar Hemdan <hagarhem(a)amazon.com> gpio: prevent potential speculation leaks in gpio_device_get_desc() Ferry Meng <mengferry(a)linux.alibaba.com> ocfs2: strict bound check before memcmp in ocfs2_xattr_find_entry() Ferry Meng <mengferry(a)linux.alibaba.com> ocfs2: add bounds checking to ocfs2_xattr_find_entry() Michael Kelley <mhklinux(a)outlook.com> x86/hyperv: Set X86_FEATURE_TSC_KNOWN_FREQ when Hyper-V provides frequency Liao Chen <liaochen4(a)huawei.com> spi: bcm63xx: Enable module autoloading Liao Chen <liaochen4(a)huawei.com> ASoC: tda7419: fix module autoloading Emmanuel Grumbach <emmanuel.grumbach(a)intel.com> wifi: iwlwifi: mvm: don't wait for tx queues if firmware is dead Daniel Gabay <daniel.gabay(a)intel.com> wifi: iwlwifi: mvm: fix iwl_mvm_max_scan_ie_fw_cmd_room() Jacky Chou <jacky_chou(a)aspeedtech.com> net: ftgmac100: Ensure tx descriptor updates are visible Mike Rapoport <rppt(a)kernel.org> microblaze: don't treat zero reserved memory regions as error Thomas Blocher <thomas.blocher(a)ek-dev.de> pinctrl: at91: make it work with current gpiolib Hongbo Li <lihongbo22(a)huawei.com> ASoC: allow module autoloading for table db1200_pids Samasth Norway Ananda <samasth.norway.ananda(a)oracle.com> selftests/kcmp: remove call to ksft_set_plan() Samasth Norway Ananda <samasth.norway.ananda(a)oracle.com> selftests/vm: remove call to ksft_set_plan() Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> soundwire: stream: Revert "soundwire: stream: fix programming slave ports for non-continous port maps" Sean Anderson <sean.anderson(a)linux.dev> net: dpaa: Pad packets to ETH_ZLEN Jacky Chou <jacky_chou(a)aspeedtech.com> net: ftgmac100: Enable TX interrupt to avoid TX timeout Eran Ben Elisha <eranbe(a)mellanox.com> net/mlx5: Update the list of the PCI supported devices Quentin Schulz <quentin.schulz(a)cherry.de> arm64: dts: rockchip: override BIOS_DISABLE signal via GPIO hog on RK3399 Puma Anders Roxell <anders.roxell(a)linaro.org> scripts: kconfig: merge_config: config files: add a trailing newline Pawel Dembicki <paweldembicki(a)gmail.com> net: phy: vitesse: repair vsc73xx autonegotiation Moon Yeounsu <yyyynoom(a)gmail.com> net: ethernet: use ip_hdrlen() instead of bit shift Foster Snowhill <forst(a)pen.gy> usbnet: ipheth: fix carrier detection in modes 1 and 4 Aleksandr Mishin <amishin(a)t-argos.ru> staging: iio: frequency: ad9834: Validate frequency parameter value Beniamin Bia <biabeniamin(a)gmail.com> staging: iio: frequency: ad9833: Load clock using clock framework Beniamin Bia <biabeniamin(a)gmail.com> staging: iio: frequency: ad9833: Get frequency value statically ------------- Diffstat: .gitignore | 1 - Documentation/IPMI.txt | 2 +- Documentation/arm64/silicon-errata.txt | 2 + Documentation/driver-model/devres.txt | 1 + Makefile | 4 +- arch/arm/mach-realview/platsmp-dt.c | 1 + arch/arm64/Kconfig | 2 + arch/arm64/boot/dts/rockchip/rk3399-puma.dtsi | 23 +- arch/arm64/include/asm/cputype.h | 4 + arch/arm64/include/asm/uprobes.h | 12 +- arch/arm64/kernel/cpu_errata.c | 2 + arch/arm64/kernel/probes/decode-insn.c | 16 +- arch/arm64/kernel/probes/simulate-insn.c | 18 +- arch/arm64/kernel/probes/uprobes.c | 4 +- arch/microblaze/mm/init.c | 5 - arch/parisc/kernel/entry.S | 6 +- arch/parisc/kernel/syscall.S | 14 +- arch/riscv/Kconfig | 5 + arch/s390/include/asm/facility.h | 6 +- arch/s390/kernel/perf_cpum_sf.c | 12 +- arch/s390/kvm/diag.c | 2 +- arch/s390/kvm/gaccess.c | 162 +++++--- arch/s390/kvm/gaccess.h | 14 +- arch/s390/mm/cmm.c | 18 +- arch/x86/include/asm/cpufeatures.h | 3 +- arch/x86/kernel/apic/apic.c | 14 +- arch/x86/kernel/cpu/mshyperv.c | 1 + arch/x86/xen/setup.c | 2 +- block/bfq-iosched.c | 13 +- crypto/aead.c | 3 +- crypto/cipher.c | 3 +- drivers/acpi/acpica/dbconvert.c | 2 + drivers/acpi/acpica/exprep.c | 3 + drivers/acpi/acpica/psargs.c | 47 +++ drivers/acpi/battery.c | 28 +- drivers/acpi/button.c | 11 + drivers/acpi/device_sysfs.c | 5 +- drivers/acpi/ec.c | 55 ++- drivers/acpi/pmic/tps68470_pmic.c | 6 +- drivers/ata/sata_sil.c | 12 +- drivers/base/bus.c | 6 +- drivers/base/core.c | 13 +- drivers/base/firmware_loader/main.c | 30 ++ drivers/base/module.c | 4 - drivers/block/aoe/aoecmd.c | 13 +- drivers/block/drbd/drbd_main.c | 8 +- drivers/block/drbd/drbd_state.c | 2 +- drivers/bluetooth/btusb.c | 10 +- drivers/char/virtio_console.c | 18 +- drivers/clk/bcm/clk-bcm53573-ilp.c | 2 +- drivers/clk/clk-devres.c | 115 +++++- drivers/clk/rockchip/clk-rk3228.c | 2 +- drivers/clk/rockchip/clk.c | 3 +- drivers/clk/ti/clk-dra7-atl.c | 1 + drivers/clocksource/timer-qcom.c | 7 +- drivers/firmware/arm_sdei.c | 2 +- drivers/gpio/gpio-aspeed.c | 4 +- drivers/gpio/gpio-davinci.c | 8 +- drivers/gpio/gpiolib.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 15 +- drivers/gpu/drm/amd/amdgpu/atombios_encoders.c | 26 +- drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 2 + .../gpu/drm/amd/display/dc/dcn10/dcn10_cm_common.c | 2 + drivers/gpu/drm/amd/include/atombios.h | 4 +- drivers/gpu/drm/drm_crtc.c | 17 +- drivers/gpu/drm/drm_print.c | 13 +- drivers/gpu/drm/msm/adreno/a5xx_gpu.h | 1 + drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 26 +- drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c | 2 +- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- drivers/gpu/drm/radeon/atombios.h | 2 +- drivers/gpu/drm/radeon/evergreen_cs.c | 62 +-- drivers/gpu/drm/radeon/r100.c | 70 ++-- drivers/gpu/drm/radeon/radeon_atombios.c | 26 +- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 4 +- drivers/gpu/drm/stm/drv.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 1 + drivers/hid/hid-ids.h | 2 + drivers/hid/hid-plantronics.c | 23 ++ drivers/hwmon/max16065.c | 5 +- drivers/hwmon/ntc_thermistor.c | 1 + drivers/hwtracing/coresight/coresight-tmc-etr.c | 2 +- drivers/i2c/busses/i2c-aspeed.c | 16 +- drivers/i2c/busses/i2c-i801.c | 9 +- drivers/i2c/busses/i2c-isch.c | 3 +- drivers/i2c/busses/i2c-xiic.c | 19 +- drivers/iio/adc/Kconfig | 2 + .../iio/common/hid-sensors/hid-sensor-trigger.c | 2 +- drivers/iio/dac/Kconfig | 1 + drivers/iio/light/opt3001.c | 4 + drivers/iio/magnetometer/ak8975.c | 32 +- drivers/infiniband/core/iwcm.c | 2 +- drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 2 +- drivers/infiniband/hw/cxgb4/cm.c | 14 +- drivers/input/keyboard/adp5589-keys.c | 13 +- drivers/input/rmi4/rmi_driver.c | 6 +- drivers/mailbox/bcm2835-mailbox.c | 3 +- drivers/mailbox/rockchip-mailbox.c | 2 +- drivers/media/common/videobuf2/videobuf2-core.c | 8 +- drivers/media/dvb-frontends/rtl2830.c | 2 +- drivers/media/dvb-frontends/rtl2832.c | 2 +- drivers/media/platform/qcom/venus/core.c | 1 + drivers/misc/sgi-gru/grukservices.c | 2 - drivers/misc/sgi-gru/grumain.c | 4 - drivers/misc/sgi-gru/grutlbpurge.c | 2 - drivers/mtd/devices/slram.c | 2 + drivers/net/dsa/mv88e6xxx/global1_atu.c | 3 +- drivers/net/ethernet/aeroflex/greth.c | 3 +- drivers/net/ethernet/amd/mvme147.c | 7 +- drivers/net/ethernet/broadcom/bcmsysport.c | 1 + drivers/net/ethernet/cortina/gemini.c | 15 +- drivers/net/ethernet/emulex/benet/be_main.c | 10 +- drivers/net/ethernet/faraday/ftgmac100.c | 26 +- drivers/net/ethernet/faraday/ftgmac100.h | 2 +- drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 9 +- drivers/net/ethernet/hisilicon/hip04_eth.c | 1 + drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 1 + drivers/net/ethernet/hisilicon/hns_mdio.c | 1 + drivers/net/ethernet/i825xx/sun3_82586.c | 1 + drivers/net/ethernet/ibm/emac/mal.c | 2 +- drivers/net/ethernet/intel/igb/igb_main.c | 6 +- drivers/net/ethernet/jme.c | 10 +- drivers/net/ethernet/lantiq_etop.c | 4 +- drivers/net/ethernet/marvell/mvpp2/mvpp2.h | 2 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + drivers/net/ethernet/seeq/ether3.c | 2 + drivers/net/gtp.c | 27 +- drivers/net/hyperv/netvsc_drv.c | 30 ++ drivers/net/macsec.c | 18 - drivers/net/phy/vitesse.c | 14 - drivers/net/ppp/ppp_async.c | 2 +- drivers/net/usb/cdc_ncm.c | 8 +- drivers/net/usb/ipheth.c | 5 +- drivers/net/usb/r8152.c | 73 +--- drivers/net/usb/usbnet.c | 3 +- drivers/net/wireless/ath/ath10k/wmi-tlv.c | 7 +- drivers/net/wireless/ath/ath10k/wmi.c | 2 + drivers/net/wireless/ath/ath9k/debug.c | 6 +- drivers/net/wireless/ath/ath9k/hif_usb.c | 6 +- drivers/net/wireless/ath/ath9k/htc_drv_debug.c | 2 - drivers/net/wireless/intel/iwlegacy/common.c | 2 + drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 9 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 8 +- drivers/net/wireless/marvell/mwifiex/fw.h | 2 +- drivers/net/wireless/marvell/mwifiex/scan.c | 3 +- drivers/ntb/hw/intel/ntb_hw_gen1.c | 2 +- drivers/of/irq.c | 38 +- drivers/parport/procfs.c | 22 +- drivers/pci/controller/pcie-xilinx-nwl.c | 24 +- drivers/pci/quirks.c | 6 + drivers/pinctrl/mvebu/pinctrl-dove.c | 45 +- drivers/pinctrl/pinctrl-at91.c | 5 +- drivers/pinctrl/pinctrl-single.c | 3 +- drivers/power/reset/brcmstb-reboot.c | 3 - drivers/power/supply/max17042_battery.c | 5 +- drivers/pps/clients/pps_parport.c | 14 +- drivers/reset/reset-berlin.c | 3 +- drivers/rtc/Kconfig | 2 +- drivers/rtc/rtc-at91sam9.c | 45 +- drivers/s390/char/sclp_vt220.c | 4 +- drivers/scsi/aacraid/aacraid.h | 2 +- drivers/soc/versatile/soc-integrator.c | 1 + drivers/soc/versatile/soc-realview.c | 20 +- drivers/soundwire/stream.c | 8 +- drivers/spi/spi-bcm63xx.c | 2 + drivers/spi/spi-ppc4xx.c | 7 +- drivers/spi/spi-s3c64xx.c | 4 +- drivers/staging/iio/frequency/ad9834.c | 54 +-- drivers/staging/iio/frequency/ad9834.h | 28 -- drivers/tty/serial/rp2.c | 2 +- drivers/tty/vt/vt.c | 2 +- drivers/usb/chipidea/udc.c | 8 +- drivers/usb/dwc3/core.c | 49 ++- drivers/usb/dwc3/core.h | 11 +- drivers/usb/dwc3/gadget.c | 11 - drivers/usb/host/xhci-pci.c | 5 + drivers/usb/host/xhci-ring.c | 16 +- drivers/usb/host/xhci.h | 2 +- drivers/usb/misc/appledisplay.c | 15 +- drivers/usb/misc/cypress_cy7c63.c | 4 + drivers/usb/misc/yurex.c | 5 +- drivers/usb/phy/phy.c | 2 +- drivers/usb/serial/option.c | 8 + drivers/usb/serial/pl2303.c | 1 + drivers/usb/serial/pl2303.h | 4 + drivers/usb/storage/unusual_devs.h | 11 + drivers/usb/typec/class.c | 3 + drivers/video/fbdev/hpfb.c | 1 + drivers/video/fbdev/pxafb.c | 1 + drivers/video/fbdev/sis/sis_main.c | 2 +- drivers/xen/swiotlb-xen.c | 40 +- fs/btrfs/disk-io.c | 11 + fs/ceph/addr.c | 1 - fs/ext4/ext4.h | 1 + fs/ext4/extents.c | 70 +++- fs/ext4/ialloc.c | 2 + fs/ext4/inline.c | 35 +- fs/ext4/inode.c | 11 +- fs/ext4/mballoc.c | 10 +- fs/ext4/migrate.c | 2 +- fs/ext4/move_extent.c | 1 - fs/ext4/namei.c | 14 +- fs/ext4/xattr.c | 4 +- fs/f2fs/acl.c | 23 +- fs/f2fs/dir.c | 3 +- fs/f2fs/f2fs.h | 4 +- fs/f2fs/file.c | 24 +- fs/f2fs/super.c | 4 +- fs/f2fs/xattr.c | 29 +- fs/fat/namei_vfat.c | 2 +- fs/fcntl.c | 14 +- fs/inode.c | 4 + fs/jbd2/checkpoint.c | 14 +- fs/jbd2/commit.c | 36 +- fs/jbd2/journal.c | 2 + fs/jfs/jfs_discard.c | 11 +- fs/jfs/jfs_dmap.c | 11 +- fs/jfs/jfs_imap.c | 2 +- fs/jfs/xattr.c | 2 + fs/lockd/clnt4xdr.c | 14 - fs/lockd/clntxdr.c | 14 - fs/nfs/callback_xdr.c | 61 ++- fs/nfs/nfs2xdr.c | 84 ++-- fs/nfs/nfs3xdr.c | 163 +++----- fs/nfs/nfs42xdr.c | 21 +- fs/nfs/nfs4state.c | 1 + fs/nfs/nfs4xdr.c | 451 ++++++--------------- fs/nfsd/nfs4callback.c | 13 - fs/nfsd/nfs4idmap.c | 13 +- fs/nfsd/nfs4state.c | 15 +- fs/nilfs2/btree.c | 12 +- fs/nilfs2/dir.c | 50 +-- fs/nilfs2/namei.c | 42 +- fs/nilfs2/nilfs.h | 2 +- fs/nilfs2/page.c | 7 +- fs/ocfs2/aops.c | 5 +- fs/ocfs2/buffer_head_io.c | 4 +- fs/ocfs2/file.c | 8 + fs/ocfs2/journal.c | 7 +- fs/ocfs2/localalloc.c | 19 + fs/ocfs2/quota_local.c | 8 +- fs/ocfs2/refcounttree.c | 26 +- fs/ocfs2/xattr.c | 38 +- fs/udf/inode.c | 9 +- include/drm/drm_print.h | 54 ++- include/dt-bindings/power/r8a774b1-sysc.h | 26 ++ include/linux/clk.h | 145 +++++++ include/linux/jbd2.h | 4 + include/linux/pci_ids.h | 2 + include/net/sock.h | 2 + include/net/tcp.h | 27 +- include/trace/events/f2fs.h | 3 +- include/trace/events/sched.h | 84 ++++ include/uapi/linux/cec.h | 6 +- include/uapi/linux/netfilter/nf_tables.h | 2 +- kernel/bpf/arraymap.c | 3 + kernel/bpf/hashtab.c | 3 + kernel/bpf/lpm_trie.c | 2 +- kernel/cgroup/cgroup.c | 4 +- kernel/events/core.c | 6 +- kernel/events/uprobes.c | 2 +- kernel/kthread.c | 19 +- kernel/signal.c | 11 +- kernel/time/posix-clock.c | 3 + kernel/trace/trace_output.c | 6 +- lib/xz/xz_crc32.c | 2 +- lib/xz/xz_private.h | 4 - mm/shmem.c | 2 + net/bluetooth/af_bluetooth.c | 1 + net/bluetooth/bnep/core.c | 3 +- net/bluetooth/rfcomm/sock.c | 2 - net/bridge/br_netfilter_hooks.c | 5 + net/can/bcm.c | 4 +- net/core/dev.c | 29 +- net/ipv4/devinet.c | 6 +- net/ipv4/fib_frontend.c | 2 +- net/ipv4/ip_gre.c | 6 +- net/ipv4/netfilter/nf_dup_ipv4.c | 7 +- net/ipv4/tcp_input.c | 24 +- net/ipv4/tcp_ipv4.c | 5 +- net/ipv4/tcp_output.c | 2 +- net/ipv4/tcp_rate.c | 17 +- net/ipv4/tcp_recovery.c | 5 +- net/ipv6/addrconf.c | 8 +- net/ipv6/netfilter/nf_dup_ipv6.c | 7 +- net/ipv6/netfilter/nf_reject_ipv6.c | 14 +- net/mac80211/cfg.c | 3 +- net/mac80211/iface.c | 17 +- net/mac80211/key.c | 42 +- net/netfilter/nf_conntrack_netlink.c | 7 +- net/netfilter/nf_tables_api.c | 2 +- net/netfilter/nft_payload.c | 3 + net/netlink/af_netlink.c | 3 +- net/qrtr/qrtr.c | 2 +- net/sched/sch_api.c | 2 +- net/sctp/socket.c | 4 +- net/tipc/bearer.c | 8 +- net/wireless/nl80211.c | 3 +- net/wireless/scan.c | 6 +- net/wireless/sme.c | 3 +- net/xfrm/xfrm_user.c | 6 +- scripts/kconfig/merge_config.sh | 2 + security/selinux/selinuxfs.c | 31 +- security/smack/smackfs.c | 2 +- security/tomoyo/domain.c | 9 +- sound/core/init.c | 14 +- sound/pci/asihpi/hpimsgx.c | 2 +- sound/pci/hda/hda_generic.c | 4 +- sound/pci/hda/patch_conexant.c | 24 +- sound/pci/hda/patch_realtek.c | 38 +- sound/pci/rme9652/hdsp.c | 6 +- sound/pci/rme9652/hdspm.c | 6 +- sound/soc/au1x/db1200.c | 1 + sound/soc/codecs/tda7419.c | 1 + tools/iio/iio_generic_buffer.c | 4 + tools/perf/builtin-sched.c | 8 +- tools/perf/util/time-utils.c | 4 +- tools/testing/ktest/ktest.pl | 2 +- tools/testing/selftests/bpf/test_lru_map.c | 3 +- .../breakpoints/step_after_suspend_test.c | 5 +- tools/testing/selftests/kcmp/kcmp_test.c | 1 - tools/testing/selftests/vDSO/parse_vdso.c | 3 +- tools/testing/selftests/vm/compaction_test.c | 2 - tools/usb/usbip/src/usbip_detach.c | 1 + virt/kvm/kvm_main.c | 5 +- 326 files changed, 2648 insertions(+), 1791 deletions(-)

1 year, 1 month

5
356
0 0

[PATCH v2 0/2] powerpc: Prepare for clang's per-task stack protector support

by Nathan Chancellor

This series prepares the powerpc Kconfig and Kbuild files for clang's per-task stack protector support. clang requires '-mstack-protector-guard-offset' to always be passed with the other '-mstack-protector-guard' flags, which does not always happen with the powerpc implementation, unlike arm, arm64, and riscv implementations. This series brings powerpc in line with those other architectures, which allows clang's support to work right away when it is merged. Additionally, there is one other fix needed for the Kconfig test to work correctly when targeting 32-bit. I have tested this series in QEMU against LKDTM's REPORT_STACK_CANARY with ppc64le_guest_defconfig and pmac32_defconfig built with a toolchain that contains Keith's in-progress pull request, which should land for LLVM 20: https://github.com/llvm/llvm-project/pull/110928 --- Changes in v2: - Combined patch 1 and 3, as they are fixing the same test for similar reasons; adjust commit message accordingly (Christophe) - Moved stack protector guard flags on one line in Makefile (Christophe) - Add 'Cc: stable' targeting 6.1 and newer for the sake of simplicity, as it is the oldest stable release where this series applies cleanly (folks who want it on earlier releases can request or perform a backport separately). - Pick up Keith's Reviewed-by and Tested-by on both patches. - Add a blurb to commit message of patch 1 explaining why clang's register selection behavior differs from GCC. - Link to v1: https://lore.kernel.org/r/20241007-powerpc-fix-stackprotector-test-clang-v1… --- Nathan Chancellor (2): powerpc: Fix stack protector Kconfig test for clang powerpc: Adjust adding stack protector flags to KBUILD_CLAGS for clang arch/powerpc/Kconfig | 4 ++-- arch/powerpc/Makefile | 13 ++++--------- 2 files changed, 6 insertions(+), 11 deletions(-) --- base-commit: 8cf0b93919e13d1e8d4466eb4080a4c4d9d66d7b change-id: 20241004-powerpc-fix-stackprotector-test-clang-84e67ed82f62 Best regards, -- Nathan Chancellor <nathan(a)kernel.org>

1 year, 1 month

2
3
0 0

[for 6.11 PATCH] RISC-V: disallow gcc + rust builds

by Conor Dooley

From: Conor Dooley <conor.dooley(a)microchip.com> commit 33549fcf37ec461f398f0a41e1c9948be2e5aca4 upstream During the discussion before supporting rust on riscv, it was decided not to support gcc yet, due to differences in extension handling compared to llvm (only the version of libclang matching the c compiler is supported). Recently Jason Montleon reported [1] that building with gcc caused build issues, due to unsupported arguments being passed to libclang. After some discussion between myself and Miguel, it is better to disable gcc + rust builds to match the original intent, and subsequently support it when an appropriate set of extensions can be deduced from the version of libclang. Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1] Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2] Fixes: 70a57b247251a ("RISC-V: enable building 64-bit kernels with rust support") Cc: stable(a)vger.kernel.org Reported-by: Jason Montleon <jmontleo(a)redhat.com> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com> Acked-by: Miguel Ojeda <ojeda(a)kernel.org> Reviewed-by: Nathan Chancellor <nathan(a)kernel.org> Link: https://lore.kernel.org/r/20241001-playlist-deceiving-16ece2f440f5@spud Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com> --- Documentation/rust/arch-support.rst | 2 +- arch/riscv/Kconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/rust/arch-support.rst b/Documentation/rust/arch-support.rst index 750ff371570a..54be7ddf3e57 100644 --- a/Documentation/rust/arch-support.rst +++ b/Documentation/rust/arch-support.rst @@ -17,7 +17,7 @@ Architecture Level of support Constraints ============= ================ ============================================== ``arm64`` Maintained Little Endian only. ``loongarch`` Maintained \- -``riscv`` Maintained ``riscv64`` only. +``riscv`` Maintained ``riscv64`` and LLVM/Clang only. ``um`` Maintained \- ``x86`` Maintained ``x86_64`` only. ============= ================ ============================================== diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index d11c2479d8e1..6651a5cbdc27 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -172,7 +172,7 @@ config RISCV select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RETHOOK if !XIP_KERNEL select HAVE_RSEQ - select HAVE_RUST if 64BIT + select HAVE_RUST if 64BIT && CC_IS_CLANG select HAVE_SAMPLE_FTRACE_DIRECT select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_STACKPROTECTOR -- 2.45.2

1 year, 1 month

2
1
0 0

[PATCH net 0/6] Make TCP-MD5-diag slightly less broken

by Dmitry Safonov via B4 Relay

My original intent was to replace the last non-upstream Arista's TCP-AO piece. That is per-netns procfs seqfile which lists AO keys. In my view an acceptable upstream alternative would be TCP-AO-diag uAPI. So, I started by looking and reviewing TCP-MD5-diag code. And straight away I saw a bunch of issues: 1. Similarly to TCP_MD5SIG_EXT, which doesn't check tcpm_flags for unknown flags and so being non-extendable setsockopt(), the same way tcp_diag_put_md5sig() dumps md5 keys in an array of tcp_diag_md5sig, which makes it ABI non-extendable structure as userspace can't tolerate any new members in it. 2. Inet-diag allocates netlink message for sockets in inet_diag_dump_one_icsk(), which uses a TCP-diag callback .idiag_get_aux_size(), that pre-calculates the needed space for TCP-diag related information. But as neither socket lock nor rcu_readlock() are held between allocation and the actual TCP info filling, the TCP-related space requirement may change before reaching tcp_diag_put_md5sig(). I.e., the number of TCP-MD5 keys on a socket. Thankfully, TCP-MD5-diag won't overwrite the skb, but will return EMSGSIZE, triggering WARN_ON() in inet_diag_dump_one_icsk(). 3. Inet-diag "do" request* can create skb of any message required size. But "dump" request* the skb size, since d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()") is limited by 32 KB. Having in mind that sizeof(struct tcp_diag_md5sig) = 100 bytes, dumps for sockets that have more than 327 keys are going to fail (not counting other diag infos, which lower this limit futher). That is much lower than the number of TCP-MD5 keys that can be allocated on a socket with the current default optmem_max limit (128Kb). So, then I went and written selftests for TCP-MD5-diag and besides confirming that (2) and (3) are not theoretical issues, I also discovered another issues, that I didn't notice on code inspection: 4. nlattr::nla_len is __u16, which limits the largest netlink attibute by 64Kb or by 655 tcp_diag_md5sig keys in the diag array. What happens de-facto is that the netlink attribute gets u16 overflow, breaking the userspace parsing - RTA_NEXT(), that should point to the next attribute, points into the middle of md5 keys array. In this patch set issues (2) and (4) are addressed. (2) by not returning EMSGSIZE when the dump raced with modifying TCP-MD5 keys on a socket, but mark the dump inconsistent by setting NLM_F_DUMP_INTR nlmsg flag. Which changes uAPI in situations where previously kernel did WARN() and errored the dump. (4) by artificially limiting the maximum attribute size by U16_MAX - 1. In order to remove the new limit from (4) solution, my plan is to convert the dump of TCP-MD5 keys from an array to NL_ATTR_TYPE_NESTED_ARRAY (or alike), which should also address (1). And for (3), it's needed to teach tcp-diag how-to remember not only socket on which previous recvmsg() stopped, but potentially TCP-MD5 key as well. I plan in the next part of patch set address (3), (1) and the new limit for (4), together with adding new TCP-AO-diag. * Terminology from Documentation/userspace-api/netlink/intro.rst Signed-off-by: Dmitry Safonov <0x7f454c46(a)gmail.com> --- Dmitry Safonov (6): net/diag: Do not race on dumping MD5 keys with adding new MD5 keys net/diag: Warn only once on EMSGSIZE net/diag: Pre-allocate optional info only if requested net/diag: Always pre-allocate tcp_ulp info net/diag: Limit TCP-MD5-diag array by max attribute length net/netlink: Correct the comment on netlink message max cap include/linux/inet_diag.h | 3 +- include/net/tcp.h | 1 - net/ipv4/inet_diag.c | 87 ++++++++++++++++++++++++++++++++++++++--------- net/ipv4/tcp_diag.c | 69 ++++++++++++++++++------------------- net/mptcp/diag.c | 20 ----------- net/netlink/af_netlink.c | 4 +-- net/tls/tls_main.c | 17 --------- 7 files changed, 109 insertions(+), 92 deletions(-) --- base-commit: 2e1b3cc9d7f790145a80cb705b168f05dab65df2 change-id: 20241106-tcp-md5-diag-prep-2f0dcf371d90 Best regards, -- Dmitry Safonov <0x7f454c46(a)gmail.com>

1 year, 1 month

2
2
0 0

[PATCH 6.1 0/1] cpufreq: amd-pstate: add check for cpufreq_cpu_get's return value

by Anastasia Belova

NULL-dereference is possible in amd_pstate_adjust_perf in 6.1 stable release. The problem has been fixed by the following upstream patch that was adapted to 6.1. The patch couldn't be applied clearly but the changes made are minor. Found by Linux Verification Center (linuxtesting.org) with SVACE.

1 year, 1 month

2
3
0 0

[PATCH] irqchip/gic-v3: Force propagation of the active state with a read-back

by Marc Zyngier

Christoffer reports that on some implementations, writing to GICR_ISACTIVER0 (and similar GICD registers) can race badly with a guest issuing a deactivation of that interrupt via the system register interface. There are multiple reasons to this: - we use an early write-acknoledgement memory type (nGnRE), meaning that the write may only have made it as far as some interconnect by the time the store is considered "done" - the GIC itself is allowed to buffer the write until it decides to take it into account (as long as it is in finite time) The effects are that the activation may not have taken effect by the time we enter the guest, forcing an immediate exit, or that a guest deactivation occurs before the interrupt is active, doing nothing. In order to guarantee that the write to the ISACTIVER register has taken effect, read back from it, forcing the interconnect to propagate the write, and the GIC to process the write before returning the read. Reported-by: Christoffer Dall <christoffer.dall(a)arm.com> Acked-by: Christoffer Dall <christoffer.dall(a)arm.com> Signed-off-by: Marc Zyngier <maz(a)kernel.org> Cc: stable(a)vger.kernel.org --- drivers/irqchip/irq-gic-v3.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c index ce87205e3e823..8b6159f4cdafa 100644 --- a/drivers/irqchip/irq-gic-v3.c +++ b/drivers/irqchip/irq-gic-v3.c @@ -524,6 +524,13 @@ static int gic_irq_set_irqchip_state(struct irq_data *d, } gic_poke_irq(d, reg); + + /* + * Force read-back to guarantee that the active state has taken + * effect, and won't race with a guest-driven deactivation. + */ + if (reg == GICD_ISACTIVER) + gic_peek_irq(d, reg); return 0; } -- 2.39.2

1 year, 1 month

2
1
0 0

[PATCH] arm64/ptrace: Zero FPMR on streaming mode entry/exit

by Mark Brown

When FPMR and SME are both present then entering and exiting streaming mode clears FPMR in the same manner as it clears the V/Z and P registers. Since entering and exiting streaming mode via ptrace is expected to have the same effect as doing so via SMSTART/SMSTOP it should clear FPMR too but this was missed when FPMR support was added. Add the required reset of FPMR. Since changing the vector length resets SVCR a SME vector length change implemented via a write to ZA can trigger an exit of streaming mode and we need to check when writing to ZA as well. Fixes: 4035c22ef7d4 ("arm64/ptrace: Expose FPMR via ptrace") Signed-off-by: Mark Brown <broonie(a)kernel.org> Cc: stable(a)vger.kernel.org --- arch/arm64/kernel/ptrace.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c index b756578aeaeea1d3250276734520e3eaae8a671d..f242df53de2992bf8a3fd51710d6653fe82f7779 100644 --- a/arch/arm64/kernel/ptrace.c +++ b/arch/arm64/kernel/ptrace.c @@ -876,6 +876,7 @@ static int sve_set_common(struct task_struct *target, const void *kbuf, const void __user *ubuf, enum vec_type type) { + u64 old_svcr = target->thread.svcr; int ret; struct user_sve_header header; unsigned int vq; @@ -903,8 +904,6 @@ static int sve_set_common(struct task_struct *target, /* Enter/exit streaming mode */ if (system_supports_sme()) { - u64 old_svcr = target->thread.svcr; - switch (type) { case ARM64_VEC_SVE: target->thread.svcr &= ~SVCR_SM_MASK; @@ -1003,6 +1002,10 @@ static int sve_set_common(struct task_struct *target, start, end); out: + /* If we entered or exited streaming mode then reset FPMR */ + if ((target->thread.svcr & SVCR_SM) != (old_svcr & SVCR_SM)) + target->thread.uw.fpmr = 0; + fpsimd_flush_task_state(target); return ret; } @@ -1099,6 +1102,7 @@ static int za_set(struct task_struct *target, unsigned int pos, unsigned int count, const void *kbuf, const void __user *ubuf) { + u64 old_svcr = target->thread.svcr; int ret; struct user_za_header header; unsigned int vq; @@ -1175,6 +1179,10 @@ static int za_set(struct task_struct *target, target->thread.svcr |= SVCR_ZA_MASK; out: + /* If we entered or exited streaming mode then reset FPMR */ + if ((target->thread.svcr & SVCR_SM) != (old_svcr & SVCR_SM)) + target->thread.uw.fpmr = 0; + fpsimd_flush_task_state(target); return ret; } --- base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354 change-id: 20241106-arm64-ptrace-fpmr-sm-45390f592574 Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 1 month

1
0
0 0

+ ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: ocfs2: fix UBSAN warning in ocfs2_verify_volume() has been added to the -mm mm-hotfixes-unstable branch. Its filename is ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Dmitry Antipov <dmantipov(a)yandex.ru> Subject: ocfs2: fix UBSAN warning in ocfs2_verify_volume() Date: Wed, 6 Nov 2024 12:21:00 +0300 Syzbot has reported the following splat triggered by UBSAN: UBSAN: shift-out-of-bounds in fs/ocfs2/super.c:2336:10 shift exponent 32768 is too large for 32-bit type 'int' CPU: 2 UID: 0 PID: 5255 Comm: repro Not tainted 6.12.0-rc4-syzkaller-00047-gc2ee9f594da8 #0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x241/0x360 ? __pfx_dump_stack_lvl+0x10/0x10 ? __pfx__printk+0x10/0x10 ? __asan_memset+0x23/0x50 ? lockdep_init_map_type+0xa1/0x910 __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 ocfs2_fill_super+0xf9c/0x5750 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? __pfx_validate_chain+0x10/0x10 ? validate_chain+0x11e/0x5920 ? __lock_acquire+0x1384/0x2050 ? __pfx_validate_chain+0x10/0x10 ? string+0x26a/0x2b0 ? widen_string+0x3a/0x310 ? string+0x26a/0x2b0 ? bdev_name+0x2b1/0x3c0 ? pointer+0x703/0x1210 ? __pfx_pointer+0x10/0x10 ? __pfx_format_decode+0x10/0x10 ? __lock_acquire+0x1384/0x2050 ? vsnprintf+0x1ccd/0x1da0 ? snprintf+0xda/0x120 ? __pfx_lock_release+0x10/0x10 ? do_raw_spin_lock+0x14f/0x370 ? __pfx_snprintf+0x10/0x10 ? set_blocksize+0x1f9/0x360 ? sb_set_blocksize+0x98/0xf0 ? setup_bdev_super+0x4e6/0x5d0 mount_bdev+0x20c/0x2d0 ? __pfx_ocfs2_fill_super+0x10/0x10 ? __pfx_mount_bdev+0x10/0x10 ? vfs_parse_fs_string+0x190/0x230 ? __pfx_vfs_parse_fs_string+0x10/0x10 legacy_get_tree+0xf0/0x190 ? __pfx_ocfs2_mount+0x10/0x10 vfs_get_tree+0x92/0x2b0 do_new_mount+0x2be/0xb40 ? __pfx_do_new_mount+0x10/0x10 __se_sys_mount+0x2d6/0x3c0 ? __pfx___se_sys_mount+0x10/0x10 ? do_syscall_64+0x100/0x230 ? __x64_sys_mount+0x20/0xc0 do_syscall_64+0xf3/0x230 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f37cae96fda Code: 48 8b 0d 51 ce 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e ce 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007fff6c1aa228 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007fff6c1aa240 RCX: 00007f37cae96fda RDX: 00000000200002c0 RSI: 0000000020000040 RDI: 00007fff6c1aa240 RBP: 0000000000000004 R08: 00007fff6c1aa280 R09: 0000000000000000 R10: 00000000000008c0 R11: 0000000000000206 R12: 00000000000008c0 R13: 00007fff6c1aa280 R14: 0000000000000003 R15: 0000000001000000 </TASK> For a really damaged superblock, the value of 'i_super.s_blocksize_bits' may exceed the maximum possible shift for an underlying 'int'. So add an extra check whether the aforementioned field represents the valid block size, which is 512 bytes, 1K, 2K, or 4K. Link: https://lkml.kernel.org/r/20241106092100.2661330-1-dmantipov@yandex.ru Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Dmitry Antipov <dmantipov(a)yandex.ru> Reported-by: syzbot+56f7cd1abe4b8e475180(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=56f7cd1abe4b8e475180 Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Jun Piao <piaojun(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/super.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) --- a/fs/ocfs2/super.c~ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume +++ a/fs/ocfs2/super.c @@ -2319,6 +2319,7 @@ static int ocfs2_verify_volume(struct oc struct ocfs2_blockcheck_stats *stats) { int status = -EAGAIN; + u32 blksz_bits; if (memcmp(di->i_signature, OCFS2_SUPER_BLOCK_SIGNATURE, strlen(OCFS2_SUPER_BLOCK_SIGNATURE)) == 0) { @@ -2333,11 +2334,15 @@ static int ocfs2_verify_volume(struct oc goto out; } status = -EINVAL; - if ((1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits)) != blksz) { + /* Acceptable block sizes are 512 bytes, 1K, 2K and 4K. */ + blksz_bits = le32_to_cpu(di->id2.i_super.s_blocksize_bits); + if (blksz_bits < 9 || blksz_bits > 12) { mlog(ML_ERROR, "found superblock with incorrect block " - "size: found %u, should be %u\n", - 1 << le32_to_cpu(di->id2.i_super.s_blocksize_bits), - blksz); + "size bits: found %u, should be 9, 10, 11, or 12\n", + blksz_bits); + } else if ((1 << le32_to_cpu(blksz_bits)) != blksz) { + mlog(ML_ERROR, "found superblock with incorrect block " + "size: found %u, should be %u\n", 1 << blksz_bits, blksz); } else if (le16_to_cpu(di->id2.i_super.s_major_rev_level) != OCFS2_MAJOR_REV_LEVEL || le16_to_cpu(di->id2.i_super.s_minor_rev_level) != _ Patches currently in -mm which might be from dmantipov(a)yandex.ru are ocfs2-fix-ubsan-warning-in-ocfs2_verify_volume.patch ocfs2-fix-uninitialized-value-in-ocfs2_file_read_iter.patch

1 year, 1 month

1
0
0 0

+ nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint has been added to the -mm mm-hotfixes-unstable branch. Its filename is nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint Date: Thu, 7 Nov 2024 01:07:33 +0900 When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty() may cause a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because, since the tracepoint was added in mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, nilfs_grab_buffer(), which grabs a buffer to read (or create) a block of metadata, including b-tree node blocks, does not set the block device, but instead does so only if the buffer is not in the "uptodate" state for each of its caller block reading functions. However, if the uptodate flag is set on a folio/page, and the buffer heads are detached from it by try_to_free_buffers(), and new buffer heads are then attached by create_empty_buffers(), the uptodate flag may be restored to each buffer without the block device being set to bh->b_bdev, and mark_buffer_dirty() may be called later in that state, resulting in the bug mentioned above. Fix this issue by making nilfs_grab_buffer() always set the block device of the super block structure to the buffer head, regardless of the state of the buffer's uptodate flag. Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: Ubisectech Sirius <bugreport(a)valiantsec.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/btnode.c | 2 -- fs/nilfs2/gcinode.c | 4 +--- fs/nilfs2/mdt.c | 1 - fs/nilfs2/page.c | 1 + 4 files changed, 2 insertions(+), 6 deletions(-) --- a/fs/nilfs2/btnode.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/btnode.c @@ -68,7 +68,6 @@ nilfs_btnode_create_block(struct address goto failed; } memset(bh->b_data, 0, i_blocksize(inode)); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = blocknr; set_buffer_mapped(bh); set_buffer_uptodate(bh); @@ -133,7 +132,6 @@ int nilfs_btnode_submit_block(struct add goto found; } set_buffer_mapped(bh); - bh->b_bdev = inode->i_sb->s_bdev; bh->b_blocknr = pblocknr; /* set block address for read */ bh->b_end_io = end_buffer_read_sync; get_bh(bh); --- a/fs/nilfs2/gcinode.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/gcinode.c @@ -83,10 +83,8 @@ int nilfs_gccache_submit_read_data(struc goto out; } - if (!buffer_mapped(bh)) { - bh->b_bdev = inode->i_sb->s_bdev; + if (!buffer_mapped(bh)) set_buffer_mapped(bh); - } bh->b_blocknr = pbn; bh->b_end_io = end_buffer_read_sync; get_bh(bh); --- a/fs/nilfs2/mdt.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/mdt.c @@ -89,7 +89,6 @@ static int nilfs_mdt_create_block(struct if (buffer_uptodate(bh)) goto failed_bh; - bh->b_bdev = sb->s_bdev; err = nilfs_mdt_insert_new_block(inode, block, bh, init_block); if (likely(!err)) { get_bh(bh); --- a/fs/nilfs2/page.c~nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint +++ a/fs/nilfs2/page.c @@ -63,6 +63,7 @@ struct buffer_head *nilfs_grab_buffer(st folio_put(folio); return NULL; } + bh->b_bdev = inode->i_sb->s_bdev; return bh; } _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint.patch nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint.patch

1 year, 1 month

1
0
0 0

+ nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint has been added to the -mm mm-hotfixes-unstable branch. Its filename is nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Subject: nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint Date: Thu, 7 Nov 2024 01:07:32 +0900 Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints". This series fixes null pointer dereference bugs that occur when using nilfs2 and two block-related tracepoints. This patch (of 2): It has been reported that when using "block:block_touch_buffer" tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because since the tracepoint was added in touch_buffer(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, the block_device structure is set after the function returns to the caller. Here, touch_buffer() is used to mark the folio/page that owns the buffer head as accessed, but the common search helper for folio/page used by the caller function was optimized to mark the folio/page as accessed when it was reimplemented a long time ago, eliminating the need to call touch_buffer() here in the first place. So this solves the issue by eliminating the touch_buffer() call itself. Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: Ubisectech Sirius <bugreport(a)valiantsec.com> Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@va… Reported-by: syzbot+9982fb8d18eba905abe2(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2 Tested-by: syzbot+9982fb8d18eba905abe2(a)syzkaller.appspotmail.com Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 - 1 file changed, 1 deletion(-) --- a/fs/nilfs2/page.c~nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint +++ a/fs/nilfs2/page.c @@ -39,7 +39,6 @@ static struct buffer_head *__nilfs_get_f first_block = (unsigned long)index << (PAGE_SHIFT - blkbits); bh = get_nth_bh(bh, block - first_block); - touch_buffer(bh); wait_on_buffer(bh); return bh; } _ Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are nilfs2-fix-null-ptr-deref-in-block_touch_buffer-tracepoint.patch nilfs2-fix-null-ptr-deref-in-block_dirty_buffer-tracepoint.patch

1 year, 1 month

1
0
0 0

+ mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: page_alloc: move mlocked flag clearance into free_pages_prepare() has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Roman Gushchin <roman.gushchin(a)linux.dev> Subject: mm: page_alloc: move mlocked flag clearance into free_pages_prepare() Date: Wed, 6 Nov 2024 19:53:54 +0000 Syzbot reported a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.5.504 pfn:61f45 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45 flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline] kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530 __do_compat_sys_ioctl fs/ioctl.c:1007 [inline] __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e page last free pid 8399 tgid 8399 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686 folios_put_refs+0x76c/0x860 mm/swap.c:1007 free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373 tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465 exit_mmap+0x496/0xc40 mm/mmap.c:1926 __mmput+0x115/0x390 kernel/fork.c:1348 exit_mm+0x220/0x310 kernel/exit.c:571 do_exit+0x9b2/0x28e0 kernel/exit.c:926 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 bad_page+0x176/0x1d0 mm/page_alloc.c:501 free_page_is_bad mm/page_alloc.c:918 [inline] free_pages_prepare mm/page_alloc.c:1100 [inline] free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638 kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline] kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386 kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:939 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e RIP: 0023:0xf745d579 Code: Unable to access opcode bytes at 0xf745d54f. RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4 RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings), aside from a small memory leak - "bad" pages are stopped from being allocated again. Link: https://lkml.kernel.org/r/20241106195354.270757-1-roman.gushchin@linux.dev Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Reported-by: syzbot+e985d3026c4fd041578e(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6729f475.050a0220.701a.0019.GAE@google.com Acked-by: Hugh Dickins <hughd(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 15 +++++++++++++++ mm/swap.c | 14 -------------- 2 files changed, 15 insertions(+), 14 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare +++ a/mm/page_alloc.c @@ -1048,6 +1048,7 @@ __always_inline bool free_pages_prepare( bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1057,6 +1058,20 @@ __always_inline bool free_pages_prepare( if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + /* + * In rare cases, when truncation or holepunching raced with + * munlock after VM_LOCKED was cleared, Mlocked may still be + * found set here. This does not indicate a problem, unless + * "unevictable_pgs_cleared" appears worryingly large. + */ + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); --- a/mm/swap.c~mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare +++ a/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /* _ Patches currently in -mm which might be from roman.gushchin(a)linux.dev are signal-restore-the-override_rlimit-logic.patch mm-page_alloc-move-mlocked-flag-clearance-into-free_pages_prepare.patch

1 year, 1 month

1
0
0 0

[PATCH v3] mm: page_alloc: move mlocked flag clearance into free_pages_prepare()

by Roman Gushchin

Syzbot reported a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.5.504 pfn:61f45 page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x61f45 flags: 0xfff00000080204(referenced|workingset|mlocked|node=0|zone=1|lastcpupid=0x7ff) raw: 00fff00000080204 0000000000000000 dead000000000122 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 8443, tgid 8442 (syz.5.504), ts 201884660643, free_ts 201499827394 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x303f/0x3190 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x710 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x680 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5488 [inline] kvm_dev_ioctl+0x12dc/0x2240 virt/kvm/kvm_main.c:5530 __do_compat_sys_ioctl fs/ioctl.c:1007 [inline] __se_compat_sys_ioctl+0x510/0xc90 fs/ioctl.c:950 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e page last free pid 8399 tgid 8399 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_folios+0xf12/0x18d0 mm/page_alloc.c:2686 folios_put_refs+0x76c/0x860 mm/swap.c:1007 free_pages_and_swap_cache+0x5c8/0x690 mm/swap_state.c:335 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline] tlb_batch_pages_flush mm/mmu_gather.c:149 [inline] tlb_flush_mmu_free mm/mmu_gather.c:366 [inline] tlb_flush_mmu+0x3a3/0x680 mm/mmu_gather.c:373 tlb_finish_mmu+0xd4/0x200 mm/mmu_gather.c:465 exit_mmap+0x496/0xc40 mm/mmap.c:1926 __mmput+0x115/0x390 kernel/fork.c:1348 exit_mm+0x220/0x310 kernel/exit.c:571 do_exit+0x9b2/0x28e0 kernel/exit.c:926 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Modules linked in: CPU: 0 UID: 0 PID: 8442 Comm: syz.5.504 Not tainted 6.12.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 bad_page+0x176/0x1d0 mm/page_alloc.c:501 free_page_is_bad mm/page_alloc.c:918 [inline] free_pages_prepare mm/page_alloc.c:1100 [inline] free_unref_page+0xed0/0xf20 mm/page_alloc.c:2638 kvm_destroy_vm virt/kvm/kvm_main.c:1327 [inline] kvm_put_kvm+0xc75/0x1350 virt/kvm/kvm_main.c:1386 kvm_vcpu_release+0x54/0x60 virt/kvm/kvm_main.c:4143 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:939 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __ia32_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 ia32_sys_call+0x2624/0x2630 arch/x86/include/generated/asm/syscalls_32.h:253 do_syscall_32_irqs_on arch/x86/entry/common.c:165 [inline] __do_fast_syscall_32+0xb4/0x110 arch/x86/entry/common.c:386 do_fast_syscall_32+0x34/0x80 arch/x86/entry/common.c:411 entry_SYSENTER_compat_after_hwframe+0x84/0x8e RIP: 0023:0xf745d579 Code: Unable to access opcode bytes at 0xf745d54f. RSP: 002b:00000000f75afd6c EFLAGS: 00000206 ORIG_RAX: 00000000000000fc RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 00000000ffffff9c RDI: 00000000f744cff4 RBP: 00000000f717ae61 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK> The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings), aside from a small memory leak - "bad" pages are stopped from being allocated again. Reported-by: syzbot+e985d3026c4fd041578e(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6729f475.050a0220.701a.0019.GAE@google.com Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Acked-by: Hugh Dickins <hughd(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> --- mm/page_alloc.c | 15 +++++++++++++++ mm/swap.c | 14 -------------- 2 files changed, 15 insertions(+), 14 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 47048b39b8ca..371d1c6c1fc7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1048,6 +1048,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1057,6 +1058,20 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + /* + * In rare cases, when truncation or holepunching raced with + * munlock after VM_LOCKED was cleared, Mlocked may still be + * found set here. This does not indicate a problem, unless + * "unevictable_pgs_cleared" appears worryingly large. + */ + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); diff --git a/mm/swap.c b/mm/swap.c index 638a3f001676..10decd9dffa1 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /* -- 2.47.0.199.ga7371fff76-goog

1 year, 1 month

1
0
0 0

[PATCH v3] arm64/signal: Avoid corruption of SME state when entering signal handler

by Mark Brown

We intend that signal handlers are entered with PSTATE.{SM,ZA}={0,0}. The logic for this in setup_return() manipulates the saved state and live CPU state in an unsafe manner, and consequently, when a task enters a signal handler: * The task entering the signal handler might not have its PSTATE.{SM,ZA} bits cleared, and other register state that is affected by changes to PSTATE.{SM,ZA} might not be zeroed as expected. * An unrelated task might have its PSTATE.{SM,ZA} bits cleared unexpectedly, potentially zeroing other register state that is affected by changes to PSTATE.{SM,ZA}. Tasks which do not set PSTATE.{SM,ZA} (i.e. those only using plain FPSIMD or non-streaming SVE) are not affected, as there is no resulting change to PSTATE.{SM,ZA}. Consider for example two tasks on one CPU: A: Begins signal entry in kernel mode, is preempted prior to SMSTOP. B: Using SM and/or ZA in userspace with register state current on the CPU, is preempted. A: Scheduled in, no register state changes made as in kernel mode. A: Executes SMSTOP, modifying live register state. A: Scheduled out. B: Scheduled in, fpsimd_thread_switch() sees the register state on the CPU is tracked as being that for task B so the state is not reloaded prior to returning to userspace. Task B is now running with SM and ZA incorrectly cleared. Fix this by: * Checking TIF_FOREIGN_FPSTATE, and only updating the saved or live state as appropriate. * Using {get,put}_cpu_fpsimd_context() to ensure mutual exclusion against other code which manipulates this state. To allow their use, the logic is moved into a new fpsimd_enter_sighandler() helper in fpsimd.c. This race has been observed intermittently with fp-stress, especially with preempt disabled, commonly but not exclusively reporting "Bad SVCR: 0". While we're at it also fix a discrepancy between in register and in memory entries. When operating on the register state we issue a SMSTOP, exiting streaming mode if we were in it. This clears the V/Z and P register and FPMR but nothing else. The in memory version clears all the user FPSIMD state including FPCR and FPSR but does not clear FPMR. Add the clear of FPMR and limit the existing memset() to only cover the vregs, preserving the state of FPCR and FPSR like SMSTOP does. Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals") Signed-off-by: Mark Brown <broonie(a)kernel.org> Cc: stable(a)vger.kernel.org --- Changes in v3: - Fix the in memory update to follow the behaviour of the SMSTOP we issue when updating in registers. - Link to v2: https://lore.kernel.org/r/20241030-arm64-fp-sme-sigentry-v2-1-43ce805d1b20@… Changes in v2: - Commit message tweaks. - Flush the task state when updating in memory to ensure we reload. - Link to v1: https://lore.kernel.org/r/20241023-arm64-fp-sme-sigentry-v1-1-249ff7ec3ad0@… --- arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/fpsimd.c | 39 +++++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 19 +------------------ 3 files changed, 41 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state); extern void fpsimd_thread_switch(struct task_struct *next); extern void fpsimd_flush_thread(void); +extern void fpsimd_enter_sighandler(void); extern void fpsimd_signal_preserve_current_state(void); extern void fpsimd_preserve_current_state(void); extern void fpsimd_restore_current_state(void); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 77006df20a75aee7c991cf116b6d06bfe953d1a4..10c8efd1c5ce83f4ea4025b213111ab9519263b1 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1693,6 +1693,45 @@ void fpsimd_signal_preserve_current_state(void) sve_to_fpsimd(current); } +/* + * Called by the signal handling code when preparing current to enter + * a signal handler. Currently this only needs to take care of exiting + * streaming mode and clearing ZA on SME systems. + */ +void fpsimd_enter_sighandler(void) +{ + if (!system_supports_sme()) + return; + + get_cpu_fpsimd_context(); + + if (test_thread_flag(TIF_FOREIGN_FPSTATE)) { + /* + * Exiting streaming mode zeros the V/Z and P + * registers and FPMR. Zero FPMR and the V registers, + * marking the state as FPSIMD only to force a clear + * of the remaining bits during reload if needed. + */ + if (current->thread.svcr & SVCR_SM_MASK) { + memset(&current->thread.uw.fpsimd_state.vregs, 0, + sizeof(current->thread.uw.fpsimd_state.vregs)); + current->thread.uw.fpmr = 0; + current->thread.fp_type = FP_STATE_FPSIMD; + } + + current->thread.svcr &= ~(SVCR_ZA_MASK | + SVCR_SM_MASK); + + /* Ensure any copies on other CPUs aren't reused */ + fpsimd_flush_task_state(current); + } else { + /* The register state is current, just update it. */ + sme_smstop(); + } + + put_cpu_fpsimd_context(); +} + /* * Called by KVM when entering the guest. */ diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 5619869475304776fc005fe24a385bf86bfdd253..fe07d0bd9f7978d73973f07ce38b7bdd7914abb2 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1218,24 +1218,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, /* TCO (Tag Check Override) always cleared for signal handlers */ regs->pstate &= ~PSR_TCO_BIT; - /* Signal handlers are invoked with ZA and streaming mode disabled */ - if (system_supports_sme()) { - /* - * If we were in streaming mode the saved register - * state was SVE but we will exit SM and use the - * FPSIMD register state - flush the saved FPSIMD - * register state in case it gets loaded. - */ - if (current->thread.svcr & SVCR_SM_MASK) { - memset(&current->thread.uw.fpsimd_state, 0, - sizeof(current->thread.uw.fpsimd_state)); - current->thread.fp_type = FP_STATE_FPSIMD; - } - - current->thread.svcr &= ~(SVCR_ZA_MASK | - SVCR_SM_MASK); - sme_smstop(); - } + fpsimd_enter_sighandler(); if (system_supports_poe()) write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); --- base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354 change-id: 20241023-arm64-fp-sme-sigentry-a2bd7187e71b Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 1 month

1
0
0 0

[PATCH] perf/x86/intel: Fix bitmask of OCR and FRONTEND events for LNC

by kan.liang＠linux.intel.com

From: Kan Liang <kan.liang(a)linux.intel.com> The released OCR and FRONTEND events utilized more bits on Lunar Lake p-core. The corresponding mask in the extra_regs has to be extended to unblock the extra bits. Add a dedicated intel_lnc_extra_regs. Fixes: a932aa0e868f ("perf/x86: Add Lunar Lake and Arrow Lake support") Reported-by: Andi Kleen <ak(a)linux.intel.com> Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com> Cc: stable(a)vger.kernel.org --- arch/x86/events/intel/core.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index bb284aff7bfd..a347b4323256 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -429,6 +429,16 @@ static struct event_constraint intel_lnc_event_constraints[] = { EVENT_CONSTRAINT_END }; +static struct extra_reg intel_lnc_extra_regs[] __read_mostly = { + INTEL_UEVENT_EXTRA_REG(0x012a, MSR_OFFCORE_RSP_0, 0xfffffffffffull, RSP_0), + INTEL_UEVENT_EXTRA_REG(0x012b, MSR_OFFCORE_RSP_1, 0xfffffffffffull, RSP_1), + INTEL_UEVENT_PEBS_LDLAT_EXTRA_REG(0x01cd), + INTEL_UEVENT_EXTRA_REG(0x02c6, MSR_PEBS_FRONTEND, 0x9, FE), + INTEL_UEVENT_EXTRA_REG(0x03c6, MSR_PEBS_FRONTEND, 0x7fff1f, FE), + INTEL_UEVENT_EXTRA_REG(0x40ad, MSR_PEBS_FRONTEND, 0xf, FE), + INTEL_UEVENT_EXTRA_REG(0x04c2, MSR_PEBS_FRONTEND, 0x8, FE), + EVENT_EXTRA_END +}; EVENT_ATTR_STR(mem-loads, mem_ld_nhm, "event=0x0b,umask=0x10,ldlat=3"); EVENT_ATTR_STR(mem-loads, mem_ld_snb, "event=0xcd,umask=0x1,ldlat=3"); @@ -6422,7 +6432,7 @@ static __always_inline void intel_pmu_init_lnc(struct pmu *pmu) intel_pmu_init_glc(pmu); hybrid(pmu, event_constraints) = intel_lnc_event_constraints; hybrid(pmu, pebs_constraints) = intel_lnc_pebs_event_constraints; - hybrid(pmu, extra_regs) = intel_rwc_extra_regs; + hybrid(pmu, extra_regs) = intel_lnc_extra_regs; } static __always_inline void intel_pmu_init_skt(struct pmu *pmu) -- 2.38.1

1 year, 1 month

1
0
0 0

[PATCH 6.1.y 5.15.y 5.10.y] net: do not delay dst_entries_add() in dst_release()

by Abdelkareem Abdelsaamad

From: Eric Dumazet <edumazet(a)google.com> commit ac888d58869bb99753e7652be19a151df9ecb35d upstream. dst_entries_add() uses per-cpu data that might be freed at netns dismantle from ip6_route_net_exit() calling dst_entries_destroy() Before ip6_route_net_exit() can be called, we release all the dsts associated with this netns, via calls to dst_release(), which waits an rcu grace period before calling dst_destroy() dst_entries_add() use in dst_destroy() is racy, because dst_entries_destroy() could have been called already. Decrementing the number of dsts must happen sooner. Notes: 1) in CONFIG_XFRM case, dst_destroy() can call dst_release_immediate(child), this might also cause UAF if the child does not have DST_NOCOUNT set. IPSEC maintainers might take a look and see how to address this. 2) There is also discussion about removing this count of dst, which might happen in future kernels. Fixes: f88649721268 ("ipv4: fix dst race in sk_dst_get()") Closes: https://lore.kernel.org/lkml/CANn89iLCCGsP7SFn9HKpvnKu96Td4KD08xf7aGtiYgZnk… Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org> Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org> Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org> Signed-off-by: Eric Dumazet <edumazet(a)google.com> Cc: Xin Long <lucien.xin(a)gmail.com> Cc: Steffen Klassert <steffen.klassert(a)secunet.com> Reviewed-by: Xin Long <lucien.xin(a)gmail.com> Link: https://patch.msgid.link/20241008143110.1064899-1-edumazet@google.com Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> [Conflict due to bc9d3a9f2afc ("net: dst: Switch to rcuref_t reference counting") is not in the tree] Signed-off-by: Abdelkareem Abdelsaamad <kareemem(a)amazon.com> --- net/core/dst.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/net/core/dst.c b/net/core/dst.c index 453ec8aafc4a..5bb143857336 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -109,9 +109,6 @@ struct dst_entry *dst_destroy(struct dst_entry * dst) child = xdst->child; } #endif - if (!(dst->flags & DST_NOCOUNT)) - dst_entries_add(dst->ops, -1); - if (dst->ops->destroy) dst->ops->destroy(dst); if (dst->dev) @@ -162,6 +159,12 @@ void dst_dev_put(struct dst_entry *dst) } EXPORT_SYMBOL(dst_dev_put); +static void dst_count_dec(struct dst_entry *dst) +{ + if (!(dst->flags & DST_NOCOUNT)) + dst_entries_add(dst->ops, -1); +} + void dst_release(struct dst_entry *dst) { if (dst) { @@ -171,8 +174,10 @@ void dst_release(struct dst_entry *dst) if (WARN_ONCE(newrefcnt < 0, "dst_release underflow")) net_warn_ratelimited("%s: dst:%p refcnt:%d\n", __func__, dst, newrefcnt); - if (!newrefcnt) + if (!newrefcnt){ + dst_count_dec(dst); call_rcu(&dst->rcu_head, dst_destroy_rcu); + } } } EXPORT_SYMBOL(dst_release); @@ -186,8 +191,10 @@ void dst_release_immediate(struct dst_entry *dst) if (WARN_ONCE(newrefcnt < 0, "dst_release_immediate underflow")) net_warn_ratelimited("%s: dst:%p refcnt:%d\n", __func__, dst, newrefcnt); - if (!newrefcnt) + if (!newrefcnt){ + dst_count_dec(dst); dst_destroy(dst); + } } } EXPORT_SYMBOL(dst_release_immediate); -- 2.40.1

1 year, 1 month

1
0
0 0

[PATCH 22/31] ASoC: da7213: Populate max_register to regmap_config

by Claudiu

From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> On the Renesas RZ/G3S SMARC Carrier II board having a DA7212 codec (using da7213 driver) connected to one SSIF-2 available on the Renesas RZ/G3S SoC it has been discovered that using the runtime PM API for suspend/resume (as will be proposed in the following commits) leads to the codec not being propertly initialized after resume. This is because w/o max_register populated to regmap_config the regcache_rbtree_sync() breaks on base_reg > max condition and the regcache_sync_block() call is skipped. Fixes: ef5c2eba2412 ("ASoC: codecs: Add da7213 codec") Cc: stable(a)vger.kernel.org Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> --- sound/soc/codecs/da7213.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/soc/codecs/da7213.c b/sound/soc/codecs/da7213.c index f3ef6fb55304..486db60bf2dd 100644 --- a/sound/soc/codecs/da7213.c +++ b/sound/soc/codecs/da7213.c @@ -2136,6 +2136,7 @@ static const struct regmap_config da7213_regmap_config = { .reg_bits = 8, .val_bits = 8, + .max_register = DA7213_TONE_GEN_OFF_PER, .reg_defaults = da7213_reg_defaults, .num_reg_defaults = ARRAY_SIZE(da7213_reg_defaults), .volatile_reg = da7213_volatile_register, -- 2.39.2

1 year, 1 month

4
3
0 0

[PATCH 6.11.y 1/2] rcu/kvfree: Add kvfree_rcu_barrier() API

by Suren Baghdasaryan

From: "Uladzislau Rezki (Sony)" <urezki(a)gmail.com> From: Uladzislau Rezki <urezki(a)gmail.com> commit 3c5d61ae919cc377c71118ccc76fa6e8518023f8 upstream. Add a kvfree_rcu_barrier() function. It waits until all in-flight pointers are freed over RCU machinery. It does not wait any GP completion and it is within its right to return immediately if there are no outstanding pointers. This function is useful when there is a need to guarantee that a memory is fully freed before destroying memory caches. For example, during unloading a kernel module. Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> --- include/linux/rcutiny.h | 5 ++ include/linux/rcutree.h | 1 + kernel/rcu/tree.c | 109 +++++++++++++++++++++++++++++++++++++--- 3 files changed, 107 insertions(+), 8 deletions(-) diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index d9ac7b136aea..522123050ff8 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -111,6 +111,11 @@ static inline void __kvfree_call_rcu(struct rcu_head *head, void *ptr) kvfree(ptr); } +static inline void kvfree_rcu_barrier(void) +{ + rcu_barrier(); +} + #ifdef CONFIG_KASAN_GENERIC void kvfree_call_rcu(struct rcu_head *head, void *ptr); #else diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 254244202ea9..58e7db80f3a8 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -35,6 +35,7 @@ static inline void rcu_virt_note_context_switch(void) void synchronize_rcu_expedited(void); void kvfree_call_rcu(struct rcu_head *head, void *ptr); +void kvfree_rcu_barrier(void); void rcu_barrier(void); void rcu_momentary_dyntick_idle(void); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index e641cc681901..be00aac5f4e7 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3584,18 +3584,15 @@ kvfree_rcu_drain_ready(struct kfree_rcu_cpu *krcp) } /* - * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. + * Return: %true if a work is queued, %false otherwise. */ -static void kfree_rcu_monitor(struct work_struct *work) +static bool +kvfree_rcu_queue_batch(struct kfree_rcu_cpu *krcp) { - struct kfree_rcu_cpu *krcp = container_of(work, - struct kfree_rcu_cpu, monitor_work.work); unsigned long flags; + bool queued = false; int i, j; - // Drain ready for reclaim. - kvfree_rcu_drain_ready(krcp); - raw_spin_lock_irqsave(&krcp->lock, flags); // Attempt to start a new batch. @@ -3634,11 +3631,27 @@ static void kfree_rcu_monitor(struct work_struct *work) // be that the work is in the pending state when // channels have been detached following by each // other. - queue_rcu_work(system_wq, &krwp->rcu_work); + queued = queue_rcu_work(system_wq, &krwp->rcu_work); } } raw_spin_unlock_irqrestore(&krcp->lock, flags); + return queued; +} + +/* + * This function is invoked after the KFREE_DRAIN_JIFFIES timeout. + */ +static void kfree_rcu_monitor(struct work_struct *work) +{ + struct kfree_rcu_cpu *krcp = container_of(work, + struct kfree_rcu_cpu, monitor_work.work); + + // Drain ready for reclaim. + kvfree_rcu_drain_ready(krcp); + + // Queue a batch for a rest. + kvfree_rcu_queue_batch(krcp); // If there is nothing to detach, it means that our job is // successfully done here. In case of having at least one @@ -3859,6 +3872,86 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr) } EXPORT_SYMBOL_GPL(kvfree_call_rcu); +/** + * kvfree_rcu_barrier - Wait until all in-flight kvfree_rcu() complete. + * + * Note that a single argument of kvfree_rcu() call has a slow path that + * triggers synchronize_rcu() following by freeing a pointer. It is done + * before the return from the function. Therefore for any single-argument + * call that will result in a kfree() to a cache that is to be destroyed + * during module exit, it is developer's responsibility to ensure that all + * such calls have returned before the call to kmem_cache_destroy(). + */ +void kvfree_rcu_barrier(void) +{ + struct kfree_rcu_cpu_work *krwp; + struct kfree_rcu_cpu *krcp; + bool queued; + int i, cpu; + + /* + * Firstly we detach objects and queue them over an RCU-batch + * for all CPUs. Finally queued works are flushed for each CPU. + * + * Please note. If there are outstanding batches for a particular + * CPU, those have to be finished first following by queuing a new. + */ + for_each_possible_cpu(cpu) { + krcp = per_cpu_ptr(&krc, cpu); + + /* + * Check if this CPU has any objects which have been queued for a + * new GP completion. If not(means nothing to detach), we are done + * with it. If any batch is pending/running for this "krcp", below + * per-cpu flush_rcu_work() waits its completion(see last step). + */ + if (!need_offload_krc(krcp)) + continue; + + while (1) { + /* + * If we are not able to queue a new RCU work it means: + * - batches for this CPU are still in flight which should + * be flushed first and then repeat; + * - no objects to detach, because of concurrency. + */ + queued = kvfree_rcu_queue_batch(krcp); + + /* + * Bail out, if there is no need to offload this "krcp" + * anymore. As noted earlier it can run concurrently. + */ + if (queued || !need_offload_krc(krcp)) + break; + + /* There are ongoing batches. */ + for (i = 0; i < KFREE_N_BATCHES; i++) { + krwp = &(krcp->krw_arr[i]); + flush_rcu_work(&krwp->rcu_work); + } + } + } + + /* + * Now we guarantee that all objects are flushed. + */ + for_each_possible_cpu(cpu) { + krcp = per_cpu_ptr(&krc, cpu); + + /* + * A monitor work can drain ready to reclaim objects + * directly. Wait its completion if running or pending. + */ + cancel_delayed_work_sync(&krcp->monitor_work); + + for (i = 0; i < KFREE_N_BATCHES; i++) { + krwp = &(krcp->krw_arr[i]); + flush_rcu_work(&krwp->rcu_work); + } + } +} +EXPORT_SYMBOL_GPL(kvfree_rcu_barrier); + static unsigned long kfree_rcu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { base-commit: 17365d66f1c6aa6bf4f4cb9842f5edeac027bcfb -- 2.47.0.105.g07ac214952-goog

1 year, 1 month

2
6
0 0

hid-pidff.c: null-pointer deref if optional HID reports are not present

by Nolan Nicholson

Hello, (This is my first time reporting a Linux bug; please accept my apologies for any mistakes in the process.) When initializing a HID PID device, hid-pidff.c checks for eight required HID reports and five optional reports. If the eight required reports are present, the hid_pidff_init() function then attempts to find the necessary fields in each required or optional report, using the pidff_find_fields() function. However, if any of the five optional reports is not present, pidff_find_fields() will trigger a null-pointer dereference. I recently implemented the descriptors for a USB HID device with PID force-feedback capability. After implementing the required report descriptors but not the optional ones, I got an OOPS from the pidff_find_fields function. I saved the OOPS from my Ubuntu installation, and have attached it here. I later reproduced the issue on 6.11.6. I was able to work around the issue by having my device present all of the optional report descriptors as well as all of the required ones. Thank you, Nolan Nicholson

1 year, 1 month

3
2
0 0

[PATCH 0/2] arm64/fp: Fix missing invalidation when working with in memory FP state

by Mark Brown

Mark Rutland identified a repeated pattern where we update the in memory floating point state for tasks but do not invalidate the tracking of the last CPU that the task's state was loaded on, meaning that we can incorrectly fail to load the state from memory due to the checking in fpsimd_thread_switch(). When we change the in-memory state we need to also invalidate the last CPU information so that the state is corretly identified as needing to be reloaded from memory. This series adds the missing invalidations. Signed-off-by: Mark Brown <broonie(a)kernel.org> --- Mark Brown (2): arm64/sve: Flush foreign register state in sve_init_regs() arm64/sme: Flush foreign register state in do_sme_acc() arch/arm64/kernel/fpsimd.c | 3 +++ 1 file changed, 3 insertions(+) --- base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354 change-id: 20241030-arm64-fpsimd-foreign-flush-6913aa24cd9b Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 1 month

3
4
0 0

[PATCH v5 0/6] phy: core: Fix bugs for several APIs and simplify an API

by Zijun Hu

This patch series is to fix bugs for below APIs: devm_phy_put() devm_of_phy_provider_unregister() devm_phy_destroy() phy_get() of_phy_get() devm_phy_get() devm_of_phy_get() devm_of_phy_get_by_index() And simplify below API: of_phy_simple_xlate(). Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- Changes in v5: - s/Fixed/Fix s/case/cause for commit message based on Johan's reminder - Remove unrelated change about code style for patch 4/6 suggested by Johan - Link to v4: https://lore.kernel.org/r/20241102-phy_core_fix-v4-0-4f06439f61b1@quicinc.c… Changes in v4: - Correct commit message for patch 6/6 - Link to v3: https://lore.kernel.org/r/20241030-phy_core_fix-v3-0-19b97c3ec917@quicinc.c… Changes in v3: - Correct commit message based on Johan's suggestions for patches 1/6-3/6. - Use goto label solution suggested by Johan for patch 4/6, also correct commit message and remove the inline comment for it. - Link to v2: https://lore.kernel.org/r/20241024-phy_core_fix-v2-0-fc0c63dbfcf3@quicinc.c… Changes in v2: - Correct title, commit message, and inline comments. - Link to v1: https://lore.kernel.org/r/20241020-phy_core_fix-v1-0-078062f7da71@quicinc.c… --- Zijun Hu (6): phy: core: Fix that API devm_phy_put() fails to release the phy phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider phy: core: Fix that API devm_phy_destroy() fails to destroy the phy phy: core: Fix an OF node refcount leakage in _of_phy_get() phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup() phy: core: Simplify API of_phy_simple_xlate() implementation drivers/phy/phy-core.c | 41 ++++++++++++++++++++--------------------- 1 file changed, 20 insertions(+), 21 deletions(-) --- base-commit: e70d2677ef4088d59158739d72b67ac36d1b132b change-id: 20241020-phy_core_fix-e3ad65db98f7 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

1 year, 1 month

2
7
0 0

[PATCH v4 0/6] phy: core: Fix bugs for several APIs and simplify an API

by Zijun Hu

This patch series is to fix bugs for below APIs: devm_phy_put() devm_of_phy_provider_unregister() devm_phy_destroy() phy_get() of_phy_get() devm_phy_get() devm_of_phy_get() devm_of_phy_get_by_index() And simplify below API: of_phy_simple_xlate(). Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- Changes in v4: - Correct commit message for patch 6/6 - Link to v3: https://lore.kernel.org/r/20241030-phy_core_fix-v3-0-19b97c3ec917@quicinc.c… Changes in v3: - Correct commit message based on Johan's suggestions for patches 1/6-3/6. - Use goto label solution suggested by Johan for patch 4/6, also correct commit message and remove the inline comment for it. - Link to v2: https://lore.kernel.org/r/20241024-phy_core_fix-v2-0-fc0c63dbfcf3@quicinc.c… Changes in v2: - Correct title, commit message, and inline comments. - Link to v1: https://lore.kernel.org/r/20241020-phy_core_fix-v1-0-078062f7da71@quicinc.c… --- Zijun Hu (6): phy: core: Fix that API devm_phy_put() fails to release the phy phy: core: Fix that API devm_of_phy_provider_unregister() fails to unregister the phy provider phy: core: Fix that API devm_phy_destroy() fails to destroy the phy phy: core: Fix an OF node refcount leakage in _of_phy_get() phy: core: Fix an OF node refcount leakage in of_phy_provider_lookup() phy: core: Simplify API of_phy_simple_xlate() implementation drivers/phy/phy-core.c | 43 +++++++++++++++++++++---------------------- 1 file changed, 21 insertions(+), 22 deletions(-) --- base-commit: e70d2677ef4088d59158739d72b67ac36d1b132b change-id: 20241020-phy_core_fix-e3ad65db98f7 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

1 year, 1 month

3
12
0 0

Re: [PATCH 6.11 000/245] 6.11.7-rc1 review

by Ronald Warsow

Hi Greg no regressions here on x86_64 (RKL, Intel 11th Gen. CPU) Thanks Tested-by: Ronald Warsow <rwarsow(a)gmx.de>

1 year, 1 month

1
0
0 0

[PATCH can v3] can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation

by Marc Kleine-Budde

Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") introduced mcp251xfd_get_tef_len() to get the number of unhandled transmit events from the Transmit Event FIFO (TEF). As the TEF has no head pointer, the driver uses the TX FIFO's tail pointer instead, assuming that send frames are completed. However the check for the TEF being full was not correct. This leads to the driver stop working if the TEF is full. Fix the TEF full check by assuming that if, from the driver's point of view, there are no free TX buffers in the chip and the TX FIFO is empty, all messages must have been sent and the TEF must therefore be full. Reported-by: Sven Schuchmann <schuchmann(a)schleissheimer.de> Closes: https://patch.msgid.link/FR3P281MB155216711EFF900AD9791B7ED9692@FR3P281MB15… Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") Tested-by: Sven Schuchmann <schuchmann(a)schleissheimer.de> Cc: stable(a)vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- Changes in v3: - add proper patch description - added Sven's Tested-by - Link to v2: https://patch.msgid.link/20241025-mcp251xfd-fix-length-calculation-v2-1-ea5… Changes in v2: - mcp251xfd_tx_fifo_sta_empty(): fix check if TX-FIFO is empty - Link to RFC: https://patch.msgid.link/20241001-mcp251xfd-fix-length-calculation-v1-1-598… --- drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c index f732556d233a7be3b43f6f08e0b8f25732190104..d3ac865933fdf6c4ecdd80ad4d7accbff51eb0f8 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -16,9 +16,9 @@ #include "mcp251xfd.h" -static inline bool mcp251xfd_tx_fifo_sta_full(u32 fifo_sta) +static inline bool mcp251xfd_tx_fifo_sta_empty(u32 fifo_sta) { - return !(fifo_sta & MCP251XFD_REG_FIFOSTA_TFNRFNIF); + return fifo_sta & MCP251XFD_REG_FIFOSTA_TFERFFIF; } static inline int @@ -122,7 +122,11 @@ mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p) if (err) return err; - if (mcp251xfd_tx_fifo_sta_full(fifo_sta)) { + /* If the chip says the TX-FIFO is empty, but there are no TX + * buffers free in the ring, we assume all have been sent. + */ + if (mcp251xfd_tx_fifo_sta_empty(fifo_sta) && + mcp251xfd_get_tx_free(tx_ring) == 0) { *len_p = tx_ring->obj_num; return 0; } --- base-commit: 5ccdcdf186aec6b9111845fd37e1757e9b413e2f change-id: 20241001-mcp251xfd-fix-length-calculation-09b6cc10aeb0 Best regards, -- Marc Kleine-Budde <mkl(a)pengutronix.de>

1 year, 1 month

2
1
0 0

[PATCH] wacom: Interpret tilt data from Intuos Pro BT as signed values

by Gerecke, Jason

From: Jason Gerecke <jason.gerecke(a)wacom.com> The tilt data contained in the Bluetooth packets of an Intuos Pro are supposed to be interpreted as signed values. Simply casting the values to type `char` is not guaranteed to work since it is implementation- defined whether it is signed or unsigned. At least one user has noticed the data being reported incorrectly on their system. To ensure that the data is interpreted properly, we specifically cast to `signed char` instead. Link: https://github.com/linuxwacom/input-wacom/issues/445 Fixes: 4922cd26f03c ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface") CC: stable(a)vger.kernel.org # 4.11+ Signed-off-by: Jason Gerecke <jason.gerecke(a)wacom.com> --- drivers/hid/wacom_wac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hid/wacom_wac.c b/drivers/hid/wacom_wac.c index 413606bdf476..5a599c90e7a2 100644 --- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -1353,9 +1353,9 @@ static void wacom_intuos_pro2_bt_pen(struct wacom_wac *wacom) rotation -= 1800; input_report_abs(pen_input, ABS_TILT_X, - (char)frame[7]); + (signed char)frame[7]); input_report_abs(pen_input, ABS_TILT_Y, - (char)frame[8]); + (signed char)frame[8]); input_report_abs(pen_input, ABS_Z, rotation); input_report_abs(pen_input, ABS_WHEEL, get_unaligned_le16(&frame[11])); -- 2.47.0

1 year, 1 month

2
1
0 0

FAILED: Patch "io_uring/rw: fix missing NOWAIT check for O_DIRECT start write" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1d60d74e852647255bd8e76f5a22dc42531e4389 Mon Sep 17 00:00:00 2001 From: Jens Axboe <axboe(a)kernel.dk> Date: Thu, 31 Oct 2024 08:05:44 -0600 Subject: [PATCH] io_uring/rw: fix missing NOWAIT check for O_DIRECT start write When io_uring starts a write, it'll call kiocb_start_write() to bump the super block rwsem, preventing any freezes from happening while that write is in-flight. The freeze side will grab that rwsem for writing, excluding any new writers from happening and waiting for existing writes to finish. But io_uring unconditionally uses kiocb_start_write(), which will block if someone is currently attempting to freeze the mount point. This causes a deadlock where freeze is waiting for previous writes to complete, but the previous writes cannot complete, as the task that is supposed to complete them is blocked waiting on starting a new write. This results in the following stuck trace showing that dependency with the write blocked starting a new write: task:fio state:D stack:0 pid:886 tgid:886 ppid:876 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_rwsem_wait+0x1e8/0x3f8 __percpu_down_read+0xe8/0x500 io_write+0xbb8/0xff8 io_issue_sqe+0x10c/0x1020 io_submit_sqes+0x614/0x2110 __arm64_sys_io_uring_enter+0x524/0x1038 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 INFO: task fsfreeze:7364 blocked for more than 15 seconds. Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963 with the attempting freezer stuck trying to grab the rwsem: task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995 Call trace: __switch_to+0x1d8/0x348 __schedule+0x8e8/0x2248 schedule+0x110/0x3f0 percpu_down_write+0x2b0/0x680 freeze_super+0x248/0x8a8 do_vfs_ioctl+0x149c/0x1b18 __arm64_sys_ioctl+0xd0/0x1a0 invoke_syscall+0x74/0x268 el0_svc_common.constprop.0+0x160/0x238 do_el0_svc+0x44/0x60 el0_svc+0x44/0xb0 el0t_64_sync_handler+0x118/0x128 el0t_64_sync+0x168/0x170 Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a blocking grab of the super block rwsem if it isn't set. For normal issue where IOCB_NOWAIT would always be set, this returns -EAGAIN which will have io_uring core issue a blocking attempt of the write. That will in turn also get completions run, ensuring forward progress. Since freezing requires CAP_SYS_ADMIN in the first place, this isn't something that can be triggered by a regular user. Cc: stable(a)vger.kernel.org # 5.10+ Reported-by: Peter Mann <peter.mann(a)sh.cz> Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz Signed-off-by: Jens Axboe <axboe(a)kernel.dk> --- io_uring/rw.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/io_uring/rw.c b/io_uring/rw.c index 354c4e175654c..155938f100931 100644 --- a/io_uring/rw.c +++ b/io_uring/rw.c @@ -1014,6 +1014,25 @@ int io_read_mshot(struct io_kiocb *req, unsigned int issue_flags) return IOU_OK; } +static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb) +{ + struct inode *inode; + bool ret; + + if (!(req->flags & REQ_F_ISREG)) + return true; + if (!(kiocb->ki_flags & IOCB_NOWAIT)) { + kiocb_start_write(kiocb); + return true; + } + + inode = file_inode(kiocb->ki_filp); + ret = sb_start_write_trylock(inode->i_sb); + if (ret) + __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE); + return ret; +} + int io_write(struct io_kiocb *req, unsigned int issue_flags) { bool force_nonblock = issue_flags & IO_URING_F_NONBLOCK; @@ -1051,8 +1070,8 @@ int io_write(struct io_kiocb *req, unsigned int issue_flags) if (unlikely(ret)) return ret; - if (req->flags & REQ_F_ISREG) - kiocb_start_write(kiocb); + if (unlikely(!io_kiocb_start_write(req, kiocb))) + return -EAGAIN; kiocb->ki_flags |= IOCB_WRITE; if (likely(req->file->f_op->write_iter)) -- 2.43.0

1 year, 1 month

2
1
0 0

Stable backport (was "Re: PROBLEM: io_uring hang causing uninterruptible sleep state on 6.6.59")

by Jens Axboe

On 11/3/24 5:06 PM, Jens Axboe wrote: > On 11/3/24 5:01 PM, Keith Busch wrote: >> On Sun, Nov 03, 2024 at 04:53:27PM -0700, Jens Axboe wrote: >>> On 11/3/24 4:47 PM, Andrew Marshall wrote: >>>> I identified f4ce3b5d26ce149e77e6b8e8f2058aa80e5b034e as the likely >>>> problematic commit simply by browsing git log. As indicated above; >>>> reverting that atop 6.6.59 results in success. Since it is passing on >>>> 6.11.6, I suspect there is some missing backport to 6.6.x, or some >>>> other semantic merge conflict. Unfortunately I do not have a compact, >>>> minimal reproducer, but can provide my large one (it is testing a >>>> larger build process in a VM) if needed?there are some additional >>>> details in the above-linked downstream bug report, though. I hope that >>>> having identified the problematic commit is enough for someone with >>>> more context to go off of. Happy to provide more information if >>>> needed. >>> >>> Don't worry about not having a reproducer, having the backport commit >>> pin pointed will do just fine. I'll take a look at this. >> >> I think stable is missing: >> >> 6b231248e97fc3 ("io_uring: consolidate overflow flushing") > > I think you need to go back further than that, this one already > unconditionally holds ->uring_lock around overflow flushing... Took a look, it's this one: commit 8d09a88ef9d3cb7d21d45c39b7b7c31298d23998 Author: Pavel Begunkov <asml.silence(a)gmail.com> Date: Wed Apr 10 02:26:54 2024 +0100 io_uring: always lock __io_cqring_overflow_flush Greg/stable, can you pick this one for 6.6-stable? It picks cleanly. For 6.1, which is the other stable of that age that has the backport, the attached patch will do the trick. With that, I believe it should be sorted. Hopefully that can make 6.6.60 and 6.1.116. -- Jens Axboe

1 year, 1 month

3
5
0 0

FAILED: Patch "RISC-V: disallow gcc + rust builds" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 33549fcf37ec461f398f0a41e1c9948be2e5aca4 Mon Sep 17 00:00:00 2001 From: Conor Dooley <conor.dooley(a)microchip.com> Date: Tue, 1 Oct 2024 12:28:13 +0100 Subject: [PATCH] RISC-V: disallow gcc + rust builds During the discussion before supporting rust on riscv, it was decided not to support gcc yet, due to differences in extension handling compared to llvm (only the version of libclang matching the c compiler is supported). Recently Jason Montleon reported [1] that building with gcc caused build issues, due to unsupported arguments being passed to libclang. After some discussion between myself and Miguel, it is better to disable gcc + rust builds to match the original intent, and subsequently support it when an appropriate set of extensions can be deduced from the version of libclang. Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1] Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2] Fixes: 70a57b247251a ("RISC-V: enable building 64-bit kernels with rust support") Cc: stable(a)vger.kernel.org Reported-by: Jason Montleon <jmontleo(a)redhat.com> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com> Acked-by: Miguel Ojeda <ojeda(a)kernel.org> Reviewed-by: Nathan Chancellor <nathan(a)kernel.org> Link: https://lore.kernel.org/r/20241001-playlist-deceiving-16ece2f440f5@spud Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- Documentation/rust/arch-support.rst | 2 +- arch/riscv/Kconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/rust/arch-support.rst b/Documentation/rust/arch-support.rst index 750ff371570a0..54be7ddf3e57a 100644 --- a/Documentation/rust/arch-support.rst +++ b/Documentation/rust/arch-support.rst @@ -17,7 +17,7 @@ Architecture Level of support Constraints ============= ================ ============================================== ``arm64`` Maintained Little Endian only. ``loongarch`` Maintained \- -``riscv`` Maintained ``riscv64`` only. +``riscv`` Maintained ``riscv64`` and LLVM/Clang only. ``um`` Maintained \- ``x86`` Maintained ``x86_64`` only. ============= ================ ============================================== diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 62545946ecf43..f4c570538d55b 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -177,7 +177,7 @@ config RISCV select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RETHOOK if !XIP_KERNEL select HAVE_RSEQ - select HAVE_RUST if RUSTC_SUPPORTS_RISCV + select HAVE_RUST if RUSTC_SUPPORTS_RISCV && CC_IS_CLANG select HAVE_SAMPLE_FTRACE_DIRECT select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_STACKPROTECTOR -- 2.43.0

1 year, 1 month

2
1
0 0

[PATCH v5] media: uvcvideo: Fix crash during unbind if gpio unit is in use

by Ricardo Ribalda

We used the wrong device for the device managed functions. We used the usb device, when we should be using the interface device. If we unbind the driver from the usb interface, the cleanup functions are never called. In our case, the IRQ is never disabled. If an IRQ is triggered, it will try to access memory sections that are already free, causing an OOPS. We cannot use the function devm_request_threaded_irq here. The devm_* clean functions are called after the main structure is released by uvc_delete. Luckily this bug has small impact, as it is only affected by devices with gpio units and the user has to unbind the device, a disconnect will not trigger this error. Cc: stable(a)vger.kernel.org Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT") Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> --- Changes in v5: - Revert non refcount, that belongs to a different set - Move cleanup to a different function - Link to v4: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v4-0-410e548f097a@chromiu… Changes in v4: Thanks Laurent. - Remove refcounted cleaup to support devres. - Link to v3: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v3-1-c0959c8906d3@chromiu… Changes in v3: Thanks Sakari. - Rename variable to initialized. - Other CodeStyle. - Link to v2: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v2-1-547ce6a6962e@chromiu… Changes in v2: Thanks to Laurent. - The main structure is not allocated with devres so there is a small period of time where we can get an irq with the structure free. Do not use devres for the IRQ. - Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu… --- drivers/media/usb/uvc/uvc_driver.c | 27 ++++++++++++++++++++------- drivers/media/usb/uvc/uvcvideo.h | 1 + 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index a96f6ca0889f..aa937f07b6b5 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev) struct gpio_desc *gpio_privacy; int irq; - gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy", + gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy", GPIOD_IN); if (IS_ERR_OR_NULL(gpio_privacy)) return PTR_ERR_OR_ZERO(gpio_privacy); irq = gpiod_to_irq(gpio_privacy); if (irq < 0) - return dev_err_probe(&dev->udev->dev, irq, + return dev_err_probe(&dev->intf->dev, irq, "No IRQ for privacy GPIO\n"); unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT, @@ -1329,15 +1329,27 @@ static int uvc_gpio_parse(struct uvc_device *dev) static int uvc_gpio_init_irq(struct uvc_device *dev) { struct uvc_entity *unit = dev->gpio_unit; + int ret; if (!unit || unit->gpio.irq < 0) return 0; - return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL, - uvc_gpio_irq, - IRQF_ONESHOT | IRQF_TRIGGER_FALLING | - IRQF_TRIGGER_RISING, - "uvc_privacy_gpio", dev); + ret = request_threaded_irq(unit->gpio.irq, NULL, uvc_gpio_irq, + IRQF_ONESHOT | IRQF_TRIGGER_FALLING | + IRQF_TRIGGER_RISING, + "uvc_privacy_gpio", dev); + + unit->gpio.initialized = !ret; + + return ret; +} + +static void uvc_gpio_cleanup(struct uvc_device *dev) +{ + if (!dev->gpio_unit || !dev->gpio_unit->gpio.initialized) + return; + + free_irq(dev->gpio_unit->gpio.irq, dev); } /* ------------------------------------------------------------------------ @@ -1982,6 +1994,7 @@ static void uvc_unregister_video(struct uvc_device *dev) if (media_devnode_is_registered(dev->mdev.devnode)) media_device_unregister(&dev->mdev); #endif + uvc_gpio_cleanup(dev); } int uvc_register_video_device(struct uvc_device *dev, diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index 07f9921d83f2..965a789ed03e 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -234,6 +234,7 @@ struct uvc_entity { u8 *bmControls; struct gpio_desc *gpio_privacy; int irq; + bool initialized; } gpio; }; --- base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a change-id: 20241031-uvc-crashrmmod-666de3fc9141 Best regards, -- Ricardo Ribalda <ribalda(a)chromium.org>

1 year, 1 month

2
1
0 0

[PATCH 6.1] kselftest/arm64: Initialise current at build time in signal tests

by Mahmoud Adam

From: Mark Brown <broonie(a)kernel.org> upstream 6e4b4f0eca88e47def703f90a403fef5b96730d5 commit. When building with clang the toolchain refuses to link the signals testcases since the assembly code has a reference to current which has no initialiser so is placed in the BSS: /tmp/signals-af2042.o: in function `fake_sigreturn': <unknown>:51:(.text+0x40): relocation truncated to fit: R_AARCH64_LD_PREL_LO19 against symbol `current' defined in .bss section in /tmp/test_signals-ec1160.o Since the first statement in main() initialises current we may as well fix this by moving the initialisation to build time so the variable doesn't end up in the BSS. Signed-off-by: Mark Brown <broonie(a)kernel.org> Reviewed-by: Nick Desaulniers <ndesaulniers(a)google.com> Link: https://lore.kernel.org/r/20230111-arm64-kselftest-clang-v1-4-89c69d377727@… Signed-off-by: Catalin Marinas <catalin.marinas(a)arm.com> Signed-off-by: Mahmoud Adam <mngyadam(a)amazon.com> --- since 6.1.113 we see these compilations issues reported in the patch description, this upstream patch fixes the issue, and it's a clean backport. tools/testing/selftests/arm64/signal/test_signals.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/tools/testing/selftests/arm64/signal/test_signals.c b/tools/testing/selftests/arm64/signal/test_signals.c index 416b1ff431998..00051b40d71ea 100644 --- a/tools/testing/selftests/arm64/signal/test_signals.c +++ b/tools/testing/selftests/arm64/signal/test_signals.c @@ -12,12 +12,10 @@ #include "test_signals.h" #include "test_signals_utils.h" -struct tdescr *current; +struct tdescr *current = &tde; int main(int argc, char *argv[]) { - current = &tde; - ksft_print_msg("%s :: %s\n", current->name, current->descr); if (test_setup(current) && test_init(current)) { test_run(current); -- 2.40.1

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 0b04fbe886b4274c8e5855011233aaa69fec6e75 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:52 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7f4926194e50f..e06a6fdc0bab7 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10750,6 +10750,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x1404, "Clevo N150CU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x14a1, "Clevo L141MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x2624, "Clevo L240TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x28c1, "Clevo V370VND", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

2
4
0 0

FAILED: patch "[PATCH] mm: migrate: fix getting incorrect page mapping during page" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x d1adb25df7111de83b64655a80b5a135adbded61 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110617-blame-taekwondo-ba51@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d1adb25df7111de83b64655a80b5a135adbded61 Mon Sep 17 00:00:00 2001 From: Baolin Wang <baolin.wang(a)linux.alibaba.com> Date: Fri, 15 Dec 2023 20:07:52 +0800 Subject: [PATCH] mm: migrate: fix getting incorrect page mapping during page migration When running stress-ng testing, we found below kernel crash after a few hours: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : dentry_name+0xd8/0x224 lr : pointer+0x22c/0x370 sp : ffff800025f134c0 ...... Call trace: dentry_name+0xd8/0x224 pointer+0x22c/0x370 vsnprintf+0x1ec/0x730 vscnprintf+0x2c/0x60 vprintk_store+0x70/0x234 vprintk_emit+0xe0/0x24c vprintk_default+0x3c/0x44 vprintk_func+0x84/0x2d0 printk+0x64/0x88 __dump_page+0x52c/0x530 dump_page+0x14/0x20 set_migratetype_isolate+0x110/0x224 start_isolate_page_range+0xc4/0x20c offline_pages+0x124/0x474 memory_block_offline+0x44/0xf4 memory_subsys_offline+0x3c/0x70 device_offline+0xf0/0x120 ...... After analyzing the vmcore, I found this issue is caused by page migration. The scenario is that, one thread is doing page migration, and we will use the target page's ->mapping field to save 'anon_vma' pointer between page unmap and page move, and now the target page is locked and refcount is 1. Currently, there is another stress-ng thread performing memory hotplug, attempting to offline the target page that is being migrated. It discovers that the refcount of this target page is 1, preventing the offline operation, thus proceeding to dump the page. However, page_mapping() of the target page may return an incorrect file mapping to crash the system in dump_mapping(), since the target page->mapping only saves 'anon_vma' pointer without setting PAGE_MAPPING_ANON flag. There are seveval ways to fix this issue: (1) Setting the PAGE_MAPPING_ANON flag for target page's ->mapping when saving 'anon_vma', but this can confuse PageAnon() for PFN walkers, since the target page has not built mappings yet. (2) Getting the page lock to call page_mapping() in __dump_page() to avoid crashing the system, however, there are still some PFN walkers that call page_mapping() without holding the page lock, such as compaction. (3) Using target page->private field to save the 'anon_vma' pointer and 2 bits page state, just as page->mapping records an anonymous page, which can remove the page_mapping() impact for PFN walkers and also seems a simple way. So I choose option 3 to fix this issue, and this can also fix other potential issues for PFN walkers, such as compaction. Link: https://lkml.kernel.org/r/e60b17a88afc38cb32f84c3e30837ec70b343d2b.17026417… Fixes: 64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()") Signed-off-by: Baolin Wang <baolin.wang(a)linux.alibaba.com> Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: David Hildenbrand <david(a)redhat.com> Cc: Xu Yu <xuyu(a)linux.alibaba.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/migrate.c b/mm/migrate.c index 397f2a6e34cb..bad3039d165e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1025,38 +1025,31 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, } /* - * To record some information during migration, we use some unused - * fields (mapping and private) of struct folio of the newly allocated - * destination folio. This is safe because nobody is using them - * except us. + * To record some information during migration, we use unused private + * field of struct folio of the newly allocated destination folio. + * This is safe because nobody is using it except us. */ -union migration_ptr { - struct anon_vma *anon_vma; - struct address_space *mapping; -}; - enum { PAGE_WAS_MAPPED = BIT(0), PAGE_WAS_MLOCKED = BIT(1), + PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, }; static void __migrate_folio_record(struct folio *dst, - unsigned long old_page_state, + int old_page_state, struct anon_vma *anon_vma) { - union migration_ptr ptr = { .anon_vma = anon_vma }; - dst->mapping = ptr.mapping; - dst->private = (void *)old_page_state; + dst->private = (void *)anon_vma + old_page_state; } static void __migrate_folio_extract(struct folio *dst, int *old_page_state, struct anon_vma **anon_vmap) { - union migration_ptr ptr = { .mapping = dst->mapping }; - *anon_vmap = ptr.anon_vma; - *old_page_state = (unsigned long)dst->private; - dst->mapping = NULL; + unsigned long private = (unsigned long)dst->private; + + *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES); + *old_page_state = private & PAGE_OLD_STATES; dst->private = NULL; }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] block: fix queue limits checks in blk_rq_map_user_bvec for" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x be0e822bb3f5259c7f9424ba97e8175211288813 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110618-voicing-yield-48ff@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From be0e822bb3f5259c7f9424ba97e8175211288813 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig <hch(a)lst.de> Date: Mon, 28 Oct 2024 10:07:48 +0100 Subject: [PATCH] block: fix queue limits checks in blk_rq_map_user_bvec for real blk_rq_map_user_bvec currently only has ad-hoc checks for queue limits, and the last fix to it enabled valid NVMe I/O to pass, but also allowed invalid one for drivers that set a max_segment_size or seg_boundary limit. Fix it once for all by using the bio_split_rw_at helper from the I/O path that indicates if and where a bio would be have to be split to adhere to the queue limits, and it returns a positive value, turn that into -EREMOTEIO to retry using the copy path. Fixes: 2ff949441802 ("block: fix sanity checks in blk_rq_map_user_bvec") Signed-off-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: John Garry <john.g.garry(a)oracle.com> Link: https://lore.kernel.org/r/20241028090840.446180-1-hch@lst.de Signed-off-by: Jens Axboe <axboe(a)kernel.dk> diff --git a/block/blk-map.c b/block/blk-map.c index 6ef2ec1f7d78..b5fd1d857461 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -561,55 +561,33 @@ EXPORT_SYMBOL(blk_rq_append_bio); /* Prepare bio for passthrough IO given ITER_BVEC iter */ static int blk_rq_map_user_bvec(struct request *rq, const struct iov_iter *iter) { - struct request_queue *q = rq->q; - size_t nr_iter = iov_iter_count(iter); - size_t nr_segs = iter->nr_segs; - struct bio_vec *bvecs, *bvprvp = NULL; - const struct queue_limits *lim = &q->limits; - unsigned int nsegs = 0, bytes = 0; + const struct queue_limits *lim = &rq->q->limits; + unsigned int max_bytes = lim->max_hw_sectors << SECTOR_SHIFT; + unsigned int nsegs; struct bio *bio; - size_t i; + int ret; - if (!nr_iter || (nr_iter >> SECTOR_SHIFT) > queue_max_hw_sectors(q)) - return -EINVAL; - if (nr_segs > queue_max_segments(q)) + if (!iov_iter_count(iter) || iov_iter_count(iter) > max_bytes) return -EINVAL; - /* no iovecs to alloc, as we already have a BVEC iterator */ + /* reuse the bvecs from the iterator instead of allocating new ones */ bio = blk_rq_map_bio_alloc(rq, 0, GFP_KERNEL); - if (bio == NULL) + if (!bio) return -ENOMEM; - bio_iov_bvec_set(bio, (struct iov_iter *)iter); - blk_rq_bio_prep(rq, bio, nr_segs); - /* loop to perform a bunch of sanity checks */ - bvecs = (struct bio_vec *)iter->bvec; - for (i = 0; i < nr_segs; i++) { - struct bio_vec *bv = &bvecs[i]; - - /* - * If the queue doesn't support SG gaps and adding this - * offset would create a gap, fallback to copy. - */ - if (bvprvp && bvec_gap_to_prev(lim, bvprvp, bv->bv_offset)) { - blk_mq_map_bio_put(bio); - return -EREMOTEIO; - } - /* check full condition */ - if (nsegs >= nr_segs || bytes > UINT_MAX - bv->bv_len) - goto put_bio; - if (bytes + bv->bv_len > nr_iter) - break; - - nsegs++; - bytes += bv->bv_len; - bvprvp = bv; + /* check that the data layout matches the hardware restrictions */ + ret = bio_split_rw_at(bio, lim, &nsegs, max_bytes); + if (ret) { + /* if we would have to split the bio, copy instead */ + if (ret > 0) + ret = -EREMOTEIO; + blk_mq_map_bio_put(bio); + return ret; } + + blk_rq_bio_prep(rq, bio, nsegs); return 0; -put_bio: - blk_mq_map_bio_put(bio); - return -EINVAL; } /**

1 year, 1 month

1
0
0 0

[tip: perf/core] perf/x86/intel/pt: Fix buffer full but size is 0 case

by tip-bot2 for Adrian Hunter

The following commit has been merged into the perf/core branch of tip: Commit-ID: 5b590160d2cf776b304eb054afafea2bd55e3620 Gitweb: https://git.kernel.org/tip/5b590160d2cf776b304eb054afafea2bd55e3620 Author: Adrian Hunter <adrian.hunter(a)intel.com> AuthorDate: Tue, 22 Oct 2024 18:59:07 +03:00 Committer: Peter Zijlstra <peterz(a)infradead.org> CommitterDate: Tue, 05 Nov 2024 12:55:43 +01:00 perf/x86/intel/pt: Fix buffer full but size is 0 case If the trace data buffer becomes full, a truncated flag [T] is reported in PERF_RECORD_AUX. In some cases, the size reported is 0, even though data must have been added to make the buffer full. That happens when the buffer fills up from empty to full before the Intel PT driver has updated the buffer position. Then the driver calculates the new buffer position before calculating the data size. If the old and new positions are the same, the data size is reported as 0, even though it is really the whole buffer size. Fix by detecting when the buffer position is wrapped, and adjust the data size calculation accordingly. Example Use a very small buffer size (8K) and observe the size of truncated [T] data. Before the fix, it is possible to see records of 0 size. Before: $ perf record -m,8K -e intel_pt// uname Linux [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.105 MB perf.data ] $ perf script -D --no-itrace | grep AUX | grep -F '[T]' Warning: AUX data lost 2 times out of 3! 5 19462712368111 0x19710 [0x40]: PERF_RECORD_AUX offset: 0 size: 0 flags: 0x1 [T] 5 19462712700046 0x19ba8 [0x40]: PERF_RECORD_AUX offset: 0x170 size: 0xe90 flags: 0x1 [T] After: $ perf record -m,8K -e intel_pt// uname Linux [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.040 MB perf.data ] $ perf script -D --no-itrace | grep AUX | grep -F '[T]' Warning: AUX data lost 2 times out of 3! 1 113720802995 0x4948 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x2000 flags: 0x1 [T] 1 113720979812 0x6b10 [0x40]: PERF_RECORD_AUX offset: 0x2000 size: 0x2000 flags: 0x1 [T] Fixes: 52ca9ced3f70 ("perf/x86/intel/pt: Add Intel PT PMU driver") Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: stable(a)vger.kernel.org Link: https://lkml.kernel.org/r/20241022155920.17511-2-adrian.hunter@intel.com --- arch/x86/events/intel/pt.c | 11 ++++++++--- arch/x86/events/intel/pt.h | 2 ++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c index fd4670a..a087bc0 100644 --- a/arch/x86/events/intel/pt.c +++ b/arch/x86/events/intel/pt.c @@ -828,11 +828,13 @@ static void pt_buffer_advance(struct pt_buffer *buf) buf->cur_idx++; if (buf->cur_idx == buf->cur->last) { - if (buf->cur == buf->last) + if (buf->cur == buf->last) { buf->cur = buf->first; - else + buf->wrapped = true; + } else { buf->cur = list_entry(buf->cur->list.next, struct topa, list); + } buf->cur_idx = 0; } } @@ -846,8 +848,11 @@ static void pt_buffer_advance(struct pt_buffer *buf) static void pt_update_head(struct pt *pt) { struct pt_buffer *buf = perf_get_aux(&pt->handle); + bool wrapped = buf->wrapped; u64 topa_idx, base, old; + buf->wrapped = false; + if (buf->single) { local_set(&buf->data_size, buf->output_off); return; @@ -865,7 +870,7 @@ static void pt_update_head(struct pt *pt) } else { old = (local64_xchg(&buf->head, base) & ((buf->nr_pages << PAGE_SHIFT) - 1)); - if (base < old) + if (base < old || (base == old && wrapped)) base += buf->nr_pages << PAGE_SHIFT; local_add(base - old, &buf->data_size); diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h index f5e46c0..a1b6c04 100644 --- a/arch/x86/events/intel/pt.h +++ b/arch/x86/events/intel/pt.h @@ -65,6 +65,7 @@ struct pt_pmu { * @head: logical write offset inside the buffer * @snapshot: if this is for a snapshot/overwrite counter * @single: use Single Range Output instead of ToPA + * @wrapped: buffer advance wrapped back to the first topa table * @stop_pos: STOP topa entry index * @intr_pos: INT topa entry index * @stop_te: STOP topa entry pointer @@ -82,6 +83,7 @@ struct pt_buffer { local64_t head; bool snapshot; bool single; + bool wrapped; long stop_pos, intr_pos; struct topa_entry *stop_te, *intr_te; void **data_pages;

1 year, 1 month

1
0
0 0

[PATCH 33/33] usb: xhci: Avoid queuing redundant Stop Endpoint commands

by Mathias Nyman

From: Michal Pecio <michal.pecio(a)gmail.com> Stop Endpoint command on an already stopped endpoint fails and may be misinterpreted as a known hardware bug by the completion handler. This results in an unnecessary delay with repeated retries of the command. Avoid queuing this command when endpoint state flags indicate that it's stopped or halted and the command will fail. If commands are pending on the endpoint, their completion handlers will process cancelled TDs so it's done. In case of waiting for external operations like clearing TT buffer, the endpoint is stopped and cancelled TDs can be processed now. This eliminates practically all unnecessary retries because an endpoint with pending URBs is maintained in Running state by the driver, unless aforementioned commands or other operations are pending on it. This is guaranteed by xhci_ring_ep_doorbell() and by the fact that it is called every time any of those operations completes. The only known exceptions are hardware bugs (the endpoint never starts at all) and Stream Protocol errors not associated with any TRB, which cause an endpoint reset not followed by restart. Sounds like a bug. Generally, these retries are only expected to happen when the endpoint fails to start for unknown/no reason, which is a worse problem itself, and fixing the bug eliminates the retries too. All cases were tested and found to work as expected. SET_DEQ_PENDING was produced by patching uvcvideo to unlink URBs in 100us intervals, which then runs into this case very often. EP_HALTED was produced by restarting 'cat /dev/ttyUSB0' on a serial dongle with broken cable. EP_CLEARING_TT by the same, with the dongle on an external hub. Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable(a)vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-ring.c | 13 +++++++++++++ drivers/usb/host/xhci.c | 19 +++++++++++++++---- drivers/usb/host/xhci.h | 1 + 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 55be03be2374..4cf5363875c7 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1076,6 +1076,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep) return 0; } +/* + * Erase queued TDs from transfer ring(s) and give back those the xHC didn't + * stop on. If necessary, queue commands to move the xHC off cancelled TDs it + * stopped on. Those will be given back later when the commands complete. + * + * Call under xhci->lock on a stopped endpoint. + */ +void xhci_process_cancelled_tds(struct xhci_virt_ep *ep) +{ + xhci_invalidate_cancelled_tds(ep); + xhci_giveback_invalidated_tds(ep); +} + /* * Returns the TD the endpoint ring halted on. * Only call for non-running rings without streams. diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 4977ada0a19e..5ebde8cae4fc 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1756,10 +1756,21 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) } } - /* Queue a stop endpoint command, but only if this is - * the first cancellation to be handled. - */ - if (!(ep->ep_state & EP_STOP_CMD_PENDING)) { + /* These completion handlers will sort out cancelled TDs for us */ + if (ep->ep_state & (EP_STOP_CMD_PENDING | EP_HALTED | SET_DEQ_PENDING)) { + xhci_dbg(xhci, "Not queuing Stop Endpoint on slot %d ep %d in state 0x%x\n", + urb->dev->slot_id, ep_index, ep->ep_state); + goto done; + } + + /* In this case no commands are pending but the endpoint is stopped */ + if (ep->ep_state & EP_CLEARING_TT) { + /* and cancelled TDs can be given back right away */ + xhci_dbg(xhci, "Invalidating TDs instantly on slot %d ep %d in state 0x%x\n", + urb->dev->slot_id, ep_index, ep->ep_state); + xhci_process_cancelled_tds(ep); + } else { + /* Otherwise, queue a new Stop Endpoint command */ command = xhci_alloc_command(xhci, false, GFP_ATOMIC); if (!command) { ret = -ENOMEM; diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 6dd3138b2380..4914f0a10cff 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1922,6 +1922,7 @@ void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring); unsigned int count_trbs(u64 addr, u64 len); int xhci_stop_endpoint_sync(struct xhci_hcd *xhci, struct xhci_virt_ep *ep, int suspend, gfp_t gfp_flags); +void xhci_process_cancelled_tds(struct xhci_virt_ep *ep); /* xHCI roothub code */ void xhci_set_link_state(struct xhci_hcd *xhci, struct xhci_port *port, -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH 32/33] usb: xhci: Fix TD invalidation under pending Set TR Dequeue

by Mathias Nyman

From: Michal Pecio <michal.pecio(a)gmail.com> xhci_invalidate_cancelled_tds() may not work correctly if the hardware is modifying endpoint or stream contexts at the same time by executing a Set TR Dequeue command. And even if it worked, it would be unable to queue Set TR Dequeue for the next stream, failing to clear xHC cache. On stream endpoints, a chain of Set TR Dequeue commands may take some time to execute and we may want to cancel more TDs during this time. Currently this leads to Stop Endpoint completion handler calling this function without testing for SET_DEQ_PENDING, which will trigger the aforementioned problems when it happens. On all endpoints, a halt condition causes Reset Endpoint to be queued and an error status given to the class driver, which may unlink more URBs in response. Stop Endpoint is queued and its handler may execute concurrently with Set TR Dequeue queued by Reset Endpoint handler. (Reset Endpoint handler calls this function too, but there seems to be no possibility of it running concurrently with Set TR Dequeue). Fix xhci_invalidate_cancelled_tds() to work correctly under a pending Set TR Dequeue. Bail out of the function when SET_DEQ_PENDING is set, then make the completion handler call the function again and also call xhci_giveback_invalidated_tds(), which needs to be called next. This seems to fix another potential bug, where the handler would call xhci_invalidate_cancelled_tds(), which may clear some deferred TDs if a sanity check fails, and the TDs wouldn't be given back promptly. Said sanity check seems to be wrong and prone to false positives when the endpoint halts, but fixing it is beyond the scope of this change, besides ensuring that cleared TDs are given back properly. Fixes: 5ceac4402f5d ("xhci: Handle TD clearing for multiple streams case") CC: stable(a)vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-ring.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index dd23596ccd84..55be03be2374 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -980,6 +980,13 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep) unsigned int slot_id = ep->vdev->slot_id; int err; + /* + * This is not going to work if the hardware is changing its dequeue + * pointers as we look at them. Completion handler will call us later. + */ + if (ep->ep_state & SET_DEQ_PENDING) + return 0; + xhci = ep->xhci; list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) { @@ -1367,7 +1374,6 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id, struct xhci_slot_ctx *slot_ctx; struct xhci_stream_ctx *stream_ctx; struct xhci_td *td, *tmp_td; - bool deferred = false; ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3])); stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2])); @@ -1471,8 +1477,6 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id, xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n", __func__, td->urb); xhci_td_cleanup(ep->xhci, td, ep_ring, td->status); - } else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) { - deferred = true; } else { xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n", __func__, td->urb, td->cancel_status); @@ -1483,11 +1487,15 @@ static void xhci_handle_cmd_set_deq(struct xhci_hcd *xhci, int slot_id, ep->queued_deq_seg = NULL; ep->queued_deq_ptr = NULL; - if (deferred) { - /* We have more streams to clear */ + /* Check for deferred or newly cancelled TDs */ + if (!list_empty(&ep->cancelled_td_list)) { xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n", __func__); xhci_invalidate_cancelled_tds(ep); + /* Try to restart the endpoint if all is done */ + ring_doorbell_for_active_rings(xhci, slot_id, ep_index); + /* Start giving back any TDs invalidated above */ + xhci_giveback_invalidated_tds(ep); } else { /* Restart any rings with pending URBs */ xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__); -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH 31/33] usb: xhci: Limit Stop Endpoint retries

by Mathias Nyman

From: Michal Pecio <michal.pecio(a)gmail.com> Some host controllers fail to atomically transition an endpoint to the Running state on a doorbell ring and enter a hidden "Restarting" state, which looks very much like Stopped, with the important difference that it will spontaneously transition to Running anytime soon. A Stop Endpoint command queued in the Restarting state typically fails with Context State Error and the completion handler sees the Endpoint Context State as either still Stopped or already Running. Even a case of Halted was observed, when an error occurred right after the restart. The Halted state is already recovered from by resetting the endpoint. The Running state is handled by retrying Stop Endpoint. The Stopped state was recognized as a problem on NEC controllers and worked around also by retrying, because the endpoint soon restarts and then stops for good. But there is a risk: the command may fail if the endpoint is "stopped for good" already, and retries will fail forever. The possibility of this was not realized at the time, but a number of cases were discovered later and reproduced. Some proved difficult to deal with, and it is outright impossible to predict if an endpoint may fail to ever start at all due to a hardware bug. One such bug (albeit on ASM3142, not on NEC) was found to be reliably triggered simply by toggling an AX88179 NIC up/down in a tight loop for a few seconds. An endless retries storm is quite nasty. Besides putting needless load on the xHC and CPU, it causes URBs never to be given back, paralyzing the device and connection/disconnection logic for the whole bus if the device is unplugged. User processes waiting for URBs become unkillable, drivers and kworker threads lock up and xhci_hcd cannot be reloaded. For peace of mind, impose a timeout on Stop Endpoint retries in this case. If they don't succeed in 100ms, consider the endpoint stopped permanently for some reason and just give back the unlinked URBs. This failure case is rare already and work is under way to make it rarer. Start this work today by also handling one simple case of race with Reset Endpoint, because it costs just two lines to implement. Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable(a)vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-ring.c | 28 ++++++++++++++++++++++++---- drivers/usb/host/xhci.c | 2 ++ drivers/usb/host/xhci.h | 1 + 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index c9c0c4a7588a..dd23596ccd84 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -52,6 +52,7 @@ * endpoint rings; it generates events on the event ring for these. */ +#include <linux/jiffies.h> #include <linux/scatterlist.h> #include <linux/slab.h> #include <linux/dma-mapping.h> @@ -1158,16 +1159,35 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id, return; case EP_STATE_STOPPED: /* - * NEC uPD720200 sometimes sets this state and fails with - * Context Error while continuing to process TRBs. - * Be conservative and trust EP_CTX_STATE on other chips. + * Per xHCI 4.6.9, Stop Endpoint command on a Stopped + * EP is a Context State Error, and EP stays Stopped. + * + * But maybe it failed on Halted, and somebody ran Reset + * Endpoint later. EP state is now Stopped and EP_HALTED + * still set because Reset EP handler will run after us. + */ + if (ep->ep_state & EP_HALTED) + break; + /* + * On some HCs EP state remains Stopped for some tens of + * us to a few ms or more after a doorbell ring, and any + * new Stop Endpoint fails without aborting the restart. + * This handler may run quickly enough to still see this + * Stopped state, but it will soon change to Running. + * + * Assume this bug on unexpected Stop Endpoint failures. + * Keep retrying until the EP starts and stops again, on + * chips where this is known to help. Wait for 100ms. */ if (!(xhci->quirks & XHCI_NEC_HOST)) break; + if (time_is_before_jiffies(ep->stop_time + msecs_to_jiffies(100))) + break; fallthrough; case EP_STATE_RUNNING: /* Race, HW handled stop ep cmd before ep was running */ - xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n"); + xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n", + GET_EP_CTX_STATE(ep_ctx)); command = xhci_alloc_command(xhci, false, GFP_ATOMIC); if (!command) { diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index bc477cf99805..4977ada0a19e 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -8,6 +8,7 @@ * Some code borrowed from the Linux EHCI driver. */ +#include <linux/jiffies.h> #include <linux/pci.h> #include <linux/iommu.h> #include <linux/iopoll.h> @@ -1764,6 +1765,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) ret = -ENOMEM; goto done; } + ep->stop_time = jiffies; ep->ep_state |= EP_STOP_CMD_PENDING; xhci_queue_stop_endpoint(xhci, command, urb->dev->slot_id, ep_index, 0); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index a0e992c3db0d..6dd3138b2380 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -691,6 +691,7 @@ struct xhci_virt_ep { /* Bandwidth checking storage */ struct xhci_bw_info bw_info; struct list_head bw_endpoint_list; + unsigned long stop_time; /* Isoch Frame ID checking storage */ int next_frame_id; /* Use new Isoch TRB layout needed for extended TBC support */ -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH 20/33] xhci: Don't perform Soft Retry for Etron xHCI host

by Mathias Nyman

From: Kuangyi Chiang <ki.chiang65(a)gmail.com> Since commit f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors"), unplugging USB device while enumeration results in errors like this: [ 364.855321] xhci_hcd 0000:0b:00.0: ERROR Transfer event for disabled endpoint slot 5 ep 2 [ 364.864622] xhci_hcd 0000:0b:00.0: @0000002167656d70 67f03000 00000021 0c000000 05038001 [ 374.934793] xhci_hcd 0000:0b:00.0: Abort failed to stop command ring: -110 [ 374.958793] xhci_hcd 0000:0b:00.0: xHCI host controller not responding, assume dead [ 374.967590] xhci_hcd 0000:0b:00.0: HC died; cleaning up [ 374.973984] xhci_hcd 0000:0b:00.0: Timeout while waiting for configure endpoint command Seems that Etorn xHCI host can not perform Soft Retry correctly, apply XHCI_NO_SOFT_RETRY quirk to disable Soft Retry and then issue is gone. This patch depends on commit a4a251f8c235 ("usb: xhci: do not perform Soft Retry for some xHCI hosts"). Fixes: f8f80be501aa ("xhci: Use soft retry to recover faster from transaction errors") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-pci.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 4b8c93e59d6d..0d49d9c390c1 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -399,6 +399,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_ETRON_HOST; xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; + xhci->quirks |= XHCI_NO_SOFT_RETRY; } if (pdev->vendor == PCI_VENDOR_ID_RENESAS && -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH 19/33] xhci: Fix control transfer error on Etron xHCI host

by Mathias Nyman

From: Kuangyi Chiang <ki.chiang65(a)gmail.com> Performing a stability stress test on a USB3.0 2.5G ethernet adapter results in errors like this: [ 91.441469] r8152 2-3:1.0 eth3: get_registers -71 [ 91.458659] r8152 2-3:1.0 eth3: get_registers -71 [ 91.475911] r8152 2-3:1.0 eth3: get_registers -71 [ 91.493203] r8152 2-3:1.0 eth3: get_registers -71 [ 91.510421] r8152 2-3:1.0 eth3: get_registers -71 The r8152 driver will periodically issue lots of control-IN requests to access the status of ethernet adapter hardware registers during the test. This happens when the xHCI driver enqueue a control TD (which cross over the Link TRB between two ring segments, as shown) in the endpoint zero's transfer ring. Seems the Etron xHCI host can not perform this TD correctly, causing the USB transfer error occurred, maybe the upper driver retry that control-IN request can solve problem, but not all drivers do this. | | ------- | TRB | Setup Stage ------- | TRB | Link ------- ------- | TRB | Data Stage ------- | TRB | Status Stage ------- | | To work around this, the xHCI driver should enqueue a No Op TRB if next available TRB is the Link TRB in the ring segment, this can prevent the Setup and Data Stage TRB to be breaked by the Link TRB. Check if the XHCI_ETRON_HOST quirk flag is set before invoking the workaround in xhci_queue_ctrl_tx(). Fixes: d0e96f5a71a0 ("USB: xhci: Control transfer support.") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-ring.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index f62b243d0fc4..517df97ef496 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -3733,6 +3733,20 @@ int xhci_queue_ctrl_tx(struct xhci_hcd *xhci, gfp_t mem_flags, if (!urb->setup_packet) return -EINVAL; + if ((xhci->quirks & XHCI_ETRON_HOST) && + urb->dev->speed >= USB_SPEED_SUPER) { + /* + * If next available TRB is the Link TRB in the ring segment then + * enqueue a No Op TRB, this can prevent the Setup and Data Stage + * TRB to be breaked by the Link TRB. + */ + if (trb_is_link(ep_ring->enqueue + 1)) { + field = TRB_TYPE(TRB_TR_NOOP) | ep_ring->cycle_state; + queue_trb(xhci, ep_ring, false, 0, 0, + TRB_INTR_TARGET(0), field); + } + } + /* 1 TRB for setup, 1 for status */ num_trbs = 2; /* -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH 18/33] xhci: Don't issue Reset Device command to Etron xHCI host

by Mathias Nyman

From: Kuangyi Chiang <ki.chiang65(a)gmail.com> Sometimes the hub driver does not recognize the USB device connected to the external USB2.0 hub when the system resumes from S4. After the SetPortFeature(PORT_RESET) request is completed, the hub driver calls the HCD reset_device callback, which will issue a Reset Device command and free all structures associated with endpoints that were disabled. This happens when the xHCI driver issue a Reset Device command to inform the Etron xHCI host that the USB device associated with a device slot has been reset. Seems that the Etron xHCI host can not perform this command correctly, affecting the USB device. To work around this, the xHCI driver should obtain a new device slot with reference to commit 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command"), which is another way to inform the Etron xHCI host that the USB device has been reset. Add a new XHCI_ETRON_HOST quirk flag to invoke the workaround in xhci_discover_or_reset_device(). Fixes: 2a8f82c4ceaf ("USB: xhci: Notify the xHC when a device is reset.") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-pci.c | 1 + drivers/usb/host/xhci.c | 19 +++++++++++++++++++ drivers/usb/host/xhci.h | 1 + 3 files changed, 21 insertions(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index db3c7e738213..4b8c93e59d6d 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -396,6 +396,7 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) if (pdev->vendor == PCI_VENDOR_ID_ETRON && (pdev->device == PCI_DEVICE_ID_EJ168 || pdev->device == PCI_DEVICE_ID_EJ188)) { + xhci->quirks |= XHCI_ETRON_HOST; xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; } diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index aa8c877f47ac..ae16253b53fb 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -3733,6 +3733,8 @@ void xhci_free_device_endpoint_resources(struct xhci_hcd *xhci, xhci->num_active_eps); } +static void xhci_free_dev(struct usb_hcd *hcd, struct usb_device *udev); + /* * This submits a Reset Device Command, which will set the device state to 0, * set the device address to 0, and disable all the endpoints except the default @@ -3803,6 +3805,23 @@ static int xhci_discover_or_reset_device(struct usb_hcd *hcd, SLOT_STATE_DISABLED) return 0; + if (xhci->quirks & XHCI_ETRON_HOST) { + /* + * Obtaining a new device slot to inform the xHCI host that + * the USB device has been reset. + */ + ret = xhci_disable_slot(xhci, udev->slot_id); + xhci_free_virt_device(xhci, udev->slot_id); + if (!ret) { + ret = xhci_alloc_dev(hcd, udev); + if (ret == 1) + ret = 0; + else + ret = -EINVAL; + } + return ret; + } + trace_xhci_discover_or_reset_device(slot_ctx); xhci_dbg(xhci, "Resetting device with slot ID %u\n", slot_id); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index d3b250c736b8..a0204e10486d 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1631,6 +1631,7 @@ struct xhci_hcd { #define XHCI_ZHAOXIN_HOST BIT_ULL(46) #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48) +#define XHCI_ETRON_HOST BIT_ULL(49) unsigned int num_active_eps; unsigned int limit_active_eps; -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH 17/33] xhci: Combine two if statements for Etron xHCI host

by Mathias Nyman

From: Kuangyi Chiang <ki.chiang65(a)gmail.com> Combine two if statements, because these hosts have the same quirk flags applied. [Mathias: has stable tag because other fixes in series depend on this] Fixes: 91f7a1524a92 ("xhci: Apply broken streams quirk to Etron EJ188 xHCI host") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/host/xhci-pci.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 7803ff1f1c9f..db3c7e738213 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -394,12 +394,8 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_DEFAULT_PM_RUNTIME_ALLOW; if (pdev->vendor == PCI_VENDOR_ID_ETRON && - pdev->device == PCI_DEVICE_ID_EJ168) { - xhci->quirks |= XHCI_RESET_ON_RESUME; - xhci->quirks |= XHCI_BROKEN_STREAMS; - } - if (pdev->vendor == PCI_VENDOR_ID_ETRON && - pdev->device == PCI_DEVICE_ID_EJ188) { + (pdev->device == PCI_DEVICE_ID_EJ168 || + pdev->device == PCI_DEVICE_ID_EJ188)) { xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; } -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH] drm/tegra: fix a possible null pointer dereference

by Qiu-ji Chen

In tegra_crtc_reset(), new memory is allocated with kzalloc(), but no check is performed. Before calling __drm_atomic_helper_crtc_reset, state should be checked to prevent possible null pointer dereference. Fixes: b7e0b04ae450 ("drm/tegra: Convert to using __drm_atomic_helper_crtc_reset() for reset.") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/gpu/drm/tegra/dc.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c index be61c9d1a4f0..1ed30853bd9e 100644 --- a/drivers/gpu/drm/tegra/dc.c +++ b/drivers/gpu/drm/tegra/dc.c @@ -1388,7 +1388,10 @@ static void tegra_crtc_reset(struct drm_crtc *crtc) if (crtc->state) tegra_crtc_atomic_destroy_state(crtc, crtc->state); - __drm_atomic_helper_crtc_reset(crtc, &state->base); + if (state) + __drm_atomic_helper_crtc_reset(crtc, &state->base); + else + __drm_atomic_helper_crtc_reset(crtc, NULL); } static struct drm_crtc_state * -- 2.34.1

1 year, 1 month

1
0
0 0

[PATCH 06/31] ASoC: sh: rz-ssi: Terminate all the DMA transactions

by Claudiu

From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> In case of full duplex the 1st closed stream doesn't benefit from the dmaengine_terminate_async(). Call it after the companion stream is closed. Fixes: 26ac471c5354 ("ASoC: sh: rz-ssi: Add SSI DMAC support") Cc: stable(a)vger.kernel.org Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> --- sound/soc/renesas/rz-ssi.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/sound/soc/renesas/rz-ssi.c b/sound/soc/renesas/rz-ssi.c index 6efd017aaa7f..2d8721156099 100644 --- a/sound/soc/renesas/rz-ssi.c +++ b/sound/soc/renesas/rz-ssi.c @@ -415,8 +415,12 @@ static int rz_ssi_stop(struct rz_ssi_priv *ssi, struct rz_ssi_stream *strm) rz_ssi_reg_mask_setl(ssi, SSICR, SSICR_TEN | SSICR_REN, 0); /* Cancel all remaining DMA transactions */ - if (rz_ssi_is_dma_enabled(ssi)) - dmaengine_terminate_async(strm->dma_ch); + if (rz_ssi_is_dma_enabled(ssi)) { + if (ssi->playback.dma_ch) + dmaengine_terminate_async(ssi->playback.dma_ch); + if (ssi->capture.dma_ch) + dmaengine_terminate_async(ssi->capture.dma_ch); + } rz_ssi_set_idle(ssi); -- 2.39.2

1 year, 1 month

1
0
0 0

Default profile performance issue on Navi 3x

by Mario Limonciello

Hi, A performance issue on AMD Navi3x dGPUs in 6.10.y and 6.11.y has been improved by patches that landed in 6.12-rc. Can you please bring these 3 patches back to 6.11.y to help performance there as well? commit b932d5ad9257 ("drm/amdgpu/swsmu: fix ordering for setting workload_mask") commit ec1aab7816b0 ("drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs") commit 7c210ca5a2d7 ("drm/amdgpu: handle default profile on on devices without fullscreen 3D") Thanks!

1 year, 1 month

2
1
0 0

Request to backport "ASoC: Intel: mtl-match: Add cs42l43_l0 cs35l56_l23 for MTL"

by Liao, Bard

Hi, commit 84b22af29ff6 ("ASoC: Intel: mtl-match: Add cs42l43_l0 cs35l56_l23 for MTL") upstream. The commit added cs42l43 on SoundWire link 0 and cs35l56 on SoundWire link 2 and 3 configuration support on Intel Meteor Lake support. Audio will not work without this commit if the laptop use the given audio configuration. I wish this commit can be applied to kernel 6.8. It can be applied and built cleanly on it. Thanks, Bard

1 year, 1 month

2
2
0 0

[PATCH 6.6 v2] SUNRPC: Remove BUG_ON call sites

by Dominique Martinet

From: Chuck Lever <chuck.lever(a)oracle.com> [ Upstream commit 789ce196a31dd13276076762204bee87df893e53 ] There is no need to take down the whole system for these assertions. I'd rather not attempt a heroic save here, as some bug has occurred that has left the transport data structures in an unknown state. Just warn and then leak the left-over resources. Acked-by: Christian Brauner <brauner(a)kernel.org> Reviewed-by: NeilBrown <neilb(a)suse.de> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> Signed-off-by: Dominique Martinet <asmadeus(a)codewreck.org> --- v2: resend with signoff properly set as requested v1: https://lkml.kernel.org/r/20241102065203.13291-1-asmadeus@codewreck.org net/sunrpc/svc.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 029c49065016..b43dc8409b1f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -577,11 +577,12 @@ svc_destroy(struct kref *ref) timer_shutdown_sync(&serv->sv_temptimer); /* - * The last user is gone and thus all sockets have to be destroyed to - * the point. Check this. + * Remaining transports at this point are not expected. */ - BUG_ON(!list_empty(&serv->sv_permsocks)); - BUG_ON(!list_empty(&serv->sv_tempsocks)); + WARN_ONCE(!list_empty(&serv->sv_permsocks), + "SVC: permsocks remain for %s\n", serv->sv_program->pg_name); + WARN_ONCE(!list_empty(&serv->sv_tempsocks), + "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name); cache_clean_deferred(serv); -- 2.46.1

1 year, 1 month

2
1
0 0

[PATCH v2] mtd: spi-nor: winbond: fix w25q128 regression

by Linus Walleij

From: Michael Walle <mwalle(a)kernel.org> Upstream commit d35df77707bf5ae1221b5ba1c8a88cf4fcdd4901 ("mtd: spi-nor: winbond: fix w25q128 regression") however the code has changed a lot after v6.6 so the patch did not apply to v6.6 or v6.1 which still has the problem. This patch fixes the issue in the way of the old API and has been tested on hardware. Please apply it for v6.1 and v6.6. Commit 83e824a4a595 ("mtd: spi-nor: Correct flags for Winbond w25q128") removed the flags for non-SFDP devices. It was assumed that it wasn't in use anymore. This wasn't true. Add the no_sfdp_flags as well as the size again. We add the additional flags for dual and quad read because they have been reported to work properly by Hartmut using both older and newer versions of this flash, the similar flashes with 64Mbit and 256Mbit already have these flags and because it will (luckily) trigger our legacy SFDP parsing, so newer versions with SFDP support will still get the parameters from the SFDP tables. Reported-by: Hartmut Birr <e9hack(a)gmail.com> Closes: https://lore.kernel.org/r/CALxbwRo_-9CaJmt7r7ELgu+vOcgk=xZcGHobnKf=oT2=u4d4… Fixes: 83e824a4a595 ("mtd: spi-nor: Correct flags for Winbond w25q128") Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org> Signed-off-by: Michael Walle <mwalle(a)kernel.org> Acked-by: Tudor Ambarus <tudor.ambarus(a)linaro.org> Reviewed-by: Esben Haabendal <esben(a)geanix.com> Reviewed-by: Pratyush Yadav <pratyush(a)kernel.org> Signed-off-by: Pratyush Yadav <pratyush(a)kernel.org> Link: https://lore.kernel.org/r/20240621120929.2670185-1-mwalle@kernel.org [Backported to v6.6 - vastly different due to upstream changes] Reviewed-by: Tudor Ambarus <tudor.ambarus(a)linaro.org> Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org> --- This fix backported to stable v6.6 and v6.1 after reports from OpenWrt users: https://github.com/openwrt/openwrt/issues/16796 --- Changes in v2: - Use the right upstream committ ID (dunno what happened) - Put the commit ID on top on the desired format. - Link to v1: https://lore.kernel.org/r/20241028-v6-6-v1-1-991446d71bb7@linaro.org --- drivers/mtd/spi-nor/winbond.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/mtd/spi-nor/winbond.c b/drivers/mtd/spi-nor/winbond.c index cd99c9a1c568..95dd28b9bf14 100644 --- a/drivers/mtd/spi-nor/winbond.c +++ b/drivers/mtd/spi-nor/winbond.c @@ -120,9 +120,10 @@ static const struct flash_info winbond_nor_parts[] = { NO_SFDP_FLAGS(SECT_4K) }, { "w25q80bl", INFO(0xef4014, 0, 64 * 1024, 16) NO_SFDP_FLAGS(SECT_4K) }, - { "w25q128", INFO(0xef4018, 0, 0, 0) - PARSE_SFDP - FLAGS(SPI_NOR_HAS_LOCK | SPI_NOR_HAS_TB) }, + { "w25q128", INFO(0xef4018, 0, 64 * 1024, 256) + FLAGS(SPI_NOR_HAS_LOCK | SPI_NOR_HAS_TB) + NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ | + SPI_NOR_QUAD_READ) }, { "w25q256", INFO(0xef4019, 0, 64 * 1024, 512) NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ) .fixups = &w25q256_fixups }, --- base-commit: ffc253263a1375a65fa6c9f62a893e9767fbebfa change-id: 20241027-v6-6-7ed05eaccb22 Best regards, -- Linus Walleij <linus.walleij(a)linaro.org>

1 year, 1 month

2
1
0 0

[PATCH 6.1.y] LoongArch: Fix build errors due to backported TIMENS

by Huacai Chen

Commit eb3710efffce1dcff83761db4615f91d93aabfcb ("LoongArch: Add support to clone a time namespace") backports the TIMENS support for LoongArch (corresponding upstream commit aa5e65dc0818bbf676bf06927368ec46867778fd) but causes build errors: CC arch/loongarch/kernel/vdso.o arch/loongarch/kernel/vdso.c: In function ‘vvar_fault’: arch/loongarch/kernel/vdso.c:54:36: error: implicit declaration of function ‘find_timens_vvar_page’ [-Werror=implicit-function-declaration] 54 | struct page *timens_page = find_timens_vvar_page(vma); | ^~~~~~~~~~~~~~~~~~~~~ arch/loongarch/kernel/vdso.c:54:36: warning: initialization of ‘struct page *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] arch/loongarch/kernel/vdso.c: In function ‘vdso_join_timens’: arch/loongarch/kernel/vdso.c:143:25: error: implicit declaration of function ‘zap_vma_pages’; did you mean ‘zap_vma_ptes’? [-Werror=implicit-function-declaration] 143 | zap_vma_pages(vma); | ^~~~~~~~~~~~~ | zap_vma_ptes cc1: some warnings being treated as errors Because in 6.1.y we should define find_timens_vvar_page() by ourselves and use zap_page_range() instead of zap_vma_pages(), so fix it. Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- arch/loongarch/kernel/vdso.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c index 59aa9dd466e8..64eb5386e7b2 100644 --- a/arch/loongarch/kernel/vdso.c +++ b/arch/loongarch/kernel/vdso.c @@ -40,6 +40,8 @@ static struct page *vdso_pages[] = { NULL }; struct vdso_data *vdso_data = generic_vdso_data.data; struct vdso_pcpu_data *vdso_pdata = loongarch_vdso_data.vdata.pdata; +static struct page *find_timens_vvar_page(struct vm_area_struct *vma); + static int vdso_mremap(const struct vm_special_mapping *sm, struct vm_area_struct *new_vma) { current->mm->context.vdso = (void *)(new_vma->vm_start); @@ -139,13 +141,37 @@ int vdso_join_timens(struct task_struct *task, struct time_namespace *ns) mmap_read_lock(mm); for_each_vma(vmi, vma) { + unsigned long size = vma->vm_end - vma->vm_start; + if (vma_is_special_mapping(vma, &vdso_info.data_mapping)) - zap_vma_pages(vma); + zap_page_range(vma, vma->vm_start, size); } mmap_read_unlock(mm); return 0; } + +static struct page *find_timens_vvar_page(struct vm_area_struct *vma) +{ + if (likely(vma->vm_mm == current->mm)) + return current->nsproxy->time_ns->vvar_page; + + /* + * VM_PFNMAP | VM_IO protect .fault() handler from being called + * through interfaces like /proc/$pid/mem or + * process_vm_{readv,writev}() as long as there's no .access() + * in special_mapping_vmops. + * For more details check_vma_flags() and __access_remote_vm() + */ + WARN(1, "vvar_page accessed remotely"); + + return NULL; +} +#else +static struct page *find_timens_vvar_page(struct vm_area_struct *vma) +{ + return NULL; +} #endif static unsigned long vdso_base(void) -- 2.43.5

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: don't install PMD mappings when THPs are disabled by the" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 2b0f922323ccfa76219bcaacd35cd50aeaa13592 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024101842-empty-espresso-c8a3@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 2b0f922323ccfa76219bcaacd35cd50aeaa13592 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david(a)redhat.com> Date: Fri, 11 Oct 2024 12:24:45 +0200 Subject: [PATCH] mm: don't install PMD mappings when THPs are disabled by the hw/process/vma We (or rather, readahead logic :) ) might be allocating a THP in the pagecache and then try mapping it into a process that explicitly disabled THP: we might end up installing PMD mappings. This is a problem for s390x KVM, which explicitly remaps all PMD-mapped THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before starting the VM. For example, starting a VM backed on a file system with large folios supported makes the VM crash when the VM tries accessing such a mapping using KVM. Is it also a problem when the HW disabled THP using TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case without X86_FEATURE_PSE. In the future, we might be able to do better on s390x and only disallow PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED really wants. For now, fix it by essentially performing the same check as would be done in __thp_vma_allowable_orders() or in shmem code, where this works as expected, and disallow PMD mappings, making us fallback to PTE mappings. Link: https://lkml.kernel.org/r/20241011102445.934409-3-david@redhat.com Fixes: 793917d997df ("mm/readahead: Add large folio readahead") Signed-off-by: David Hildenbrand <david(a)redhat.com> Reported-by: Leo Fu <bfu(a)redhat.com> Tested-by: Thomas Huth <thuth(a)redhat.com> Cc: Thomas Huth <thuth(a)redhat.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com> Cc: Janosch Frank <frankja(a)linux.ibm.com> Cc: Claudio Imbrenda <imbrenda(a)linux.ibm.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index c0869a962ddd..30feedabc932 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4920,6 +4920,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) pmd_t entry; vm_fault_t ret = VM_FAULT_FALLBACK; + /* + * It is too late to allocate a small folio, we already have a large + * folio in the pagecache: especially s390 KVM cannot tolerate any + * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any + * PMD mappings if THPs are disabled. + */ + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) + return ret; + if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER)) return ret;

1 year, 1 month

4
10
0 0

[PATCH V4] wifi: rtlwifi: Drastically reduce the attempts to read efuse in case of failures

by Guilherme G. Piccoli

Syzkaller reported a hung task with uevent_show() on stack trace. That specific issue was addressed by another commit [0], but even with that fix applied (for example, running v6.12-rc5) we face another type of hung task that comes from the same reproducer [1]. By investigating that, we could narrow it to the following path: (a) Syzkaller emulates a Realtek USB WiFi adapter using raw-gadget and dummy_hcd infrastructure. (b) During the probe of rtl8192cu, the driver ends-up performing an efuse read procedure (which is related to EEPROM load IIUC), and here lies the issue: the function read_efuse() calls read_efuse_byte() many times, as loop iterations depending on the efuse size (in our example, 512 in total). This procedure for reading efuse bytes relies in a loop that performs an I/O read up to *10k* times in case of failures. We measured the time of the loop inside read_efuse_byte() alone, and in this reproducer (which involves the dummy_hcd emulation layer), it takes 15 seconds each. As a consequence, we have the driver stuck in its probe routine for big time, exposing a stack trace like below if we attempt to reboot the system, for example: task:kworker/0:3 state:D stack:0 pid:662 tgid:662 ppid:2 flags:0x00004000 Workqueue: usb_hub_wq hub_event Call Trace: __schedule+0xe22/0xeb6 schedule_timeout+0xe7/0x132 __wait_for_common+0xb5/0x12e usb_start_wait_urb+0xc5/0x1ef ? usb_alloc_urb+0x95/0xa4 usb_control_msg+0xff/0x184 _usbctrl_vendorreq_sync+0xa0/0x161 _usb_read_sync+0xb3/0xc5 read_efuse_byte+0x13c/0x146 read_efuse+0x351/0x5f0 efuse_read_all_map+0x42/0x52 rtl_efuse_shadow_map_update+0x60/0xef rtl_get_hwinfo+0x5d/0x1c2 rtl92cu_read_eeprom_info+0x10a/0x8d5 ? rtl92c_read_chip_version+0x14f/0x17e rtl_usb_probe+0x323/0x851 usb_probe_interface+0x278/0x34b really_probe+0x202/0x4a4 __driver_probe_device+0x166/0x1b2 driver_probe_device+0x2f/0xd8 [...] We propose hereby to drastically reduce the attempts of doing the I/O reads in case of failures, restricted to USB devices (given that they're inherently slower than PCIe ones). By retrying up to 10 times (instead of 10000), we got reponsiveness in the reproducer, while seems reasonable to believe that there's no sane USB device implementation in the field requiring this amount of retries at every I/O read in order to properly work. Based on that assumption, it'd be good to have it backported to stable but maybe not since driver implementation (the 10k number comes from day 0), perhaps up to 6.x series makes sense. [0] Commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race") [1] A note about that: this syzkaller report presents multiple reproducers that differs by the type of emulated USB device. For this specific case, check the entry from 2024/08/08 06:23 in the list of crashes; the C repro is available at https://syzkaller.appspot.com/text?tag=ReproC&x=1521fc83980000. Cc: stable(a)vger.kernel.org # v6.1+ Reported-by: syzbot+edd9fe0d3a65b14588d5(a)syzkaller.appspotmail.com Tested-by: Bitterblue Smith <rtl8821cerfe2(a)gmail.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> --- V4: - Changed the if conditional to check if device is of USB type, instead of PCI type - thanks Ping-Ke Shih. - Re-addded the Bitterblue Smith's Tested-by tag, which I forgot in V3 xD V3 link: https://lore.kernel.org/r/20241031155731.1253259-1-gpiccoli@igalia.com/ drivers/net/wireless/realtek/rtlwifi/efuse.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/efuse.c b/drivers/net/wireless/realtek/rtlwifi/efuse.c index 82cf5fb5175f..6518e77b89f5 100644 --- a/drivers/net/wireless/realtek/rtlwifi/efuse.c +++ b/drivers/net/wireless/realtek/rtlwifi/efuse.c @@ -162,10 +162,19 @@ void efuse_write_1byte(struct ieee80211_hw *hw, u16 address, u8 value) void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf) { struct rtl_priv *rtlpriv = rtl_priv(hw); + u16 max_attempts = 10000; u32 value32; u8 readbyte; u16 retry; + /* + * In case of USB devices, transfer speeds are limited, hence + * efuse I/O reads could be (way) slower. So, decrease (a lot) + * the read attempts in case of failures. + */ + if (rtlpriv->rtlhal.interface == INTF_USB) + max_attempts = 10; + rtl_write_byte(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL] + 1, (_offset & 0xff)); readbyte = rtl_read_byte(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL] + 2); @@ -178,7 +187,7 @@ void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf) retry = 0; value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]); - while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < 10000)) { + while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < max_attempts)) { value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]); retry++; -- 2.46.2

1 year, 1 month

2
1
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

3
2
0 0

[PATCH 6.6] SUNRPC: Remove BUG_ON call sites

by Dominique Martinet

From: Chuck Lever <chuck.lever(a)oracle.com> [ Upstream commit 789ce196a31dd13276076762204bee87df893e53 ] There is no need to take down the whole system for these assertions. I'd rather not attempt a heroic save here, as some bug has occurred that has left the transport data structures in an unknown state. Just warn and then leak the left-over resources. Acked-by: Christian Brauner <brauner(a)kernel.org> Reviewed-by: NeilBrown <neilb(a)suse.de> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com> --- I've hit this BUG at home when restarting the nfs-server service and while that didn't bring the whole system down it did kill a thread with the nfsd_mutex lock held, making exportfs & other related commands all hang in unkillable state trying to grab the lock. So this is purely selfish so that this won't happen again next time I upgrade :-) I'd like to say I have any idea why the bug hit on that 6.6.42 (the sv_permsocks one did) and help with the underlying issue, but I honestly didn't do anything fancy and don't have anything interesting in logs (except the bug itself, happy to forward it if someone cares); would have been possible to debug this if I had a crash dump but it's not setup on this machine and just having this down to WARN if probably good enough... Cheers, net/sunrpc/svc.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c index 029c49065016..b43dc8409b1f 100644 --- a/net/sunrpc/svc.c +++ b/net/sunrpc/svc.c @@ -577,11 +577,12 @@ svc_destroy(struct kref *ref) timer_shutdown_sync(&serv->sv_temptimer); /* - * The last user is gone and thus all sockets have to be destroyed to - * the point. Check this. + * Remaining transports at this point are not expected. */ - BUG_ON(!list_empty(&serv->sv_permsocks)); - BUG_ON(!list_empty(&serv->sv_tempsocks)); + WARN_ONCE(!list_empty(&serv->sv_permsocks), + "SVC: permsocks remain for %s\n", serv->sv_program->pg_name); + WARN_ONCE(!list_empty(&serv->sv_tempsocks), + "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name); cache_clean_deferred(serv); -- 2.46.1

1 year, 1 month

2
2
0 0

FAILED: patch "[PATCH] mm: shmem: fix data-race in shmem_getattr()" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x d949d1d14fa281ace388b1de978e8f2cd52875cf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110553-blaming-ammonium-3ec5@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23c..4ba1d00fabda 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE;

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: shmem: fix data-race in shmem_getattr()" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x d949d1d14fa281ace388b1de978e8f2cd52875cf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110551-frosty-diploma-2492@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23c..4ba1d00fabda 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE;

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: shmem: fix data-race in shmem_getattr()" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x d949d1d14fa281ace388b1de978e8f2cd52875cf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110550-excretory-dig-7bda@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23c..4ba1d00fabda 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE;

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: shmem: fix data-race in shmem_getattr()" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x d949d1d14fa281ace388b1de978e8f2cd52875cf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110549-slate-undrilled-d94c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23c..4ba1d00fabda 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE;

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: shmem: fix data-race in shmem_getattr()" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x d949d1d14fa281ace388b1de978e8f2cd52875cf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110547-snooper-saint-e58c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23c..4ba1d00fabda 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE;

1 year, 1 month

2
1
0 0

[PATCH V3] vp_vdpa: fix id_table array not null terminated error

by Xiaoguang Wang

Allocate one extra virtio_device_id as null terminator, otherwise vdpa_mgmtdev_get_classes() may iterate multiple times and visit undefined memory. Fixes: ffbda8e9df10 ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa") Cc: stable(a)vger.kernel.org Suggested-by: Parav Pandit <parav(a)nvidia.com> Signed-off-by: Angus Chen <angus.chen(a)jaguarmicro.com> Signed-off-by: Xiaoguang Wang <lege.wang(a)jaguarmicro.com> --- V3: Use array assignment style for mdev_id. V2: Use kcalloc() api. --- drivers/vdpa/virtio_pci/vp_vdpa.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index ac4ab22f7d8b..16380764275e 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -612,7 +612,11 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto mdev_err; } - mdev_id = kzalloc(sizeof(struct virtio_device_id), GFP_KERNEL); + /* + * id_table should be a null terminated array, so allocate one additional + * entry here, see vdpa_mgmtdev_get_classes(). + */ + mdev_id = kcalloc(2, sizeof(struct virtio_device_id), GFP_KERNEL); if (!mdev_id) { err = -ENOMEM; goto mdev_id_err; @@ -632,8 +636,8 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto probe_err; } - mdev_id->device = mdev->id.device; - mdev_id->vendor = mdev->id.vendor; + mdev_id[0].device = mdev->id.device; + mdev_id[0].vendor = mdev->id.vendor; mgtdev->id_table = mdev_id; mgtdev->max_supported_vqs = vp_modern_get_num_queues(mdev); mgtdev->supported_features = vp_modern_get_features(mdev); -- 2.40.1

1 year, 1 month

3
2
0 0

[PATCH v2] mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_buckets_create

by Koichiro Den

Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") reduced ARCH_KMALLOC_MINALIGN to 8 on arm64. However, with KASAN_HW_TAGS enabled, arch_slab_minalign() becomes 16. This causes kmalloc_caches[*][8] to be aliased to kmalloc_caches[*][16], resulting in kmem_buckets_create() attempting to create a kmem_cache for size 16 twice. This duplication triggers warnings on boot: [ 2.325108] ------------[ cut here ]------------ [ 2.325135] kmem_cache of name 'memdup_user-16' already exists [ 2.325783] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:107 __kmem_cache_create_args+0xb8/0x3b0 [ 2.327957] Modules linked in: [ 2.328550] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5mm-unstable-arm64+ #12 [ 2.328683] Hardware name: QEMU QEMU Virtual Machine, BIOS 2024.02-2 03/11/2024 [ 2.328790] pstate: 61000009 (nZCv daif -PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 2.328911] pc : __kmem_cache_create_args+0xb8/0x3b0 [ 2.328930] lr : __kmem_cache_create_args+0xb8/0x3b0 [ 2.328942] sp : ffff800083d6fc50 [ 2.328961] x29: ffff800083d6fc50 x28: f2ff0000c1674410 x27: ffff8000820b0598 [ 2.329061] x26: 000000007fffffff x25: 0000000000000010 x24: 0000000000002000 [ 2.329101] x23: ffff800083d6fce8 x22: ffff8000832222e8 x21: ffff800083222388 [ 2.329118] x20: f2ff0000c1674410 x19: f5ff0000c16364c0 x18: ffff800083d80030 [ 2.329135] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 2.329152] x14: 0000000000000000 x13: 0a73747369786520 x12: 79646165726c6120 [ 2.329169] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000 [ 2.329194] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 2.329210] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 2.329226] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 2.329291] Call trace: [ 2.329407] __kmem_cache_create_args+0xb8/0x3b0 [ 2.329499] kmem_buckets_create+0xfc/0x320 [ 2.329526] init_user_buckets+0x34/0x78 [ 2.329540] do_one_initcall+0x64/0x3c8 [ 2.329550] kernel_init_freeable+0x26c/0x578 [ 2.329562] kernel_init+0x3c/0x258 [ 2.329574] ret_from_fork+0x10/0x20 [ 2.329698] ---[ end trace 0000000000000000 ]--- [ 2.403704] ------------[ cut here ]------------ [ 2.404716] kmem_cache of name 'msg_msg-16' already exists [ 2.404801] WARNING: CPU: 2 PID: 1 at mm/slab_common.c:107 __kmem_cache_create_args+0xb8/0x3b0 [ 2.404842] Modules linked in: [ 2.404971] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.0-rc5mm-unstable-arm64+ #12 [ 2.405026] Tainted: [W]=WARN [ 2.405043] Hardware name: QEMU QEMU Virtual Machine, BIOS 2024.02-2 03/11/2024 [ 2.405057] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.405079] pc : __kmem_cache_create_args+0xb8/0x3b0 [ 2.405100] lr : __kmem_cache_create_args+0xb8/0x3b0 [ 2.405111] sp : ffff800083d6fc50 [ 2.405115] x29: ffff800083d6fc50 x28: fbff0000c1674410 x27: ffff8000820b0598 [ 2.405135] x26: 000000000000ffd0 x25: 0000000000000010 x24: 0000000000006000 [ 2.405153] x23: ffff800083d6fce8 x22: ffff8000832222e8 x21: ffff800083222388 [ 2.405169] x20: fbff0000c1674410 x19: fdff0000c163d6c0 x18: ffff800083d80030 [ 2.405185] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 2.405201] x14: 0000000000000000 x13: 0a73747369786520 x12: 79646165726c6120 [ 2.405217] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000 [ 2.405233] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 2.405248] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 2.405271] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 2.405287] Call trace: [ 2.405293] __kmem_cache_create_args+0xb8/0x3b0 [ 2.405305] kmem_buckets_create+0xfc/0x320 [ 2.405315] init_msg_buckets+0x34/0x78 [ 2.405326] do_one_initcall+0x64/0x3c8 [ 2.405337] kernel_init_freeable+0x26c/0x578 [ 2.405348] kernel_init+0x3c/0x258 [ 2.405360] ret_from_fork+0x10/0x20 [ 2.405370] ---[ end trace 0000000000000000 ]--- To address this, alias kmem_cache for sizes smaller than min alignment to the aligned sized kmem_cache, as done with the default system kmalloc bucket. Cc: <stable(a)vger.kernel.org> # 6.11.x Fixes: b32801d1255b ("mm/slab: Introduce kmem_buckets_create() and family") Signed-off-by: Koichiro Den <koichiro.den(a)gmail.com> --- Changes in v2: - Simplify by just reusing calculated aligned size stored in the default kmalloc_caches --- mm/slab_common.c | 36 +++++++++++++++++++++++------------- 1 file changed, 23 insertions(+), 13 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 3d26c257ed8b..db6ffe53c23e 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -380,8 +380,11 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, unsigned int usersize, void (*ctor)(void *)) { + unsigned long mask = 0; + unsigned int idx; kmem_buckets *b; - int idx; + + BUILD_BUG_ON(ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]) > BITS_PER_LONG); /* * When the separate buckets API is not built in, just return @@ -403,7 +406,7 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) { char *short_size, *cache_name; unsigned int cache_useroffset, cache_usersize; - unsigned int size; + unsigned int size, aligned_idx; if (!kmalloc_caches[KMALLOC_NORMAL][idx]) continue; @@ -416,10 +419,6 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, if (WARN_ON(!short_size)) goto fail; - cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1); - if (WARN_ON(!cache_name)) - goto fail; - if (useroffset >= size) { cache_useroffset = 0; cache_usersize = 0; @@ -427,18 +426,29 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, cache_useroffset = useroffset; cache_usersize = min(size - cache_useroffset, usersize); } - (*b)[idx] = kmem_cache_create_usercopy(cache_name, size, - 0, flags, cache_useroffset, - cache_usersize, ctor); - kfree(cache_name); - if (WARN_ON(!(*b)[idx])) - goto fail; + + aligned_idx = __kmalloc_index(size, false); + if (!(*b)[aligned_idx]) { + cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1); + if (WARN_ON(!cache_name)) + goto fail; + (*b)[aligned_idx] = kmem_cache_create_usercopy(cache_name, size, + 0, flags, cache_useroffset, + cache_usersize, ctor); + if (WARN_ON(!(*b)[aligned_idx])) { + kfree(cache_name); + goto fail; + } + set_bit(aligned_idx, &mask); + } + if (idx != aligned_idx) + (*b)[idx] = (*b)[aligned_idx]; } return b; fail: - for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) + for_each_set_bit(idx, &mask, ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL])) kmem_cache_destroy((*b)[idx]); kmem_cache_free(kmem_buckets_cache, b); -- 2.43.0

1 year, 1 month

3
3
0 0

[PATCH net 0/2] mptcp: pm: fix wrong perm and sock kfree

by Matthieu Baerts (NGI0)

Two small fixes related to the MPTCP path-manager: - Patch 1: remove an accidental restriction to admin users to list MPTCP endpoints. A regression from v6.7. - Patch 2: correctly use sock_kfree_s() instead of kfree() in the userspace PM. A fix for another fix introduced in v6.4 and backportable up to v5.19. Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org> --- Geliang Tang (1): mptcp: use sock_kfree_s instead of kfree Matthieu Baerts (NGI0) (1): mptcp: no admin perm to list endpoints Documentation/netlink/specs/mptcp_pm.yaml | 1 - net/mptcp/mptcp_pm_gen.c | 1 - net/mptcp/pm_userspace.c | 3 ++- 3 files changed, 2 insertions(+), 3 deletions(-) --- base-commit: 5ccdcdf186aec6b9111845fd37e1757e9b413e2f change-id: 20241104-net-mptcp-misc-6-12-34ca759f78ee Best regards, -- Matthieu Baerts (NGI0) <matttbe(a)kernel.org>

1 year, 1 month

2
3
0 0

[PATCH v2] mm: page_alloc: move mlocked flag clearance into free_pages_prepare()

by Roman Gushchin

Syzbot reported a bad page state problem caused by a page being freed using free_page() still having a mlocked flag at free_pages_prepare() stage: BUG: Bad page state in process syz.0.15 pfn:1137bb page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff8881137bb870 pfn:0x1137bb flags: 0x400000000080000(mlocked|node=0|zone=1) raw: 0400000000080000 0000000000000000 dead000000000122 0000000000000000 raw: ffff8881137bb870 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set page_owner tracks the page as allocated page last allocated via order 0, migratetype Unmovable, gfp_mask 0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), pid 3005, tgid 3004 (syz.0.15), ts 61546 608067, free_ts 61390082085 set_page_owner include/linux/page_owner.h:32 [inline] post_alloc_hook+0x1f3/0x230 mm/page_alloc.c:1537 prep_new_page mm/page_alloc.c:1545 [inline] get_page_from_freelist+0x3008/0x31f0 mm/page_alloc.c:3457 __alloc_pages_noprof+0x292/0x7b0 mm/page_alloc.c:4733 alloc_pages_mpol_noprof+0x3e8/0x630 mm/mempolicy.c:2265 kvm_coalesced_mmio_init+0x1f/0xf0 virt/kvm/coalesced_mmio.c:99 kvm_create_vm virt/kvm/kvm_main.c:1235 [inline] kvm_dev_ioctl_create_vm virt/kvm/kvm_main.c:5500 [inline] kvm_dev_ioctl+0x13bb/0x2320 virt/kvm/kvm_main.c:5542 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:907 [inline] __se_sys_ioctl+0xf9/0x170 fs/ioctl.c:893 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x69/0x110 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e page last free pid 951 tgid 951 stack trace: reset_page_owner include/linux/page_owner.h:25 [inline] free_pages_prepare mm/page_alloc.c:1108 [inline] free_unref_page+0xcb1/0xf00 mm/page_alloc.c:2638 vfree+0x181/0x2e0 mm/vmalloc.c:3361 delayed_vfree_work+0x56/0x80 mm/vmalloc.c:3282 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0xa5c/0x17a0 kernel/workqueue.c:3310 worker_thread+0xa2b/0xf70 kernel/workqueue.c:3391 kthread+0x2df/0x370 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 A reproducer is available here: https://syzkaller.appspot.com/x/repro.c?x=1437939f980000 The problem was originally introduced by commit b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance"): it was handling focused on handling pagecache and anonymous memory and wasn't suitable for lower level get_page()/free_page() API's used for example by KVM, as with this reproducer. Fix it by moving the mlocked flag clearance down to free_page_prepare(). The bug itself if fairly old and harmless (aside from generating these warnings). Closes: https://syzkaller.appspot.com/x/report.txt?x=169a47d0580000 Fixes: b109b87050df ("mm/munlock: replace clear_page_mlock() by final clearance") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Cc: <stable(a)vger.kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Vlastimil Babka <vbabka(a)suse.cz> --- mm/page_alloc.c | 15 +++++++++++++++ mm/swap.c | 14 -------------- 2 files changed, 15 insertions(+), 14 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bc55d39eb372..7535d78862ab 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1044,6 +1044,7 @@ __always_inline bool free_pages_prepare(struct page *page, bool skip_kasan_poison = should_skip_kasan_poison(page); bool init = want_init_on_free(); bool compound = PageCompound(page); + struct folio *folio = page_folio(page); VM_BUG_ON_PAGE(PageTail(page), page); @@ -1053,6 +1054,20 @@ __always_inline bool free_pages_prepare(struct page *page, if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); + /* + * In rare cases, when truncation or holepunching raced with + * munlock after VM_LOCKED was cleared, Mlocked may still be + * found set here. This does not indicate a problem, unless + * "unevictable_pgs_cleared" appears worryingly large. + */ + if (unlikely(folio_test_mlocked(folio))) { + long nr_pages = folio_nr_pages(folio); + + __folio_clear_mlocked(folio); + zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); + count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); + } + if (unlikely(PageHWPoison(page)) && !order) { /* Do not let hwpoison pages hit pcplists/buddy */ reset_page_owner(page, order); diff --git a/mm/swap.c b/mm/swap.c index 835bdf324b76..7cd0f4719423 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -78,20 +78,6 @@ static void __page_cache_release(struct folio *folio, struct lruvec **lruvecp, lruvec_del_folio(*lruvecp, folio); __folio_clear_lru_flags(folio); } - - /* - * In rare cases, when truncation or holepunching raced with - * munlock after VM_LOCKED was cleared, Mlocked may still be - * found set here. This does not indicate a problem, unless - * "unevictable_pgs_cleared" appears worryingly large. - */ - if (unlikely(folio_test_mlocked(folio))) { - long nr_pages = folio_nr_pages(folio); - - __folio_clear_mlocked(folio); - zone_stat_mod_folio(folio, NR_MLOCK, -nr_pages); - count_vm_events(UNEVICTABLE_PGCLEARED, nr_pages); - } } /* -- 2.47.0.105.g07ac214952-goog

1 year, 1 month

6
17
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 0b04fbe886b4274c8e5855011233aaa69fec6e75 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:52 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7f4926194e50f..e06a6fdc0bab7 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10750,6 +10750,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x1404, "Clevo N150CU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x14a1, "Clevo L141MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x2624, "Clevo L240TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x28c1, "Clevo V370VND", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg()" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 6bd301819f8f69331a55ae2336c8b111fc933f3d Mon Sep 17 00:00:00 2001 From: Zicheng Qu <quzicheng(a)huawei.com> Date: Tue, 22 Oct 2024 13:43:54 +0000 Subject: [PATCH] staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg() In the ad9832_write_frequency() function, clk_get_rate() might return 0. This can lead to a division by zero when calling ad9832_calc_freqreg(). The check if (fout > (clk_get_rate(st->mclk) / 2)) does not protect against the case when fout is 0. The ad9832_write_frequency() function is called from ad9832_write(), and fout is derived from a text buffer, which can contain any value. Link: https://lore.kernel.org/all/2024100904-CVE-2024-47663-9bdc@gregkh/ Fixes: ea707584bac1 ("Staging: IIO: DDS: AD9832 / AD9835 driver") Cc: stable(a)vger.kernel.org Signed-off-by: Zicheng Qu <quzicheng(a)huawei.com> Reviewed-by: Nuno Sa <nuno.sa(a)analog.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://patch.msgid.link/20241022134354.574614-1-quzicheng@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> --- drivers/staging/iio/frequency/ad9832.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/staging/iio/frequency/ad9832.c b/drivers/staging/iio/frequency/ad9832.c index 6c390c4eb26de..492612e8f8bad 100644 --- a/drivers/staging/iio/frequency/ad9832.c +++ b/drivers/staging/iio/frequency/ad9832.c @@ -129,12 +129,15 @@ static unsigned long ad9832_calc_freqreg(unsigned long mclk, unsigned long fout) static int ad9832_write_frequency(struct ad9832_state *st, unsigned int addr, unsigned long fout) { + unsigned long clk_freq; unsigned long regval; - if (fout > (clk_get_rate(st->mclk) / 2)) + clk_freq = clk_get_rate(st->mclk); + + if (!clk_freq || fout > (clk_freq / 2)) return -EINVAL; - regval = ad9832_calc_freqreg(clk_get_rate(st->mclk), fout); + regval = ad9832_calc_freqreg(clk_freq, fout); st->freq_data[0] = cpu_to_be16((AD9832_CMD_FRE8BITSW << CMD_SHIFT) | (addr << ADD_SHIFT) | -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Limit internal Mic boost on Dell platform" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 78e7be018784934081afec77f96d49a2483f9188 Mon Sep 17 00:00:00 2001 From: Kailang Yang <kailang(a)realtek.com> Date: Fri, 18 Oct 2024 13:53:24 +0800 Subject: [PATCH] ALSA: hda/realtek: Limit internal Mic boost on Dell platform Dell want to limit internal Mic boost on all Dell platform. Signed-off-by: Kailang Yang <kailang(a)realtek.com> Cc: <stable(a)vger.kernel.org> Link: https://lore.kernel.org/561fc5f5eff04b6cbd79ed173cd1c1db@realtek.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 3567b14b52b7c..784ac058418fc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7521,6 +7521,7 @@ enum { ALC286_FIXUP_SONY_MIC_NO_PRESENCE, ALC269_FIXUP_PINCFG_NO_HP_TO_LINEOUT, ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, @@ -7555,6 +7556,7 @@ enum { ALC255_FIXUP_ACER_MIC_NO_PRESENCE, ALC255_FIXUP_ASUS_MIC_NO_PRESENCE, ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC255_FIXUP_DELL2_MIC_NO_PRESENCE, ALC255_FIXUP_HEADSET_MODE, ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC, @@ -8114,6 +8116,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE }, + [ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC269_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -8394,6 +8402,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC255_FIXUP_HEADSET_MODE }, + [ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC255_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -11076,6 +11090,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, .name = "dell-headset-dock"}, {.id = ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, .name = "dell-headset3"}, {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, .name = "dell-headset4"}, + {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, .name = "dell-headset4-quiet"}, {.id = ALC283_FIXUP_CHROME_BOOK, .name = "alc283-dac-wcaps"}, {.id = ALC283_FIXUP_SENSE_COMBO_JACK, .name = "alc283-sense-combo"}, {.id = ALC292_FIXUP_TPT440_DOCK, .name = "tpt440-dock"}, @@ -11630,16 +11645,16 @@ static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = { SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1b, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, {0x19, 0x40000000}, {0x1b, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, + SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: usb-audio: Add quirks for Dell WD19 dock" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4413665dd6c528b31284119e3571c25f371e1c36 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20Sch=C3=A4r?= <jan(a)jschaer.ch> Date: Tue, 29 Oct 2024 23:12:49 +0100 Subject: [PATCH] ALSA: usb-audio: Add quirks for Dell WD19 dock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The WD19 family of docks has the same audio chipset as the WD15. This change enables jack detection on the WD19. We don't need the dell_dock_mixer_init quirk for the WD19. It is only needed because of the dell_alc4020_map quirk for the WD15 in mixer_maps.c, which disables the volume controls. Even for the WD15, this quirk was apparently only needed when the dock firmware was not updated. Signed-off-by: Jan Schär <jan(a)jschaer.ch> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/usb/mixer_quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index 2a9594f34dac6..6456e87e2f397 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -4042,6 +4042,9 @@ int snd_usb_mixer_apply_create_quirk(struct usb_mixer_interface *mixer) break; err = dell_dock_mixer_init(mixer); break; + case USB_ID(0x0bda, 0x402e): /* Dell WD19 dock */ + err = dell_dock_mixer_create(mixer); + break; case USB_ID(0x2a39, 0x3fd2): /* RME ADI-2 Pro */ case USB_ID(0x2a39, 0x3fd3): /* RME ADI-2 DAC */ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: vdso: Prevent the compiler from inserting calls to memset()" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From bf40167d54d55d4b54d0103713d86a8638fb9290 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 16 Oct 2024 10:36:24 +0200 Subject: [PATCH] riscv: vdso: Prevent the compiler from inserting calls to memset() The compiler is smart enough to insert a call to memset() in riscv_vdso_get_cpus(), which generates a dynamic relocation. So prevent this by using -fno-builtin option. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Reviewed-by: Guo Ren <guoren(a)kernel.org> Link: https://lore.kernel.org/r/20241016083625.136311-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/kernel/vdso/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index 960feb1526caa..3f1c4b2d0b064 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -18,6 +18,7 @@ obj-vdso = $(patsubst %, %.o, $(vdso-syms)) note.o ccflags-y := -fno-stack-protector ccflags-y += -DDISABLE_BRANCH_PROFILING +ccflags-y += -fno-builtin ifneq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o += -fPIC -include $(c-gettimeofday-y) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1" failed to apply to v4.19-stable tree

by Sasha Levin

The patch below does not apply to the v4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From e49370d769e71456db3fbd982e95bab8c69f73e8 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:53 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e06a6fdc0bab7..571fa8a6c9e12 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11008,6 +11008,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1d05, 0x1409, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1d17, 0x3288, "Haier Boyue G42", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 0b04fbe886b4274c8e5855011233aaa69fec6e75 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:52 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7f4926194e50f..e06a6fdc0bab7 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10750,6 +10750,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x1404, "Clevo N150CU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x14a1, "Clevo L141MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x2624, "Clevo L240TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x28c1, "Clevo V370VND", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Limit internal Mic boost on Dell platform" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 78e7be018784934081afec77f96d49a2483f9188 Mon Sep 17 00:00:00 2001 From: Kailang Yang <kailang(a)realtek.com> Date: Fri, 18 Oct 2024 13:53:24 +0800 Subject: [PATCH] ALSA: hda/realtek: Limit internal Mic boost on Dell platform Dell want to limit internal Mic boost on all Dell platform. Signed-off-by: Kailang Yang <kailang(a)realtek.com> Cc: <stable(a)vger.kernel.org> Link: https://lore.kernel.org/561fc5f5eff04b6cbd79ed173cd1c1db@realtek.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 3567b14b52b7c..784ac058418fc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7521,6 +7521,7 @@ enum { ALC286_FIXUP_SONY_MIC_NO_PRESENCE, ALC269_FIXUP_PINCFG_NO_HP_TO_LINEOUT, ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, @@ -7555,6 +7556,7 @@ enum { ALC255_FIXUP_ACER_MIC_NO_PRESENCE, ALC255_FIXUP_ASUS_MIC_NO_PRESENCE, ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC255_FIXUP_DELL2_MIC_NO_PRESENCE, ALC255_FIXUP_HEADSET_MODE, ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC, @@ -8114,6 +8116,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE }, + [ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC269_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -8394,6 +8402,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC255_FIXUP_HEADSET_MODE }, + [ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC255_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -11076,6 +11090,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, .name = "dell-headset-dock"}, {.id = ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, .name = "dell-headset3"}, {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, .name = "dell-headset4"}, + {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, .name = "dell-headset4-quiet"}, {.id = ALC283_FIXUP_CHROME_BOOK, .name = "alc283-dac-wcaps"}, {.id = ALC283_FIXUP_SENSE_COMBO_JACK, .name = "alc283-sense-combo"}, {.id = ALC292_FIXUP_TPT440_DOCK, .name = "tpt440-dock"}, @@ -11630,16 +11645,16 @@ static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = { SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1b, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, {0x19, 0x40000000}, {0x1b, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, + SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 117932eea99b729ee5d12783601a4f7f5fd58a23 Mon Sep 17 00:00:00 2001 From: Chen Ridong <chenridong(a)huawei.com> Date: Tue, 8 Oct 2024 11:24:56 +0000 Subject: [PATCH] cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction A hung_task problem shown below was found: INFO: task kworker/0:0:8 blocked for more than 327 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Workqueue: events cgroup_bpf_release Call Trace: <TASK> __schedule+0x5a2/0x2050 ? find_held_lock+0x33/0x100 ? wq_worker_sleeping+0x9e/0xe0 schedule+0x9f/0x180 schedule_preempt_disabled+0x25/0x50 __mutex_lock+0x512/0x740 ? cgroup_bpf_release+0x1e/0x4d0 ? cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? cgroup_bpf_release+0x1e/0x4d0 ? mutex_lock_nested+0x2b/0x40 ? __pfx_delay_tsc+0x10/0x10 mutex_lock_nested+0x2b/0x40 cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0 ? process_scheduled_works+0x161/0x8a0 process_scheduled_works+0x23a/0x8a0 worker_thread+0x231/0x5b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x14d/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x59/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This issue can be reproduced by the following pressuse test: 1. A large number of cpuset cgroups are deleted. 2. Set cpu on and off repeatly. 3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature. The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks, which may lead to deadlock. It can lead to a deadlock through the following steps: 1. A large number of cpusets are deleted asynchronously, which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are cgroup_bpf_release works, and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram, there are 256 (in the acvtive queue) + n (in the inactive queue) works. 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release), so it will be blocked for a while. 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2. 4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn, it will call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read. However, cpuset_css_offline will be blocked for step 3. 5. At this moment, there are 256 works in active queue that are cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as a result, all of them are blocked. Consequently, sscs.work can not be executed. Ultimately, this situation leads to four processes being blocked, forming a deadlock. system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) ... 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) ...blocking... css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) ...blocking... 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking... To fix the problem, place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn't create a large number of concurrent work items. If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work items, it should use its own dedicated workqueue. Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Cc: stable(a)vger.kernel.org # v5.3+ Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei… Tested-by: Vishal Chourasia <vishalc(a)linux.ibm.com> Signed-off-by: Chen Ridong <chenridong(a)huawei.com> Signed-off-by: Tejun Heo <tj(a)kernel.org> --- kernel/bpf/cgroup.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index e7113d700b878..025d7e2214aeb 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -24,6 +24,23 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE); EXPORT_SYMBOL(cgroup_bpf_enabled_key); +/* + * cgroup bpf destruction makes heavy use of work items and there can be a lot + * of concurrent destructions. Use a separate workqueue so that cgroup bpf + * destruction work items don't end up filling up max_active of system_wq + * which may lead to deadlock. + */ +static struct workqueue_struct *cgroup_bpf_destroy_wq; + +static int __init cgroup_bpf_wq_init(void) +{ + cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy", 0, 1); + if (!cgroup_bpf_destroy_wq) + panic("Failed to alloc workqueue for cgroup bpf destroy.\n"); + return 0; +} +core_initcall(cgroup_bpf_wq_init); + /* __always_inline is necessary to prevent indirect call through run_prog * function pointer. */ @@ -334,7 +351,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref) struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt); INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release); - queue_work(system_wq, &cgrp->bpf.release_work); + queue_work(cgroup_bpf_destroy_wq, &cgrp->bpf.release_work); } /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: usb-audio: Add quirks for Dell WD19 dock" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4413665dd6c528b31284119e3571c25f371e1c36 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20Sch=C3=A4r?= <jan(a)jschaer.ch> Date: Tue, 29 Oct 2024 23:12:49 +0100 Subject: [PATCH] ALSA: usb-audio: Add quirks for Dell WD19 dock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The WD19 family of docks has the same audio chipset as the WD15. This change enables jack detection on the WD19. We don't need the dell_dock_mixer_init quirk for the WD19. It is only needed because of the dell_alc4020_map quirk for the WD15 in mixer_maps.c, which disables the volume controls. Even for the WD15, this quirk was apparently only needed when the dock firmware was not updated. Signed-off-by: Jan Schär <jan(a)jschaer.ch> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/usb/mixer_quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index 2a9594f34dac6..6456e87e2f397 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -4042,6 +4042,9 @@ int snd_usb_mixer_apply_create_quirk(struct usb_mixer_interface *mixer) break; err = dell_dock_mixer_init(mixer); break; + case USB_ID(0x0bda, 0x402e): /* Dell WD19 dock */ + err = dell_dock_mixer_create(mixer); + break; case USB_ID(0x2a39, 0x3fd2): /* RME ADI-2 Pro */ case USB_ID(0x2a39, 0x3fd3): /* RME ADI-2 DAC */ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: vdso: Prevent the compiler from inserting calls to memset()" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From bf40167d54d55d4b54d0103713d86a8638fb9290 Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 16 Oct 2024 10:36:24 +0200 Subject: [PATCH] riscv: vdso: Prevent the compiler from inserting calls to memset() The compiler is smart enough to insert a call to memset() in riscv_vdso_get_cpus(), which generates a dynamic relocation. So prevent this by using -fno-builtin option. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Reviewed-by: Guo Ren <guoren(a)kernel.org> Link: https://lore.kernel.org/r/20241016083625.136311-2-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/kernel/vdso/Makefile | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile index 960feb1526caa..3f1c4b2d0b064 100644 --- a/arch/riscv/kernel/vdso/Makefile +++ b/arch/riscv/kernel/vdso/Makefile @@ -18,6 +18,7 @@ obj-vdso = $(patsubst %, %.o, $(vdso-syms)) note.o ccflags-y := -fno-stack-protector ccflags-y += -DDISABLE_BRANCH_PROFILING +ccflags-y += -fno-builtin ifneq ($(c-gettimeofday-y),) CFLAGS_vgettimeofday.o += -fPIC -include $(c-gettimeofday-y) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1" failed to apply to v5.4-stable tree

by Sasha Levin

The patch below does not apply to the v5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From e49370d769e71456db3fbd982e95bab8c69f73e8 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:53 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e06a6fdc0bab7..571fa8a6c9e12 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11008,6 +11008,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1d05, 0x1409, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1d17, 0x3288, "Haier Boyue G42", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 0b04fbe886b4274c8e5855011233aaa69fec6e75 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:52 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7f4926194e50f..e06a6fdc0bab7 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10750,6 +10750,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x1404, "Clevo N150CU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x14a1, "Clevo L141MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x2624, "Clevo L240TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x28c1, "Clevo V370VND", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Limit internal Mic boost on Dell platform" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 78e7be018784934081afec77f96d49a2483f9188 Mon Sep 17 00:00:00 2001 From: Kailang Yang <kailang(a)realtek.com> Date: Fri, 18 Oct 2024 13:53:24 +0800 Subject: [PATCH] ALSA: hda/realtek: Limit internal Mic boost on Dell platform Dell want to limit internal Mic boost on all Dell platform. Signed-off-by: Kailang Yang <kailang(a)realtek.com> Cc: <stable(a)vger.kernel.org> Link: https://lore.kernel.org/561fc5f5eff04b6cbd79ed173cd1c1db@realtek.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 3567b14b52b7c..784ac058418fc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7521,6 +7521,7 @@ enum { ALC286_FIXUP_SONY_MIC_NO_PRESENCE, ALC269_FIXUP_PINCFG_NO_HP_TO_LINEOUT, ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, @@ -7555,6 +7556,7 @@ enum { ALC255_FIXUP_ACER_MIC_NO_PRESENCE, ALC255_FIXUP_ASUS_MIC_NO_PRESENCE, ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC255_FIXUP_DELL2_MIC_NO_PRESENCE, ALC255_FIXUP_HEADSET_MODE, ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC, @@ -8114,6 +8116,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE }, + [ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC269_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -8394,6 +8402,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC255_FIXUP_HEADSET_MODE }, + [ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC255_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -11076,6 +11090,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, .name = "dell-headset-dock"}, {.id = ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, .name = "dell-headset3"}, {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, .name = "dell-headset4"}, + {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, .name = "dell-headset4-quiet"}, {.id = ALC283_FIXUP_CHROME_BOOK, .name = "alc283-dac-wcaps"}, {.id = ALC283_FIXUP_SENSE_COMBO_JACK, .name = "alc283-sense-combo"}, {.id = ALC292_FIXUP_TPT440_DOCK, .name = "tpt440-dock"}, @@ -11630,16 +11645,16 @@ static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = { SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1b, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, {0x19, 0x40000000}, {0x1b, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, + SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm/page_alloc: let GFP_ATOMIC order-0 allocs access highatomic reserves" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 281dd25c1a018261a04d1b8bf41a0674000bfe38 Mon Sep 17 00:00:00 2001 From: Matt Fleming <mfleming(a)cloudflare.com> Date: Fri, 11 Oct 2024 13:07:37 +0100 Subject: [PATCH] mm/page_alloc: let GFP_ATOMIC order-0 allocs access highatomic reserves Under memory pressure it's possible for GFP_ATOMIC order-0 allocations to fail even though free pages are available in the highatomic reserves. GFP_ATOMIC allocations cannot trigger unreserve_highatomic_pageblock() since it's only run from reclaim. Given that such allocations will pass the watermarks in __zone_watermark_unusable_free(), it makes sense to fallback to highatomic reserves the same way that ALLOC_OOM can. This fixes order-0 page allocation failures observed on Cloudflare's fleet when handling network packets: kswapd1: page allocation failure: order:0, mode:0x820(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-7 CPU: 10 PID: 696 Comm: kswapd1 Kdump: loaded Tainted: G O 6.6.43-CUSTOM #1 Hardware name: MACHINE Call Trace: <IRQ> dump_stack_lvl+0x3c/0x50 warn_alloc+0x13a/0x1c0 __alloc_pages_slowpath.constprop.0+0xc9d/0xd10 __alloc_pages+0x327/0x340 __napi_alloc_skb+0x16d/0x1f0 bnxt_rx_page_skb+0x96/0x1b0 [bnxt_en] bnxt_rx_pkt+0x201/0x15e0 [bnxt_en] __bnxt_poll_work+0x156/0x2b0 [bnxt_en] bnxt_poll+0xd9/0x1c0 [bnxt_en] __napi_poll+0x2b/0x1b0 bpf_trampoline_6442524138+0x7d/0x1000 __napi_poll+0x5/0x1b0 net_rx_action+0x342/0x740 handle_softirqs+0xcf/0x2b0 irq_exit_rcu+0x6c/0x90 sysvec_apic_timer_interrupt+0x72/0x90 </IRQ> [mfleming(a)cloudflare.com: update comment] Link: https://lkml.kernel.org/r/20241015125158.3597702-1-matt@readmodwrite.com Link: https://lkml.kernel.org/r/20241011120737.3300370-1-matt@readmodwrite.com Link: https://lore.kernel.org/all/CAGis_TWzSu=P7QJmjD58WWiu3zjMTVKSzdOwWE8ORaGytz… Fixes: 1d91df85f399 ("mm/page_alloc: handle a missing case for memalloc_nocma_{save/restore} APIs") Signed-off-by: Matt Fleming <mfleming(a)cloudflare.com> Suggested-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 8afab64814dc4..94a2ffe280089 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2893,12 +2893,12 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, page = __rmqueue(zone, order, migratetype, alloc_flags); /* - * If the allocation fails, allow OOM handling access - * to HIGHATOMIC reserves as failing now is worse than - * failing a high-order atomic allocation in the - * future. + * If the allocation fails, allow OOM handling and + * order-0 (atomic) allocs access to HIGHATOMIC + * reserves as failing now is worse than failing a + * high-order atomic allocation in the future. */ - if (!page && (alloc_flags & ALLOC_OOM)) + if (!page && (alloc_flags & (ALLOC_OOM|ALLOC_NON_BLOCK))) page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC); if (!page) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 117932eea99b729ee5d12783601a4f7f5fd58a23 Mon Sep 17 00:00:00 2001 From: Chen Ridong <chenridong(a)huawei.com> Date: Tue, 8 Oct 2024 11:24:56 +0000 Subject: [PATCH] cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction A hung_task problem shown below was found: INFO: task kworker/0:0:8 blocked for more than 327 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Workqueue: events cgroup_bpf_release Call Trace: <TASK> __schedule+0x5a2/0x2050 ? find_held_lock+0x33/0x100 ? wq_worker_sleeping+0x9e/0xe0 schedule+0x9f/0x180 schedule_preempt_disabled+0x25/0x50 __mutex_lock+0x512/0x740 ? cgroup_bpf_release+0x1e/0x4d0 ? cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? cgroup_bpf_release+0x1e/0x4d0 ? mutex_lock_nested+0x2b/0x40 ? __pfx_delay_tsc+0x10/0x10 mutex_lock_nested+0x2b/0x40 cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0 ? process_scheduled_works+0x161/0x8a0 process_scheduled_works+0x23a/0x8a0 worker_thread+0x231/0x5b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x14d/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x59/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This issue can be reproduced by the following pressuse test: 1. A large number of cpuset cgroups are deleted. 2. Set cpu on and off repeatly. 3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature. The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks, which may lead to deadlock. It can lead to a deadlock through the following steps: 1. A large number of cpusets are deleted asynchronously, which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are cgroup_bpf_release works, and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram, there are 256 (in the acvtive queue) + n (in the inactive queue) works. 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release), so it will be blocked for a while. 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2. 4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn, it will call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read. However, cpuset_css_offline will be blocked for step 3. 5. At this moment, there are 256 works in active queue that are cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as a result, all of them are blocked. Consequently, sscs.work can not be executed. Ultimately, this situation leads to four processes being blocked, forming a deadlock. system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) ... 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) ...blocking... css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) ...blocking... 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking... To fix the problem, place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn't create a large number of concurrent work items. If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work items, it should use its own dedicated workqueue. Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Cc: stable(a)vger.kernel.org # v5.3+ Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei… Tested-by: Vishal Chourasia <vishalc(a)linux.ibm.com> Signed-off-by: Chen Ridong <chenridong(a)huawei.com> Signed-off-by: Tejun Heo <tj(a)kernel.org> --- kernel/bpf/cgroup.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index e7113d700b878..025d7e2214aeb 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -24,6 +24,23 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE); EXPORT_SYMBOL(cgroup_bpf_enabled_key); +/* + * cgroup bpf destruction makes heavy use of work items and there can be a lot + * of concurrent destructions. Use a separate workqueue so that cgroup bpf + * destruction work items don't end up filling up max_active of system_wq + * which may lead to deadlock. + */ +static struct workqueue_struct *cgroup_bpf_destroy_wq; + +static int __init cgroup_bpf_wq_init(void) +{ + cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy", 0, 1); + if (!cgroup_bpf_destroy_wq) + panic("Failed to alloc workqueue for cgroup bpf destroy.\n"); + return 0; +} +core_initcall(cgroup_bpf_wq_init); + /* __always_inline is necessary to prevent indirect call through run_prog * function pointer. */ @@ -334,7 +351,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref) struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt); INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release); - queue_work(system_wq, &cgrp->bpf.release_work); + queue_work(cgroup_bpf_destroy_wq, &cgrp->bpf.release_work); } /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: usb-audio: Add quirks for Dell WD19 dock" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4413665dd6c528b31284119e3571c25f371e1c36 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20Sch=C3=A4r?= <jan(a)jschaer.ch> Date: Tue, 29 Oct 2024 23:12:49 +0100 Subject: [PATCH] ALSA: usb-audio: Add quirks for Dell WD19 dock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The WD19 family of docks has the same audio chipset as the WD15. This change enables jack detection on the WD19. We don't need the dell_dock_mixer_init quirk for the WD19. It is only needed because of the dell_alc4020_map quirk for the WD15 in mixer_maps.c, which disables the volume controls. Even for the WD15, this quirk was apparently only needed when the dock firmware was not updated. Signed-off-by: Jan Schär <jan(a)jschaer.ch> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/usb/mixer_quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index 2a9594f34dac6..6456e87e2f397 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -4042,6 +4042,9 @@ int snd_usb_mixer_apply_create_quirk(struct usb_mixer_interface *mixer) break; err = dell_dock_mixer_init(mixer); break; + case USB_ID(0x0bda, 0x402e): /* Dell WD19 dock */ + err = dell_dock_mixer_create(mixer); + break; case USB_ID(0x2a39, 0x3fd2): /* RME ADI-2 Pro */ case USB_ID(0x2a39, 0x3fd3): /* RME ADI-2 DAC */ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1" failed to apply to v5.10-stable tree

by Sasha Levin

The patch below does not apply to the v5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From e49370d769e71456db3fbd982e95bab8c69f73e8 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:53 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e06a6fdc0bab7..571fa8a6c9e12 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11008,6 +11008,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1d05, 0x1409, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1d17, 0x3288, "Haier Boyue G42", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 0b04fbe886b4274c8e5855011233aaa69fec6e75 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:52 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Gemini 17 Gen3 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-1-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7f4926194e50f..e06a6fdc0bab7 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -10750,6 +10750,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1558, 0x1404, "Clevo N150CU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x14a1, "Clevo L141MU", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x2624, "Clevo L240TU", ALC256_FIXUP_SYSTEM76_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x1558, 0x28c1, "Clevo V370VND", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1558, 0x4018, "Clevo NV40M[BE]", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4019, "Clevo NV40MZ", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1558, 0x4020, "Clevo NV40MB", ALC293_FIXUP_SYSTEM76_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "vmscan,migrate: fix page count imbalance on node stats when demoting pages" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 35e41024c4c2b02ef8207f61b9004f6956cf037b Mon Sep 17 00:00:00 2001 From: Gregory Price <gourry(a)gourry.net> Date: Fri, 25 Oct 2024 10:17:24 -0400 Subject: [PATCH] vmscan,migrate: fix page count imbalance on node stats when demoting pages When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. migrate_pages will decrement the the node's isolated page count, leading to an imbalanced count when invoked from (MG)LRU code. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The following path produces the decrement: shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry(a)gourry.net> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net> Reviewed-by: Shakeel Butt <shakeel.butt(a)linux.dev> Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 7e520562d421a..fab84a7760889 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1178,7 +1178,7 @@ static void migrate_folio_done(struct folio *src, * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ - if (likely(!__folio_test_movable(src))) + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Limit internal Mic boost on Dell platform" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 78e7be018784934081afec77f96d49a2483f9188 Mon Sep 17 00:00:00 2001 From: Kailang Yang <kailang(a)realtek.com> Date: Fri, 18 Oct 2024 13:53:24 +0800 Subject: [PATCH] ALSA: hda/realtek: Limit internal Mic boost on Dell platform Dell want to limit internal Mic boost on all Dell platform. Signed-off-by: Kailang Yang <kailang(a)realtek.com> Cc: <stable(a)vger.kernel.org> Link: https://lore.kernel.org/561fc5f5eff04b6cbd79ed173cd1c1db@realtek.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 3567b14b52b7c..784ac058418fc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7521,6 +7521,7 @@ enum { ALC286_FIXUP_SONY_MIC_NO_PRESENCE, ALC269_FIXUP_PINCFG_NO_HP_TO_LINEOUT, ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, @@ -7555,6 +7556,7 @@ enum { ALC255_FIXUP_ACER_MIC_NO_PRESENCE, ALC255_FIXUP_ASUS_MIC_NO_PRESENCE, ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC255_FIXUP_DELL2_MIC_NO_PRESENCE, ALC255_FIXUP_HEADSET_MODE, ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC, @@ -8114,6 +8116,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE }, + [ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC269_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -8394,6 +8402,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC255_FIXUP_HEADSET_MODE }, + [ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC255_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -11076,6 +11090,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, .name = "dell-headset-dock"}, {.id = ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, .name = "dell-headset3"}, {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, .name = "dell-headset4"}, + {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, .name = "dell-headset4-quiet"}, {.id = ALC283_FIXUP_CHROME_BOOK, .name = "alc283-dac-wcaps"}, {.id = ALC283_FIXUP_SENSE_COMBO_JACK, .name = "alc283-sense-combo"}, {.id = ALC292_FIXUP_TPT440_DOCK, .name = "tpt440-dock"}, @@ -11630,16 +11645,16 @@ static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = { SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1b, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, {0x19, 0x40000000}, {0x1b, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, + SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 117932eea99b729ee5d12783601a4f7f5fd58a23 Mon Sep 17 00:00:00 2001 From: Chen Ridong <chenridong(a)huawei.com> Date: Tue, 8 Oct 2024 11:24:56 +0000 Subject: [PATCH] cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction A hung_task problem shown below was found: INFO: task kworker/0:0:8 blocked for more than 327 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Workqueue: events cgroup_bpf_release Call Trace: <TASK> __schedule+0x5a2/0x2050 ? find_held_lock+0x33/0x100 ? wq_worker_sleeping+0x9e/0xe0 schedule+0x9f/0x180 schedule_preempt_disabled+0x25/0x50 __mutex_lock+0x512/0x740 ? cgroup_bpf_release+0x1e/0x4d0 ? cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? cgroup_bpf_release+0x1e/0x4d0 ? mutex_lock_nested+0x2b/0x40 ? __pfx_delay_tsc+0x10/0x10 mutex_lock_nested+0x2b/0x40 cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0 ? process_scheduled_works+0x161/0x8a0 process_scheduled_works+0x23a/0x8a0 worker_thread+0x231/0x5b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x14d/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x59/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This issue can be reproduced by the following pressuse test: 1. A large number of cpuset cgroups are deleted. 2. Set cpu on and off repeatly. 3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature. The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks, which may lead to deadlock. It can lead to a deadlock through the following steps: 1. A large number of cpusets are deleted asynchronously, which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are cgroup_bpf_release works, and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram, there are 256 (in the acvtive queue) + n (in the inactive queue) works. 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release), so it will be blocked for a while. 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2. 4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn, it will call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read. However, cpuset_css_offline will be blocked for step 3. 5. At this moment, there are 256 works in active queue that are cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as a result, all of them are blocked. Consequently, sscs.work can not be executed. Ultimately, this situation leads to four processes being blocked, forming a deadlock. system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) ... 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) ...blocking... css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) ...blocking... 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking... To fix the problem, place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn't create a large number of concurrent work items. If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work items, it should use its own dedicated workqueue. Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Cc: stable(a)vger.kernel.org # v5.3+ Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei… Tested-by: Vishal Chourasia <vishalc(a)linux.ibm.com> Signed-off-by: Chen Ridong <chenridong(a)huawei.com> Signed-off-by: Tejun Heo <tj(a)kernel.org> --- kernel/bpf/cgroup.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index e7113d700b878..025d7e2214aeb 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -24,6 +24,23 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE); EXPORT_SYMBOL(cgroup_bpf_enabled_key); +/* + * cgroup bpf destruction makes heavy use of work items and there can be a lot + * of concurrent destructions. Use a separate workqueue so that cgroup bpf + * destruction work items don't end up filling up max_active of system_wq + * which may lead to deadlock. + */ +static struct workqueue_struct *cgroup_bpf_destroy_wq; + +static int __init cgroup_bpf_wq_init(void) +{ + cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy", 0, 1); + if (!cgroup_bpf_destroy_wq) + panic("Failed to alloc workqueue for cgroup bpf destroy.\n"); + return 0; +} +core_initcall(cgroup_bpf_wq_init); + /* __always_inline is necessary to prevent indirect call through run_prog * function pointer. */ @@ -334,7 +351,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref) struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt); INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release); - queue_work(system_wq, &cgrp->bpf.release_work); + queue_work(cgroup_bpf_destroy_wq, &cgrp->bpf.release_work); } /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: usb-audio: Add quirks for Dell WD19 dock" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4413665dd6c528b31284119e3571c25f371e1c36 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20Sch=C3=A4r?= <jan(a)jschaer.ch> Date: Tue, 29 Oct 2024 23:12:49 +0100 Subject: [PATCH] ALSA: usb-audio: Add quirks for Dell WD19 dock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The WD19 family of docks has the same audio chipset as the WD15. This change enables jack detection on the WD19. We don't need the dell_dock_mixer_init quirk for the WD19. It is only needed because of the dell_alc4020_map quirk for the WD15 in mixer_maps.c, which disables the volume controls. Even for the WD15, this quirk was apparently only needed when the dock firmware was not updated. Signed-off-by: Jan Schär <jan(a)jschaer.ch> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/usb/mixer_quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index 2a9594f34dac6..6456e87e2f397 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -4042,6 +4042,9 @@ int snd_usb_mixer_apply_create_quirk(struct usb_mixer_interface *mixer) break; err = dell_dock_mixer_init(mixer); break; + case USB_ID(0x0bda, 0x402e): /* Dell WD19 dock */ + err = dell_dock_mixer_create(mixer); + break; case USB_ID(0x2a39, 0x3fd2): /* RME ADI-2 Pro */ case USB_ID(0x2a39, 0x3fd3): /* RME ADI-2 DAC */ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a74..ddcbd80a49fb2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From e49370d769e71456db3fbd982e95bab8c69f73e8 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:53 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e06a6fdc0bab7..571fa8a6c9e12 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11008,6 +11008,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1d05, 0x1409, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1d17, 0x3288, "Haier Boyue G42", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 5 ++- mm/rmap.c | 9 ++--- mm/vmscan.c | 88 +++++++++++++++++++++++------------------- 3 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab6..5b1c984daf454 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d496..73d5998677d40 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b3601..ddaaff67642e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /****************************************************************************** -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "x86/traps: move kmsan check after instrumentation_begin" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1db272864ff250b5e607283eaec819e1186c8e26 Mon Sep 17 00:00:00 2001 From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com> Date: Wed, 16 Oct 2024 20:24:07 +0500 Subject: [PATCH] x86/traps: move kmsan check after instrumentation_begin During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following: AR built-in.a AR vmlinux.a LD vmlinux.o vmlinux.o: warning: objtool: handle_bug+0x4: call to kmsan_unpoison_entry_regs() leaves .noinstr.text section OBJCOPY modules.builtin.modinfo GEN modules.builtin MODPOST Module.symvers CC .vmlinux.export.o Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes the warning. There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but it has the return condition and if we include it after instrumentation_begin() it results the warning "return with instrumentation enabled", hence, I'm concerned that regs will not be KMSAN unpoisoned if `ud_type == BUG_NONE` is true. Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()") Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com> Reviewed-by: Alexander Potapenko <glider(a)google.com> Cc: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/kernel/traps.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d05392db5d0fe..2dbadf347b5f4 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -261,12 +261,6 @@ static noinstr bool handle_bug(struct pt_regs *regs) int ud_type; u32 imm; - /* - * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug() - * is a rare case that uses @regs without passing them to - * irqentry_enter(). - */ - kmsan_unpoison_entry_regs(regs); ud_type = decode_bug(regs->ip, &imm); if (ud_type == BUG_NONE) return handled; @@ -275,6 +269,12 @@ static noinstr bool handle_bug(struct pt_regs *regs) * All lies, just get the WARN/BUG out. */ instrumentation_begin(); + /* + * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug() + * is a rare case that uses @regs without passing them to + * irqentry_enter(). + */ + kmsan_unpoison_entry_regs(regs); /* * Since we're emulating a CALL with exceptions, restore the interrupt * state to what it was at the exception site. -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/acpi: Ensure ports ready at cxl_acpi_probe() return" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 48f62d38a07d464a499fa834638afcfd2b68f852 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Tue, 22 Oct 2024 18:43:40 -0700 Subject: [PATCH] cxl/acpi: Ensure ports ready at cxl_acpi_probe() return In order to ensure root CXL ports are enabled upon cxl_acpi_probe() when the 'cxl_port' driver is built as a module, arrange for the module to be pre-loaded or built-in. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Tested-by: Gregory Price <gourry(a)gourry.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Link: https://patch.msgid.link/172964781969.81806.17276352414854540808.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/cxl/acpi.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 82b78e331d8ed..432b7cfd12a8e 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void) /* load before dax_hmem sees 'Soft Reserved' CXL ranges */ subsys_initcall(cxl_acpi_init); + +/* + * Arrange for host-bridge ports to be active synchronous with + * cxl_acpi_probe() exit. + */ +MODULE_SOFTDEP("pre: cxl_port"); + module_exit(cxl_acpi_exit); MODULE_DESCRIPTION("CXL ACPI: Platform Support"); MODULE_LICENSE("GPL v2"); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix defrag not merging contiguous extents due to merged extent maps" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 77b0d113eec49a7390ff1a08ca1923e89f5f86c6 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Tue, 29 Oct 2024 15:18:45 +0000 Subject: [PATCH] btrfs: fix defrag not merging contiguous extents due to merged extent maps When running defrag (manual defrag) against a file that has extents that are contiguous and we already have the respective extent maps loaded and merged, we end up not defragging the range covered by those contiguous extents. This happens when we have an extent map that was the result of merging multiple extent maps for contiguous extents and the length of the merged extent map is greater than or equals to the defrag threshold length. The script below reproduces this scenario: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a 256K file with 4 extents of 64K each. xfs_io -f -c "falloc 0 64K" \ -c "pwrite 0 64K" \ -c "falloc 64K 64K" \ -c "pwrite 64K 64K" \ -c "falloc 128K 64K" \ -c "pwrite 128K 64K" \ -c "falloc 192K 64K" \ -c "pwrite 192K 64K" \ $MNT/foo umount $MNT echo -n "Initial number of file extent items: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 128K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 128K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 256K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 256K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l Running it: $ ./test.sh Initial number of file extent items: 4 Number of file extent items after defrag with 128K threshold: 4 Number of file extent items after defrag with 256K threshold: 4 The 4 extents don't get merged because we have an extent map with a size of 256K that is the result of merging the individual extent maps for each of the four 64K extents and at defrag_lookup_extent() we have a value of zero for the generation threshold ('newer_than' argument) since this is a manual defrag. As a consequence we don't call defrag_get_extent() to get an extent map representing a single file extent item in the inode's subvolume tree, so we end up using the merged extent map at defrag_collect_targets() and decide not to defrag. Fix this by updating defrag_lookup_extent() to always discard extent maps that were merged and call defrag_get_extent() regardless of the minimum generation threshold ('newer_than' argument). A test case for fstests will be sent along soon. CC: stable(a)vger.kernel.org # 6.1+ Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/defrag.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index b95ef44c326bd..968dae9539482 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -763,12 +763,12 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start, * We can get a merged extent, in that case, we need to re-search * tree to get the original em for defrag. * - * If @newer_than is 0 or em::generation < newer_than, we can trust - * this em, as either we don't care about the generation, or the - * merged extent map will be rejected anyway. + * This is because even if we have adjacent extents that are contiguous + * and compatible (same type and flags), we still want to defrag them + * so that we use less metadata (extent items in the extent tree and + * file extent items in the inode's subvolume tree). */ - if (em && (em->flags & EXTENT_FLAG_MERGED) && - newer_than && em->generation >= newer_than) { + if (em && (em->flags & EXTENT_FLAG_MERGED)) { free_extent_map(em); em = NULL; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/port: Fix use-after-free, permit out-of-order decoder shutdown" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 101c268bd2f37e965a5468353e62d154db38838e Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Tue, 22 Oct 2024 18:43:49 -0700 Subject: [PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable(a)vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Cc: Zijun Hu <quic_zijuhu(a)quicinc.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/base/core.c | 35 +++++++++++++++++++++++++ drivers/cxl/core/hdm.c | 50 ++++++++++++++++++++++++++++++------ drivers/cxl/core/region.c | 48 ++++++++++------------------------ drivers/cxl/cxl.h | 3 ++- include/linux/device.h | 3 +++ tools/testing/cxl/test/cxl.c | 14 ++++------ 6 files changed, 100 insertions(+), 53 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index a4c853411a6b2..e42f1ad730780 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child_reverse); +/** + * device_for_each_child_reverse_from - device child iterator in reversed order. + * @parent: parent struct device. + * @from: optional starting point in child list + * @fn: function to be called for each device. + * @data: data for the callback. + * + * Iterate over @parent's child devices, starting at @from, and call @fn + * for each, passing it @data. This helper is identical to + * device_for_each_child_reverse() when @from is NULL. + * + * @fn is checked each iteration. If it returns anything other than 0, + * iteration stop and that value is returned to the caller of + * device_for_each_child_reverse_from(); + */ +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)) +{ + struct klist_iter i; + struct device *child; + int error = 0; + + if (!parent->p) + return 0; + + klist_iter_init_node(&parent->p->klist_children, &i, + (from ? &from->p->knode_parent : NULL)); + while ((child = prev_device(&i)) && !error) + error = fn(child, data); + klist_iter_exit(&i); + return error; +} +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); + /** * device_find_child - device iterator for locating a particular device. * @parent: parent struct device diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3df10517a3278..223c273c0cd17 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int cxl_decoder_reset(struct cxl_decoder *cxld) +static int commit_reap(struct device *dev, const void *data) +{ + struct cxl_port *port = to_cxl_port(dev->parent); + struct cxl_decoder *cxld; + + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev)) + return 0; + + cxld = to_cxl_decoder(dev); + if (port->commit_end == cxld->id && + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", + dev_name(&cxld->dev), port->commit_end); + } + + return 0; +} + +void cxl_port_commit_reap(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + + lockdep_assert_held_write(&cxl_region_rwsem); + + /* + * Once the highest committed decoder is disabled, free any other + * decoders that were pinned allocated by out-of-order release. + */ + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev), + port->commit_end); + device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL, + commit_reap); +} +EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL); + +static void cxl_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); @@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) u32 ctrl; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } down_read(&cxl_dpa_rwsem); ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); @@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); up_read(&cxl_dpa_rwsem); - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; /* Userspace is now responsible for reconfiguring this decoder */ @@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) cxled = to_cxl_endpoint_decoder(&cxld->dev); cxled->state = CXL_DECODER_STATE_MANUAL; } - - return 0; } static int cxl_setup_hdm_decoder_from_dvsec( diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e701e4b040328..3478d20583035 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); return 0; } else { - dev_err(&cxlr->dev, - "Failed to synchronize CPU cache state\n"); + dev_WARN(&cxlr->dev, + "Failed to synchronize CPU cache state\n"); return -ENXIO; } } @@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) return 0; } -static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) +static void cxl_region_decode_reset(struct cxl_region *cxlr, int count) { struct cxl_region_params *p = &cxlr->params; - int i, rc = 0; + int i; /* - * Before region teardown attempt to flush, and if the flush - * fails cancel the region teardown for data consistency - * concerns + * Before region teardown attempt to flush, evict any data cached for + * this region, or scream loudly about missing arch / platform support + * for CXL teardown. */ - rc = cxl_region_invalidate_memregion(cxlr); - if (rc) - return rc; + cxl_region_invalidate_memregion(cxlr); for (i = count - 1; i >= 0; i--) { struct cxl_endpoint_decoder *cxled = p->targets[i]; @@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) cxl_rr = cxl_rr_load(iter, cxlr); cxld = cxl_rr->decoder; if (cxld->reset) - rc = cxld->reset(cxld); - if (rc) - return rc; + cxld->reset(cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } endpoint_reset: - rc = cxled->cxld.reset(&cxled->cxld); - if (rc) - return rc; + cxled->cxld.reset(&cxled->cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } /* all decoders associated with this region have been torn down */ clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); - - return 0; } static int commit_decoder(struct cxl_decoder *cxld) @@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, * still pending. */ if (p->state == CXL_CONFIG_RESET_PENDING) { - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - /* - * Revert to committed since there may still be active - * decoders associated with this region, or move forward - * to active to mark the reset successful - */ - if (rc) - p->state = CXL_CONFIG_COMMIT; - else - p->state = CXL_CONFIG_ACTIVE; + cxl_region_decode_reset(cxlr, p->interleave_ways); + p->state = CXL_CONFIG_ACTIVE; } } @@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) get_device(&cxlr->dev); if (p->state > CXL_CONFIG_ACTIVE) { - /* - * TODO: tear down all impacted regions if a device is - * removed out of order - */ - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - if (rc) - goto out; + cxl_region_decode_reset(cxlr, p->interleave_ways); p->state = CXL_CONFIG_ACTIVE; } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0d8b810a51f04..5406e3ab3d4a4 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -359,7 +359,7 @@ struct cxl_decoder { struct cxl_region *region; unsigned long flags; int (*commit)(struct cxl_decoder *cxld); - int (*reset)(struct cxl_decoder *cxld); + void (*reset)(struct cxl_decoder *cxld); }; /* @@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port) int cxl_num_decoders_committed(struct cxl_port *port); bool is_cxl_port(const struct device *dev); struct cxl_port *to_cxl_port(const struct device *dev); +void cxl_port_commit_reap(struct cxl_decoder *cxld); struct pci_bus; int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, struct pci_bus *bus); diff --git a/include/linux/device.h b/include/linux/device.h index b4bde8d226979..667cb6db90193 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); int device_for_each_child_reverse(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)); struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent, diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 90d5afd52dd06..c5bbd89b31920 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int mock_decoder_reset(struct cxl_decoder *cxld) +static void mock_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); int id = cxld->id; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } - - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; - - return 0; } static void default_mock_decoder(struct cxl_decoder *cxld) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: avoid unconditional one-tick sleep when swapcache_prepare fails" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <v-songbaohua(a)oppo.com> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbbd..bdf77a3ec47bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/port: Fix CXL port initialization order when the subsystem is built-in" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/cxl/Kconfig | 1 + drivers/cxl/Makefile | 20 ++++++++++++++------ drivers/cxl/port.c | 17 ++++++++++++++++- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082c..876469e23f7a7 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52e..2caa90fa4bf25 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768fe..9dc394295e1fc 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a74..ddcbd80a49fb2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix extent map merging not happening for adjacent extents" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From a0f0625390858321525c2a8d04e174a546bd19b3 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Mon, 28 Oct 2024 16:23:00 +0000 Subject: [PATCH] btrfs: fix extent map merging not happening for adjacent extents If we have 3 or more adjacent extents in a file, that is, consecutive file extent items pointing to adjacent extents, within a contiguous file range and compatible flags, we end up not merging all the extents into a single extent map. For example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \ -c "pwrite -b 64K 64K 64K" \ -c "pwrite -b 64K 128K 64K" \ -c "pwrite -b 64K 192K 64K" \ /mnt/sdc/foo After all the ordered extents complete we unpin the extent maps and try to merge them, but instead of getting a single extent map we get two because: 1) When the first ordered extent completes (file range [0, 64K)) we unpin its extent map and attempt to merge it with the extent map for the range [64K, 128K), but we can't because that extent map is still pinned; 2) When the second ordered extent completes (file range [64K, 128K)), we unpin its extent map and merge it with the previous extent map, for file range [0, 64K), but we can't merge with the next extent map, for the file range [128K, 192K), because this one is still pinned. The merged extent map for the file range [0, 128K) gets the flag EXTENT_MAP_MERGED set; 3) When the third ordered extent completes (file range [128K, 192K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [0, 128K), but we can't because that extent map has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false due to different flags) while the extent map for the range [128K, 192K) doesn't have that flag set. We also can't merge it with the next extent map, for file range [192K, 256K), because that one is still pinned. At this moment we have 3 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 192K). One for file range [192K, 256K) which is still pinned; 4) When the fourth and final extent completes (file range [192K, 256K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [128K, 192K), which succeeds since none of these extent maps have the EXTENT_MAP_MERGED flag set. So we end up with 2 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set. Since after merging extent maps we don't attempt to merge again, that is, merge the resulting extent map with the one that is now preceding it (and the one following it), we end up with those two extent maps, when we could have had a single extent map to represent the whole file. Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag. While this doesn't present any functional issue, it prevents the merging of extent maps which allows to save memory, and can make defrag not merging extents too (that will be addressed in the next patch). Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/extent_map.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 668c617444a50..1d93e1202c339 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -230,7 +230,12 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma if (extent_map_end(prev) != next->start) return false; - if (prev->flags != next->flags) + /* + * The merged flag is not an on-disk flag, it just indicates we had the + * extent maps of 2 (or more) adjacent extents merged, so factor it out. + */ + if ((prev->flags & ~EXTENT_FLAG_MERGED) != + (next->flags & ~EXTENT_FLAG_MERGED)) return false; if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 2 -- mm/vmscan.c | 14 +++++--------- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a28355..9342e5692dab6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c5071..4f1d33e4b3601 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 5 ++- mm/rmap.c | 9 ++--- mm/vmscan.c | 88 +++++++++++++++++++++++------------------- 3 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab6..5b1c984daf454 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d496..73d5998677d40 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b3601..ddaaff67642e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /****************************************************************************** -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "drm/amd/pm: Vangogh: Fix kernel memory out of bounds write" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 Mon Sep 17 00:00:00 2001 From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Date: Fri, 25 Oct 2024 15:56:39 +0100 Subject: [PATCH] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Evan Quan <evan.quan(a)amd.com> Cc: Wenyou Yang <WenYou.Yang(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com> Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703) Cc: stable(a)vger.kernel.org # v6.6+ --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 22737b11b1bfb..1fe020f1f4dbe 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -242,7 +242,9 @@ static int vangogh_tables_init(struct smu_context *smu) goto err0_out; smu_table->metrics_time = 0; - smu_table->gpu_metrics_table_size = max(sizeof(struct gpu_metrics_v2_3), sizeof(struct gpu_metrics_v2_2)); + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3)); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4)); smu_table->gpu_metrics_table = kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); if (!smu_table->gpu_metrics_table) goto err1_out; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: allow set/clear page_type again" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/page-flags.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a767104878..cc839e4365c18 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix defrag not merging contiguous extents due to merged extent maps" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 77b0d113eec49a7390ff1a08ca1923e89f5f86c6 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Tue, 29 Oct 2024 15:18:45 +0000 Subject: [PATCH] btrfs: fix defrag not merging contiguous extents due to merged extent maps When running defrag (manual defrag) against a file that has extents that are contiguous and we already have the respective extent maps loaded and merged, we end up not defragging the range covered by those contiguous extents. This happens when we have an extent map that was the result of merging multiple extent maps for contiguous extents and the length of the merged extent map is greater than or equals to the defrag threshold length. The script below reproduces this scenario: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a 256K file with 4 extents of 64K each. xfs_io -f -c "falloc 0 64K" \ -c "pwrite 0 64K" \ -c "falloc 64K 64K" \ -c "pwrite 64K 64K" \ -c "falloc 128K 64K" \ -c "pwrite 128K 64K" \ -c "falloc 192K 64K" \ -c "pwrite 192K 64K" \ $MNT/foo umount $MNT echo -n "Initial number of file extent items: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 128K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 128K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 256K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 256K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l Running it: $ ./test.sh Initial number of file extent items: 4 Number of file extent items after defrag with 128K threshold: 4 Number of file extent items after defrag with 256K threshold: 4 The 4 extents don't get merged because we have an extent map with a size of 256K that is the result of merging the individual extent maps for each of the four 64K extents and at defrag_lookup_extent() we have a value of zero for the generation threshold ('newer_than' argument) since this is a manual defrag. As a consequence we don't call defrag_get_extent() to get an extent map representing a single file extent item in the inode's subvolume tree, so we end up using the merged extent map at defrag_collect_targets() and decide not to defrag. Fix this by updating defrag_lookup_extent() to always discard extent maps that were merged and call defrag_get_extent() regardless of the minimum generation threshold ('newer_than' argument). A test case for fstests will be sent along soon. CC: stable(a)vger.kernel.org # 6.1+ Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/defrag.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index b95ef44c326bd..968dae9539482 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -763,12 +763,12 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start, * We can get a merged extent, in that case, we need to re-search * tree to get the original em for defrag. * - * If @newer_than is 0 or em::generation < newer_than, we can trust - * this em, as either we don't care about the generation, or the - * merged extent map will be rejected anyway. + * This is because even if we have adjacent extents that are contiguous + * and compatible (same type and flags), we still want to defrag them + * so that we use less metadata (extent items in the extent tree and + * file extent items in the inode's subvolume tree). */ - if (em && (em->flags & EXTENT_FLAG_MERGED) && - newer_than && em->generation >= newer_than) { + if (em && (em->flags & EXTENT_FLAG_MERGED)) { free_extent_map(em); em = NULL; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: avoid unconditional one-tick sleep when swapcache_prepare fails" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <v-songbaohua(a)oppo.com> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbbd..bdf77a3ec47bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/port: Fix CXL port initialization order when the subsystem is built-in" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/cxl/Kconfig | 1 + drivers/cxl/Makefile | 20 ++++++++++++++------ drivers/cxl/port.c | 17 ++++++++++++++++- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082c..876469e23f7a7 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52e..2caa90fa4bf25 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768fe..9dc394295e1fc 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a74..ddcbd80a49fb2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix extent map merging not happening for adjacent extents" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From a0f0625390858321525c2a8d04e174a546bd19b3 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Mon, 28 Oct 2024 16:23:00 +0000 Subject: [PATCH] btrfs: fix extent map merging not happening for adjacent extents If we have 3 or more adjacent extents in a file, that is, consecutive file extent items pointing to adjacent extents, within a contiguous file range and compatible flags, we end up not merging all the extents into a single extent map. For example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \ -c "pwrite -b 64K 64K 64K" \ -c "pwrite -b 64K 128K 64K" \ -c "pwrite -b 64K 192K 64K" \ /mnt/sdc/foo After all the ordered extents complete we unpin the extent maps and try to merge them, but instead of getting a single extent map we get two because: 1) When the first ordered extent completes (file range [0, 64K)) we unpin its extent map and attempt to merge it with the extent map for the range [64K, 128K), but we can't because that extent map is still pinned; 2) When the second ordered extent completes (file range [64K, 128K)), we unpin its extent map and merge it with the previous extent map, for file range [0, 64K), but we can't merge with the next extent map, for the file range [128K, 192K), because this one is still pinned. The merged extent map for the file range [0, 128K) gets the flag EXTENT_MAP_MERGED set; 3) When the third ordered extent completes (file range [128K, 192K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [0, 128K), but we can't because that extent map has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false due to different flags) while the extent map for the range [128K, 192K) doesn't have that flag set. We also can't merge it with the next extent map, for file range [192K, 256K), because that one is still pinned. At this moment we have 3 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 192K). One for file range [192K, 256K) which is still pinned; 4) When the fourth and final extent completes (file range [192K, 256K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [128K, 192K), which succeeds since none of these extent maps have the EXTENT_MAP_MERGED flag set. So we end up with 2 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set. Since after merging extent maps we don't attempt to merge again, that is, merge the resulting extent map with the one that is now preceding it (and the one following it), we end up with those two extent maps, when we could have had a single extent map to represent the whole file. Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag. While this doesn't present any functional issue, it prevents the merging of extent maps which allows to save memory, and can make defrag not merging extents too (that will be addressed in the next patch). Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/extent_map.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 668c617444a50..1d93e1202c339 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -230,7 +230,12 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma if (extent_map_end(prev) != next->start) return false; - if (prev->flags != next->flags) + /* + * The merged flag is not an on-disk flag, it just indicates we had the + * extent maps of 2 (or more) adjacent extents merged, so factor it out. + */ + if ((prev->flags & ~EXTENT_FLAG_MERGED) != + (next->flags & ~EXTENT_FLAG_MERGED)) return false; if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix error propagation of split bios" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d48e1dea3931de64c26717adc2b89743c7ab6594 Mon Sep 17 00:00:00 2001 From: Naohiro Aota <naohiro.aota(a)wdc.com> Date: Wed, 9 Oct 2024 22:52:06 +0900 Subject: [PATCH] btrfs: fix error propagation of split bios The purpose of btrfs_bbio_propagate_error() shall be propagating an error of split bio to its original btrfs_bio, and tell the error to the upper layer. However, it's not working well on some cases. * Case 1. Immediate (or quick) end_bio with an error When btrfs sends btrfs_bio to mirrored devices, btrfs calls btrfs_bio_end_io() when all the mirroring bios are completed. If that btrfs_bio was split, it is from btrfs_clone_bioset and its end_io function is btrfs_orig_write_end_io. For this case, btrfs_bbio_propagate_error() accesses the orig_bbio's bio context to increase the error count. That works well in most cases. However, if the end_io is called enough fast, orig_bbio's (remaining part after split) bio context may not be properly set at that time. Since the bio context is set when the orig_bbio (the last btrfs_bio) is sent to devices, that might be too late for earlier split btrfs_bio's completion. That will result in NULL pointer dereference. That bug is easily reproducible by running btrfs/146 on zoned devices [1] and it shows the following trace. [1] You need raid-stripe-tree feature as it create "-d raid0 -m raid1" FS. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 UID: 0 PID: 13 Comm: kworker/u32:1 Not tainted 6.11.0-rc7-BTRFS-ZNS+ #474 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-btrfs-5) RIP: 0010:btrfs_bio_end_io+0xae/0xc0 [btrfs] BTRFS error (device dm-0): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 RSP: 0018:ffffc9000006f248 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888005a7f080 RCX: ffffc9000006f1dc RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff888005a7f080 RBP: ffff888011dfc540 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff82e508e0 R11: 0000000000000005 R12: ffff88800ddfbe58 R13: ffff888005a7f080 R14: ffff888005a7f158 R15: ffff888005a7f158 FS: 0000000000000000(0000) GS:ffff88803ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000002e22006 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body.cold+0x19/0x26 ? page_fault_oops+0x13e/0x2b0 ? _printk+0x58/0x73 ? do_user_addr_fault+0x5f/0x750 ? exc_page_fault+0x76/0x240 ? asm_exc_page_fault+0x22/0x30 ? btrfs_bio_end_io+0xae/0xc0 [btrfs] ? btrfs_log_dev_io_error+0x7f/0x90 [btrfs] btrfs_orig_write_end_io+0x51/0x90 [btrfs] dm_submit_bio+0x5c2/0xa50 [dm_mod] ? find_held_lock+0x2b/0x80 ? blk_try_enter_queue+0x90/0x1e0 __submit_bio+0xe0/0x130 ? ktime_get+0x10a/0x160 ? lockdep_hardirqs_on+0x74/0x100 submit_bio_noacct_nocheck+0x199/0x410 btrfs_submit_bio+0x7d/0x150 [btrfs] btrfs_submit_chunk+0x1a1/0x6d0 [btrfs] ? lockdep_hardirqs_on+0x74/0x100 ? __folio_start_writeback+0x10/0x2c0 btrfs_submit_bbio+0x1c/0x40 [btrfs] submit_one_bio+0x44/0x60 [btrfs] submit_extent_folio+0x13f/0x330 [btrfs] ? btrfs_set_range_writeback+0xa3/0xd0 [btrfs] extent_writepage_io+0x18b/0x360 [btrfs] extent_write_locked_range+0x17c/0x340 [btrfs] ? __pfx_end_bbio_data_write+0x10/0x10 [btrfs] run_delalloc_cow+0x71/0xd0 [btrfs] btrfs_run_delalloc_range+0x176/0x500 [btrfs] ? find_lock_delalloc_range+0x119/0x260 [btrfs] writepage_delalloc+0x2ab/0x480 [btrfs] extent_write_cache_pages+0x236/0x7d0 [btrfs] btrfs_writepages+0x72/0x130 [btrfs] do_writepages+0xd4/0x240 ? find_held_lock+0x2b/0x80 ? wbc_attach_and_unlock_inode+0x12c/0x290 ? wbc_attach_and_unlock_inode+0x12c/0x290 __writeback_single_inode+0x5c/0x4c0 ? do_raw_spin_unlock+0x49/0xb0 writeback_sb_inodes+0x22c/0x560 __writeback_inodes_wb+0x4c/0xe0 wb_writeback+0x1d6/0x3f0 wb_workfn+0x334/0x520 process_one_work+0x1ee/0x570 ? lock_is_held_type+0xc6/0x130 worker_thread+0x1d1/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xee/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: dm_mod btrfs blake2b_generic xor raid6_pq rapl CR2: 0000000000000020 * Case 2. Earlier completion of orig_bbio for mirrored btrfs_bios btrfs_bbio_propagate_error() assumes the end_io function for orig_bbio is called last among split bios. In that case, btrfs_orig_write_end_io() sets the bio->bi_status to BLK_STS_IOERR by seeing the bioc->error [2]. Otherwise, the increased orig_bio's bioc->error is not checked by anyone and return BLK_STS_OK to the upper layer. [2] Actually, this is not true. Because we only increases orig_bioc->errors by max_errors, the condition "atomic_read(&bioc->error) > bioc->max_errors" is still not met if only one split btrfs_bio fails. * Case 3. Later completion of orig_bbio for un-mirrored btrfs_bios In contrast to the above case, btrfs_bbio_propagate_error() is not working well if un-mirrored orig_bbio is completed last. It sets orig_bbio->bio.bi_status to the btrfs_bio's error. But, that is easily over-written by orig_bbio's completion status. If the status is BLK_STS_OK, the upper layer would not know the failure. * Solution Considering the above cases, we can only save the error status in the orig_bbio (remaining part after split) itself as it is always available. Also, the saved error status should be propagated when all the split btrfs_bios are finished (i.e, bbio->pending_ios == 0). This commit introduces "status" to btrfs_bbio and saves the first error of split bios to original btrfs_bio's "status" variable. When all the split bios are finished, the saved status is loaded into original btrfs_bio's status. With this commit, btrfs/146 on zoned devices does not hit the NULL pointer dereference anymore. Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios") CC: stable(a)vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/bio.c | 37 +++++++++++++------------------------ fs/btrfs/bio.h | 3 +++ 2 files changed, 16 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index ce13416bc10f0..f83ec5a1baa60 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -49,6 +49,7 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_fs_info *fs_info, bbio->end_io = end_io; bbio->private = private; atomic_set(&bbio->pending_ios, 1); + WRITE_ONCE(bbio->status, BLK_STS_OK); } /* @@ -120,41 +121,29 @@ static void __btrfs_bio_end_io(struct btrfs_bio *bbio) } } -static void btrfs_orig_write_end_io(struct bio *bio); - -static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio, - struct btrfs_bio *orig_bbio) -{ - /* - * For writes we tolerate nr_mirrors - 1 write failures, so we can't - * just blindly propagate a write failure here. Instead increment the - * error count in the original I/O context so that it is guaranteed to - * be larger than the error tolerance. - */ - if (bbio->bio.bi_end_io == &btrfs_orig_write_end_io) { - struct btrfs_io_stripe *orig_stripe = orig_bbio->bio.bi_private; - struct btrfs_io_context *orig_bioc = orig_stripe->bioc; - - atomic_add(orig_bioc->max_errors, &orig_bioc->error); - } else { - orig_bbio->bio.bi_status = bbio->bio.bi_status; - } -} - void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) { bbio->bio.bi_status = status; if (bbio->bio.bi_pool == &btrfs_clone_bioset) { struct btrfs_bio *orig_bbio = bbio->private; - if (bbio->bio.bi_status) - btrfs_bbio_propagate_error(bbio, orig_bbio); btrfs_cleanup_bio(bbio); bbio = orig_bbio; } - if (atomic_dec_and_test(&bbio->pending_ios)) + /* + * At this point, bbio always points to the original btrfs_bio. Save + * the first error in it. + */ + if (status != BLK_STS_OK) + cmpxchg(&bbio->status, BLK_STS_OK, status); + + if (atomic_dec_and_test(&bbio->pending_ios)) { + /* Load split bio's error which might be set above. */ + if (status == BLK_STS_OK) + bbio->bio.bi_status = READ_ONCE(bbio->status); __btrfs_bio_end_io(bbio); + } } static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror) diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h index e486123407458..e2fe16074ad65 100644 --- a/fs/btrfs/bio.h +++ b/fs/btrfs/bio.h @@ -79,6 +79,9 @@ struct btrfs_bio { /* File system that this I/O operates on. */ struct btrfs_fs_info *fs_info; + /* Save the first error status of split bio. */ + blk_status_t status; + /* * This member must come last, bio_alloc_bioset will allocate enough * bytes for entire btrfs_bio but relies on bio being last. -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 2 -- mm/vmscan.c | 14 +++++--------- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a28355..9342e5692dab6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c5071..4f1d33e4b3601 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: allow set/clear page_type again" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/page-flags.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a767104878..cc839e4365c18 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: avoid unconditional one-tick sleep when swapcache_prepare fails" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <v-songbaohua(a)oppo.com> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbbd..bdf77a3ec47bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

[merged mm-nonmm-stable] ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: ipc: fix memleak if msg_init_ns failed in create_ipc_ns has been removed from the -mm tree. Its filename was ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Wupeng Ma <mawupeng1(a)huawei.com> Subject: ipc: fix memleak if msg_init_ns failed in create_ipc_ns Date: Wed, 23 Oct 2024 17:31:29 +0800 From: Ma Wupeng <mawupeng1(a)huawei.com> Percpu memory allocation may failed during create_ipc_ns however this fail is not handled properly since ipc sysctls and mq sysctls is not released properly. Fix this by release these two resource when failure. Here is the kmemleak stack when percpu failed: unreferenced object 0xffff88819de2a600 (size 512): comm "shmem_2nstest", pid 120711, jiffies 4300542254 hex dump (first 32 bytes): 60 aa 9d 84 ff ff ff ff fc 18 48 b2 84 88 ff ff `.........H..... 04 00 00 00 a4 01 00 00 20 e4 56 81 ff ff ff ff ........ .V..... backtrace (crc be7cba35): [<ffffffff81b43f83>] __kmalloc_node_track_caller_noprof+0x333/0x420 [<ffffffff81a52e56>] kmemdup_noprof+0x26/0x50 [<ffffffff821b2f37>] setup_mq_sysctls+0x57/0x1d0 [<ffffffff821b29cc>] copy_ipcs+0x29c/0x3b0 [<ffffffff815d6a10>] create_new_namespaces+0x1d0/0x920 [<ffffffff815d7449>] copy_namespaces+0x2e9/0x3e0 [<ffffffff815458f3>] copy_process+0x29f3/0x7ff0 [<ffffffff8154b080>] kernel_clone+0xc0/0x650 [<ffffffff8154b6b1>] __do_sys_clone+0xa1/0xe0 [<ffffffff843df8ff>] do_syscall_64+0xbf/0x1c0 [<ffffffff846000b0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53 Link: https://lkml.kernel.org/r/20241023093129.3074301-1-mawupeng1@huawei.com Fixes: 72d1e611082e ("ipc/msg: mitigate the lock contention with percpu counter") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- ipc/namespace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/ipc/namespace.c~ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns +++ a/ipc/namespace.c @@ -83,13 +83,15 @@ static struct ipc_namespace *create_ipc_ err = msg_init_ns(ns); if (err) - goto fail_put; + goto fail_ipc; sem_init_ns(ns); shm_init_ns(ns); return ns; +fail_ipc: + retire_ipc_sysctls(ns); fail_mq: retire_mq_sysctls(ns); _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are

1 year, 1 month

1
0
0 0

[merged mm-stable] mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/damon/vaddr: fix issue in damon_va_evenly_split_region() has been removed from the -mm tree. Its filename was mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Zheng Yejian <zhengyejian(a)huaweicloud.com> Subject: mm/damon/vaddr: fix issue in damon_va_evenly_split_region() Date: Tue, 22 Oct 2024 16:39:26 +0800 Patch series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()". v2. According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). And add 'nr_piece == 1' check in damon_va_evenly_split_region() for better code readability and add a corresponding kunit testcase. This patch (of 2): According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). After this patch, damon-operations test passed: # ./tools/testing/kunit/kunit.py run damon-operations [...] ============== damon-operations (6 subtests) =============== [PASSED] damon_test_three_regions_in_vmas [PASSED] damon_test_apply_three_regions1 [PASSED] damon_test_apply_three_regions2 [PASSED] damon_test_apply_three_regions3 [PASSED] damon_test_apply_three_regions4 [PASSED] damon_test_split_evenly ================ [PASSED] damon-operations ================= Link: https://lkml.kernel.org/r/20241022083927.3592237-1-zhengyejian@huaweicloud.… Link: https://lkml.kernel.org/r/20241022083927.3592237-2-zhengyejian@huaweicloud.… Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com> Reviewed-by: SeongJae Park <sj(a)kernel.org> Cc: Fernand Sieber <sieberf(a)amazon.com> Cc: Leonard Foerster <foersleo(a)amazon.de> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Ye Weihua <yeweihua4(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/damon/tests/vaddr-kunit.h | 1 + mm/damon/vaddr.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) --- a/mm/damon/tests/vaddr-kunit.h~mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region +++ a/mm/damon/tests/vaddr-kunit.h @@ -300,6 +300,7 @@ static void damon_test_split_evenly(stru damon_test_split_evenly_fail(test, 0, 100, 0); damon_test_split_evenly_succ(test, 0, 100, 10); damon_test_split_evenly_succ(test, 5, 59, 5); + damon_test_split_evenly_succ(test, 0, 3, 2); damon_test_split_evenly_fail(test, 5, 6, 2); } --- a/mm/damon/vaddr.c~mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region +++ a/mm/damon/vaddr.c @@ -67,6 +67,7 @@ static int damon_va_evenly_split_region( unsigned long sz_orig, sz_piece, orig_end; struct damon_region *n = NULL, *next; unsigned long start; + unsigned int i; if (!r || !nr_pieces) return -EINVAL; @@ -80,8 +81,7 @@ static int damon_va_evenly_split_region( r->ar.end = r->ar.start + sz_piece; next = damon_next_region(r); - for (start = r->ar.end; start + sz_piece <= orig_end; - start += sz_piece) { + for (start = r->ar.end, i = 1; i < nr_pieces; start += sz_piece, i++) { n = damon_new_region(start, start + sz_piece); if (!n) return -ENOMEM; _ Patches currently in -mm which might be from zhengyejian(a)huaweicloud.com are

1 year, 1 month

1
0
0 0

[merged mm-stable] mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation has been removed from the -mm tree. Its filename was mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Adrian Huang <ahuang12(a)lenovo.com> Subject: mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation Date: Sat, 27 Jul 2024 00:52:46 +0800 When compiling kernel source 'make -j $(nproc)' with the up-and-running KASAN-enabled kernel on a 256-core machine, the following soft lockup is shown: watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 Workqueue: events drain_vmap_area_work RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0 Call Trace: <IRQ> ? watchdog_timer_fn+0x2cd/0x390 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x300/0x6d0 ? sched_clock_cpu+0x69/0x4e0 ? __pfx___hrtimer_run_queues+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? ktime_get_update_offsets_now+0x7f/0x2a0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? hrtimer_interrupt+0x2ca/0x760 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0 ? sysvec_apic_timer_interrupt+0x6a/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? smp_call_function_many_cond+0x1d8/0xbb0 ? __pfx_do_kernel_range_flush+0x10/0x10 on_each_cpu_cond_mask+0x20/0x40 flush_tlb_kernel_range+0x19b/0x250 ? srso_return_thunk+0x5/0x5f ? kasan_release_vmalloc+0xa7/0xc0 purge_vmap_node+0x357/0x820 ? __pfx_purge_vmap_node+0x10/0x10 __purge_vmap_area_lazy+0x5b8/0xa10 drain_vmap_area_work+0x21/0x30 process_one_work+0x661/0x10b0 worker_thread+0x844/0x10e0 ? srso_return_thunk+0x5/0x5f ? __kthread_parkme+0x82/0x140 ? __pfx_worker_thread+0x10/0x10 kthread+0x2a5/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Debugging Analysis: 1. The following ftrace log shows that the lockup CPU spends too much time iterating vmap_nodes and flushing TLB when purging vm_area structures. (Some info is trimmed). kworker: funcgraph_entry: | drain_vmap_area_work() { kworker: funcgraph_entry: | mutex_lock() { kworker: funcgraph_entry: 1.092 us | __cond_resched(); kworker: funcgraph_exit: 3.306 us | } ... ... kworker: funcgraph_entry: | flush_tlb_kernel_range() { ... ... kworker: funcgraph_exit: # 7533.649 us | } ... ... kworker: funcgraph_entry: 2.344 us | mutex_unlock(); kworker: funcgraph_exit: $ 23871554 us | } The drain_vmap_area_work() spends over 23 seconds. There are 2805 flush_tlb_kernel_range() calls in the ftrace log. * One is called in __purge_vmap_area_lazy(). * Others are called by purge_vmap_node->kasan_release_vmalloc. purge_vmap_node() iteratively releases kasan vmalloc allocations and flushes TLB for each vmap_area. - [Rough calculation] Each flush_tlb_kernel_range() runs about 7.5ms. -- 2804 * 7.5ms = 21.03 seconds. -- That's why a soft lock is triggered. 2. Extending the soft lockup time can work around the issue (For example, # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the above-mentioned speculation: drain_vmap_area_work() spends too much time. If we combine all TLB flush operations of the KASAN shadow virtual address into one operation in the call path 'purge_vmap_node()->kasan_release_vmalloc()', the running time of drain_vmap_area_work() can be saved greatly. The idea is from the flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the soft lockup won't be triggered. Here is the test result based on 6.10: [6.10 wo/ the patch] 1. ftrace latency profiling (record a trace if the latency > 20s). echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function echo function_graph > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on 2. Run `make -j $(nproc)` to compile the kernel source 3. Once the soft lockup is reproduced, check the ftrace log: cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 76) $ 50412985 us | } /* __purge_vmap_area_lazy */ 76) $ 50412997 us | } /* drain_vmap_area_work */ 76) $ 29165911 us | } /* __purge_vmap_area_lazy */ 76) $ 29165926 us | } /* drain_vmap_area_work */ 91) $ 53629423 us | } /* __purge_vmap_area_lazy */ 91) $ 53629434 us | } /* drain_vmap_area_work */ 91) $ 28121014 us | } /* __purge_vmap_area_lazy */ 91) $ 28121026 us | } /* drain_vmap_area_work */ [6.10 w/ the patch] 1. Repeat step 1-2 in "[6.10 wo/ the patch]" 2. The soft lockup is not triggered and ftrace log is empty. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace log. 4. Setting 'tracing_thresh' to 1 second gets ftrace log. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 23) $ 1074942 us | } /* __purge_vmap_area_lazy */ 23) $ 1074950 us | } /* drain_vmap_area_work */ The worst execution time of drain_vmap_area_work() is about 1 second. Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/ Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com Fixes: 282631cb2447 ("mm: vmalloc: remove global purge_vmap_area_root rb-tree") Signed-off-by: Adrian Huang <ahuang12(a)lenovo.com> Co-developed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Tested-by: Jiwei Sun <sunjw10(a)lenovo.com> Reviewed-by: Baoquan He <bhe(a)redhat.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Andrey Konovalov <andreyknvl(a)gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/kasan.h | 12 +++++++++--- mm/kasan/shadow.c | 14 ++++++++++---- mm/vmalloc.c | 34 ++++++++++++++++++++++++++-------- 3 files changed, 45 insertions(+), 15 deletions(-) --- a/include/linux/kasan.h~mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation +++ a/include/linux/kasan.h @@ -29,6 +29,9 @@ typedef unsigned int __bitwise kasan_vma #define KASAN_VMALLOC_VM_ALLOC ((__force kasan_vmalloc_flags_t)0x02u) #define KASAN_VMALLOC_PROT_NORMAL ((__force kasan_vmalloc_flags_t)0x04u) +#define KASAN_VMALLOC_PAGE_RANGE 0x1 /* Apply exsiting page range */ +#define KASAN_VMALLOC_TLB_FLUSH 0x2 /* TLB flush */ + #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) #include <linux/pgtable.h> @@ -564,7 +567,8 @@ void kasan_populate_early_vm_area_shadow int kasan_populate_vmalloc(unsigned long addr, unsigned long size); void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end); + unsigned long free_region_end, + unsigned long flags); #else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ @@ -579,7 +583,8 @@ static inline int kasan_populate_vmalloc static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) { } + unsigned long free_region_end, + unsigned long flags) { } #endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ @@ -614,7 +619,8 @@ static inline int kasan_populate_vmalloc static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) { } + unsigned long free_region_end, + unsigned long flags) { } static inline void *kasan_unpoison_vmalloc(const void *start, unsigned long size, --- a/mm/kasan/shadow.c~mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation +++ a/mm/kasan/shadow.c @@ -489,7 +489,8 @@ static int kasan_depopulate_vmalloc_pte( */ void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) + unsigned long free_region_end, + unsigned long flags) { void *shadow_start, *shadow_end; unsigned long region_start, region_end; @@ -522,12 +523,17 @@ void kasan_release_vmalloc(unsigned long __memset(shadow_start, KASAN_SHADOW_INIT, shadow_end - shadow_start); return; } - apply_to_existing_page_range(&init_mm, + + + if (flags & KASAN_VMALLOC_PAGE_RANGE) + apply_to_existing_page_range(&init_mm, (unsigned long)shadow_start, size, kasan_depopulate_vmalloc_pte, NULL); - flush_tlb_kernel_range((unsigned long)shadow_start, - (unsigned long)shadow_end); + + if (flags & KASAN_VMALLOC_TLB_FLUSH) + flush_tlb_kernel_range((unsigned long)shadow_start, + (unsigned long)shadow_end); } } --- a/mm/vmalloc.c~mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation +++ a/mm/vmalloc.c @@ -2182,6 +2182,25 @@ decay_va_pool_node(struct vmap_node *vn, reclaim_list_global(&decay_list); } +static void +kasan_release_vmalloc_node(struct vmap_node *vn) +{ + struct vmap_area *va; + unsigned long start, end; + + start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start; + end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end; + + list_for_each_entry(va, &vn->purge_list, list) { + if (is_vmalloc_or_module_addr((void *) va->va_start)) + kasan_release_vmalloc(va->va_start, va->va_end, + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE); + } + + kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH); +} + static void purge_vmap_node(struct work_struct *work) { struct vmap_node *vn = container_of(work, @@ -2190,20 +2209,17 @@ static void purge_vmap_node(struct work_ struct vmap_area *va, *n_va; LIST_HEAD(local_list); + if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) + kasan_release_vmalloc_node(vn); + vn->nr_purged = 0; list_for_each_entry_safe(va, n_va, &vn->purge_list, list) { unsigned long nr = va_size(va) >> PAGE_SHIFT; - unsigned long orig_start = va->va_start; - unsigned long orig_end = va->va_end; unsigned int vn_id = decode_vn_id(va->flags); list_del_init(&va->list); - if (is_vmalloc_or_module_addr((void *)orig_start)) - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); - nr_purged_pages += nr; vn->nr_purged++; @@ -4784,7 +4800,8 @@ recovery: &free_vmap_area_list); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); vas[area] = NULL; } @@ -4834,7 +4851,8 @@ err_free_shadow: &free_vmap_area_list); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); vas[area] = NULL; kfree(vms[area]); } _ Patches currently in -mm which might be from ahuang12(a)lenovo.com are

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-resolve-faulty-mmap_region-error-path-behaviour.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: resolve faulty mmap_region() error path behaviour has been removed from the -mm tree. Its filename was mm-resolve-faulty-mmap_region-error-path-behaviour.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: resolve faulty mmap_region() error path behaviour Date: Tue, 29 Oct 2024 18:11:48 +0000 The mmap_region() function is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. Taking advantage of previous patches in this series we move a number of checks earlier in the code, simplifying things by moving the core of the logic into a static internal function __mmap_region(). Doing this allows us to perform a number of checks up front before we do any real work, and allows us to unwind the writable unmap check unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE validation unconditionally also. We move a number of things here: 1. We preallocate memory for the iterator before we call the file-backed memory hook, allowing us to exit early and avoid having to perform complicated and error-prone close/free logic. We carefully free iterator state on both success and error paths. 2. The enclosing mmap_region() function handles the mapping_map_writable() logic early. Previously the logic had the mapping_map_writable() at the point of mapping a newly allocated file-backed VMA, and a matching mapping_unmap_writable() on success and error paths. We now do this unconditionally if this is a file-backed, shared writable mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however doing so does not invalidate the seal check we just performed, and we in any case always decrement the counter in the wrapper. We perform a debug assert to ensure a driver does not attempt to do the opposite. 3. We also move arch_validate_flags() up into the mmap_region() function. This is only relevant on arm64 and sparc64, and the check is only meaningful for SPARC with ADI enabled. We explicitly add a warning for this arch if a driver invalidates this check, though the code ought eventually to be fixed to eliminate the need for this. With all of these measures in place, we no longer need to explicitly close the VMA on error paths, as we place all checks which might fail prior to a call to any driver mmap hook. This eliminates an entire class of errors, makes the code easier to reason about and more robust. Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Tested-by: Mark Brown <broonie(a)kernel.org> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmap.c | 121 ++++++++++++++++++++++++++++------------------------ 1 file changed, 66 insertions(+), 55 deletions(-) --- a/mm/mmap.c~mm-resolve-faulty-mmap_region-error-path-behaviour +++ a/mm/mmap.c @@ -1358,20 +1358,18 @@ int do_munmap(struct mm_struct *mm, unsi return do_vmi_munmap(&vmi, mm, start, len, uf, false); } -unsigned long mmap_region(struct file *file, unsigned long addr, +static unsigned long __mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; pgoff_t pglen = PHYS_PFN(len); - struct vm_area_struct *merge; unsigned long charged = 0; struct vma_munmap_struct vms; struct ma_state mas_detach; struct maple_tree mt_detach; unsigned long end = addr + len; - bool writable_file_mapping = false; int error; VMA_ITERATOR(vmi, mm, addr); VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff); @@ -1445,28 +1443,26 @@ unsigned long mmap_region(struct file *f vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); + if (vma_iter_prealloc(&vmi, vma)) { + error = -ENOMEM; + goto free_vma; + } + if (file) { vma->vm_file = get_file(file); error = mmap_file(file, vma); if (error) - goto unmap_and_free_vma; - - if (vma_is_shared_maywrite(vma)) { - error = mapping_map_writable(file->f_mapping); - if (error) - goto close_and_free_vma; - - writable_file_mapping = true; - } + goto unmap_and_free_file_vma; + /* Drivers cannot alter the address of the VMA. */ + WARN_ON_ONCE(addr != vma->vm_start); /* - * Expansion is handled above, merging is handled below. - * Drivers should not alter the address of the VMA. + * Drivers should not permit writability when previously it was + * disallowed. */ - if (WARN_ON((addr != vma->vm_start))) { - error = -EINVAL; - goto close_and_free_vma; - } + VM_WARN_ON_ONCE(vm_flags != vma->vm_flags && + !(vm_flags & VM_MAYWRITE) && + (vma->vm_flags & VM_MAYWRITE)); vma_iter_config(&vmi, addr, end); /* @@ -1474,6 +1470,8 @@ unsigned long mmap_region(struct file *f * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { + struct vm_area_struct *merge; + vmg.flags = vma->vm_flags; /* If this fails, state is reset ready for a reattempt. */ merge = vma_merge_new_range(&vmg); @@ -1491,7 +1489,7 @@ unsigned long mmap_region(struct file *f vma = merge; /* Update vm_flags to pick up the change. */ vm_flags = vma->vm_flags; - goto unmap_writable; + goto file_expanded; } vma_iter_config(&vmi, addr, end); } @@ -1500,26 +1498,15 @@ unsigned long mmap_region(struct file *f } else if (vm_flags & VM_SHARED) { error = shmem_zero_setup(vma); if (error) - goto free_vma; + goto free_iter_vma; } else { vma_set_anonymous(vma); } - if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { - error = -EACCES; - goto close_and_free_vma; - } - - /* Allow architectures to sanity-check the vm_flags */ - if (!arch_validate_flags(vma->vm_flags)) { - error = -EINVAL; - goto close_and_free_vma; - } - - if (vma_iter_prealloc(&vmi, vma)) { - error = -ENOMEM; - goto close_and_free_vma; - } +#ifdef CONFIG_SPARC64 + /* TODO: Fix SPARC ADI! */ + WARN_ON_ONCE(!arch_validate_flags(vm_flags)); +#endif /* Lock the VMA since it is modified after insertion into VMA tree */ vma_start_write(vma); @@ -1533,10 +1520,7 @@ unsigned long mmap_region(struct file *f */ khugepaged_enter_vma(vma, vma->vm_flags); - /* Once vma denies write, undo our temporary denial count */ -unmap_writable: - if (writable_file_mapping) - mapping_unmap_writable(file->f_mapping); +file_expanded: file = vma->vm_file; ksm_add_vma(vma); expanded: @@ -1569,23 +1553,17 @@ expanded: vma_set_page_prot(vma); - validate_mm(mm); return addr; -close_and_free_vma: - vma_close(vma); - - if (file || vma->vm_file) { -unmap_and_free_vma: - fput(vma->vm_file); - vma->vm_file = NULL; - - vma_iter_set(&vmi, vma->vm_end); - /* Undo any partial mapping done by a device driver. */ - unmap_region(&vmi.mas, vma, vmg.prev, vmg.next); - } - if (writable_file_mapping) - mapping_unmap_writable(file->f_mapping); +unmap_and_free_file_vma: + fput(vma->vm_file); + vma->vm_file = NULL; + + vma_iter_set(&vmi, vma->vm_end); + /* Undo any partial mapping done by a device driver. */ + unmap_region(&vmi.mas, vma, vmg.prev, vmg.next); +free_iter_vma: + vma_iter_free(&vmi); free_vma: vm_area_free(vma); unacct_error: @@ -1595,10 +1573,43 @@ unacct_error: abort_munmap: vms_abort_munmap_vmas(&vms, &mas_detach); gather_failed: - validate_mm(mm); return error; } +unsigned long mmap_region(struct file *file, unsigned long addr, + unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, + struct list_head *uf) +{ + unsigned long ret; + bool writable_file_mapping = false; + + /* Check to see if MDWE is applicable. */ + if (map_deny_write_exec(vm_flags, vm_flags)) + return -EACCES; + + /* Allow architectures to sanity-check the vm_flags. */ + if (!arch_validate_flags(vm_flags)) + return -EINVAL; + + /* Map writable and ensure this isn't a sealed memfd. */ + if (file && is_shared_maywrite(vm_flags)) { + int error = mapping_map_writable(file->f_mapping); + + if (error) + return error; + writable_file_mapping = true; + } + + ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf); + + /* Clear our write mapping regardless of error. */ + if (writable_file_mapping) + mapping_unmap_writable(file->f_mapping); + + validate_mm(current->mm); + return ret; +} + static int __vm_munmap(unsigned long start, size_t len, bool unlock) { int ret; _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling has been removed from the -mm tree. Its filename was mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Date: Tue, 29 Oct 2024 18:11:47 +0000 Currently MTE is permitted in two circumstances (desiring to use MTE having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is specified, as checked by arch_calc_vm_flag_bits() and actualised by setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap hook is activated in mmap_region(). The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also set is the arm64 implementation of arch_validate_flags(). Unfortunately, we intend to refactor mmap_region() to perform this check earlier, meaning that in the case of a shmem backing we will not have invoked shmem_mmap() yet, causing the mapping to fail spuriously. It is inappropriate to set this architecture-specific flag in general mm code anyway, so a sensible resolution of this issue is to instead move the check somewhere else. We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via the arch_calc_vm_flag_bits() call. This is an appropriate place to do this as we already check for the MAP_ANONYMOUS case here, and the shmem file case is simply a variant of the same idea - we permit RAM-backed memory. This requires a modification to the arch_calc_vm_flag_bits() signature to pass in a pointer to the struct file associated with the mapping, however this is not too egregious as this is only used by two architectures anyway - arm64 and parisc. So this patch performs this adjustment and removes the unnecessary assignment of VM_MTE_ALLOWED in shmem_mmap(). [akpm(a)linux-foundation.org: fix whitespace, per Catalin] Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Suggested-by: Catalin Marinas <catalin.marinas(a)arm.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/arm64/include/asm/mman.h | 10 +++++++--- arch/parisc/include/asm/mman.h | 5 +++-- include/linux/mman.h | 7 ++++--- mm/mmap.c | 2 +- mm/nommu.c | 2 +- mm/shmem.c | 3 --- 6 files changed, 16 insertions(+), 13 deletions(-) --- a/arch/arm64/include/asm/mman.h~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/arch/arm64/include/asm/mman.h @@ -6,6 +6,8 @@ #ifndef BUILD_VDSO #include <linux/compiler.h> +#include <linux/fs.h> +#include <linux/shmem_fs.h> #include <linux/types.h> static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, @@ -31,19 +33,21 @@ static inline unsigned long arch_calc_vm } #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) -static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, + unsigned long flags) { /* * Only allow MTE on anonymous mappings as these are guaranteed to be * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ - if (system_supports_mte() && (flags & MAP_ANONYMOUS)) + if (system_supports_mte() && + ((flags & MAP_ANONYMOUS) || shmem_file(file))) return VM_MTE_ALLOWED; return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags) static inline bool arch_validate_prot(unsigned long prot, unsigned long addr __always_unused) --- a/arch/parisc/include/asm/mman.h~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/arch/parisc/include/asm/mman.h @@ -2,6 +2,7 @@ #ifndef __ASM_MMAN_H__ #define __ASM_MMAN_H__ +#include <linux/fs.h> #include <uapi/asm/mman.h> /* PARISC cannot allow mdwe as it needs writable stacks */ @@ -11,7 +12,7 @@ static inline bool arch_memory_deny_writ } #define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported -static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags) { /* * The stack on parisc grows upwards, so if userspace requests memory @@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags) #endif /* __ASM_MMAN_H__ */ --- a/include/linux/mman.h~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/include/linux/mman.h @@ -2,6 +2,7 @@ #ifndef _LINUX_MMAN_H #define _LINUX_MMAN_H +#include <linux/fs.h> #include <linux/mm.h> #include <linux/percpu_counter.h> @@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long #endif #ifndef arch_calc_vm_flag_bits -#define arch_calc_vm_flag_bits(flags) 0 +#define arch_calc_vm_flag_bits(file, flags) 0 #endif #ifndef arch_validate_prot @@ -151,13 +152,13 @@ calc_vm_prot_bits(unsigned long prot, un * Combine the mmap "flags" argument into "vm_flags" used internally. */ static inline unsigned long -calc_vm_flag_bits(unsigned long flags) +calc_vm_flag_bits(struct file *file, unsigned long flags) { return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | - arch_calc_vm_flag_bits(flags); + arch_calc_vm_flag_bits(file, flags); } unsigned long vm_commit_limit(void); --- a/mm/mmap.c~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/mm/mmap.c @@ -344,7 +344,7 @@ unsigned long do_mmap(struct file *file, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; /* Obtain the address to map to. we verify (or select) it and ensure --- a/mm/nommu.c~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/mm/nommu.c @@ -842,7 +842,7 @@ static unsigned long determine_vm_flags( { unsigned long vm_flags; - vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags); + vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags); if (!file) { /* --- a/mm/shmem.c~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/mm/shmem.c @@ -2733,9 +2733,6 @@ static int shmem_mmap(struct file *file, if (ret) return ret; - /* arm64 - allow memory tagging on RAM-based files */ - vm_flags_set(vma, VM_MTE_ALLOWED); - file_accessed(file); /* This is anonymous shared memory if it is unlinked at the time of mmap */ if (inode->i_nlink) _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-refactor-map_deny_write_exec.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: refactor map_deny_write_exec() has been removed from the -mm tree. Its filename was mm-refactor-map_deny_write_exec.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: refactor map_deny_write_exec() Date: Tue, 29 Oct 2024 18:11:46 +0000 Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mman.h | 21 ++++++++++++++++++--- mm/mmap.c | 2 +- mm/mprotect.c | 2 +- mm/vma.h | 2 +- 4 files changed, 21 insertions(+), 6 deletions(-) --- a/include/linux/mman.h~mm-refactor-map_deny_write_exec +++ a/include/linux/mman.h @@ -188,16 +188,31 @@ static inline bool arch_memory_deny_writ * * d) mmap(PROT_READ | PROT_EXEC) * mmap(PROT_READ | PROT_EXEC | PROT_BTI) + * + * This is only applicable if the user has set the Memory-Deny-Write-Execute + * (MDWE) protection mask for the current process. + * + * @old specifies the VMA flags the VMA originally possessed, and @new the ones + * we propose to set. + * + * Return: false if proposed change is OK, true if not ok and should be denied. */ -static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +static inline bool map_deny_write_exec(unsigned long old, unsigned long new) { + /* If MDWE is disabled, we have nothing to deny. */ if (!test_bit(MMF_HAS_MDWE, &current->mm->flags)) return false; - if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + /* If the new VMA is not executable, we have nothing to deny. */ + if (!(new & VM_EXEC)) + return false; + + /* Under MDWE we do not accept newly writably executable VMAs... */ + if (new & VM_WRITE) return true; - if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + /* ...nor previously non-executable VMAs becoming executable. */ + if (!(old & VM_EXEC)) return true; return false; --- a/mm/mmap.c~mm-refactor-map_deny_write_exec +++ a/mm/mmap.c @@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *f vma_set_anonymous(vma); } - if (map_deny_write_exec(vma, vma->vm_flags)) { + if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { error = -EACCES; goto close_and_free_vma; } --- a/mm/mprotect.c~mm-refactor-map_deny_write_exec +++ a/mm/mprotect.c @@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned lon break; } - if (map_deny_write_exec(vma, newflags)) { + if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; } --- a/mm/vma.h~mm-refactor-map_deny_write_exec +++ a/mm/vma.h @@ -42,7 +42,7 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - /* 1 byte hole */ + /* 2 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */ _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-unconditionally-close-vmas-on-error.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: unconditionally close VMAs on error has been removed from the -mm tree. Its filename was mm-unconditionally-close-vmas-on-error.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: unconditionally close VMAs on error Date: Tue, 29 Oct 2024 18:11:45 +0000 Incorrect invocation of VMA callbacks when the VMA is no longer in a consistent state is bug prone and risky to perform. With regards to the important vm_ops->close() callback We have gone to great lengths to try to track whether or not we ought to close VMAs. Rather than doing so and risking making a mistake somewhere, instead unconditionally close and reset vma->vm_ops to an empty dummy operations set with a NULL .close operator. We introduce a new function to do so - vma_close() - and simplify existing vms logic which tracked whether we needed to close or not. This simplifies the logic, avoids incorrect double-calling of the .close() callback and allows us to update error paths to simply call vma_close() unconditionally - making VMA closure idempotent. Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/internal.h | 18 ++++++++++++++++++ mm/mmap.c | 5 ++--- mm/nommu.c | 3 +-- mm/vma.c | 14 +++++--------- mm/vma.h | 4 +--- 5 files changed, 27 insertions(+), 17 deletions(-) --- a/mm/internal.h~mm-unconditionally-close-vmas-on-error +++ a/mm/internal.h @@ -135,6 +135,24 @@ static inline int mmap_file(struct file return err; } +/* + * If the VMA has a close hook then close it, and since closing it might leave + * it in an inconsistent state which makes the use of any hooks suspect, clear + * them down by installing dummy empty hooks. + */ +static inline void vma_close(struct vm_area_struct *vma) +{ + if (vma->vm_ops && vma->vm_ops->close) { + vma->vm_ops->close(vma); + + /* + * The mapping is in an inconsistent state, and no further hooks + * may be invoked upon it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + } +} + #ifdef CONFIG_MMU /* Flags for folio_pte_batch(). */ --- a/mm/mmap.c~mm-unconditionally-close-vmas-on-error +++ a/mm/mmap.c @@ -1573,8 +1573,7 @@ expanded: return addr; close_and_free_vma: - if (file && !vms.closed_vm_ops && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (file || vma->vm_file) { unmap_and_free_vma: @@ -1934,7 +1933,7 @@ void exit_mmap(struct mm_struct *mm) do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted += vma_pages(vma); - remove_vma(vma, /* unreachable = */ true, /* closed = */ false); + remove_vma(vma, /* unreachable = */ true); count++; cond_resched(); vma = vma_next(&vmi); --- a/mm/nommu.c~mm-unconditionally-close-vmas-on-error +++ a/mm/nommu.c @@ -589,8 +589,7 @@ static int delete_vma_from_mm(struct vm_ */ static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma) { - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); put_nommu_region(vma->vm_region); --- a/mm/vma.c~mm-unconditionally-close-vmas-on-error +++ a/mm/vma.c @@ -323,11 +323,10 @@ static bool can_vma_merge_right(struct v /* * Close a vm structure and free it. */ -void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed) +void remove_vma(struct vm_area_struct *vma, bool unreachable) { might_sleep(); - if (!closed && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); @@ -1115,9 +1114,7 @@ void vms_clean_up_area(struct vma_munmap vms_clear_ptes(vms, mas_detach, true); mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); - vms->closed_vm_ops = true; + vma_close(vma); } /* @@ -1160,7 +1157,7 @@ void vms_complete_munmap_vmas(struct vma /* Remove and clean up vmas */ mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - remove_vma(vma, /* = */ false, vms->closed_vm_ops); + remove_vma(vma, /* unreachable = */ false); vm_unacct_memory(vms->nr_accounted); validate_mm(mm); @@ -1684,8 +1681,7 @@ struct vm_area_struct *copy_vma(struct v return new_vma; out_vma_link: - if (new_vma->vm_ops && new_vma->vm_ops->close) - new_vma->vm_ops->close(new_vma); + vma_close(new_vma); if (new_vma->vm_file) fput(new_vma->vm_file); --- a/mm/vma.h~mm-unconditionally-close-vmas-on-error +++ a/mm/vma.h @@ -42,7 +42,6 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - bool closed_vm_ops; /* call_mmap() was encountered, so vmas may be closed */ /* 1 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ @@ -198,7 +197,6 @@ static inline void init_vma_munmap(struc vms->unmap_start = FIRST_USER_ADDRESS; vms->unmap_end = USER_PGTABLES_CEILING; vms->clear_ptes = false; - vms->closed_vm_ops = false; } #endif @@ -269,7 +267,7 @@ int do_vmi_munmap(struct vma_iterator *v unsigned long start, size_t len, struct list_head *uf, bool unlock); -void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed); +void remove_vma(struct vm_area_struct *vma, bool unreachable); void unmap_region(struct ma_state *mas, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next); _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: avoid unsafe VMA hook invocation when error arises on mmap hook has been removed from the -mm tree. Its filename was mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: avoid unsafe VMA hook invocation when error arises on mmap hook Date: Tue, 29 Oct 2024 18:11:44 +0000 Patch series "fix error handling in mmap_region() and refactor (hotfixes)", v4. mmap_region() is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. This series goes to great lengths to simplify how mmap_region() works and to avoid unwinding errors late on in the process of setting up the VMA for the new mapping, and equally avoids such operations occurring while the VMA is in an inconsistent state. The patches in this series comprise the minimal changes required to resolve existing issues in mmap_region() error handling, in order that they can be hotfixed and backported. There is additionally a follow up series which goes further, separated out from the v1 series and sent and updated separately. This patch (of 5): After an attempted mmap() fails, we are no longer in a situation where we can safely interact with VMA hooks. This is currently not enforced, meaning that we need complicated handling to ensure we do not incorrectly call these hooks. We can avoid the whole issue by treating the VMA as suspect the moment that the file->f_ops->mmap() function reports an error by replacing whatever VMA operations were installed with a dummy empty set of VMA operations. We do so through a new helper function internal to mm - mmap_file() - which is both more logically named than the existing call_mmap() function and correctly isolates handling of the vm_op reassignment to mm. All the existing invocations of call_mmap() outside of mm are ultimately nested within the call_mmap() from mm, which we now replace. It is therefore safe to leave call_mmap() in place as a convenience function (and to avoid churn). The invokers are: ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap() coda_file_operations -> mmap -> coda_file_mmap() shm_file_operations -> shm_mmap() shm_file_operations_huge -> shm_mmap() dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops -> i915_gem_dmabuf_mmap() None of these callers interact with vm_ops or mappings in a problematic way on error, quickly exiting out. Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/internal.h | 27 +++++++++++++++++++++++++++ mm/mmap.c | 6 +++--- mm/nommu.c | 4 ++-- 3 files changed, 32 insertions(+), 5 deletions(-) --- a/mm/internal.h~mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook +++ a/mm/internal.h @@ -108,6 +108,33 @@ static inline void *folio_raw_mapping(co return (void *)(mapping & ~PAGE_MAPPING_FLAGS); } +/* + * This is a file-backed mapping, and is about to be memory mapped - invoke its + * mmap hook and safely handle error conditions. On error, VMA hooks will be + * mutated. + * + * @file: File which backs the mapping. + * @vma: VMA which we are mapping. + * + * Returns: 0 if success, error otherwise. + */ +static inline int mmap_file(struct file *file, struct vm_area_struct *vma) +{ + int err = call_mmap(file, vma); + + if (likely(!err)) + return 0; + + /* + * OK, we tried to call the file hook for mmap(), but an error + * arose. The mapping is in an inconsistent state and we most not invoke + * any further hooks on it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + + return err; +} + #ifdef CONFIG_MMU /* Flags for folio_pte_batch(). */ --- a/mm/mmap.c~mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook +++ a/mm/mmap.c @@ -1422,7 +1422,7 @@ unsigned long mmap_region(struct file *f /* * clear PTEs while the vma is still in the tree so that rmap * cannot race with the freeing later in the truncate scenario. - * This is also needed for call_mmap(), which is why vm_ops + * This is also needed for mmap_file(), which is why vm_ops * close function is called. */ vms_clean_up_area(&vms, &mas_detach); @@ -1447,7 +1447,7 @@ unsigned long mmap_region(struct file *f if (file) { vma->vm_file = get_file(file); - error = call_mmap(file, vma); + error = mmap_file(file, vma); if (error) goto unmap_and_free_vma; @@ -1470,7 +1470,7 @@ unsigned long mmap_region(struct file *f vma_iter_config(&vmi, addr, end); /* - * If vm_flags changed after call_mmap(), we should try merge + * If vm_flags changed after mmap_file(), we should try merge * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { --- a/mm/nommu.c~mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook +++ a/mm/nommu.c @@ -885,7 +885,7 @@ static int do_mmap_shared_file(struct vm { int ret; - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); if (ret == 0) { vma->vm_region->vm_top = vma->vm_region->vm_end; return 0; @@ -918,7 +918,7 @@ static int do_mmap_private(struct vm_are * happy. */ if (capabilities & NOMMU_MAP_DIRECT) { - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); /* shouldn't return success if we're not sharing */ if (WARN_ON_ONCE(!is_nommu_shared_mapping(vma->vm_flags))) ret = -ENOSYS; _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/thp: fix deferred split unqueue naming and locking has been removed from the -mm tree. Its filename was mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/thp: fix deferred split unqueue naming and locking Date: Sun, 27 Oct 2024 13:02:13 -0700 (PDT) Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 35 ++++++++++++++++++++++++++--------- mm/internal.h | 10 +++++----- mm/memcontrol-v1.c | 25 +++++++++++++++++++++++++ mm/memcontrol.c | 8 +++++--- mm/migrate.c | 4 ++-- mm/page_alloc.c | 1 - mm/swap.c | 4 ++-- mm/vmscan.c | 4 ++-- 8 files changed, 67 insertions(+), 24 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *fo return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio * return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; --- a/mm/internal.h~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/internal.h @@ -639,11 +639,11 @@ static inline void folio_set_order(struc #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -651,9 +651,9 @@ static inline void folio_undo_large_rmap * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) --- a/mm/memcontrol.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *ol /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *fo VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) --- a/mm/memcontrol-v1.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struc css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_ra enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_ra target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { --- a/mm/migrate.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struc folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struc } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: --- a/mm/page_alloc.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/page_alloc.c @@ -2681,7 +2681,6 @@ void free_unref_folios(struct folio_batc unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* --- a/mm/swap.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) --- a/mm/vmscan.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/vmscan.c @@ -1476,7 +1476,7 @@ free_it: */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(s if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); _ Patches currently in -mm which might be from hughd(a)google.com are mm-delete-the-unused-put_pages_list.patch

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110529-clapper-deferred-1146@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

2
1
0 0

[GIT PULL 01/10] xfs: convert perag to use xarrays

by Darrick J. Wong

Hi Carlos, Please pull this branch with changes for xfs for 6.13-rc1. As usual, I did a test-merge with the main upstream branch as of a few minutes ago, and didn't see any conflicts. Please let me know if you encounter any problems. --D The following changes since commit 59b723cd2adbac2a34fc8e12c74ae26ae45bf230: Linux 6.12-rc6 (2024-11-03 14:05:52 -1000) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/perag-xarray-6.13_2024-11-05 for you to fetch changes up to d66496578b2a099ea453f56782f1cd2bf63a8029: xfs: insert the pag structures into the xarray later (2024-11-05 13:38:27 -0800) ---------------------------------------------------------------- xfs: convert perag to use xarrays [v5.5 01/10] Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong(a)kernel.org> ---------------------------------------------------------------- Christoph Hellwig (22): xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev xfs: remove the unused pagb_count field in struct xfs_perag xfs: remove the unused pag_active_wq field in struct xfs_perag xfs: pass a pag to xfs_difree_inode_chunk xfs: remove the agno argument to xfs_free_ag_extent xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers xfs: add a xfs_agino_to_ino helper xfs: pass a pag to xfs_extent_busy_{search,reuse} xfs: keep a reference to the pag for busy extents xfs: remove the mount field from struct xfs_busy_extents xfs: remove the unused trace_xfs_iwalk_ag trace point xfs: remove the unused xrep_bmap_walk_rmap trace point xfs: constify pag arguments to trace points xfs: pass a perag structure to the xfs_ag_resv_init_error trace point xfs: pass objects to the xfs_irec_merge_{pre,post} trace points xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point xfs: pass objects to the xrep_ibt_walk_rmap tracepoint xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points xfs: pass the pag to the xrep_newbt_extent_class tracepoints xfs: convert remaining trace points to pass pag structures xfs: split xfs_initialize_perag xfs: insert the pag structures into the xarray later Darrick J. Wong (1): xfs: fix simplify extent lookup in xfs_can_free_eofblocks fs/xfs/libxfs/xfs_ag.c | 135 ++++++++++++++------------ fs/xfs/libxfs/xfs_ag.h | 30 +++++- fs/xfs/libxfs/xfs_ag_resv.c | 3 +- fs/xfs/libxfs/xfs_alloc.c | 32 +++---- fs/xfs/libxfs/xfs_alloc.h | 5 +- fs/xfs/libxfs/xfs_alloc_btree.c | 2 +- fs/xfs/libxfs/xfs_btree.c | 7 +- fs/xfs/libxfs/xfs_ialloc.c | 67 ++++++------- fs/xfs/libxfs/xfs_ialloc_btree.c | 2 +- fs/xfs/libxfs/xfs_inode_util.c | 4 +- fs/xfs/libxfs/xfs_refcount.c | 11 +-- fs/xfs/libxfs/xfs_refcount_btree.c | 3 +- fs/xfs/libxfs/xfs_rmap_btree.c | 2 +- fs/xfs/scrub/agheader_repair.c | 16 +--- fs/xfs/scrub/alloc_repair.c | 10 +- fs/xfs/scrub/bmap.c | 5 +- fs/xfs/scrub/bmap_repair.c | 4 +- fs/xfs/scrub/common.c | 2 +- fs/xfs/scrub/cow_repair.c | 18 ++-- fs/xfs/scrub/ialloc.c | 8 +- fs/xfs/scrub/ialloc_repair.c | 25 ++--- fs/xfs/scrub/newbt.c | 46 ++++----- fs/xfs/scrub/reap.c | 8 +- fs/xfs/scrub/refcount_repair.c | 5 +- fs/xfs/scrub/repair.c | 13 ++- fs/xfs/scrub/rmap_repair.c | 9 +- fs/xfs/scrub/trace.h | 161 +++++++++++++++---------------- fs/xfs/xfs_bmap_util.c | 8 +- fs/xfs/xfs_buf_item_recover.c | 5 +- fs/xfs/xfs_discard.c | 20 ++-- fs/xfs/xfs_extent_busy.c | 31 +++--- fs/xfs/xfs_extent_busy.h | 14 ++- fs/xfs/xfs_extfree_item.c | 4 +- fs/xfs/xfs_filestream.c | 5 +- fs/xfs/xfs_fsmap.c | 25 ++--- fs/xfs/xfs_health.c | 8 +- fs/xfs/xfs_inode.c | 5 +- fs/xfs/xfs_iunlink_item.c | 13 ++- fs/xfs/xfs_iwalk.c | 17 ++-- fs/xfs/xfs_log_cil.c | 3 +- fs/xfs/xfs_log_recover.c | 5 +- fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 191 ++++++++++++++++--------------------- fs/xfs/xfs_trans.c | 2 +- 44 files changed, 459 insertions(+), 531 deletions(-)

1 year, 1 month

1
0
0 0

+ util_macrosh-fix-rework-find_closest-macros.patch added to mm-nonmm-unstable branch

by Andrew Morton

The patch titled Subject: util_macros.h: fix/rework find_closest() macros has been added to the -mm mm-nonmm-unstable branch. Its filename is util_macrosh-fix-rework-find_closest-macros.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Alexandru Ardelean <aardelean(a)baylibre.com> Subject: util_macros.h: fix/rework find_closest() macros Date: Tue, 5 Nov 2024 16:54:05 +0200 A bug was found in the find_closest() (find_closest_descending() is also affected after some testing), where for certain values with small progressions, the rounding (done by averaging 2 values) causes an incorrect index to be returned. The rounding issues occur for progressions of 1, 2 and 3. It goes away when the progression/interval between two values is 4 or larger. It's particularly bad for progressions of 1. For example if there's an array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0 (the index of '1'), rather than returning 1 (the index of '2'). This means that for exact values (with a progression of 1), find_closest() will misbehave and return the index of the value smaller than the one we're searching for. For progressions of 2 and 3, the exact values are obtained correctly; but values aren't approximated correctly (as one would expect). Starting with progressions of 4, all seems to be good (one gets what one would expect). While one could argue that 'find_closest()' should not be used for arrays with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still behave correctly. The bug was found while testing the 'drivers/iio/adc/ad7606.c', specifically the oversampling feature. For reference, the oversampling values are listed as: static const unsigned int ad7606_oversampling_avail[7] = { 1, 2, 4, 8, 16, 32, 64, }; When doing: 1. $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is fine 2. $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is wrong; 2 should be returned here 3. $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 2 # this is fine 4. $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine And from here-on, the values are as correct (one gets what one would expect.) While writing a kunit test for this bug, a peculiar issue was found for the array in the 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c' drivers. While running the kunit test (for 'ina226_avg_tab' from these drivers): * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, so value. * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, value 1; and now one could argue whether 3 is closer to 4 or to 1. This quirk only appears for value '3' in this array, but it seems to be a another rounding issue. * And from 4 onwards the 'find_closest'() works fine (one gets what one would expect). This change reworks the find_closest() macros to also check the difference between the left and right elements when 'x'. If the distance to the right is smaller (than the distance to the left), the index is incremented by 1. This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro. In order to accommodate for any mix of negative + positive values, the internal variables '__fc_x', '__fc_mid_x', '__fc_left' & '__fc_right' are forced to 'long' type. This also addresses any potential bugs/issues with 'x' being of an unsigned type. In those situations any comparison between signed & unsigned would be promoted to a comparison between 2 unsigned numbers; this is especially annoying when '__fc_left' & '__fc_right' underflow. The find_closest_descending() macro was also reworked and duplicated from the find_closest(), and it is being iterated in reverse. The main reason for this is to get the same indices as 'find_closest()' (but in reverse). The comparison for '__fc_right < __fc_left' favors going the array in ascending order. For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we get: __fc_mid_x = 2 __fc_left = -1 __fc_right = -2 Then '__fc_right < __fc_left' evaluates to true and '__fc_i++' becomes 7 which is not quite incorrect, but 3 is closer to 4 than to 1. This change has been validated with the kunit from the next patch. Link: https://lkml.kernel.org/r/20241105145406.554365-1-aardelean@baylibre.com Fixes: 95d119528b0b ("util_macros.h: add find_closest() macro") Signed-off-by: Alexandru Ardelean <aardelean(a)baylibre.com> Cc: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/util_macros.h | 56 ++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 16 deletions(-) --- a/include/linux/util_macros.h~util_macrosh-fix-rework-find_closest-macros +++ a/include/linux/util_macros.h @@ -4,19 +4,6 @@ #include <linux/math.h> -#define __find_closest(x, a, as, op) \ -({ \ - typeof(as) __fc_i, __fc_as = (as) - 1; \ - typeof(x) __fc_x = (x); \ - typeof(*a) const *__fc_a = (a); \ - for (__fc_i = 0; __fc_i < __fc_as; __fc_i++) { \ - if (__fc_x op DIV_ROUND_CLOSEST(__fc_a[__fc_i] + \ - __fc_a[__fc_i + 1], 2)) \ - break; \ - } \ - (__fc_i); \ -}) - /** * find_closest - locate the closest element in a sorted array * @x: The reference value. @@ -25,8 +12,27 @@ * @as: Size of 'a'. * * Returns the index of the element closest to 'x'. + * Note: If using an array of negative numbers (or mixed positive numbers), + * then be sure that 'x' is of a signed-type to get good results. */ -#define find_closest(x, a, as) __find_closest(x, a, as, <=) +#define find_closest(x, a, as) \ +({ \ + typeof(as) __fc_i, __fc_as = (as) - 1; \ + long __fc_mid_x, __fc_x = (x); \ + long __fc_left, __fc_right; \ + typeof(*a) const *__fc_a = (a); \ + for (__fc_i = 0; __fc_i < __fc_as; __fc_i++) { \ + __fc_mid_x = (__fc_a[__fc_i] + __fc_a[__fc_i + 1]) / 2; \ + if (__fc_x <= __fc_mid_x) { \ + __fc_left = __fc_x - __fc_a[__fc_i]; \ + __fc_right = __fc_a[__fc_i + 1] - __fc_x; \ + if (__fc_right < __fc_left) \ + __fc_i++; \ + break; \ + } \ + } \ + (__fc_i); \ +}) /** * find_closest_descending - locate the closest element in a sorted array @@ -36,9 +42,27 @@ * @as: Size of 'a'. * * Similar to find_closest() but 'a' is expected to be sorted in descending - * order. + * order. The iteration is done in reverse order, so that the comparison + * of '__fc_right' & '__fc_left' also works for unsigned numbers. */ -#define find_closest_descending(x, a, as) __find_closest(x, a, as, >=) +#define find_closest_descending(x, a, as) \ +({ \ + typeof(as) __fc_i, __fc_as = (as) - 1; \ + long __fc_mid_x, __fc_x = (x); \ + long __fc_left, __fc_right; \ + typeof(*a) const *__fc_a = (a); \ + for (__fc_i = __fc_as; __fc_i >= 1; __fc_i--) { \ + __fc_mid_x = (__fc_a[__fc_i] + __fc_a[__fc_i - 1]) / 2; \ + if (__fc_x <= __fc_mid_x) { \ + __fc_left = __fc_x - __fc_a[__fc_i]; \ + __fc_right = __fc_a[__fc_i - 1] - __fc_x; \ + if (__fc_right < __fc_left) \ + __fc_i--; \ + break; \ + } \ + } \ + (__fc_i); \ +}) /** * is_insidevar - check if the @ptr points inside the @var memory range. _ Patches currently in -mm which might be from aardelean(a)baylibre.com are util_macrosh-fix-rework-find_closest-macros.patch lib-util_macros_kunit-add-kunit-test-for-util_macrosh.patch

1 year, 1 month

1
0
0 0

[REGRESSION] The iwl4965 driver broke somewhere between 6.10.10 and 6.11.5 (probably 6.11rc)

by Alf Marius

Hi, I recently installed Arch Linux on an old laptop (Fujitsu-Siemens AMILO Xi 2550) and noticed that: - when booting Linux from the Arch ISO (kernel version 6.10.10) WIFI is working fine - after installing Arch Linux from the ISO and booting (kernel version 6.11.5) WIFI was not working properly By "not working properly" I mean: downloading small files or installing a few small packages was working ok, but when downloading larger files or installing larger packages with lots of dependencies, the connection would gradually slow down and eventually die. I reported this on the Arch Linux forum (https://bbs.archlinux.org/viewtopic.php?pid=2206757) and some helpful memeber suggested that this might be the commit that broke things: https://github.com/torvalds/linux/commit/02b682d54598f61cbb7dbb14d98ec18011… An Arch Linux packet manager (gromit) helped me debug this issue by building a couple of kernels that I tested. - https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.12rc… - https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.12rc… The first one didn't work, but the second (in which he reverted the commit linked above) did fix my problem. So, I guess this commit should be investigated by those in the know. Thats why I also added Andrii and Kalle to CC as they are listed in the commit message. My network controller: Intel corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61) Kernel driver in use: iwl4965 This is my first kernel bug report, hope I did everything right :) I'm ofc willing to help provide more info and debug locally here to help solve this issue. Thanks and good night Alf :) -- "The generation of random numbers is too important to be left to chance."

1 year, 1 month

1
0
0 0

[PATCHSET v5.5 01/10] xfs: convert perag to use xarrays

by Darrick J. Wong

Hi all, Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. With a bit of luck, this should all go splendidly. Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pe… --- Commits in this patchset: * xfs: fix simplify extent lookup in xfs_can_free_eofblocks * xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev * xfs: remove the unused pagb_count field in struct xfs_perag * xfs: remove the unused pag_active_wq field in struct xfs_perag * xfs: pass a pag to xfs_difree_inode_chunk * xfs: remove the agno argument to xfs_free_ag_extent * xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers * xfs: add a xfs_agino_to_ino helper * xfs: pass a pag to xfs_extent_busy_{search,reuse} * xfs: keep a reference to the pag for busy extents * xfs: remove the mount field from struct xfs_busy_extents * xfs: remove the unused trace_xfs_iwalk_ag trace point * xfs: remove the unused xrep_bmap_walk_rmap trace point * xfs: constify pag arguments to trace points * xfs: pass a perag structure to the xfs_ag_resv_init_error trace point * xfs: pass objects to the xfs_irec_merge_{pre,post} trace points * xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point * xfs: pass objects to the xrep_ibt_walk_rmap tracepoint * xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points * xfs: pass the pag to the xrep_newbt_extent_class tracepoints * xfs: convert remaining trace points to pass pag structures * xfs: split xfs_initialize_perag * xfs: insert the pag structures into the xarray later --- fs/xfs/libxfs/xfs_ag.c | 135 ++++++++++++++----------- fs/xfs/libxfs/xfs_ag.h | 30 +++++- fs/xfs/libxfs/xfs_ag_resv.c | 3 - fs/xfs/libxfs/xfs_alloc.c | 32 +++--- fs/xfs/libxfs/xfs_alloc.h | 5 - fs/xfs/libxfs/xfs_alloc_btree.c | 2 fs/xfs/libxfs/xfs_btree.c | 7 + fs/xfs/libxfs/xfs_ialloc.c | 67 ++++++------- fs/xfs/libxfs/xfs_ialloc_btree.c | 2 fs/xfs/libxfs/xfs_inode_util.c | 4 - fs/xfs/libxfs/xfs_refcount.c | 11 +- fs/xfs/libxfs/xfs_refcount_btree.c | 3 - fs/xfs/libxfs/xfs_rmap_btree.c | 2 fs/xfs/scrub/agheader_repair.c | 16 +-- fs/xfs/scrub/alloc_repair.c | 10 +- fs/xfs/scrub/bmap.c | 5 - fs/xfs/scrub/bmap_repair.c | 4 - fs/xfs/scrub/common.c | 2 fs/xfs/scrub/cow_repair.c | 18 +-- fs/xfs/scrub/ialloc.c | 8 +- fs/xfs/scrub/ialloc_repair.c | 25 ++--- fs/xfs/scrub/newbt.c | 46 ++++----- fs/xfs/scrub/reap.c | 8 +- fs/xfs/scrub/refcount_repair.c | 5 - fs/xfs/scrub/repair.c | 13 +- fs/xfs/scrub/rmap_repair.c | 9 +- fs/xfs/scrub/trace.h | 161 ++++++++++++++---------------- fs/xfs/xfs_bmap_util.c | 8 +- fs/xfs/xfs_buf_item_recover.c | 5 - fs/xfs/xfs_discard.c | 20 ++-- fs/xfs/xfs_extent_busy.c | 31 ++---- fs/xfs/xfs_extent_busy.h | 14 +-- fs/xfs/xfs_extfree_item.c | 4 - fs/xfs/xfs_filestream.c | 5 - fs/xfs/xfs_fsmap.c | 25 ++--- fs/xfs/xfs_health.c | 8 +- fs/xfs/xfs_inode.c | 5 - fs/xfs/xfs_iunlink_item.c | 13 +- fs/xfs/xfs_iwalk.c | 17 ++- fs/xfs/xfs_log_cil.c | 3 - fs/xfs/xfs_log_recover.c | 5 - fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 191 +++++++++++++++--------------------- fs/xfs/xfs_trans.c | 2 44 files changed, 459 insertions(+), 531 deletions(-)

1 year, 1 month

1
1
0 0

[PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()

by Qiu-ji Chen

This patch addresses a reference count handling issue in the lpfc_bsg_hba_get_event() function. In the branch if (evt->reg_id == event_req->ev_reg_id), the function calls lpfc_bsg_event_ref(), which increments the reference count of the relevant resources. However, in the branch if (evt_dat == NULL), a goto statement directly jumps to the function’s final goto block, skipping the release operations at the end of the function. This means that, if the condition if (evt_dat == NULL) is met, the function fails to correctly release the resources acquired by lpfc_bsg_event_ref(), leading to a reference count leak. To fix this issue, we added a new block job_error_unref before the job_error block. When the condition if (evt_dat == NULL) is met, the function will enter the job_error_unref block, ensuring that the previously allocated resources are properly released, thereby preventing the reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4cc0e56e977f ("[SCSI] lpfc 8.3.8: (BSG3) Modify BSG commands to operate asynchronously") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/scsi/lpfc/lpfc_bsg.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c index 85059b83ea6b..832a5a6dd85f 100644 --- a/drivers/scsi/lpfc/lpfc_bsg.c +++ b/drivers/scsi/lpfc/lpfc_bsg.c @@ -1294,7 +1294,7 @@ lpfc_bsg_hba_get_event(struct bsg_job *job) if (evt_dat == NULL) { bsg_reply->reply_payload_rcv_len = 0; rc = -ENOENT; - goto job_error; + goto job_error_unref; } if (evt_dat->len > job->request_payload.payload_len) { @@ -1329,6 +1329,10 @@ lpfc_bsg_hba_get_event(struct bsg_job *job) bsg_reply->reply_payload_rcv_len); return 0; +job_err_unref: + spin_lock_irqsave(&phba->ct_ev_lock, flags); + lpfc_bsg_event_unref(evt); + spin_unlock_irqrestore(&phba->ct_ev_lock, flags); job_error: job->dd_data = NULL; bsg_reply->result = rc; -- 2.34.1

1 year, 1 month

3
3
0 0

[PATCH 13/16] drm/amd/display: update pipe selection policy to check head pipe

by Hamza Mahfooz

From: Yihan Zhu <Yihan.Zhu(a)amd.com> [Why] No check on head pipe during the dml to dc hw mapping will allow illegal pipe usage. This will result in a wrong pipe topology to cause mpcc tree totally mess up then cause a display hang. [How] Avoid to use the pipe is head in all check and avoid ODM slice during preferred pipe check. Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../display/dc/dml2/dml2_dc_resource_mgmt.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c index 6eccf0241d85..9be9ed7e01d3 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c @@ -258,12 +258,23 @@ static unsigned int find_preferred_pipe_candidates(const struct dc_state *existi * However this condition comes with a caveat. We need to ignore pipes that will * require a change in OPP but still have the same stream id. For example during * an MPC to ODM transiton. + * + * Adding check to avoid pipe select on the head pipe by utilizing dc resource + * helper function resource_get_primary_dpp_pipe and comparing the pipe index. */ if (existing_state) { for (i = 0; i < pipe_count; i++) { if (existing_state->res_ctx.pipe_ctx[i].stream && existing_state->res_ctx.pipe_ctx[i].stream->stream_id == stream_id) { + struct pipe_ctx *head_pipe = + resource_get_primary_dpp_pipe(&existing_state->res_ctx.pipe_ctx[i]); + + // we should always respect the head pipe from selection + if (head_pipe && head_pipe->pipe_idx == i) + continue; if (existing_state->res_ctx.pipe_ctx[i].plane_res.hubp && - existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i) + existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i && + (existing_state->res_ctx.pipe_ctx[i].prev_odm_pipe || + existing_state->res_ctx.pipe_ctx[i].next_odm_pipe)) continue; preferred_pipe_candidates[num_preferred_candidates++] = i; @@ -292,6 +303,12 @@ static unsigned int find_last_resort_pipe_candidates(const struct dc_state *exis */ if (existing_state) { for (i = 0; i < pipe_count; i++) { + struct pipe_ctx *head_pipe = + resource_get_primary_dpp_pipe(&existing_state->res_ctx.pipe_ctx[i]); + + // we should always respect the head pipe from selection + if (head_pipe && head_pipe->pipe_idx == i) + continue; if ((existing_state->res_ctx.pipe_ctx[i].plane_res.hubp && existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i) || existing_state->res_ctx.pipe_ctx[i].stream_res.tg) -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH 12/16] drm/amd/display: Require minimum VBlank size for stutter optimization

by Hamza Mahfooz

From: Dillon Varone <dillon.varone(a)amd.com> If the nominal VBlank is too small, optimizing for stutter can cause the prefetch bandwidth to increase drasticaly, resulting in higher clock and power requirements. Only optimize if it is >3x the stutter latency. Cc: stable(a)vger.kernel.org Reviewed-by: Austin Zheng <austin.zheng(a)amd.com> Signed-off-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c index 5a09dd298e6f..92269f0e50ed 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c @@ -8,6 +8,7 @@ #include "dml2_pmo_dcn4_fams2.h" static const double MIN_VACTIVE_MARGIN_PCT = 0.25; // We need more than non-zero margin because DET buffer granularity can alter vactive latency hiding +static const double MIN_BLANK_STUTTER_FACTOR = 3.0; static const struct dml2_pmo_pstate_strategy base_strategy_list_1_display[] = { // VActive Preferred @@ -2140,6 +2141,7 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in struct dml2_pmo_instance *pmo = in_out->instance; bool stutter_period_meets_z8_eco = true; bool z8_stutter_optimization_too_expensive = false; + bool stutter_optimization_too_expensive = false; double line_time_us, vblank_nom_time_us; unsigned int i; @@ -2161,10 +2163,15 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in line_time_us = (double)in_out->base_display_config->display_config.stream_descriptors[i].timing.h_total / (in_out->base_display_config->display_config.stream_descriptors[i].timing.pixel_clock_khz * 1000) * 1000000; vblank_nom_time_us = line_time_us * in_out->base_display_config->display_config.stream_descriptors[i].timing.vblank_nom; - if (vblank_nom_time_us < pmo->soc_bb->power_management_parameters.z8_stutter_exit_latency_us) { + if (vblank_nom_time_us < pmo->soc_bb->power_management_parameters.z8_stutter_exit_latency_us * MIN_BLANK_STUTTER_FACTOR) { z8_stutter_optimization_too_expensive = true; break; } + + if (vblank_nom_time_us < pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us * MIN_BLANK_STUTTER_FACTOR) { + stutter_optimization_too_expensive = true; + break; + } } pmo->scratch.pmo_dcn4.num_stutter_candidates = 0; @@ -2180,7 +2187,7 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in pmo->scratch.pmo_dcn4.z8_vblank_optimizable = false; } - if (pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us > 0) { + if (!stutter_optimization_too_expensive && pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us > 0) { pmo->scratch.pmo_dcn4.optimal_vblank_reserved_time_for_stutter_us[pmo->scratch.pmo_dcn4.num_stutter_candidates] = (unsigned int)pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us; pmo->scratch.pmo_dcn4.num_stutter_candidates++; } -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH 11/16] drm/amd/display: Handle dml allocation failure to avoid crash

by Hamza Mahfooz

From: Ryan Seto <ryanseto(a)amd.com> [Why] In the case where a dml allocation fails for any reason, the current state's dml contexts would no longer be valid. Then subsequent calls dc_state_copy_internal would shallow copy invalid memory and if the new state was released, a double free would occur. [How] Reset dml pointers in new_state to NULL and avoid invalid pointer Cc: stable(a)vger.kernel.org Reviewed-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Ryan Seto <ryanseto(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc_state.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c b/drivers/gpu/drm/amd/display/dc/core/dc_state.c index 2597e3fd562b..e006f816ff2f 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c @@ -265,6 +265,9 @@ struct dc_state *dc_state_create_copy(struct dc_state *src_state) dc_state_copy_internal(new_state, src_state); #ifdef CONFIG_DRM_AMD_DC_FP + new_state->bw_ctx.dml2 = NULL; + new_state->bw_ctx.dml2_dc_power_source = NULL; + if (src_state->bw_ctx.dml2 && !dml2_create_copy(&new_state->bw_ctx.dml2, src_state->bw_ctx.dml2)) { dc_state_release(new_state); -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH 07/16] drm/amd/display: always blank stream before disable crtc

by Hamza Mahfooz

From: Fudongwang <Fudong.Wang(a)amd.com> Garbage will show due to dig is on. So blank stream needed. Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Fudongwang <Fudong.Wang(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c index 036cb7e9b5bb..59b2e87317e3 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c @@ -519,15 +519,18 @@ static void dcn31_reset_back_end_for_pipe( dc->hwss.set_abm_immediate_disable(pipe_ctx); - if ((!pipe_ctx->stream->dpms_off || pipe_ctx->stream->link->link_status.link_active) - && pipe_ctx->stream->sink && pipe_ctx->stream->sink->edid_caps.panel_patch.blankstream_before_otg_off) { + link = pipe_ctx->stream->link; + + if ((!pipe_ctx->stream->dpms_off || link->link_status.link_active) && + (link->connector_signal == SIGNAL_TYPE_EDP)) dc->hwss.blank_stream(pipe_ctx); - } pipe_ctx->stream_res.tg->funcs->set_dsc_config( pipe_ctx->stream_res.tg, OPTC_DSC_DISABLED, 0, 0); + pipe_ctx->stream_res.tg->funcs->disable_crtc(pipe_ctx->stream_res.tg); + pipe_ctx->stream_res.tg->funcs->enable_optc_clock(pipe_ctx->stream_res.tg, false); if (pipe_ctx->stream_res.tg->funcs->set_odm_bypass) pipe_ctx->stream_res.tg->funcs->set_odm_bypass( @@ -539,7 +542,6 @@ static void dcn31_reset_back_end_for_pipe( pipe_ctx->stream_res.tg->funcs->set_drr( pipe_ctx->stream_res.tg, NULL); - link = pipe_ctx->stream->link; /* DPMS may already disable or */ /* dpms_off status is incorrect due to fastboot * feature. When system resume from S4 with second -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_set_event()

by Qiu-ji Chen

This patch fixes a reference count handling issue in the function lpfc_bsg_hba_set_event(). In the branch if (evt->reg_id == event_req->ev_reg_id), the function calls lpfc_bsg_event_ref(), which increments the reference count of the associated resource. However, in the subsequent branch if (&evt->node == &phba->ct_ev_waiters), a new evt is allocated, but the old evt should be released at this point. Failing to do so could lead to issues. To resolve this issue, we added a release instruction at the beginning of the next branch if (&evt->node == &phba->ct_ev_waiters), ensuring that the resources allocated in the previous branch are properly released, thereby preventing a reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4cc0e56e977f ("[SCSI] lpfc 8.3.8: (BSG3) Modify BSG commands to operate asynchronously") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/scsi/lpfc/lpfc_bsg.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c index 85059b83ea6b..3a65270c5584 100644 --- a/drivers/scsi/lpfc/lpfc_bsg.c +++ b/drivers/scsi/lpfc/lpfc_bsg.c @@ -1200,6 +1200,9 @@ lpfc_bsg_hba_set_event(struct bsg_job *job) spin_unlock_irqrestore(&phba->ct_ev_lock, flags); if (&evt->node == &phba->ct_ev_waiters) { + spin_lock_irqsave(&phba->ct_ev_lock, flags); + lpfc_bsg_event_unref(evt); + spin_unlock_irqrestore(&phba->ct_ev_lock, flags); /* no event waiting struct yet - first call */ dd_data = kmalloc(sizeof(struct bsg_job_data), GFP_KERNEL); if (dd_data == NULL) { -- 2.34.1

1 year, 1 month

2
1
0 0

[tip: x86/urgent] x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client

by tip-bot2 for Mario Limonciello

The following commit has been merged into the x86/urgent branch of tip: Commit-ID: a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0 Gitweb: https://git.kernel.org/tip/a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0 Author: Mario Limonciello <mario.limonciello(a)amd.com> AuthorDate: Tue, 05 Nov 2024 10:02:34 -06:00 Committer: Borislav Petkov (AMD) <bp(a)alien8.de> CommitterDate: Tue, 05 Nov 2024 17:48:32 +01:00 x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client A number of Zen4 client SoCs advertise the ability to use virtualized VMLOAD/VMSAVE, but using these instructions is reported to be a cause of a random host reboot. These instructions aren't intended to be advertised on Zen4 client so clear the capability. Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009 --- arch/x86/kernel/cpu/amd.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index fab5cae..823f44f 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -924,6 +924,17 @@ static void init_amd_zen4(struct cpuinfo_x86 *c) { if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT); + + /* + * These Zen4 SoCs advertise support for virtualized VMLOAD/VMSAVE + * in some BIOS versions but they can lead to random host reboots. + */ + switch (c->x86_model) { + case 0x18 ... 0x1f: + case 0x60 ... 0x7f: + clear_cpu_cap(c, X86_FEATURE_V_VMSAVE_VMLOAD); + break; + } } static void init_amd_zen5(struct cpuinfo_x86 *c)

1 year, 1 month

1
0
0 0

[PATCH] ACPICA: Fix dereference in acpi_ev_address_space_dispatch()

by George Rurikov

When support for PCC Opregion was added, validation of field_obj was missed. Based on the acpi_ev_address_space_dispatch function description, field_obj can be NULL, and also when acpi_ev_address_space_dispatch is called in the acpi_ex_region_read() NULL is passed as field_obj. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 0acf24ad7e10 ("ACPICA: Add support for PCC Opregion special context data") Cc: stable(a)vger.kernel.org Signed-off-by: George Rurikov <grurikov(a)gmail.com> --- drivers/acpi/acpica/evregion.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c index cf53b9535f18..03e8b6f186af 100644 --- a/drivers/acpi/acpica/evregion.c +++ b/drivers/acpi/acpica/evregion.c @@ -164,13 +164,17 @@ acpi_ev_address_space_dispatch(union acpi_operand_object *region_obj, } if (region_obj->region.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) { - struct acpi_pcc_info *ctx = - handler_desc->address_space.context; - - ctx->internal_buffer = - field_obj->field.internal_pcc_buffer; - ctx->length = (u16)region_obj->region.length; - ctx->subspace_id = (u8)region_obj->region.address; + if (field_obj != NULL) { + struct acpi_pcc_info *ctx = + handler_desc->address_space.context; + + ctx->internal_buffer = + field_obj->field.internal_pcc_buffer; + ctx->length = (u16)region_obj->region.length; + ctx->subspace_id = (u8)region_obj->region.address; + } else { + return_ACPI_STATUS(AE_ERROR); + } } if (region_obj->region.space_id == -- 2.34.1

1 year, 1 month

3
2
0 0

FAILED: patch "[PATCH] cxl/port: Fix CXL port initialization order when the" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110551-fragrance-deodorant-e7fa@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082..876469e23f7a 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52..2caa90fa4bf2 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768f..9dc394295e1f 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL);

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] cxl/port: Fix CXL port initialization order when the" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110550-chamomile-zit-2c77@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082..876469e23f7a 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52..2caa90fa4bf2 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768f..9dc394295e1f 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL);

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 101c268bd2f37e965a5468353e62d154db38838e # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110539-swooned-cold-53f5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 101c268bd2f37e965a5468353e62d154db38838e Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Tue, 22 Oct 2024 18:43:49 -0700 Subject: [PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable(a)vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Cc: Zijun Hu <quic_zijuhu(a)quicinc.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> diff --git a/drivers/base/core.c b/drivers/base/core.c index a4c853411a6b..e42f1ad73078 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child_reverse); +/** + * device_for_each_child_reverse_from - device child iterator in reversed order. + * @parent: parent struct device. + * @from: optional starting point in child list + * @fn: function to be called for each device. + * @data: data for the callback. + * + * Iterate over @parent's child devices, starting at @from, and call @fn + * for each, passing it @data. This helper is identical to + * device_for_each_child_reverse() when @from is NULL. + * + * @fn is checked each iteration. If it returns anything other than 0, + * iteration stop and that value is returned to the caller of + * device_for_each_child_reverse_from(); + */ +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)) +{ + struct klist_iter i; + struct device *child; + int error = 0; + + if (!parent->p) + return 0; + + klist_iter_init_node(&parent->p->klist_children, &i, + (from ? &from->p->knode_parent : NULL)); + while ((child = prev_device(&i)) && !error) + error = fn(child, data); + klist_iter_exit(&i); + return error; +} +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); + /** * device_find_child - device iterator for locating a particular device. * @parent: parent struct device diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3df10517a327..223c273c0cd1 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int cxl_decoder_reset(struct cxl_decoder *cxld) +static int commit_reap(struct device *dev, const void *data) +{ + struct cxl_port *port = to_cxl_port(dev->parent); + struct cxl_decoder *cxld; + + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev)) + return 0; + + cxld = to_cxl_decoder(dev); + if (port->commit_end == cxld->id && + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", + dev_name(&cxld->dev), port->commit_end); + } + + return 0; +} + +void cxl_port_commit_reap(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + + lockdep_assert_held_write(&cxl_region_rwsem); + + /* + * Once the highest committed decoder is disabled, free any other + * decoders that were pinned allocated by out-of-order release. + */ + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev), + port->commit_end); + device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL, + commit_reap); +} +EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL); + +static void cxl_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); @@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) u32 ctrl; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } down_read(&cxl_dpa_rwsem); ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); @@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); up_read(&cxl_dpa_rwsem); - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; /* Userspace is now responsible for reconfiguring this decoder */ @@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) cxled = to_cxl_endpoint_decoder(&cxld->dev); cxled->state = CXL_DECODER_STATE_MANUAL; } - - return 0; } static int cxl_setup_hdm_decoder_from_dvsec( diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e701e4b04032..3478d2058303 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); return 0; } else { - dev_err(&cxlr->dev, - "Failed to synchronize CPU cache state\n"); + dev_WARN(&cxlr->dev, + "Failed to synchronize CPU cache state\n"); return -ENXIO; } } @@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) return 0; } -static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) +static void cxl_region_decode_reset(struct cxl_region *cxlr, int count) { struct cxl_region_params *p = &cxlr->params; - int i, rc = 0; + int i; /* - * Before region teardown attempt to flush, and if the flush - * fails cancel the region teardown for data consistency - * concerns + * Before region teardown attempt to flush, evict any data cached for + * this region, or scream loudly about missing arch / platform support + * for CXL teardown. */ - rc = cxl_region_invalidate_memregion(cxlr); - if (rc) - return rc; + cxl_region_invalidate_memregion(cxlr); for (i = count - 1; i >= 0; i--) { struct cxl_endpoint_decoder *cxled = p->targets[i]; @@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) cxl_rr = cxl_rr_load(iter, cxlr); cxld = cxl_rr->decoder; if (cxld->reset) - rc = cxld->reset(cxld); - if (rc) - return rc; + cxld->reset(cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } endpoint_reset: - rc = cxled->cxld.reset(&cxled->cxld); - if (rc) - return rc; + cxled->cxld.reset(&cxled->cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } /* all decoders associated with this region have been torn down */ clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); - - return 0; } static int commit_decoder(struct cxl_decoder *cxld) @@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, * still pending. */ if (p->state == CXL_CONFIG_RESET_PENDING) { - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - /* - * Revert to committed since there may still be active - * decoders associated with this region, or move forward - * to active to mark the reset successful - */ - if (rc) - p->state = CXL_CONFIG_COMMIT; - else - p->state = CXL_CONFIG_ACTIVE; + cxl_region_decode_reset(cxlr, p->interleave_ways); + p->state = CXL_CONFIG_ACTIVE; } } @@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) get_device(&cxlr->dev); if (p->state > CXL_CONFIG_ACTIVE) { - /* - * TODO: tear down all impacted regions if a device is - * removed out of order - */ - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - if (rc) - goto out; + cxl_region_decode_reset(cxlr, p->interleave_ways); p->state = CXL_CONFIG_ACTIVE; } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0d8b810a51f0..5406e3ab3d4a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -359,7 +359,7 @@ struct cxl_decoder { struct cxl_region *region; unsigned long flags; int (*commit)(struct cxl_decoder *cxld); - int (*reset)(struct cxl_decoder *cxld); + void (*reset)(struct cxl_decoder *cxld); }; /* @@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port) int cxl_num_decoders_committed(struct cxl_port *port); bool is_cxl_port(const struct device *dev); struct cxl_port *to_cxl_port(const struct device *dev); +void cxl_port_commit_reap(struct cxl_decoder *cxld); struct pci_bus; int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, struct pci_bus *bus); diff --git a/include/linux/device.h b/include/linux/device.h index b4bde8d22697..667cb6db9019 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); int device_for_each_child_reverse(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)); struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent, diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 90d5afd52dd0..c5bbd89b3192 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int mock_decoder_reset(struct cxl_decoder *cxld) +static void mock_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); int id = cxld->id; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } - - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; - - return 0; } static void default_mock_decoder(struct cxl_decoder *cxld)

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] soc: qcom: pmic_glink: Handle GLINK intent allocation" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x f8c879192465d9f328cb0df07208ef077c560bb1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110503-registry-reheat-cd11@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f8c879192465d9f328cb0df07208ef077c560bb1 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com> Date: Wed, 23 Oct 2024 17:24:33 +0000 Subject: [PATCH] soc: qcom: pmic_glink: Handle GLINK intent allocation rejections Some versions of the pmic_glink firmware does not allow dynamic GLINK intent allocations, attempting to send a message before the firmware has allocated its receive buffers and announced these intent allocations will fail. When this happens something like this showns up in the log: pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125 qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications GLINK has been updated to distinguish between the cases where the remote is going down (-ECANCELED) and the intent allocation being rejected (-EAGAIN). Retry the send until intent buffers becomes available, or an actual error occur. To avoid infinitely waiting for the firmware in the event that this misbehaves and no intents arrive, an arbitrary 5 second timeout is used. This patch was developed with input from Chris Lew. Reported-by: Johan Hovold <johan(a)kernel.org> Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t Cc: stable(a)vger.kernel.org # rpmsg: glink: Handle rejected intent request better Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Tested-by: Johan Hovold <johan+linaro(a)kernel.org> Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org> Signed-off-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com> Reviewed-by: Chris Lew <quic_clew(a)quicinc.com> Link: https://lore.kernel.org/r/20241023-pmic-glink-ecancelled-v2-2-ebc268129407@… Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9606222993fd..baa4ac6704a9 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -4,6 +4,7 @@ * Copyright (c) 2022, Linaro Ltd */ #include <linux/auxiliary_bus.h> +#include <linux/delay.h> #include <linux/module.h> #include <linux/of.h> #include <linux/platform_device.h> @@ -13,6 +14,8 @@ #include <linux/soc/qcom/pmic_glink.h> #include <linux/spinlock.h> +#define PMIC_GLINK_SEND_TIMEOUT (5 * HZ) + enum { PMIC_GLINK_CLIENT_BATT = 0, PMIC_GLINK_CLIENT_ALTMODE, @@ -112,13 +115,29 @@ EXPORT_SYMBOL_GPL(pmic_glink_client_register); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg; + bool timeout_reached = false; + unsigned long start; int ret; mutex_lock(&pg->state_lock); - if (!pg->ept) + if (!pg->ept) { ret = -ECONNRESET; - else - ret = rpmsg_send(pg->ept, data, len); + } else { + start = jiffies; + for (;;) { + ret = rpmsg_send(pg->ept, data, len); + if (ret != -EAGAIN) + break; + + if (timeout_reached) { + ret = -ETIMEDOUT; + break; + } + + usleep_range(1000, 5000); + timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT); + } + } mutex_unlock(&pg->state_lock); return ret;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] firmware: qcom: scm: suppress download mode error" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x d67907154808745b0fae5874edc7b0f78d33991c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110527-maternal-devious-a112@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d67907154808745b0fae5874edc7b0f78d33991c Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Wed, 2 Oct 2024 12:01:21 +0200 Subject: [PATCH] firmware: qcom: scm: suppress download mode error Stop spamming the logs with errors about missing mechanism for setting the so called download (or dump) mode for users that have not requested that feature to be enabled in the first place. This avoids the follow error being logged on boot as well as on shutdown when the feature it not available and download mode has not been enabled on the kernel command line: qcom_scm firmware:scm: No available mechanism for setting download mode Fixes: 79cb2cb8d89b ("firmware: qcom: scm: Disable SDI and write no dump to dump mode") Fixes: 781d32d1c970 ("firmware: qcom_scm: Clear download bit during reboot") Cc: Mukesh Ojha <quic_mojha(a)quicinc.com> Cc: stable(a)vger.kernel.org # 6.4 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com> Link: https://lore.kernel.org/r/20241002100122.18809-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c index 10986cb11ec0..e2ac595902ed 100644 --- a/drivers/firmware/qcom/qcom_scm.c +++ b/drivers/firmware/qcom/qcom_scm.c @@ -545,7 +545,7 @@ static void qcom_scm_set_download_mode(u32 dload_mode) } else if (__qcom_scm_is_call_available(__scm->dev, QCOM_SCM_SVC_BOOT, QCOM_SCM_BOOT_SET_DLOAD_MODE)) { ret = __qcom_scm_set_dload_mode(__scm->dev, !!dload_mode); - } else { + } else if (dload_mode) { dev_err(__scm->dev, "No available mechanism for setting download mode\n"); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] firmware: qcom: scm: suppress download mode error" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x d67907154808745b0fae5874edc7b0f78d33991c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110526-driven-small-2a5c@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d67907154808745b0fae5874edc7b0f78d33991c Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Wed, 2 Oct 2024 12:01:21 +0200 Subject: [PATCH] firmware: qcom: scm: suppress download mode error Stop spamming the logs with errors about missing mechanism for setting the so called download (or dump) mode for users that have not requested that feature to be enabled in the first place. This avoids the follow error being logged on boot as well as on shutdown when the feature it not available and download mode has not been enabled on the kernel command line: qcom_scm firmware:scm: No available mechanism for setting download mode Fixes: 79cb2cb8d89b ("firmware: qcom: scm: Disable SDI and write no dump to dump mode") Fixes: 781d32d1c970 ("firmware: qcom_scm: Clear download bit during reboot") Cc: Mukesh Ojha <quic_mojha(a)quicinc.com> Cc: stable(a)vger.kernel.org # 6.4 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com> Link: https://lore.kernel.org/r/20241002100122.18809-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c index 10986cb11ec0..e2ac595902ed 100644 --- a/drivers/firmware/qcom/qcom_scm.c +++ b/drivers/firmware/qcom/qcom_scm.c @@ -545,7 +545,7 @@ static void qcom_scm_set_download_mode(u32 dload_mode) } else if (__qcom_scm_is_call_available(__scm->dev, QCOM_SCM_SVC_BOOT, QCOM_SCM_BOOT_SET_DLOAD_MODE)) { ret = __qcom_scm_set_dload_mode(__scm->dev, !!dload_mode); - } else { + } else if (dload_mode) { dev_err(__scm->dev, "No available mechanism for setting download mode\n"); }

1 year, 1 month

1
0
0 0

[PATCH] ASoC: codecs: Fix atomicity violation in snd_soc_component_get_drvdata()

by Qiu-ji Chen

An atomicity violation occurs when the validity of the variables da7219->clk_src and da7219->mclk_rate is being assessed. Since the entire assessment is not protected by a lock, the da7219 variable might still be in flux during the assessment, rendering this check invalid. To fix this issue, we recommend adding a lock before the block if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) so that the legitimacy check for da7219->clk_src and da7219->mclk_rate is protected by the lock, ensuring the validity of the check. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. Fixes: 6d817c0e9fd7 ("ASoC: codecs: Add da7219 codec driver") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- sound/soc/codecs/da7219.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sound/soc/codecs/da7219.c b/sound/soc/codecs/da7219.c index 311ea7918b31..e2da3e317b5a 100644 --- a/sound/soc/codecs/da7219.c +++ b/sound/soc/codecs/da7219.c @@ -1167,17 +1167,20 @@ static int da7219_set_dai_sysclk(struct snd_soc_dai *codec_dai, struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); int ret = 0; - if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) + mutex_lock(&da7219->pll_lock); + + if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) { + mutex_unlock(&da7219->pll_lock); return 0; + } if ((freq < 2000000) || (freq > 54000000)) { + mutex_unlock(&da7219->pll_lock); dev_err(codec_dai->dev, "Unsupported MCLK value %d\n", freq); return -EINVAL; } - mutex_lock(&da7219->pll_lock); - switch (clk_id) { case DA7219_CLKSRC_MCLK_SQR: snd_soc_component_update_bits(component, DA7219_PLL_CTRL, -- 2.34.1

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: allow set/clear page_type again" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 9d08ec41a0645283d79a2e642205d488feaceacf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110524-patience-rice-79e2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a76710487..cc839e4365c1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: allow set/clear page_type again" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 9d08ec41a0645283d79a2e642205d488feaceacf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110524-runner-gravity-4288@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a76710487..cc839e4365c1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110510-props-clutter-f4ae@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab..5b1c984daf45 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d49..73d5998677d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b360..ddaaff67642e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110509-doable-blemish-19d2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab..5b1c984daf45 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d49..73d5998677d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b360..ddaaff67642e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110508-poppy-cameo-fa1b@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab..5b1c984daf45 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d49..73d5998677d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b360..ddaaff67642e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-flagship-precook-4a47@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..9342e5692dab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c507..4f1d33e4b360 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110502-removal-babied-f697@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..9342e5692dab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c507..4f1d33e4b360 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110501-scrubber-eating-d64d@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..9342e5692dab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c507..4f1d33e4b360 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110506-octane-phosphate-f084@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <baohua(a)kernel.org> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbb..bdf77a3ec47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110505-january-napped-4f1b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <baohua(a)kernel.org> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbb..bdf77a3ec47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-water-overarch-bbef@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <baohua(a)kernel.org> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbb..bdf77a3ec47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret;

1 year, 1 month

1
0
0 0

[PATCH 1/2 net V2] net: vertexcom: mse102x: Fix possible double free of TX skb

by Stefan Wahren

The scope of the TX skb is wider than just mse102x_tx_frame_spi(), so in case the TX skb room needs to be expanded, we should free the the temporary skb instead of the original skb. Otherwise the original TX skb pointer would be freed again in mse102x_tx_work(), which leads to crashes: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP CPU: 0 PID: 712 Comm: kworker/0:1 Tainted: G D 6.6.23 Hardware name: chargebyte Charge SOM DC-ONE (DT) Workqueue: events mse102x_tx_work [mse102x] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_release_data+0xb8/0x1d8 lr : skb_release_data+0x1ac/0x1d8 sp : ffff8000819a3cc0 x29: ffff8000819a3cc0 x28: ffff0000046daa60 x27: ffff0000057f2dc0 x26: ffff000005386c00 x25: 0000000000000002 x24: 00000000ffffffff x23: 0000000000000000 x22: 0000000000000001 x21: ffff0000057f2e50 x20: 0000000000000006 x19: 0000000000000000 x18: ffff00003fdacfcc x17: e69ad452d0c49def x16: 84a005feff870102 x15: 0000000000000000 x14: 000000000000024a x13: 0000000000000002 x12: 0000000000000000 x11: 0000000000000400 x10: 0000000000000930 x9 : ffff00003fd913e8 x8 : fffffc00001bc008 x7 : 0000000000000000 x6 : 0000000000000008 x5 : ffff00003fd91340 x4 : 0000000000000000 x3 : 0000000000000009 x2 : 00000000fffffffe x1 : 0000000000000000 x0 : 0000000000000000 Call trace: skb_release_data+0xb8/0x1d8 kfree_skb_reason+0x48/0xb0 mse102x_tx_work+0x164/0x35c [mse102x] process_one_work+0x138/0x260 worker_thread+0x32c/0x438 kthread+0x118/0x11c ret_from_fork+0x10/0x20 Code: aa1303e0 97fffab6 72001c1f 54000141 (f9400660) Cc: stable(a)vger.kernel.org Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst(a)gmx.net> --- drivers/net/ethernet/vertexcom/mse102x.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/vertexcom/mse102x.c b/drivers/net/ethernet/vertexcom/mse102x.c index a04d4073def9..2c37957478fb 100644 --- a/drivers/net/ethernet/vertexcom/mse102x.c +++ b/drivers/net/ethernet/vertexcom/mse102x.c @@ -222,7 +222,7 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, struct mse102x_net_spi *mses = to_mse102x_spi(mse); struct spi_transfer *xfer = &mses->spi_xfer; struct spi_message *msg = &mses->spi_msg; - struct sk_buff *tskb; + struct sk_buff *tskb = NULL; int ret; netif_dbg(mse, tx_queued, mse->ndev, "%s: skb %p, %d@%p\n", @@ -235,7 +235,6 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, if (!tskb) return -ENOMEM; - dev_kfree_skb(txp); txp = tskb; } @@ -257,6 +256,8 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, mse->stats.xfer_err++; } + dev_kfree_skb(tskb); + return ret; } -- 2.34.1

1 year, 1 month

1
0
0 0

WTF: patch "[PATCH] riscv: Remove duplicated GET_RM" was seriously submitted to be applied to the 6.11-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 6.11-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 164f66de6bb6ef454893f193c898dc8f1da6d18b Mon Sep 17 00:00:00 2001 From: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> Date: Tue, 8 Oct 2024 17:41:39 +0800 Subject: [PATCH] riscv: Remove duplicated GET_RM The macro GET_RM defined twice in this file, one can be removed. Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Signed-off-by: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20241008094141.549248-3-zhangchunyan@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c index d4fd8af7aaf5..1b9867136b61 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -136,8 +136,6 @@ #define REG_PTR(insn, pos, regs) \ (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos)) -#define GET_RM(insn) (((insn) >> 12) & 7) - #define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs)) #define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs)) #define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110521-antler-uneatable-d873@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a7..ddcbd80a49fb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue;

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] RISC-V: disallow gcc + rust builds" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 33549fcf37ec461f398f0a41e1c9948be2e5aca4 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110519-underage-paying-6d38@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 33549fcf37ec461f398f0a41e1c9948be2e5aca4 Mon Sep 17 00:00:00 2001 From: Conor Dooley <conor.dooley(a)microchip.com> Date: Tue, 1 Oct 2024 12:28:13 +0100 Subject: [PATCH] RISC-V: disallow gcc + rust builds During the discussion before supporting rust on riscv, it was decided not to support gcc yet, due to differences in extension handling compared to llvm (only the version of libclang matching the c compiler is supported). Recently Jason Montleon reported [1] that building with gcc caused build issues, due to unsupported arguments being passed to libclang. After some discussion between myself and Miguel, it is better to disable gcc + rust builds to match the original intent, and subsequently support it when an appropriate set of extensions can be deduced from the version of libclang. Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1] Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2] Fixes: 70a57b247251a ("RISC-V: enable building 64-bit kernels with rust support") Cc: stable(a)vger.kernel.org Reported-by: Jason Montleon <jmontleo(a)redhat.com> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com> Acked-by: Miguel Ojeda <ojeda(a)kernel.org> Reviewed-by: Nathan Chancellor <nathan(a)kernel.org> Link: https://lore.kernel.org/r/20241001-playlist-deceiving-16ece2f440f5@spud Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> diff --git a/Documentation/rust/arch-support.rst b/Documentation/rust/arch-support.rst index 750ff371570a..54be7ddf3e57 100644 --- a/Documentation/rust/arch-support.rst +++ b/Documentation/rust/arch-support.rst @@ -17,7 +17,7 @@ Architecture Level of support Constraints ============= ================ ============================================== ``arm64`` Maintained Little Endian only. ``loongarch`` Maintained \- -``riscv`` Maintained ``riscv64`` only. +``riscv`` Maintained ``riscv64`` and LLVM/Clang only. ``um`` Maintained \- ``x86`` Maintained ``x86_64`` only. ============= ================ ============================================== diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 62545946ecf4..f4c570538d55 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -177,7 +177,7 @@ config RISCV select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RETHOOK if !XIP_KERNEL select HAVE_RSEQ - select HAVE_RUST if RUSTC_SUPPORTS_RISCV + select HAVE_RUST if RUSTC_SUPPORTS_RISCV && CC_IS_CLANG select HAVE_SAMPLE_FTRACE_DIRECT select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_STACKPROTECTOR

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110533-dexterous-semantic-aa9e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110532-veal-argue-331b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110530-diligence-author-75bb@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110531-wound-perkiness-ed39@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110530-elastic-reproduce-14c4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] iio: adc: ad7380: fix supplies for ad7380-4" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 05f9c67179c9a8d66dee175fb4b17f380908a26f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-eloquent-stalling-91df@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 05f9c67179c9a8d66dee175fb4b17f380908a26f Mon Sep 17 00:00:00 2001 From: Julien Stephan <jstephan(a)baylibre.com> Date: Tue, 22 Oct 2024 15:22:39 +0200 Subject: [PATCH] iio: adc: ad7380: fix supplies for ad7380-4 ad7380-4 is the only device in the family that does not have an internal reference. It uses "refin" as a required external reference. All other devices in the family use "refio"" as an optional external reference. Fixes: 737413da8704 ("iio: adc: ad7380: add support for ad738x-4 4 channels variants") Reviewed-by: Nuno Sa <nuno.sa(a)analog.com> Reviewed-by: David Lechner <dlechner(a)baylibre.com> Signed-off-by: Julien Stephan <jstephan(a)baylibre.com> Link: https://patch.msgid.link/20241022-ad7380-fix-supplies-v3-4-f0cefe1b7fa6@bay… Cc: <Stable(a)vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c index b107d8e97ab3..fb728570debe 100644 --- a/drivers/iio/adc/ad7380.c +++ b/drivers/iio/adc/ad7380.c @@ -89,6 +89,7 @@ struct ad7380_chip_info { bool has_mux; const char * const *supplies; unsigned int num_supplies; + bool external_ref_only; const char * const *vcm_supplies; unsigned int num_vcm_supplies; const unsigned long *available_scan_masks; @@ -431,6 +432,7 @@ static const struct ad7380_chip_info ad7380_4_chip_info = { .num_simult_channels = 4, .supplies = ad7380_supplies, .num_supplies = ARRAY_SIZE(ad7380_supplies), + .external_ref_only = true, .available_scan_masks = ad7380_4_channel_scan_masks, .timing_specs = &ad7380_4_timing, }; @@ -1047,17 +1049,31 @@ static int ad7380_probe(struct spi_device *spi) "Failed to enable power supplies\n"); fsleep(T_POWERUP_US); - /* - * If there is no REFIO supply, then it means that we are using - * the internal 2.5V reference, otherwise REFIO is reference voltage. - */ - ret = devm_regulator_get_enable_read_voltage(&spi->dev, "refio"); - if (ret < 0 && ret != -ENODEV) - return dev_err_probe(&spi->dev, ret, - "Failed to get refio regulator\n"); + if (st->chip_info->external_ref_only) { + ret = devm_regulator_get_enable_read_voltage(&spi->dev, + "refin"); + if (ret < 0) + return dev_err_probe(&spi->dev, ret, + "Failed to get refin regulator\n"); - external_ref_en = ret != -ENODEV; - st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV; + st->vref_mv = ret / 1000; + + /* these chips don't have a register bit for this */ + external_ref_en = false; + } else { + /* + * If there is no REFIO supply, then it means that we are using + * the internal reference, otherwise REFIO is reference voltage. + */ + ret = devm_regulator_get_enable_read_voltage(&spi->dev, + "refio"); + if (ret < 0 && ret != -ENODEV) + return dev_err_probe(&spi->dev, ret, + "Failed to get refio regulator\n"); + + external_ref_en = ret != -ENODEV; + st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV; + } if (st->chip_info->num_vcm_supplies > ARRAY_SIZE(st->vcm_mv)) return dev_err_probe(&spi->dev, -EINVAL,

1 year, 1 month

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror