The quilt patch titled
Subject: crash: fix x86_32 memory reserve dead loop retry bug
has been removed from the -mm tree. Its filename was
crash-fix-x86_32-memory-reserve-dead-loop-retry-bug.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Jinjie Ruan <ruanjinjie(a)huawei.com>
Subject: crash: fix x86_32 memory reserve dead loop retry bug
Date: Thu, 11 Jul 2024 15:31:18 +0800
On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=1G,high"
will cause system stall as below:
ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b]
ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7]
ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f]
ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb]
ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3]
ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b]
143MB HIGHMEM available.
879MB LOWMEM available.
mapped low ram: 0 - 36ffe000
low ram: 0 - 36ffe000
(stall here)
The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX
on x86_32, the first high crash kernel memory reservation will fail, then
go into the "retry" loop and never came out as below.
-> reserve_crashkernel_generic() and high is true
-> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail
-> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly
(because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
Fix it by changing the out check condition.
After this patch, it prints:
cannot allocate crashkernel (size:0x40000000)
Link: https://lkml.kernel.org/r/20240711073118.1289866-1-ruanjinjie@huawei.com
Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code")
Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/crash_reserve.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/crash_reserve.c~crash-fix-x86_32-memory-reserve-dead-loop-retry-bug
+++ a/kernel/crash_reserve.c
@@ -420,7 +420,7 @@ retry:
* For crashkernel=size[KMG],high, if the first attempt was
* for high memory, fall back to low memory.
*/
- if (high && search_end == CRASH_ADDR_HIGH_MAX) {
+ if (high && search_base == CRASH_ADDR_LOW_MAX) {
search_end = CRASH_ADDR_LOW_MAX;
search_base = 0;
goto retry;
_
Patches currently in -mm which might be from ruanjinjie(a)huawei.com are
From: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com>
In the DML math_ceil2 function, there is one ASSERT if the significance
is equal to zero. However, significance might be equal to zero
sometimes, and this is not an issue for a ceil function, but the current
ASSERT will trigger warnings in those cases. This commit removes the
ASSERT if the significance is equal to zero to avoid unnecessary noise.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Chaitanya Dhere <chaitanya.dhere(a)amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Signed-off-by: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com>
---
.../dml2/dml21/src/dml2_standalone_libraries/lib_float_math.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_standalone_libraries/lib_float_math.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_standalone_libraries/lib_float_math.c
index 4822dbcc86bb..e17b5ceba447 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_standalone_libraries/lib_float_math.c
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_standalone_libraries/lib_float_math.c
@@ -63,8 +63,6 @@ double math_ceil(const double arg)
double math_ceil2(const double arg, const double significance)
{
- ASSERT(significance != 0);
-
return ((int)(arg / significance + 0.99999)) * significance;
}
--
2.39.2
This is the start of the stable review cycle for the 6.6.41 release.
There are 121 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 18 Jul 2024 15:27:21 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.41-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.6.41-rc1
Dan Carpenter <dan.carpenter(a)linaro.org>
i2c: rcar: fix error code in probe()
Nathan Chancellor <nathan(a)kernel.org>
kbuild: Make ld-version.sh more robust against version string changes
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bhi: Avoid warning in #DB handler due to BHI mitigation
Brian Gerst <brgerst(a)gmail.com>
x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
Nikolay Borisov <nik.borisov(a)suse.com>
x86/entry: Rename ignore_sysret()
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: clear NO_RXDMA flag after resetting
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: testunit: avoid re-issued work after read message
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: ensure Gen3+ reset does not disturb local targets
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: introduce Gen4 devices
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: reset controller is mandatory for Gen3+
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: mark HostNotify target address as used
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: bring hardware to known state when probing
Qu Wenruo <wqu(a)suse.com>
btrfs: tree-checker: add type and sequence check for inline backrefs
John Stultz <jstultz(a)google.com>
sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath
Baokun Li <libaokun1(a)huawei.com>
ext4: avoid ptr null pointer dereference
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix kernel bug on rename operation of broken directory
John Hubbard <jhubbard(a)nvidia.com>
selftests/net: fix gro.c compilation failure due to non-existent opt_ipproto_off
SeongJae Park <sj(a)kernel.org>
mm/damon/core: merge regions aggressively when max_nr_regions is unmet
Gavin Shan <gshan(a)redhat.com>
mm/shmem: disable PMD-sized page cache if needed
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Restrict untrusted app to attach to privileged PD
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Fix ownership reassignment of remote heap
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Fix memory leak in audio daemon attach operation
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Copy the complete capability structure to user
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Avoid updating PD type for capability request
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Fix DSP capabilities request
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: send: annotate intentional data race in checking empty queue
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: queueing: annotate intentional data race in cpu round robin
Helge Deller <deller(a)kernel.org>
wireguard: allowedips: avoid unaligned 64-bit memory accesses
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: selftests: use acpi=off instead of -no-acpi for recent QEMU
Mario Limonciello <mario.limonciello(a)amd.com>
cpufreq: Allow drivers to advertise boost enabled
Mario Limonciello <mario.limonciello(a)amd.com>
cpufreq: ACPI: Mark boost policy as enabled when setting boost
Kuan-Wei Chiu <visitorckw(a)gmail.com>
ACPI: processor_idle: Fix invalid comparison with insertion sort for latency
Ilya Dryomov <idryomov(a)gmail.com>
libceph: fix race between delayed_work() and ceph_monc_stop()
Taniya Das <quic_tdas(a)quicinc.com>
pmdomain: qcom: rpmhpd: Skip retention level for Power Domains
Audra Mitchell <audra(a)redhat.com>
Fix userfaultfd_api to return EINVAL as expected
Edson Juliano Drosdeck <edson.drosdeck(a)gmail.com>
ALSA: hda/realtek: Limit mic boost on VAIO PRO PX
Nazar Bilinskyi <nbilinskyi(a)gmail.com>
ALSA: hda/realtek: Enable Mute LED on HP 250 G7
Michał Kopeć <michal.kopec(a)3mdeb.com>
ALSA: hda/realtek: add quirk for Clevo V5[46]0TU
Jacky Huang <ychuang3(a)nuvoton.com>
tty: serial: ma35d1: Add a NULL check for of_node
Armin Wolf <W_Armin(a)gmx.de>
platform/x86: toshiba_acpi: Fix array out-of-bounds access
Thomas Weißschuh <linux(a)weissschuh.net>
nvmem: core: only change name to fram for current attribute
Joy Chakraborty <joychakr(a)google.com>
nvmem: meson-efuse: Fix return value of nvmem callbacks
Joy Chakraborty <joychakr(a)google.com>
nvmem: rmem: Fix return value of rmem_read()
Johan Hovold <johan+linaro(a)kernel.org>
arm64: dts: qcom: sc8280xp-x13s: fix touchscreen power on
Cong Zhang <quic_congzhan(a)quicinc.com>
arm64: dts: qcom: sa8775p: Correct IRQ number of EL2 non-secure physical timer
João Paulo Gonçalves <joao.goncalves(a)toradex.com>
iio: trigger: Fix condition for own trigger
Hobin Woo <hobin.woo(a)samsung.com>
ksmbd: discard write access to the directory open
Gavin Shan <gshan(a)redhat.com>
mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray
Gavin Shan <gshan(a)redhat.com>
mm/filemap: skip to create PMD-sized page cache if needed
Uladzislau Rezki (Sony) <urezki(a)gmail.com>
mm: vmalloc: check if a hash-index is in cpu_possible_mask
Heiko Carstens <hca(a)linux.ibm.com>
s390/mm: Add NULL pointer check to crst_table_free() base_crst_free()
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: always resume roothubs if xHC was reset during resume
He Zhe <zhe.he(a)windriver.com>
hpet: Support 32-bit userspace
Joy Chakraborty <joychakr(a)google.com>
misc: microchip: pci1xxxx: Fix return value of nvmem callbacks
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor
Lee Jones <lee(a)kernel.org>
usb: gadget: configfs: Prevent OOB read/write in usb_string_copy()
Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
usb: dwc3: pci: add support for the Intel Panther Lake
WangYuli <wangyuli(a)uniontech.com>
USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k
Dmitry Smirnov <d.smirnov(a)inbox.lv>
USB: serial: mos7840: fix crash on resume
Vanillan Wang <vanillanwang(a)163.com>
USB: serial: option: add Rolling RW350-GL variants
Mank Wang <mank.wang(a)netprisma.us>
USB: serial: option: add Netprisma LCUK54 series modules
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add support for Foxconn T99W651
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Fibocom FM350-GL
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit FN912 rmnet compositions
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit generic core-dump composition
Ronald Wahl <ronald.wahl(a)raritan.com>
net: ks8851: Fix potential TX stall after interface reopen
Ronald Wahl <ronald.wahl(a)raritan.com>
net: ks8851: Fix deadlock with the SPI chip variant
Eric Dumazet <edumazet(a)google.com>
tcp: avoid too many retransmit packets
Eric Dumazet <edumazet(a)google.com>
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Josh Don <joshdon(a)google.com>
Revert "sched/fair: Make sure to try to detach at least one movable task"
Steve French <stfrench(a)microsoft.com>
cifs: fix setting SecurityFlags to true
Satheesh Paul <psatheesh(a)marvell.com>
octeontx2-af: fix issue with IPv4 match for RSS
Kiran Kumar K <kirankumark(a)marvell.com>
octeontx2-af: fix issue with IPv6 ext match for RSS
Michal Mazur <mmazur2(a)marvell.com>
octeontx2-af: fix detection of IP layer
Srujana Challa <schalla(a)marvell.com>
octeontx2-af: fix a issue with cpt_lf_alloc mailbox
Nithin Dabilpuram <ndabilpuram(a)marvell.com>
octeontx2-af: replace cpt slot with lf id on reg write
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: fix: remove needless retries of NVM update
Chen Ni <nichen(a)iscas.ac.cn>
ARM: davinci: Convert comma to semicolon
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Use strnlen() on name fields in V1 wmfw files
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ASoC: SOF: Intel: hda: fix null deref on system suspend entry
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Prevent buffer overrun when processing V2 alg headers
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Validate payload length before processing block
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Return error if block header overflows file
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Fix overflow checking of wmfw header
Bjorn Andersson <quic_bjorande(a)quicinc.com>
arm64: dts: qcom: sc8180x: Fix LLCC reg property again
Sven Schnelle <svens(a)linux.ibm.com>
s390: Mark psw in __load_psw_mask() as __unitialized
Daniel Borkmann <daniel(a)iogearbox.net>
net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
Chengen Du <chengen.du(a)canonical.com>
net/sched: Fix UAF when resolving a clash
Kuniyuki Iwashima <kuniyu(a)amazon.com>
udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
Oleksij Rempel <o.rempel(a)pengutronix.de>
ethtool: netlink: do not return SQI value if link is down
Dmitry Antipov <dmantipov(a)yandex.ru>
ppp: reject claimed-as-LCP but actually malformed packets
Jian Hui Lee <jianhui.lee(a)canonical.com>
net: ethernet: mtk-star-emac: set mac_managed_pm when probing
Kumar Kartikeya Dwivedi <memxor(a)gmail.com>
bpf: Fail bpf_timer_cancel when callback is being cancelled
Benjamin Tissoires <bentiss(a)kernel.org>
bpf: replace bpf_timer_init with a generic helper
Benjamin Tissoires <bentiss(a)kernel.org>
bpf: make timer data struct more generic
Mohammad Shehar Yaar Tausif <sheharyaar48(a)gmail.com>
bpf: fix order of args in call to bpf_map_kvcalloc
Aleksander Jan Bajkowski <olek2(a)wp.pl>
net: ethernet: lantiq_etop: fix double free in detach
Michal Kubiak <michal.kubiak(a)intel.com>
i40e: Fix XDP program unloading while removing the driver
Hugh Dickins <hughd(a)google.com>
net: fix rc7's __skb_datagram_iter()
Aleksandr Mishin <amishin(a)t-argos.ru>
octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability()
Geliang Tang <tanggeliang(a)kylinos.cn>
skmsg: Skip zero length skb in sk_msg_recvmsg
Oleksij Rempel <o.rempel(a)pengutronix.de>
net: phy: microchip: lan87xx: reinit PHY after cable test
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix too early release of tcx_entry
Neal Cardwell <ncardwell(a)google.com>
tcp: fix incorrect undo caused by DSACK of TLP retransmit
Dan Carpenter <dan.carpenter(a)linaro.org>
net: bcmasp: Fix error code in probe()
Brian Foster <bfoster(a)redhat.com>
vfs: don't mod negative dentry count when on shrinker list
linke li <lilinke99(a)qq.com>
fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
Jeff Layton <jlayton(a)kernel.org>
filelock: fix potential use-after-free in posix_lock_inode
Christian Eggers <ceggers(a)arri.de>
dsa: lan9303: Fix mapping between DSA port number and PHY address
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
net: dsa: introduce dsa_phylink_to_port()
Jingbo Xu <jefflexu(a)linux.alibaba.com>
cachefiles: add missing lock protection when polling
Baokun Li <libaokun1(a)huawei.com>
cachefiles: cyclic allocation of msg_id to avoid reuse
Hou Tao <houtao1(a)huawei.com>
cachefiles: wait for ondemand_object_worker to finish when dropping object
Baokun Li <libaokun1(a)huawei.com>
cachefiles: cancel all requests for the object that is being dropped
Baokun Li <libaokun1(a)huawei.com>
cachefiles: stop sending new request when dropping object
Jia Zhu <zhujia.zj(a)bytedance.com>
cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode
Baokun Li <libaokun1(a)huawei.com>
cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop
Yi Liu <yi.l.liu(a)intel.com>
vfio/pci: Init the count variable in collecting hot-reset devices
Peter Wang <peter.wang(a)mediatek.com>
scsi: ufs: core: Fix ufshcd_abort_one racing issue
Peter Wang <peter.wang(a)mediatek.com>
scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
Waiman Long <longman(a)redhat.com>
mm: prevent derefencing NULL ptr in pfn_section_valid()
-------------
Diffstat:
Documentation/admin-guide/cifs/usage.rst | 34 +--
Makefile | 4 +-
arch/arm/mach-davinci/pm.c | 2 +-
arch/arm64/boot/dts/qcom/sa8775p.dtsi | 2 +-
arch/arm64/boot/dts/qcom/sc8180x.dtsi | 11 +-
.../dts/qcom/sc8280xp-lenovo-thinkpad-x13s.dts | 15 +-
arch/s390/include/asm/processor.h | 2 +-
arch/s390/mm/pgalloc.c | 4 +
arch/x86/entry/entry_64.S | 23 +-
arch/x86/entry/entry_64_compat.S | 14 +-
arch/x86/include/asm/processor.h | 2 +-
arch/x86/kernel/cpu/common.c | 2 +-
drivers/acpi/processor_idle.c | 37 ++--
drivers/char/hpet.c | 34 ++-
drivers/cpufreq/acpi-cpufreq.c | 4 +-
drivers/cpufreq/cpufreq.c | 3 +-
drivers/firmware/cirrus/cs_dsp.c | 231 +++++++++++++++------
drivers/i2c/busses/i2c-rcar.c | 67 +++---
drivers/i2c/i2c-core-base.c | 1 +
drivers/i2c/i2c-slave-testunit.c | 7 +
drivers/iio/industrialio-trigger.c | 2 +-
drivers/misc/fastrpc.c | 41 +++-
drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_otpe2p.c | 4 -
drivers/net/dsa/lan9303-core.c | 23 +-
drivers/net/ethernet/broadcom/asp2/bcmasp.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 -
drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +-
drivers/net/ethernet/lantiq_etop.c | 4 +-
drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 2 +-
drivers/net/ethernet/marvell/octeontx2/af/npc.h | 8 +-
drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 2 +-
.../net/ethernet/marvell/octeontx2/af/rvu_cpt.c | 23 +-
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 12 +-
drivers/net/ethernet/mediatek/mtk_star_emac.c | 7 +
drivers/net/ethernet/micrel/ks8851_common.c | 10 +-
drivers/net/ethernet/micrel/ks8851_spi.c | 4 +-
drivers/net/phy/microchip_t1.c | 2 +-
drivers/net/ppp/ppp_generic.c | 15 ++
drivers/net/wireguard/allowedips.c | 4 +-
drivers/net/wireguard/queueing.h | 4 +-
drivers/net/wireguard/send.c | 2 +-
drivers/nvmem/core.c | 5 +-
drivers/nvmem/meson-efuse.c | 14 +-
drivers/nvmem/rmem.c | 5 +-
drivers/platform/x86/toshiba_acpi.c | 1 +
drivers/pmdomain/qcom/rpmhpd.c | 7 +
drivers/tty/serial/ma35d1_serial.c | 13 +-
drivers/ufs/core/ufs-mcq.c | 11 +-
drivers/ufs/core/ufshcd.c | 2 +
drivers/usb/core/config.c | 18 +-
drivers/usb/core/quirks.c | 3 +
drivers/usb/dwc3/dwc3-pci.c | 8 +
drivers/usb/gadget/configfs.c | 3 +
drivers/usb/host/xhci.c | 16 +-
drivers/usb/serial/mos7840.c | 45 ++++
drivers/usb/serial/option.c | 38 ++++
drivers/vfio/pci/vfio_pci_core.c | 2 +-
fs/btrfs/tree-checker.c | 39 ++++
fs/cachefiles/daemon.c | 14 +-
fs/cachefiles/internal.h | 15 ++
fs/cachefiles/ondemand.c | 52 ++++-
fs/cachefiles/xattr.c | 5 +-
fs/dcache.c | 12 +-
fs/ext4/sysfs.c | 2 +
fs/locks.c | 2 +-
fs/nilfs2/dir.c | 32 ++-
fs/smb/client/cifsglob.h | 4 +-
fs/smb/server/smb2pdu.c | 13 +-
fs/userfaultfd.c | 7 +-
include/linux/mmzone.h | 3 +-
include/linux/pagemap.h | 11 +-
include/net/dsa.h | 6 +
include/net/tcx.h | 13 +-
include/uapi/misc/fastrpc.h | 3 +
kernel/bpf/bpf_local_storage.c | 4 +-
kernel/bpf/helpers.c | 186 ++++++++++++-----
kernel/sched/core.c | 7 +-
kernel/sched/fair.c | 12 +-
kernel/sched/psi.c | 21 +-
kernel/sched/sched.h | 1 +
kernel/sched/stats.h | 11 +-
mm/damon/core.c | 21 +-
mm/filemap.c | 2 +-
mm/shmem.c | 15 +-
mm/vmalloc.c | 10 +-
net/ceph/mon_client.c | 14 +-
net/core/datagram.c | 3 +-
net/core/skmsg.c | 3 +-
net/dsa/port.c | 12 +-
net/ethtool/linkstate.c | 41 ++--
net/ipv4/tcp_input.c | 11 +-
net/ipv4/tcp_timer.c | 31 ++-
net/ipv4/udp.c | 4 +-
net/sched/act_ct.c | 8 +
net/sched/sch_ingress.c | 12 +-
net/sunrpc/xprtsock.c | 7 +
scripts/ld-version.sh | 8 +-
sound/pci/hda/patch_realtek.c | 4 +
sound/soc/sof/intel/hda-dai.c | 12 +-
tools/testing/selftests/net/gro.c | 3 -
tools/testing/selftests/wireguard/qemu/Makefile | 8 +-
101 files changed, 1135 insertions(+), 442 deletions(-)
This is the start of the stable review cycle for the 6.1.100 release.
There are 96 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 18 Jul 2024 15:27:21 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.100-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.100-rc1
Dan Carpenter <dan.carpenter(a)linaro.org>
i2c: rcar: fix error code in probe()
Nathan Chancellor <nathan(a)kernel.org>
kbuild: Make ld-version.sh more robust against version string changes
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bhi: Avoid warning in #DB handler due to BHI mitigation
Brian Gerst <brgerst(a)gmail.com>
x86/entry/64: Remove obsolete comment on tracing vs. SYSRET
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: clear NO_RXDMA flag after resetting
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: testunit: avoid re-issued work after read message
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: ensure Gen3+ reset does not disturb local targets
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: introduce Gen4 devices
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: reset controller is mandatory for Gen3+
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: mark HostNotify target address as used
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: bring hardware to known state when probing
John Stultz <jstultz(a)google.com>
sched: Move psi_account_irqtime() out of update_rq_clock_task() hotpath
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix kernel bug on rename operation of broken directory
Eduard Zingerman <eddyz87(a)gmail.com>
bpf: Allow reads from uninit stack
Paulo Alcantara <pc(a)manguebit.com>
cifs: avoid dup prefix path in dfs_get_automount_devname()
Paulo Alcantara <pc(a)cjr.nz>
cifs: use origin fullpath for automounts
Jim Mattson <jmattson(a)google.com>
x86/retpoline: Move a NOENDBR annotation to the SRSO dummy return thunk
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Copy the complete capability structure to user
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Avoid updating PD type for capability request
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Fix DSP capabilities request
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: send: annotate intentional data race in checking empty queue
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: queueing: annotate intentional data race in cpu round robin
Helge Deller <deller(a)kernel.org>
wireguard: allowedips: avoid unaligned 64-bit memory accesses
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: selftests: use acpi=off instead of -no-acpi for recent QEMU
Kuan-Wei Chiu <visitorckw(a)gmail.com>
ACPI: processor_idle: Fix invalid comparison with insertion sort for latency
Ilya Dryomov <idryomov(a)gmail.com>
libceph: fix race between delayed_work() and ceph_monc_stop()
Audra Mitchell <audra(a)redhat.com>
Fix userfaultfd_api to return EINVAL as expected
Edson Juliano Drosdeck <edson.drosdeck(a)gmail.com>
ALSA: hda/realtek: Limit mic boost on VAIO PRO PX
Nazar Bilinskyi <nbilinskyi(a)gmail.com>
ALSA: hda/realtek: Enable Mute LED on HP 250 G7
Michał Kopeć <michal.kopec(a)3mdeb.com>
ALSA: hda/realtek: add quirk for Clevo V5[46]0TU
Armin Wolf <W_Armin(a)gmx.de>
platform/x86: toshiba_acpi: Fix array out-of-bounds access
Thomas Weißschuh <linux(a)weissschuh.net>
nvmem: core: only change name to fram for current attribute
Joy Chakraborty <joychakr(a)google.com>
nvmem: meson-efuse: Fix return value of nvmem callbacks
Joy Chakraborty <joychakr(a)google.com>
nvmem: rmem: Fix return value of rmem_read()
Hobin Woo <hobin.woo(a)samsung.com>
ksmbd: discard write access to the directory open
Mathias Nyman <mathias.nyman(a)linux.intel.com>
xhci: always resume roothubs if xHC was reset during resume
He Zhe <zhe.he(a)windriver.com>
hpet: Support 32-bit userspace
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor
Lee Jones <lee(a)kernel.org>
usb: gadget: configfs: Prevent OOB read/write in usb_string_copy()
WangYuli <wangyuli(a)uniontech.com>
USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k
Dmitry Smirnov <d.smirnov(a)inbox.lv>
USB: serial: mos7840: fix crash on resume
Vanillan Wang <vanillanwang(a)163.com>
USB: serial: option: add Rolling RW350-GL variants
Mank Wang <mank.wang(a)netprisma.us>
USB: serial: option: add Netprisma LCUK54 series modules
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add support for Foxconn T99W651
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Fibocom FM350-GL
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit FN912 rmnet compositions
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit generic core-dump composition
Ronald Wahl <ronald.wahl(a)raritan.com>
net: ks8851: Fix potential TX stall after interface reopen
Ronald Wahl <ronald.wahl(a)raritan.com>
net: ks8851: Fix deadlock with the SPI chip variant
Eric Dumazet <edumazet(a)google.com>
tcp: avoid too many retransmit packets
Eric Dumazet <edumazet(a)google.com>
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Josh Don <joshdon(a)google.com>
Revert "sched/fair: Make sure to try to detach at least one movable task"
Steve French <stfrench(a)microsoft.com>
cifs: fix setting SecurityFlags to true
Satheesh Paul <psatheesh(a)marvell.com>
octeontx2-af: fix issue with IPv4 match for RSS
Kiran Kumar K <kirankumark(a)marvell.com>
octeontx2-af: fix issue with IPv6 ext match for RSS
Kiran Kumar K <kirankumark(a)marvell.com>
octeontx2-af: extend RSS supported offload types
Michal Mazur <mmazur2(a)marvell.com>
octeontx2-af: fix detection of IP layer
Srujana Challa <schalla(a)marvell.com>
octeontx2-af: fix a issue with cpt_lf_alloc mailbox
Srujana Challa <schalla(a)marvell.com>
octeontx2-af: update cpt lf alloc mailbox
Nithin Dabilpuram <ndabilpuram(a)marvell.com>
octeontx2-af: replace cpt slot with lf id on reg write
Chen Ni <nichen(a)iscas.ac.cn>
ARM: davinci: Convert comma to semicolon
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Use strnlen() on name fields in V1 wmfw files
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Prevent buffer overrun when processing V2 alg headers
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Validate payload length before processing block
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Return error if block header overflows file
Richard Fitzgerald <rf(a)opensource.cirrus.com>
firmware: cs_dsp: Fix overflow checking of wmfw header
Sven Schnelle <svens(a)linux.ibm.com>
s390: Mark psw in __load_psw_mask() as __unitialized
Daniel Borkmann <daniel(a)iogearbox.net>
net, sunrpc: Remap EPERM in case of connection failure in xs_tcp_setup_socket
Chengen Du <chengen.du(a)canonical.com>
net/sched: Fix UAF when resolving a clash
Kuniyuki Iwashima <kuniyu(a)amazon.com>
udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port().
Oleksij Rempel <linux(a)rempel-privat.de>
ethtool: netlink: do not return SQI value if link is down
Dmitry Antipov <dmantipov(a)yandex.ru>
ppp: reject claimed-as-LCP but actually malformed packets
Jian Hui Lee <jianhui.lee(a)canonical.com>
net: ethernet: mtk-star-emac: set mac_managed_pm when probing
Mohammad Shehar Yaar Tausif <sheharyaar48(a)gmail.com>
bpf: fix order of args in call to bpf_map_kvcalloc
Martin KaFai Lau <martin.lau(a)kernel.org>
bpf: Remove __bpf_local_storage_map_alloc
Yafang Shao <laoar.shao(a)gmail.com>
bpf: use bpf_map_kvcalloc in bpf_local_storage
Martin KaFai Lau <martin.lau(a)kernel.org>
bpf: Reduce smap->elem_size
Yonghong Song <yhs(a)fb.com>
bpf: Refactor some inode/task/sk storage functions for reuse
Aleksander Jan Bajkowski <olek2(a)wp.pl>
net: ethernet: lantiq_etop: fix double free in detach
Michal Kubiak <michal.kubiak(a)intel.com>
i40e: Fix XDP program unloading while removing the driver
Hugh Dickins <hughd(a)google.com>
net: fix rc7's __skb_datagram_iter()
Aleksandr Mishin <amishin(a)t-argos.ru>
octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability()
Geliang Tang <tanggeliang(a)kylinos.cn>
skmsg: Skip zero length skb in sk_msg_recvmsg
Oleksij Rempel <linux(a)rempel-privat.de>
net: phy: microchip: lan87xx: reinit PHY after cable test
Neal Cardwell <ncardwell(a)google.com>
tcp: fix incorrect undo caused by DSACK of TLP retransmit
Brian Foster <bfoster(a)redhat.com>
vfs: don't mod negative dentry count when on shrinker list
linke li <lilinke99(a)qq.com>
fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
Jeff Layton <jlayton(a)kernel.org>
filelock: fix potential use-after-free in posix_lock_inode
Jingbo Xu <jefflexu(a)linux.alibaba.com>
cachefiles: add missing lock protection when polling
Baokun Li <libaokun1(a)huawei.com>
cachefiles: cyclic allocation of msg_id to avoid reuse
Hou Tao <houtao1(a)huawei.com>
cachefiles: wait for ondemand_object_worker to finish when dropping object
Baokun Li <libaokun1(a)huawei.com>
cachefiles: cancel all requests for the object that is being dropped
Baokun Li <libaokun1(a)huawei.com>
cachefiles: stop sending new request when dropping object
Jia Zhu <zhujia.zj(a)bytedance.com>
cachefiles: narrow the scope of triggering EPOLLIN events in ondemand mode
Baokun Li <libaokun1(a)huawei.com>
cachefiles: propagate errors from vfs_getxattr() to avoid infinite loop
Waiman Long <longman(a)redhat.com>
mm: prevent derefencing NULL ptr in pfn_section_valid()
-------------
Diffstat:
Documentation/admin-guide/cifs/usage.rst | 34 +--
Makefile | 4 +-
arch/arm/mach-davinci/pm.c | 2 +-
arch/s390/include/asm/processor.h | 2 +-
arch/x86/entry/entry_64.S | 19 +-
arch/x86/entry/entry_64_compat.S | 14 +-
arch/x86/lib/retpoline.S | 2 +-
drivers/acpi/processor_idle.c | 37 ++--
drivers/char/hpet.c | 34 ++-
drivers/firmware/cirrus/cs_dsp.c | 231 +++++++++++++++------
drivers/i2c/busses/i2c-rcar.c | 67 +++---
drivers/i2c/i2c-core-base.c | 1 +
drivers/i2c/i2c-slave-testunit.c | 7 +
drivers/misc/fastrpc.c | 14 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 9 +-
drivers/net/ethernet/lantiq_etop.c | 4 +-
drivers/net/ethernet/marvell/octeontx2/af/mbox.h | 10 +-
drivers/net/ethernet/marvell/octeontx2/af/npc.h | 8 +-
drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 2 +-
.../net/ethernet/marvell/octeontx2/af/rvu_cpt.c | 33 ++-
.../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 67 +++++-
drivers/net/ethernet/mediatek/mtk_star_emac.c | 7 +
drivers/net/ethernet/micrel/ks8851_common.c | 10 +-
drivers/net/ethernet/micrel/ks8851_spi.c | 4 +-
drivers/net/phy/microchip_t1.c | 2 +-
drivers/net/ppp/ppp_generic.c | 15 ++
drivers/net/wireguard/allowedips.c | 4 +-
drivers/net/wireguard/queueing.h | 4 +-
drivers/net/wireguard/send.c | 2 +-
drivers/nvmem/core.c | 5 +-
drivers/nvmem/meson-efuse.c | 14 +-
drivers/nvmem/rmem.c | 5 +-
drivers/platform/x86/toshiba_acpi.c | 1 +
drivers/usb/core/config.c | 18 +-
drivers/usb/core/quirks.c | 3 +
drivers/usb/gadget/configfs.c | 3 +
drivers/usb/host/xhci.c | 16 +-
drivers/usb/serial/mos7840.c | 45 ++++
drivers/usb/serial/option.c | 38 ++++
fs/cachefiles/daemon.c | 14 +-
fs/cachefiles/internal.h | 15 ++
fs/cachefiles/ondemand.c | 52 ++++-
fs/cachefiles/xattr.c | 5 +-
fs/dcache.c | 12 +-
fs/locks.c | 2 +-
fs/nilfs2/dir.c | 32 ++-
fs/smb/client/cifs_dfs_ref.c | 36 +++-
fs/smb/client/cifsglob.h | 4 +-
fs/smb/client/cifsproto.h | 36 ++++
fs/smb/client/dir.c | 21 +-
fs/smb/server/smb2pdu.c | 13 +-
fs/userfaultfd.c | 7 +-
include/linux/bpf.h | 8 +
include/linux/bpf_local_storage.h | 17 +-
include/linux/mmzone.h | 3 +-
kernel/bpf/bpf_inode_storage.c | 38 +---
kernel/bpf/bpf_local_storage.c | 199 +++++++++++-------
kernel/bpf/bpf_task_storage.c | 38 +---
kernel/bpf/syscall.c | 15 ++
kernel/bpf/verifier.c | 11 +-
kernel/sched/core.c | 7 +-
kernel/sched/fair.c | 12 +-
kernel/sched/psi.c | 21 +-
kernel/sched/sched.h | 1 +
kernel/sched/stats.h | 11 +-
net/ceph/mon_client.c | 14 +-
net/core/bpf_sk_storage.c | 35 +---
net/core/datagram.c | 3 +-
net/core/skmsg.c | 3 +-
net/ethtool/linkstate.c | 41 ++--
net/ipv4/tcp_input.c | 11 +-
net/ipv4/tcp_timer.c | 31 ++-
net/ipv4/udp.c | 4 +-
net/sched/act_ct.c | 8 +
net/sunrpc/xprtsock.c | 7 +
scripts/ld-version.sh | 8 +-
sound/pci/hda/patch_realtek.c | 4 +
.../selftests/bpf/progs/test_global_func10.c | 9 +-
tools/testing/selftests/bpf/verifier/calls.c | 13 +-
.../selftests/bpf/verifier/helper_access_var_len.c | 104 ++++++----
tools/testing/selftests/bpf/verifier/int_ptr.c | 9 +-
.../selftests/bpf/verifier/search_pruning.c | 13 +-
tools/testing/selftests/bpf/verifier/sock.c | 27 ---
tools/testing/selftests/bpf/verifier/spill_fill.c | 7 +-
tools/testing/selftests/bpf/verifier/var_off.c | 52 -----
tools/testing/selftests/wireguard/qemu/Makefile | 8 +-
86 files changed, 1204 insertions(+), 634 deletions(-)
This is the start of the stable review cycle for the 4.19.318 release.
There are 66 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 18 Jul 2024 15:27:21 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.318-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.318-rc1
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: bring hardware to known state when probing
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix kernel bug on rename operation of broken directory
felix <fuzhen5(a)huawei.com>
SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
Eric Dumazet <edumazet(a)google.com>
tcp: avoid too many retransmit packets
Eric Dumazet <edumazet(a)google.com>
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Menglong Dong <imagedong(a)tencent.com>
net: tcp: fix unexcepted socket die when snd_wnd is 0
Eric Dumazet <edumazet(a)google.com>
tcp: refactor tcp_retransmit_timer()
Ilya Dryomov <idryomov(a)gmail.com>
libceph: fix race between delayed_work() and ceph_monc_stop()
He Zhe <zhe.he(a)windriver.com>
hpet: Support 32-bit userspace
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor
Lee Jones <lee(a)kernel.org>
usb: gadget: configfs: Prevent OOB read/write in usb_string_copy()
WangYuli <wangyuli(a)uniontech.com>
USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k
Vanillan Wang <vanillanwang(a)163.com>
USB: serial: option: add Rolling RW350-GL variants
Mank Wang <mank.wang(a)netprisma.us>
USB: serial: option: add Netprisma LCUK54 series modules
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add support for Foxconn T99W651
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Fibocom FM350-GL
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit FN912 rmnet compositions
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit generic core-dump composition
Chen Ni <nichen(a)iscas.ac.cn>
ARM: davinci: Convert comma to semicolon
Sven Schnelle <svens(a)linux.ibm.com>
s390: Mark psw in __load_psw_mask() as __unitialized
Dmitry Antipov <dmantipov(a)yandex.ru>
ppp: reject claimed-as-LCP but actually malformed packets
Aleksander Jan Bajkowski <olek2(a)wp.pl>
net: ethernet: lantiq_etop: fix double free in detach
Aleksander Jan Bajkowski <olek2(a)wp.pl>
net: lantiq_etop: add blank line after declaration
Neal Cardwell <ncardwell(a)google.com>
tcp: fix incorrect undo caused by DSACK of TLP retransmit
Daniele Ceraolo Spurio <daniele.ceraolospurio(a)intel.com>
drm/i915: make find_fw_domain work on intel_uncore
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix incorrect inode allocation from reserved inodes
Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
Mauro Carvalho Chehab <mchehab(a)kernel.org>
media: dw2102: fix a potential buffer overflow
Ghadi Elie Rahme <ghadi.rahme(a)canonical.com>
bnx2x: Fix multiple UBSAN array-index-out-of-bounds
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu/atomfirmware: silence UBSAN warning
Ma Ke <make24(a)iscas.ac.cn>
drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes
Jan Kara <jack(a)suse.cz>
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
Jan Kara <jack(a)suse.cz>
fsnotify: Do not generate events for O_PATH file descriptors
Jimmy Assarsson <extja(a)kvaser.com>
can: kvaser_usb: Explicitly initialize family in leafimx driver_info struct
Jaganath Kanakkassery <jaganath.k.os(a)gmail.com>
Bluetooth: Fix incorrect pointer arithmatic in ext_adv_report_evt
Jinliang Zheng <alexjlzheng(a)tencent.com>
mm: optimize the redundant loop of mm_update_owner_next()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: add missing check for inode numbers on directory entries
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix inode number range checks
Shigeru Yoshida <syoshida(a)redhat.com>
inet_diag: Initialize pad field in struct inet_diag_req_v2
Zijian Zhang <zijianzhang(a)bytedance.com>
selftests: make order checking verbose in msg_zerocopy selftest
Zijian Zhang <zijianzhang(a)bytedance.com>
selftests: fix OOM in msg_zerocopy selftest
Sam Sun <samsun1006219(a)gmail.com>
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
Jakub Kicinski <kuba(a)kernel.org>
tcp_metrics: validate source addr length
Neal Cardwell <ncardwell(a)google.com>
UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open()
Yuchung Cheng <ycheng(a)google.com>
net: tcp better handling of reordering then loss cases
Yousuk Seung <ysseung(a)google.com>
tcp: add ece_ack flag to reno sack functions
zhang kai <zhangkaiheb(a)126.com>
tcp: tcp_mark_head_lost is only valid for sack-tcp
Eric Dumazet <edumazet(a)google.com>
tcp: take care of compressed acks in tcp_add_reno_sack()
Holger Dengler <dengler(a)linux.ibm.com>
s390/pkey: Wipe sensitive data on failure
Wang Yong <wang.yong12(a)zte.com.cn>
jffs2: Fix potential illegal address access in jffs2_free_inode
Greg Kurz <groug(a)kaod.org>
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
Mike Marshall <hubcap(a)omnibond.com>
orangefs: fix out-of-bounds fsid access
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n
Heiner Kallweit <hkallweit1(a)gmail.com>
i2c: i801: Annotate apanel_addr as __ro_after_init
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb-frontends: tda10048: Fix integer overflow
Ricardo Ribalda <ribalda(a)chromium.org>
media: s2255: Use refcount_t instead of atomic_t for num_channels
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb-frontends: tda18271c2dd: Remove casting during div
Simon Horman <horms(a)kernel.org>
net: dsa: mv88e6xxx: Correct check for empty list
Erick Archer <erick.archer(a)outlook.com>
Input: ff-core - prefer struct_size over open coded arithmetic
Jean Delvare <jdelvare(a)suse.de>
firmware: dmi: Stop decoding on broken entry
Erick Archer <erick.archer(a)outlook.com>
sctp: prefer struct_size over open coded arithmetic
Michael Bunk <micha(a)freedict.org>
media: dw2102: Don't translate i2c read into write
Alex Hung <alex.hung(a)amd.com>
drm/amd/display: Skip finding free audio for unknown engine_id
Michael Guralnik <michaelgur(a)nvidia.com>
IB/core: Implement a limit on UMAD receive List
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb-usb: dib0700_devices: Add missing release_firmware()
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb: as102-fe: Fix as10x_register_addr packing
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-davinci/pm.c | 2 +-
arch/powerpc/include/asm/io.h | 2 +-
arch/powerpc/xmon/xmon.c | 6 +-
arch/s390/include/asm/processor.h | 2 +-
drivers/char/hpet.c | 34 ++++-
drivers/firmware/dmi_scan.c | 11 ++
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 +
drivers/gpu/drm/amd/include/atomfirmware.h | 2 +-
drivers/gpu/drm/i915/intel_uncore.c | 20 +--
drivers/gpu/drm/nouveau/nouveau_connector.c | 3 +
drivers/i2c/busses/i2c-i801.c | 2 +-
drivers/i2c/busses/i2c-pnx.c | 48 ++-----
drivers/i2c/busses/i2c-rcar.c | 17 ++-
drivers/infiniband/core/user_mad.c | 21 ++-
drivers/input/ff-core.c | 7 +-
drivers/media/dvb-frontends/as102_fe_types.h | 2 +-
drivers/media/dvb-frontends/tda10048.c | 9 +-
drivers/media/dvb-frontends/tda18271c2dd.c | 4 +-
drivers/media/usb/dvb-usb/dib0700_devices.c | 18 ++-
drivers/media/usb/dvb-usb/dw2102.c | 120 +++++++++-------
drivers/media/usb/s2255/s2255drv.c | 20 +--
drivers/net/bonding/bond_options.c | 6 +-
drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 1 +
drivers/net/dsa/mv88e6xxx/chip.c | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +-
drivers/net/ethernet/lantiq_etop.c | 5 +-
drivers/net/ppp/ppp_generic.c | 15 ++
drivers/s390/crypto/pkey_api.c | 4 +-
drivers/usb/core/config.c | 18 ++-
drivers/usb/core/quirks.c | 3 +
drivers/usb/gadget/configfs.c | 3 +
drivers/usb/serial/option.c | 38 ++++++
fs/jffs2/super.c | 1 +
fs/nilfs2/alloc.c | 18 ++-
fs/nilfs2/alloc.h | 4 +-
fs/nilfs2/dat.c | 2 +-
fs/nilfs2/dir.c | 38 +++++-
fs/nilfs2/ifile.c | 7 +-
fs/nilfs2/nilfs.h | 10 +-
fs/nilfs2/the_nilfs.c | 6 +
fs/nilfs2/the_nilfs.h | 2 +-
fs/orangefs/super.c | 3 +-
include/linux/fsnotify.h | 8 +-
include/linux/sunrpc/clnt.h | 1 +
kernel/exit.c | 2 +
mm/page-writeback.c | 2 +-
net/bluetooth/hci_event.c | 2 +-
net/ceph/mon_client.c | 14 +-
net/ipv4/inet_diag.c | 2 +
net/ipv4/tcp_input.c | 158 ++++++++++++----------
net/ipv4/tcp_metrics.c | 1 +
net/ipv4/tcp_timer.c | 45 +++++-
net/sctp/socket.c | 7 +-
net/sunrpc/clnt.c | 5 +-
tools/testing/selftests/net/msg_zerocopy.c | 14 +-
56 files changed, 544 insertions(+), 264 deletions(-)
The DCP trusted key type uses the wrong helper function to store
the blob's payload length which can lead to the wrong byte order
being used in case this would ever run on big endian architectures.
Fix by using correct helper function.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys")
Suggested-by: Richard Weinberger <richard(a)nod.at>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/
Signed-off-by: David Gstir <david(a)sigma-star.at>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v1 -> v2: fix ordering of commit tags, add s-o-b from Jarkko Sakkinen
security/keys/trusted-keys/trusted_dcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/keys/trusted-keys/trusted_dcp.c b/security/keys/trusted-keys/trusted_dcp.c
index b5f81a05be36..b0947f072a98 100644
--- a/security/keys/trusted-keys/trusted_dcp.c
+++ b/security/keys/trusted-keys/trusted_dcp.c
@@ -222,7 +222,7 @@ static int trusted_dcp_seal(struct trusted_key_payload *p, char *datablob)
return ret;
}
- b->payload_len = get_unaligned_le32(&p->key_len);
+ put_unaligned_le32(p->key_len, &b->payload_len);
p->blob_len = blen;
return 0;
}
--
2.35.3
Calling ioctl TIOCSSERIAL with an invalid baud_base can
result in uartclk being zero, which will result in a
divide by zero error in uart_get_divisor(). The check for
uartclk being zero in uart_set_info() needs to be done
before other settings are made as subsequent calls to
ioctl TIOCSSERIAL for the same port would be impacted if
the uartclk check was done where uartclk gets set.
Oops: divide error: 0000 PREEMPT SMP KASAN PTI
RIP: 0010:uart_get_divisor (drivers/tty/serial/serial_core.c:580)
Call Trace:
<TASK>
serial8250_get_divisor (drivers/tty/serial/8250/8250_port.c:2576
drivers/tty/serial/8250/8250_port.c:2589)
serial8250_do_set_termios (drivers/tty/serial/8250/8250_port.c:502
drivers/tty/serial/8250/8250_port.c:2741)
serial8250_set_termios (drivers/tty/serial/8250/8250_port.c:2862)
uart_change_line_settings (./include/linux/spinlock.h:376
./include/linux/serial_core.h:608 drivers/tty/serial/serial_core.c:222)
uart_port_startup (drivers/tty/serial/serial_core.c:342)
uart_startup (drivers/tty/serial/serial_core.c:368)
uart_set_info (drivers/tty/serial/serial_core.c:1034)
uart_set_info_user (drivers/tty/serial/serial_core.c:1059)
tty_set_serial (drivers/tty/tty_io.c:2637)
tty_ioctl (drivers/tty/tty_io.c:2647 drivers/tty/tty_io.c:2791)
__x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:907
fs/ioctl.c:893 fs/ioctl.c:893)
do_syscall_64 (arch/x86/entry/common.c:52
(discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Reported-by: syzkaller <syzkaller(a)googlegroups.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: George Kennedy <george.kennedy(a)oracle.com>
---
serial_struct baud_base=0x30000000 will cause the crash.
drivers/tty/serial/serial_core.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 2a8006e3d687..9967444eae10 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -881,6 +881,14 @@ static int uart_set_info(struct tty_struct *tty, struct tty_port *port,
new_flags = (__force upf_t)new_info->flags;
old_custom_divisor = uport->custom_divisor;
+ if (!(uport->flags & UPF_FIXED_PORT)) {
+ unsigned int uartclk = new_info->baud_base * 16;
+ /* check needs to be done here before other settings made */
+ if (uartclk == 0) {
+ retval = -EINVAL;
+ goto exit;
+ }
+ }
if (!capable(CAP_SYS_ADMIN)) {
retval = -EPERM;
if (change_irq || change_port ||
--
2.39.3
tpm_buf_append_name() has the following snippet in the beginning:
if (!tpm2_chip_auth(chip)) {
tpm_buf_append_u32(buf, handle);
/* count the number of handles in the upper bits of flags */
buf->handles++;
return;
}
The claim in the comment is wrong, and the comment is in the wrong place
as alignment in this case should not anyway be a concern of the call
site. In essence the comment is lying about the code, and thus needs to
be adressed.
Further, 'handles' was incorrectly place to struct tpm_buf, as tpm-buf.c
does manage its state. It is easy to grep that only piece of code that
actually uses the field is tpm2-sessions.c.
Address the issues by moving the variable to struct tpm_chip.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
v3:
* Reset chip->handles in the beginning of tpm2_start_auth_session()
so that it shows correct value, when TCG_TPM2_HMAC is enabled but
tpm2_sessions_init() has never been called.
v2:
* Was a bit more broken than I first thought, as 'handles' is only
useful for tpm2-sessions.c and has zero relation to tpm-buf.c.
---
drivers/char/tpm/tpm-buf.c | 1 -
drivers/char/tpm/tpm2-cmd.c | 2 +-
drivers/char/tpm/tpm2-sessions.c | 7 ++++---
include/linux/tpm.h | 8 ++++----
4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/char/tpm/tpm-buf.c b/drivers/char/tpm/tpm-buf.c
index cad0048bcc3c..d06e8e063151 100644
--- a/drivers/char/tpm/tpm-buf.c
+++ b/drivers/char/tpm/tpm-buf.c
@@ -44,7 +44,6 @@ void tpm_buf_reset(struct tpm_buf *buf, u16 tag, u32 ordinal)
head->tag = cpu_to_be16(tag);
head->length = cpu_to_be32(sizeof(*head));
head->ordinal = cpu_to_be32(ordinal);
- buf->handles = 0;
}
EXPORT_SYMBOL_GPL(tpm_buf_reset);
diff --git a/drivers/char/tpm/tpm2-cmd.c b/drivers/char/tpm/tpm2-cmd.c
index 1e856259219e..b781e4406fc2 100644
--- a/drivers/char/tpm/tpm2-cmd.c
+++ b/drivers/char/tpm/tpm2-cmd.c
@@ -776,7 +776,7 @@ int tpm2_auto_startup(struct tpm_chip *chip)
if (rc)
goto out;
- rc = tpm2_sessions_init(chip);
+ /* rc = tpm2_sessions_init(chip); */
out:
/*
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index d3521aadd43e..5e7c12d64ba8 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -238,8 +238,7 @@ void tpm_buf_append_name(struct tpm_chip *chip, struct tpm_buf *buf,
if (!tpm2_chip_auth(chip)) {
tpm_buf_append_u32(buf, handle);
- /* count the number of handles in the upper bits of flags */
- buf->handles++;
+ chip->handles++;
return;
}
@@ -310,7 +309,7 @@ void tpm_buf_append_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf,
if (!tpm2_chip_auth(chip)) {
/* offset tells us where the sessions area begins */
- int offset = buf->handles * 4 + TPM_HEADER_SIZE;
+ int offset = chip->handles * 4 + TPM_HEADER_SIZE;
u32 len = 9 + passphrase_len;
if (tpm_buf_length(buf) != offset) {
@@ -963,6 +962,8 @@ int tpm2_start_auth_session(struct tpm_chip *chip)
int rc;
u32 null_key;
+ chip->handles = 0;
+
if (!auth) {
dev_warn_once(&chip->dev, "auth session is not active\n");
return 0;
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index e93ee8d936a9..b664f7556494 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -202,9 +202,9 @@ struct tpm_chip {
/* active locality */
int locality;
+ /* handle count for session: */
+ u8 handles;
#ifdef CONFIG_TCG_TPM2_HMAC
- /* details for communication security via sessions */
-
/* saved context for NULL seed */
u8 null_key_context[TPM2_MAX_CONTEXT_SIZE];
/* name of NULL seed */
@@ -377,7 +377,6 @@ struct tpm_buf {
u32 flags;
u32 length;
u8 *data;
- u8 handles;
};
enum tpm2_object_attributes {
@@ -517,7 +516,7 @@ static inline void tpm_buf_append_hmac_session_opt(struct tpm_chip *chip,
if (tpm2_chip_auth(chip)) {
tpm_buf_append_hmac_session(chip, buf, attributes, passphrase, passphraselen);
} else {
- offset = buf->handles * 4 + TPM_HEADER_SIZE;
+ offset = chip->handles * 4 + TPM_HEADER_SIZE;
head = (struct tpm_header *)buf->data;
/*
@@ -541,6 +540,7 @@ void tpm2_end_auth_session(struct tpm_chip *chip);
static inline int tpm2_start_auth_session(struct tpm_chip *chip)
{
+ chip->handles = 0;
return 0;
}
static inline void tpm2_end_auth_session(struct tpm_chip *chip)
--
2.45.2
v2:
- Updates commits with Johan's Review/Reported tags
- Adds Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com
- Cc's stable
- Adds in suggested kernel log to allow others to more easily match kernel
log to fixes
- Link to v1: https://lore.kernel.org/r/20240714-linux-next-24-07-13-camss-fixes-v1-0-8f8…
V1:
Dogfooding with SoftISP has uncovered two bugs in this series which I'm
posting fixes for.
- The first error:
A simple race condition which to be honest I'm surprised I haven't found
earlier nor has anybody else. Simply stated the order we typically
end up loading CAMSS on boot has masked out the pm_runtime_enable() race
condition that has been present in CAMSS for a long time.
If you blacklist qcom-camss in modules.d and then modprobe after boot,
the race condition shows up easily.
Moving the pm_runtime_enable prior to subdevice registration fixes the
problem.
The second error:
Nomenclature:
- CSIPHY: CSI Physical layer analogue to digital domain serialiser
- CSID: CSI Decoder
- VFE: Video Front End
- RDI: Raw Data Interface
- VC: Virtual Channel
In order to support streaming multiple virtual-channels on the same RDI a
V4L2 provided use_count variable is used to decide whether or not to actually
terminate streaming and release buffers for 'msm_vfe_rdiX'.
Unfortunately use_count indicates the number of times msm_vfe_rdiX has
been opened by user-space not the number of concurrent streams on
msm_vfe_rdiX.
Simply stated use_count and stream_count are two different things.
The silicon enabling code to select between VCs is valid but, a different
solution needs to be found to support _concurrent_ VC streams.
Right now the upstream use_count as-is is breaking the non concurrent VC
case and I don't believe there are upstream users of concurrent VCs on
CAMSS.
This series implements a revert for the invalid use_count check,
retaining the ability to select which VC is active on the RDI.
Dogfooding with libcamera's SoftISP in Hangouts, Zoom and multiple runs
of libcamera's "qcam" application is a very different test-case to the
simple capture of frames we previously did when validating the
'use_count' change.
A partial revert in expectation of a renewed push to fixup that
concurrent VC issue is included.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (2):
media: qcom: camss: Remove use_count guard in stop_streaming
media: qcom: camss: Fix ordering of pm_runtime_enable
drivers/media/platform/qcom/camss/camss-video.c | 6 ------
drivers/media/platform/qcom/camss/camss.c | 5 +++--
2 files changed, 3 insertions(+), 8 deletions(-)
---
base-commit: c6ce8f9ab92edc9726996a0130bfc1c408132d47
change-id: 20240713-linux-next-24-07-13-camss-fixes-fa98c0965a5d
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
This is the start of the stable review cycle for the 4.19.318 release.
There are 65 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 19 Jul 2024 06:37:32 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.318-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.318-rc2
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
i2c: rcar: bring hardware to known state when probing
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix kernel bug on rename operation of broken directory
felix <fuzhen5(a)huawei.com>
SUNRPC: Fix RPC client cleaned up the freed pipefs dentries
Eric Dumazet <edumazet(a)google.com>
tcp: avoid too many retransmit packets
Eric Dumazet <edumazet(a)google.com>
tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Menglong Dong <imagedong(a)tencent.com>
net: tcp: fix unexcepted socket die when snd_wnd is 0
Eric Dumazet <edumazet(a)google.com>
tcp: refactor tcp_retransmit_timer()
Ilya Dryomov <idryomov(a)gmail.com>
libceph: fix race between delayed_work() and ceph_monc_stop()
He Zhe <zhe.he(a)windriver.com>
hpet: Support 32-bit userspace
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor
Lee Jones <lee(a)kernel.org>
usb: gadget: configfs: Prevent OOB read/write in usb_string_copy()
WangYuli <wangyuli(a)uniontech.com>
USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k
Vanillan Wang <vanillanwang(a)163.com>
USB: serial: option: add Rolling RW350-GL variants
Mank Wang <mank.wang(a)netprisma.us>
USB: serial: option: add Netprisma LCUK54 series modules
Slark Xiao <slark_xiao(a)163.com>
USB: serial: option: add support for Foxconn T99W651
Bjørn Mork <bjorn(a)mork.no>
USB: serial: option: add Fibocom FM350-GL
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit FN912 rmnet compositions
Daniele Palmas <dnlplm(a)gmail.com>
USB: serial: option: add Telit generic core-dump composition
Chen Ni <nichen(a)iscas.ac.cn>
ARM: davinci: Convert comma to semicolon
Dmitry Antipov <dmantipov(a)yandex.ru>
ppp: reject claimed-as-LCP but actually malformed packets
Aleksander Jan Bajkowski <olek2(a)wp.pl>
net: ethernet: lantiq_etop: fix double free in detach
Aleksander Jan Bajkowski <olek2(a)wp.pl>
net: lantiq_etop: add blank line after declaration
Neal Cardwell <ncardwell(a)google.com>
tcp: fix incorrect undo caused by DSACK of TLP retransmit
Daniele Ceraolo Spurio <daniele.ceraolospurio(a)intel.com>
drm/i915: make find_fw_domain work on intel_uncore
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix incorrect inode allocation from reserved inodes
Piotr Wojtaszczyk <piotr.wojtaszczyk(a)timesys.com>
i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr
Mauro Carvalho Chehab <mchehab(a)kernel.org>
media: dw2102: fix a potential buffer overflow
Ghadi Elie Rahme <ghadi.rahme(a)canonical.com>
bnx2x: Fix multiple UBSAN array-index-out-of-bounds
Alex Deucher <alexander.deucher(a)amd.com>
drm/amdgpu/atomfirmware: silence UBSAN warning
Ma Ke <make24(a)iscas.ac.cn>
drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes
Jan Kara <jack(a)suse.cz>
Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again"
Jan Kara <jack(a)suse.cz>
fsnotify: Do not generate events for O_PATH file descriptors
Jimmy Assarsson <extja(a)kvaser.com>
can: kvaser_usb: Explicitly initialize family in leafimx driver_info struct
Jaganath Kanakkassery <jaganath.k.os(a)gmail.com>
Bluetooth: Fix incorrect pointer arithmatic in ext_adv_report_evt
Jinliang Zheng <alexjlzheng(a)tencent.com>
mm: optimize the redundant loop of mm_update_owner_next()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: add missing check for inode numbers on directory entries
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix inode number range checks
Shigeru Yoshida <syoshida(a)redhat.com>
inet_diag: Initialize pad field in struct inet_diag_req_v2
Zijian Zhang <zijianzhang(a)bytedance.com>
selftests: make order checking verbose in msg_zerocopy selftest
Zijian Zhang <zijianzhang(a)bytedance.com>
selftests: fix OOM in msg_zerocopy selftest
Sam Sun <samsun1006219(a)gmail.com>
bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set()
Jakub Kicinski <kuba(a)kernel.org>
tcp_metrics: validate source addr length
Neal Cardwell <ncardwell(a)google.com>
UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open()
Yuchung Cheng <ycheng(a)google.com>
net: tcp better handling of reordering then loss cases
Yousuk Seung <ysseung(a)google.com>
tcp: add ece_ack flag to reno sack functions
zhang kai <zhangkaiheb(a)126.com>
tcp: tcp_mark_head_lost is only valid for sack-tcp
Eric Dumazet <edumazet(a)google.com>
tcp: take care of compressed acks in tcp_add_reno_sack()
Holger Dengler <dengler(a)linux.ibm.com>
s390/pkey: Wipe sensitive data on failure
Wang Yong <wang.yong12(a)zte.com.cn>
jffs2: Fix potential illegal address access in jffs2_free_inode
Greg Kurz <groug(a)kaod.org>
powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#"
Mike Marshall <hubcap(a)omnibond.com>
orangefs: fix out-of-bounds fsid access
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n
Heiner Kallweit <hkallweit1(a)gmail.com>
i2c: i801: Annotate apanel_addr as __ro_after_init
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb-frontends: tda10048: Fix integer overflow
Ricardo Ribalda <ribalda(a)chromium.org>
media: s2255: Use refcount_t instead of atomic_t for num_channels
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb-frontends: tda18271c2dd: Remove casting during div
Simon Horman <horms(a)kernel.org>
net: dsa: mv88e6xxx: Correct check for empty list
Erick Archer <erick.archer(a)outlook.com>
Input: ff-core - prefer struct_size over open coded arithmetic
Jean Delvare <jdelvare(a)suse.de>
firmware: dmi: Stop decoding on broken entry
Erick Archer <erick.archer(a)outlook.com>
sctp: prefer struct_size over open coded arithmetic
Michael Bunk <micha(a)freedict.org>
media: dw2102: Don't translate i2c read into write
Alex Hung <alex.hung(a)amd.com>
drm/amd/display: Skip finding free audio for unknown engine_id
Michael Guralnik <michaelgur(a)nvidia.com>
IB/core: Implement a limit on UMAD receive List
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb-usb: dib0700_devices: Add missing release_firmware()
Ricardo Ribalda <ribalda(a)chromium.org>
media: dvb: as102-fe: Fix as10x_register_addr packing
-------------
Diffstat:
Makefile | 4 +-
arch/arm/mach-davinci/pm.c | 2 +-
arch/powerpc/include/asm/io.h | 2 +-
arch/powerpc/xmon/xmon.c | 6 +-
drivers/char/hpet.c | 34 ++++-
drivers/firmware/dmi_scan.c | 11 ++
drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 3 +
drivers/gpu/drm/amd/include/atomfirmware.h | 2 +-
drivers/gpu/drm/i915/intel_uncore.c | 20 +--
drivers/gpu/drm/nouveau/nouveau_connector.c | 3 +
drivers/i2c/busses/i2c-i801.c | 2 +-
drivers/i2c/busses/i2c-pnx.c | 48 ++-----
drivers/i2c/busses/i2c-rcar.c | 17 ++-
drivers/infiniband/core/user_mad.c | 21 ++-
drivers/input/ff-core.c | 7 +-
drivers/media/dvb-frontends/as102_fe_types.h | 2 +-
drivers/media/dvb-frontends/tda10048.c | 9 +-
drivers/media/dvb-frontends/tda18271c2dd.c | 4 +-
drivers/media/usb/dvb-usb/dib0700_devices.c | 18 ++-
drivers/media/usb/dvb-usb/dw2102.c | 120 +++++++++-------
drivers/media/usb/s2255/s2255drv.c | 20 +--
drivers/net/bonding/bond_options.c | 6 +-
drivers/net/can/usb/kvaser_usb/kvaser_usb_core.c | 1 +
drivers/net/dsa/mv88e6xxx/chip.c | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x.h | 2 +-
drivers/net/ethernet/lantiq_etop.c | 5 +-
drivers/net/ppp/ppp_generic.c | 15 ++
drivers/s390/crypto/pkey_api.c | 4 +-
drivers/usb/core/config.c | 18 ++-
drivers/usb/core/quirks.c | 3 +
drivers/usb/gadget/configfs.c | 3 +
drivers/usb/serial/option.c | 38 ++++++
fs/jffs2/super.c | 1 +
fs/nilfs2/alloc.c | 18 ++-
fs/nilfs2/alloc.h | 4 +-
fs/nilfs2/dat.c | 2 +-
fs/nilfs2/dir.c | 38 +++++-
fs/nilfs2/ifile.c | 7 +-
fs/nilfs2/nilfs.h | 10 +-
fs/nilfs2/the_nilfs.c | 6 +
fs/nilfs2/the_nilfs.h | 2 +-
fs/orangefs/super.c | 3 +-
include/linux/fsnotify.h | 8 +-
include/linux/sunrpc/clnt.h | 1 +
kernel/exit.c | 2 +
mm/page-writeback.c | 2 +-
net/bluetooth/hci_event.c | 2 +-
net/ceph/mon_client.c | 14 +-
net/ipv4/inet_diag.c | 2 +
net/ipv4/tcp_input.c | 158 ++++++++++++----------
net/ipv4/tcp_metrics.c | 1 +
net/ipv4/tcp_timer.c | 45 +++++-
net/sctp/socket.c | 7 +-
net/sunrpc/clnt.c | 5 +-
tools/testing/selftests/net/msg_zerocopy.c | 14 +-
55 files changed, 543 insertions(+), 263 deletions(-)
Calling ioctl TIOCSSERIAL with an invalid baud_base can
result in uartclk being zero, which will result in a
divide by zero error in uart_get_divisor(). The check for
uartclk being zero in uart_set_info() needs to be done
before other settings are made as subsequent calls to
ioctl TIOCSSERIAL for the same port would be impacted if
the uartclk check was done where uartclk gets set.
Oops: divide error: 0000 PREEMPT SMP KASAN PTI
RIP: 0010:uart_get_divisor (drivers/tty/serial/serial_core.c:580)
Call Trace:
<TASK>
serial8250_get_divisor (drivers/tty/serial/8250/8250_port.c:2576
drivers/tty/serial/8250/8250_port.c:2589)
serial8250_do_set_termios (drivers/tty/serial/8250/8250_port.c:502
drivers/tty/serial/8250/8250_port.c:2741)
serial8250_set_termios (drivers/tty/serial/8250/8250_port.c:2862)
uart_change_line_settings (./include/linux/spinlock.h:376
./include/linux/serial_core.h:608 drivers/tty/serial/serial_core.c:222)
uart_port_startup (drivers/tty/serial/serial_core.c:342)
uart_startup (drivers/tty/serial/serial_core.c:368)
uart_set_info (drivers/tty/serial/serial_core.c:1034)
uart_set_info_user (drivers/tty/serial/serial_core.c:1059)
tty_set_serial (drivers/tty/tty_io.c:2637)
tty_ioctl (drivers/tty/tty_io.c:2647 drivers/tty/tty_io.c:2791)
__x64_sys_ioctl (fs/ioctl.c:52 fs/ioctl.c:907
fs/ioctl.c:893 fs/ioctl.c:893)
do_syscall_64 (arch/x86/entry/common.c:52
(discriminator 1) arch/x86/entry/common.c:83 (discriminator 1))
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
Reported-by: syzkaller <syzkaller(a)googlegroups.com>
Signed-off-by: George Kennedy <george.kennedy(a)oracle.com>
---
serial_struct baud_base=0x30000000 will cause the crash.
drivers/tty/serial/serial_core.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index 2a8006e3d687..9967444eae10 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -881,6 +881,14 @@ static int uart_set_info(struct tty_struct *tty, struct tty_port *port,
new_flags = (__force upf_t)new_info->flags;
old_custom_divisor = uport->custom_divisor;
+ if (!(uport->flags & UPF_FIXED_PORT)) {
+ unsigned int uartclk = new_info->baud_base * 16;
+ /* check needs to be done here before other settings made */
+ if (uartclk == 0) {
+ retval = -EINVAL;
+ goto exit;
+ }
+ }
if (!capable(CAP_SYS_ADMIN)) {
retval = -EPERM;
if (change_irq || change_port ||
--
2.39.3
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 0078cb365d8c2..adcd519c70b19 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -968,7 +968,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_slavemode = false,
.devtype = IMX35_CSPI,
--
2.43.0
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 67f31183c1180..8c9bafee58f9f 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -993,7 +993,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_slavemode = false,
.devtype = IMX35_CSPI,
--
2.43.0
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 21297cc62571a..8566da12d15e3 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1001,7 +1001,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_slavemode = false,
.devtype = IMX35_CSPI,
--
2.43.0
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index f201653931d89..c806ee8070e5a 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1016,7 +1016,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_slavemode = false,
.devtype = IMX35_CSPI,
--
2.43.0
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 2c660a95c17e7..93e83fbc3403f 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1040,7 +1040,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_slavemode = false,
.devtype = IMX35_CSPI,
--
2.43.0
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index d323b37723929..006860ee03ca0 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1050,7 +1050,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_targetmode = false,
.devtype = IMX35_CSPI,
--
2.43.0
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit ce1dac560a74220f2e53845ec0723b562288aed4 ]
While in commit 2dd33f9cec90 ("spi: imx: support DMA for imx35") it was
claimed that DMA works on i.MX25, i.MX31 and i.MX35 the respective
device trees don't add DMA channels. The Reference manuals of i.MX31 and
i.MX25 also don't mention the CSPI core being DMA capable. (I didn't
check the others.)
Since commit e267a5b3ec59 ("spi: spi-imx: Use dev_err_probe for failed
DMA channel requests") this results in an error message
spi_imx 43fa4000.spi: error -ENODEV: can't get the TX DMA channel!
during boot. However that isn't fatal and the driver gets loaded just
fine, just without using DMA.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://patch.msgid.link/20240508095610.2146640-2-u.kleine-koenig@pengutron…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-imx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 09b6c1b45f1a1..09c676e50fe0e 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -1050,7 +1050,7 @@ static struct spi_imx_devtype_data imx35_cspi_devtype_data = {
.rx_available = mx31_rx_available,
.reset = mx31_reset,
.fifo_size = 8,
- .has_dmamode = true,
+ .has_dmamode = false,
.dynamic_burst = false,
.has_targetmode = false,
.devtype = IMX35_CSPI,
--
2.43.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 310d6c15e9104c99d5d9d0ff8e5383a79da7d5e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071547-slum-anemic-a0cc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
310d6c15e910 ("mm/damon/core: merge regions aggressively when max_nr_regions is unmet")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 310d6c15e9104c99d5d9d0ff8e5383a79da7d5e6 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj(a)kernel.org>
Date: Mon, 24 Jun 2024 10:58:14 -0700
Subject: [PATCH] mm/damon/core: merge regions aggressively when max_nr_regions
is unmet
DAMON keeps the number of regions under max_nr_regions by skipping regions
split operations when doing so can make the number higher than the limit.
It works well for preventing violation of the limit. But, if somehow the
violation happens, it cannot recovery well depending on the situation. In
detail, if the real number of regions having different access pattern is
higher than the limit, the mechanism cannot reduce the number below the
limit. In such a case, the system could suffer from high monitoring
overhead of DAMON.
The violation can actually happen. For an example, the user could reduce
max_nr_regions while DAMON is running, to be lower than the current number
of regions. Fix the problem by repeating the merge operations with
increasing aggressiveness in kdamond_merge_regions() for the case, until
the limit is met.
[sj(a)kernel.org: increase regions merge aggressiveness while respecting min_nr_regions]
Link: https://lkml.kernel.org/r/20240626164753.46270-1-sj@kernel.org
[sj(a)kernel.org: ensure max threshold attempt for max_nr_regions violation]
Link: https://lkml.kernel.org/r/20240627163153.75969-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240624175814.89611-1-sj@kernel.org
Fixes: b9a6ac4e4ede ("mm/damon: adaptively adjust regions")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 6392f1cc97a3..e66823d6b10b 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1358,14 +1358,31 @@ static void damon_merge_regions_of(struct damon_target *t, unsigned int thres,
* access frequencies are similar. This is for minimizing the monitoring
* overhead under the dynamically changeable access pattern. If a merge was
* unnecessarily made, later 'kdamond_split_regions()' will revert it.
+ *
+ * The total number of regions could be higher than the user-defined limit,
+ * max_nr_regions for some cases. For example, the user can update
+ * max_nr_regions to a number that lower than the current number of regions
+ * while DAMON is running. For such a case, repeat merging until the limit is
+ * met while increasing @threshold up to possible maximum level.
*/
static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold,
unsigned long sz_limit)
{
struct damon_target *t;
+ unsigned int nr_regions;
+ unsigned int max_thres;
- damon_for_each_target(t, c)
- damon_merge_regions_of(t, threshold, sz_limit);
+ max_thres = c->attrs.aggr_interval /
+ (c->attrs.sample_interval ? c->attrs.sample_interval : 1);
+ do {
+ nr_regions = 0;
+ damon_for_each_target(t, c) {
+ damon_merge_regions_of(t, threshold, sz_limit);
+ nr_regions += damon_nr_regions(t);
+ }
+ threshold = max(1, threshold * 2);
+ } while (nr_regions > c->attrs.max_nr_regions &&
+ threshold / 2 < max_thres);
}
/*
tpm_buf_append_name() has the following snippet in the beginning:
if (!tpm2_chip_auth(chip)) {
tpm_buf_append_u32(buf, handle);
/* count the number of handles in the upper bits of flags */
buf->handles++;
return;
}
The claim in the comment is wrong, and the comment is in the wrong place
as alignment in this case should not anyway be a concern of the call
site. In essence the comment is lying about the code, and thus needs to
be adressed.
Further, 'handles' was incorrectly place to struct tpm_buf, as tpm-buf.c
does manage its state. It is easy to grep that only piece of code that
actually uses the field is tpm2-sessions.c.
Address the issues by moving the variable to struct tpm_chip.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2:
* Was a bit more broken than I first thought, as 'handles' is only
useful for tpm2-sessions.c and has zero relation to tpm-buf.c.
---
drivers/char/tpm/tpm-buf.c | 1 -
drivers/char/tpm/tpm2-sessions.c | 11 +++++------
include/linux/tpm.h | 8 ++++----
3 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/drivers/char/tpm/tpm-buf.c b/drivers/char/tpm/tpm-buf.c
index cad0048bcc3c..d06e8e063151 100644
--- a/drivers/char/tpm/tpm-buf.c
+++ b/drivers/char/tpm/tpm-buf.c
@@ -44,7 +44,6 @@ void tpm_buf_reset(struct tpm_buf *buf, u16 tag, u32 ordinal)
head->tag = cpu_to_be16(tag);
head->length = cpu_to_be32(sizeof(*head));
head->ordinal = cpu_to_be32(ordinal);
- buf->handles = 0;
}
EXPORT_SYMBOL_GPL(tpm_buf_reset);
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index d3521aadd43e..a4ff5bca61dd 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -238,8 +238,7 @@ void tpm_buf_append_name(struct tpm_chip *chip, struct tpm_buf *buf,
if (!tpm2_chip_auth(chip)) {
tpm_buf_append_u32(buf, handle);
- /* count the number of handles in the upper bits of flags */
- buf->handles++;
+ chip->handles++;
return;
}
@@ -310,7 +309,7 @@ void tpm_buf_append_hmac_session(struct tpm_chip *chip, struct tpm_buf *buf,
if (!tpm2_chip_auth(chip)) {
/* offset tells us where the sessions area begins */
- int offset = buf->handles * 4 + TPM_HEADER_SIZE;
+ int offset = chip->handles * 4 + TPM_HEADER_SIZE;
u32 len = 9 + passphrase_len;
if (tpm_buf_length(buf) != offset) {
@@ -1010,10 +1009,10 @@ int tpm2_start_auth_session(struct tpm_chip *chip)
tpm_buf_destroy(&buf);
- if (rc)
- goto out;
+ if (!rc)
+ chip->handles = 0;
- out:
+out:
return rc;
}
EXPORT_SYMBOL(tpm2_start_auth_session);
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index e93ee8d936a9..b664f7556494 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -202,9 +202,9 @@ struct tpm_chip {
/* active locality */
int locality;
+ /* handle count for session: */
+ u8 handles;
#ifdef CONFIG_TCG_TPM2_HMAC
- /* details for communication security via sessions */
-
/* saved context for NULL seed */
u8 null_key_context[TPM2_MAX_CONTEXT_SIZE];
/* name of NULL seed */
@@ -377,7 +377,6 @@ struct tpm_buf {
u32 flags;
u32 length;
u8 *data;
- u8 handles;
};
enum tpm2_object_attributes {
@@ -517,7 +516,7 @@ static inline void tpm_buf_append_hmac_session_opt(struct tpm_chip *chip,
if (tpm2_chip_auth(chip)) {
tpm_buf_append_hmac_session(chip, buf, attributes, passphrase, passphraselen);
} else {
- offset = buf->handles * 4 + TPM_HEADER_SIZE;
+ offset = chip->handles * 4 + TPM_HEADER_SIZE;
head = (struct tpm_header *)buf->data;
/*
@@ -541,6 +540,7 @@ void tpm2_end_auth_session(struct tpm_chip *chip);
static inline int tpm2_start_auth_session(struct tpm_chip *chip)
{
+ chip->handles = 0;
return 0;
}
static inline void tpm2_end_auth_session(struct tpm_chip *chip)
--
2.45.2
From: Miaohe Lin <linmiaohe(a)huawei.com>
commit 35e351780fa9d8240dd6f7e4f245f9ea37e96c19 upstream.
Thorvald reported a WARNING [1]. And the root cause is below race:
CPU 1 CPU 2
fork hugetlbfs_fallocate
dup_mmap hugetlbfs_punch_hole
i_mmap_lock_write(mapping);
vma_interval_tree_insert_after -- Child vma is visible through i_mmap tree.
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private -- Clear vma_lock outside i_mmap_rwsem!
i_mmap_lock_write(mapping);
hugetlb_vmdelete_list
vma_interval_tree_foreach
hugetlb_vma_trylock_write -- Vma_lock is cleared.
tmp->vm_ops->open -- Alloc new vma_lock outside i_mmap_rwsem!
hugetlb_vma_unlock_write -- Vma_lock is assigned!!!
i_mmap_unlock_write(mapping);
hugetlb_dup_vma_private() and hugetlb_vm_op_open() are called outside
i_mmap_rwsem lock while vma lock can be used in the same time. Fix this
by deferring linking file vma until vma is fully initialized. Those vmas
should be initialized first before they can be used.
Backport notes:
The first backport attempt (cec11fa2e) was reverted (dd782da4707). This is
the new backport of the original fix (35e351780fa9).
35e351780f ("fork: defer linking file vma until vma is fully initialized")
fixed a hugetlb locking race by moving a bunch of intialization code to earlier
in the function. The call to open() was included in the move but the call to
copy_page_range was not, effectively inverting their relative ordering. This
created an issue for the vfio code which assumes copy_page_range happens before
the call to open() - vfio's open zaps the vma so that the fault handler is
invoked later, but when we inverted the ordering, copy_page_range can set up
mappings post-zap which would prevent the fault handler from being invoked
later. This patch moves the call to copy_page_range to earlier than the call to
open() to restore the original ordering of the two functions while keeping the
fix for hugetlb intact.
Commit aac6db75a9 made several changes to vfio_pci_core.c, including
removing the vfio-pci custom open function. This resolves the issue on
the main branch and so we only need to apply these changes when
backporting to stable branches.
35e351780f ("fork: defer linking file vma until vma is fully initialized")-> v6.9-rc5
aac6db75a9 ("vfio/pci: Use unmap_mapping_range()") -> v6.10-rc4
Link: https://lkml.kernel.org/r/20240410091441.3539905-1-linmiaohe@huawei.com
Fixes: 8d9bfb260814 ("hugetlb: add vma based lock for pmd sharing")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reported-by: Thorvald Natvig <thorvald(a)google.com>
Closes: https://lore.kernel.org/linux-mm/20240129161735.6gmjsswx62o4pbja@revolver/T/ [1]
Reviewed-by: Jane Chu <jane.chu(a)oracle.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Mateusz Guzik <mjguzik(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Peng Zhang <zhangpeng.00(a)bytedance.com>
Cc: Tycho Andersen <tandersen(a)netflix.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Leah Rumancik <leah.rumancik(a)gmail.com>
---
kernel/fork.c | 27 +++++++++++++--------------
1 file changed, 13 insertions(+), 14 deletions(-)
diff --git a/kernel/fork.c b/kernel/fork.c
index 177ce7438db6..122d2cd124d5 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -727,6 +727,19 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
} else if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
vm_flags_clear(tmp, VM_LOCKED_MASK);
+ /*
+ * Copy/update hugetlb private vma information.
+ */
+ if (is_vm_hugetlb_page(tmp))
+ hugetlb_dup_vma_private(tmp);
+
+ if (!(tmp->vm_flags & VM_WIPEONFORK) &&
+ copy_page_range(tmp, mpnt))
+ goto fail_nomem_vmi_store;
+
+ if (tmp->vm_ops && tmp->vm_ops->open)
+ tmp->vm_ops->open(tmp);
+
file = tmp->vm_file;
if (file) {
struct address_space *mapping = file->f_mapping;
@@ -743,25 +756,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
i_mmap_unlock_write(mapping);
}
- /*
- * Copy/update hugetlb private vma information.
- */
- if (is_vm_hugetlb_page(tmp))
- hugetlb_dup_vma_private(tmp);
-
/* Link the vma into the MT */
if (vma_iter_bulk_store(&vmi, tmp))
goto fail_nomem_vmi_store;
mm->map_count++;
- if (!(tmp->vm_flags & VM_WIPEONFORK))
- retval = copy_page_range(tmp, mpnt);
-
- if (tmp->vm_ops && tmp->vm_ops->open)
- tmp->vm_ops->open(tmp);
-
- if (retval)
- goto loop_out;
}
/* a new mm has just been created */
retval = arch_dup_mmap(oldmm, mm);
--
2.45.2.803.g4e1b14247a-goog
tpm_buf_append_name() has the following snippet in the beginning:
if (!tpm2_chip_auth(chip)) {
tpm_buf_append_u32(buf, handle);
/* count the number of handles in the upper bits of flags */
buf->handles++;
return;
}
The claim in the comment is wrong, and the comment is in the wrong place
as it should not be anyway a concern of the "call site". So in essence
it is lying about the code.
Fix the alignment to be aligned with the claim in the comment and remove
the comment.
Cc: stable(a)vger.kernel.org # v6.10+
Fixes: 699e3efd6c64 ("tpm: Add HMAC session start and end functions")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
drivers/char/tpm/tpm2-sessions.c | 1 -
include/linux/tpm.h | 4 ++--
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/char/tpm/tpm2-sessions.c b/drivers/char/tpm/tpm2-sessions.c
index d3521aadd43e..02fc5d4ff535 100644
--- a/drivers/char/tpm/tpm2-sessions.c
+++ b/drivers/char/tpm/tpm2-sessions.c
@@ -238,7 +238,6 @@ void tpm_buf_append_name(struct tpm_chip *chip, struct tpm_buf *buf,
if (!tpm2_chip_auth(chip)) {
tpm_buf_append_u32(buf, handle);
- /* count the number of handles in the upper bits of flags */
buf->handles++;
return;
}
diff --git a/include/linux/tpm.h b/include/linux/tpm.h
index e93ee8d936a9..4b55298520b5 100644
--- a/include/linux/tpm.h
+++ b/include/linux/tpm.h
@@ -374,10 +374,10 @@ enum tpm_buf_flags {
* A string buffer type for constructing TPM commands.
*/
struct tpm_buf {
- u32 flags;
+ u16 flags;
+ u16 handles;
u32 length;
u8 *data;
- u8 handles;
};
enum tpm2_object_attributes {
--
2.45.2
The acpi_cst_latency_cmp() comparison function currently used for
sorting C-state latencies does not satisfy transitivity, causing
incorrect sorting results.
Specifically, if there are two valid acpi_processor_cx elements A and B
and one invalid element C, it may occur that A < B, A = C, and B = C.
Sorting algorithms assume that if A < B and A = C, then C < B, leading
to incorrect ordering.
Given the small size of the array (<=8), we replace the library sort
function with a simple insertion sort that properly ignores invalid
elements and sorts valid ones based on latency. This change ensures
correct ordering of the C-state latencies.
Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
Reported-by: Julian Sikorski <belegdol(a)gmail.com>
Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Tested-by: Julian Sikorski <belegdol(a)gmail.com>
Cc: All applicable <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
(cherry picked from commit 233323f9b9f828cd7cd5145ad811c1990b692542)
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
---
drivers/acpi/processor_idle.c | 40 ++++++++++++++---------------------
1 file changed, 16 insertions(+), 24 deletions(-)
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 4cb44d80bf52..5289c344de90 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -16,7 +16,6 @@
#include <linux/acpi.h>
#include <linux/dmi.h>
#include <linux/sched.h> /* need_resched() */
-#include <linux/sort.h>
#include <linux/tick.h>
#include <linux/cpuidle.h>
#include <linux/cpu.h>
@@ -385,28 +384,24 @@ static void acpi_processor_power_verify_c3(struct acpi_processor *pr,
return;
}
-static int acpi_cst_latency_cmp(const void *a, const void *b)
+static void acpi_cst_latency_sort(struct acpi_processor_cx *states, size_t length)
{
- const struct acpi_processor_cx *x = a, *y = b;
+ int i, j, k;
- if (!(x->valid && y->valid))
- return 0;
- if (x->latency > y->latency)
- return 1;
- if (x->latency < y->latency)
- return -1;
- return 0;
-}
-static void acpi_cst_latency_swap(void *a, void *b, int n)
-{
- struct acpi_processor_cx *x = a, *y = b;
- u32 tmp;
+ for (i = 1; i < length; i++) {
+ if (!states[i].valid)
+ continue;
- if (!(x->valid && y->valid))
- return;
- tmp = x->latency;
- x->latency = y->latency;
- y->latency = tmp;
+ for (j = i - 1, k = i; j >= 0; j--) {
+ if (!states[j].valid)
+ continue;
+
+ if (states[j].latency > states[k].latency)
+ swap(states[j].latency, states[k].latency);
+
+ k = j;
+ }
+ }
}
static int acpi_processor_power_verify(struct acpi_processor *pr)
@@ -451,10 +446,7 @@ static int acpi_processor_power_verify(struct acpi_processor *pr)
if (buggy_latency) {
pr_notice("FW issue: working around C-state latencies out of order\n");
- sort(&pr->power.states[1], max_cstate,
- sizeof(struct acpi_processor_cx),
- acpi_cst_latency_cmp,
- acpi_cst_latency_swap);
+ acpi_cst_latency_sort(&pr->power.states[1], max_cstate);
}
lapic_timer_propagate_broadcast(pr);
--
2.34.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 233323f9b9f828cd7cd5145ad811c1990b692542
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071531-underpaid-plop-ac0c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
233323f9b9f8 ("ACPI: processor_idle: Fix invalid comparison with insertion sort for latency")
0e6078c3c673 ("ACPI: processor idle: Use swap() instead of open coding it")
65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 233323f9b9f828cd7cd5145ad811c1990b692542 Mon Sep 17 00:00:00 2001
From: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Date: Tue, 2 Jul 2024 04:56:39 +0800
Subject: [PATCH] ACPI: processor_idle: Fix invalid comparison with insertion
sort for latency
The acpi_cst_latency_cmp() comparison function currently used for
sorting C-state latencies does not satisfy transitivity, causing
incorrect sorting results.
Specifically, if there are two valid acpi_processor_cx elements A and B
and one invalid element C, it may occur that A < B, A = C, and B = C.
Sorting algorithms assume that if A < B and A = C, then C < B, leading
to incorrect ordering.
Given the small size of the array (<=8), we replace the library sort
function with a simple insertion sort that properly ignores invalid
elements and sorts valid ones based on latency. This change ensures
correct ordering of the C-state latencies.
Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
Reported-by: Julian Sikorski <belegdol(a)gmail.com>
Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Tested-by: Julian Sikorski <belegdol(a)gmail.com>
Cc: All applicable <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index bd6a7857ce05..831fa4a12159 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -16,7 +16,6 @@
#include <linux/acpi.h>
#include <linux/dmi.h>
#include <linux/sched.h> /* need_resched() */
-#include <linux/sort.h>
#include <linux/tick.h>
#include <linux/cpuidle.h>
#include <linux/cpu.h>
@@ -386,25 +385,24 @@ static void acpi_processor_power_verify_c3(struct acpi_processor *pr,
acpi_write_bit_register(ACPI_BITREG_BUS_MASTER_RLD, 1);
}
-static int acpi_cst_latency_cmp(const void *a, const void *b)
+static void acpi_cst_latency_sort(struct acpi_processor_cx *states, size_t length)
{
- const struct acpi_processor_cx *x = a, *y = b;
+ int i, j, k;
- if (!(x->valid && y->valid))
- return 0;
- if (x->latency > y->latency)
- return 1;
- if (x->latency < y->latency)
- return -1;
- return 0;
-}
-static void acpi_cst_latency_swap(void *a, void *b, int n)
-{
- struct acpi_processor_cx *x = a, *y = b;
+ for (i = 1; i < length; i++) {
+ if (!states[i].valid)
+ continue;
- if (!(x->valid && y->valid))
- return;
- swap(x->latency, y->latency);
+ for (j = i - 1, k = i; j >= 0; j--) {
+ if (!states[j].valid)
+ continue;
+
+ if (states[j].latency > states[k].latency)
+ swap(states[j].latency, states[k].latency);
+
+ k = j;
+ }
+ }
}
static int acpi_processor_power_verify(struct acpi_processor *pr)
@@ -449,10 +447,7 @@ static int acpi_processor_power_verify(struct acpi_processor *pr)
if (buggy_latency) {
pr_notice("FW issue: working around C-state latencies out of order\n");
- sort(&pr->power.states[1], max_cstate,
- sizeof(struct acpi_processor_cx),
- acpi_cst_latency_cmp,
- acpi_cst_latency_swap);
+ acpi_cst_latency_sort(&pr->power.states[1], max_cstate);
}
lapic_timer_propagate_broadcast(pr);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 233323f9b9f828cd7cd5145ad811c1990b692542
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071530-rambling-fable-98ea@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
233323f9b9f8 ("ACPI: processor_idle: Fix invalid comparison with insertion sort for latency")
0e6078c3c673 ("ACPI: processor idle: Use swap() instead of open coding it")
65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 233323f9b9f828cd7cd5145ad811c1990b692542 Mon Sep 17 00:00:00 2001
From: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Date: Tue, 2 Jul 2024 04:56:39 +0800
Subject: [PATCH] ACPI: processor_idle: Fix invalid comparison with insertion
sort for latency
The acpi_cst_latency_cmp() comparison function currently used for
sorting C-state latencies does not satisfy transitivity, causing
incorrect sorting results.
Specifically, if there are two valid acpi_processor_cx elements A and B
and one invalid element C, it may occur that A < B, A = C, and B = C.
Sorting algorithms assume that if A < B and A = C, then C < B, leading
to incorrect ordering.
Given the small size of the array (<=8), we replace the library sort
function with a simple insertion sort that properly ignores invalid
elements and sorts valid ones based on latency. This change ensures
correct ordering of the C-state latencies.
Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
Reported-by: Julian Sikorski <belegdol(a)gmail.com>
Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Tested-by: Julian Sikorski <belegdol(a)gmail.com>
Cc: All applicable <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index bd6a7857ce05..831fa4a12159 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -16,7 +16,6 @@
#include <linux/acpi.h>
#include <linux/dmi.h>
#include <linux/sched.h> /* need_resched() */
-#include <linux/sort.h>
#include <linux/tick.h>
#include <linux/cpuidle.h>
#include <linux/cpu.h>
@@ -386,25 +385,24 @@ static void acpi_processor_power_verify_c3(struct acpi_processor *pr,
acpi_write_bit_register(ACPI_BITREG_BUS_MASTER_RLD, 1);
}
-static int acpi_cst_latency_cmp(const void *a, const void *b)
+static void acpi_cst_latency_sort(struct acpi_processor_cx *states, size_t length)
{
- const struct acpi_processor_cx *x = a, *y = b;
+ int i, j, k;
- if (!(x->valid && y->valid))
- return 0;
- if (x->latency > y->latency)
- return 1;
- if (x->latency < y->latency)
- return -1;
- return 0;
-}
-static void acpi_cst_latency_swap(void *a, void *b, int n)
-{
- struct acpi_processor_cx *x = a, *y = b;
+ for (i = 1; i < length; i++) {
+ if (!states[i].valid)
+ continue;
- if (!(x->valid && y->valid))
- return;
- swap(x->latency, y->latency);
+ for (j = i - 1, k = i; j >= 0; j--) {
+ if (!states[j].valid)
+ continue;
+
+ if (states[j].latency > states[k].latency)
+ swap(states[j].latency, states[k].latency);
+
+ k = j;
+ }
+ }
}
static int acpi_processor_power_verify(struct acpi_processor *pr)
@@ -449,10 +447,7 @@ static int acpi_processor_power_verify(struct acpi_processor *pr)
if (buggy_latency) {
pr_notice("FW issue: working around C-state latencies out of order\n");
- sort(&pr->power.states[1], max_cstate,
- sizeof(struct acpi_processor_cx),
- acpi_cst_latency_cmp,
- acpi_cst_latency_swap);
+ acpi_cst_latency_sort(&pr->power.states[1], max_cstate);
}
lapic_timer_propagate_broadcast(pr);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 233323f9b9f828cd7cd5145ad811c1990b692542
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071529-prognosis-achiness-85fd@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
233323f9b9f8 ("ACPI: processor_idle: Fix invalid comparison with insertion sort for latency")
0e6078c3c673 ("ACPI: processor idle: Use swap() instead of open coding it")
65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 233323f9b9f828cd7cd5145ad811c1990b692542 Mon Sep 17 00:00:00 2001
From: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Date: Tue, 2 Jul 2024 04:56:39 +0800
Subject: [PATCH] ACPI: processor_idle: Fix invalid comparison with insertion
sort for latency
The acpi_cst_latency_cmp() comparison function currently used for
sorting C-state latencies does not satisfy transitivity, causing
incorrect sorting results.
Specifically, if there are two valid acpi_processor_cx elements A and B
and one invalid element C, it may occur that A < B, A = C, and B = C.
Sorting algorithms assume that if A < B and A = C, then C < B, leading
to incorrect ordering.
Given the small size of the array (<=8), we replace the library sort
function with a simple insertion sort that properly ignores invalid
elements and sorts valid ones based on latency. This change ensures
correct ordering of the C-state latencies.
Fixes: 65ea8f2c6e23 ("ACPI: processor idle: Fix up C-state latency if not ordered")
Reported-by: Julian Sikorski <belegdol(a)gmail.com>
Closes: https://lore.kernel.org/lkml/70674dc7-5586-4183-8953-8095567e73df@gmail.com
Signed-off-by: Kuan-Wei Chiu <visitorckw(a)gmail.com>
Tested-by: Julian Sikorski <belegdol(a)gmail.com>
Cc: All applicable <stable(a)vger.kernel.org>
Link: https://patch.msgid.link/20240701205639.117194-1-visitorckw@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index bd6a7857ce05..831fa4a12159 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -16,7 +16,6 @@
#include <linux/acpi.h>
#include <linux/dmi.h>
#include <linux/sched.h> /* need_resched() */
-#include <linux/sort.h>
#include <linux/tick.h>
#include <linux/cpuidle.h>
#include <linux/cpu.h>
@@ -386,25 +385,24 @@ static void acpi_processor_power_verify_c3(struct acpi_processor *pr,
acpi_write_bit_register(ACPI_BITREG_BUS_MASTER_RLD, 1);
}
-static int acpi_cst_latency_cmp(const void *a, const void *b)
+static void acpi_cst_latency_sort(struct acpi_processor_cx *states, size_t length)
{
- const struct acpi_processor_cx *x = a, *y = b;
+ int i, j, k;
- if (!(x->valid && y->valid))
- return 0;
- if (x->latency > y->latency)
- return 1;
- if (x->latency < y->latency)
- return -1;
- return 0;
-}
-static void acpi_cst_latency_swap(void *a, void *b, int n)
-{
- struct acpi_processor_cx *x = a, *y = b;
+ for (i = 1; i < length; i++) {
+ if (!states[i].valid)
+ continue;
- if (!(x->valid && y->valid))
- return;
- swap(x->latency, y->latency);
+ for (j = i - 1, k = i; j >= 0; j--) {
+ if (!states[j].valid)
+ continue;
+
+ if (states[j].latency > states[k].latency)
+ swap(states[j].latency, states[k].latency);
+
+ k = j;
+ }
+ }
}
static int acpi_processor_power_verify(struct acpi_processor *pr)
@@ -449,10 +447,7 @@ static int acpi_processor_power_verify(struct acpi_processor *pr)
if (buggy_latency) {
pr_notice("FW issue: working around C-state latencies out of order\n");
- sort(&pr->power.states[1], max_cstate,
- sizeof(struct acpi_processor_cx),
- acpi_cst_latency_cmp,
- acpi_cst_latency_swap);
+ acpi_cst_latency_sort(&pr->power.states[1], max_cstate);
}
lapic_timer_propagate_broadcast(pr);
commit 1645c283a87c61f84b2bffd81f50724df959b11a upstream.
[BUG]
There is a bug report that ntfs2btrfs had a bug that it can lead to
transaction abort and the filesystem flips to read-only.
[CAUSE]
For inline backref items, kernel has a strict requirement for their
ordered, they must follow the following rules:
- All btrfs_extent_inline_ref::type should be in an ascending order
- Within the same type, the items should follow a descending order by
their sequence number
For EXTENT_DATA_REF type, the sequence number is result from
hash_extent_data_ref().
For other types, their sequence numbers are
btrfs_extent_inline_ref::offset.
Thus if there is any code not following above rules, the resulted
inline backrefs can prevent the kernel to locate the needed inline
backref and lead to transaction abort.
[FIX]
Ntrfs2btrfs has already fixed the problem, and btrfs-progs has added the
ability to detect such problems.
For kernel, let's be more noisy and be more specific about the order, so
that the next time kernel hits such problem we would reject it in the
first place, without leading to transaction abort.
Cc: stable(a)vger.kernel.org # 6.6
Link: https://github.com/kdave/btrfs-progs/pull/622
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
[ Fix a conflict due to header cleanup. ]
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/tree-checker.c | 39 +++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
diff --git a/fs/btrfs/tree-checker.c b/fs/btrfs/tree-checker.c
index cc6bc5985120..5d6cfa618dc4 100644
--- a/fs/btrfs/tree-checker.c
+++ b/fs/btrfs/tree-checker.c
@@ -29,6 +29,7 @@
#include "accessors.h"
#include "file-item.h"
#include "inode-item.h"
+#include "extent-tree.h"
/*
* Error message should follow the following format:
@@ -1274,6 +1275,8 @@ static int check_extent_item(struct extent_buffer *leaf,
unsigned long ptr; /* Current pointer inside inline refs */
unsigned long end; /* Extent item end */
const u32 item_size = btrfs_item_size(leaf, slot);
+ u8 last_type = 0;
+ u64 last_seq = U64_MAX;
u64 flags;
u64 generation;
u64 total_refs; /* Total refs in btrfs_extent_item */
@@ -1320,6 +1323,18 @@ static int check_extent_item(struct extent_buffer *leaf,
* 2.2) Ref type specific data
* Either using btrfs_extent_inline_ref::offset, or specific
* data structure.
+ *
+ * All above inline items should follow the order:
+ *
+ * - All btrfs_extent_inline_ref::type should be in an ascending
+ * order
+ *
+ * - Within the same type, the items should follow a descending
+ * order by their sequence number. The sequence number is
+ * determined by:
+ * * btrfs_extent_inline_ref::offset for all types other than
+ * EXTENT_DATA_REF
+ * * hash_extent_data_ref() for EXTENT_DATA_REF
*/
if (unlikely(item_size < sizeof(*ei))) {
extent_err(leaf, slot,
@@ -1401,6 +1416,7 @@ static int check_extent_item(struct extent_buffer *leaf,
struct btrfs_extent_inline_ref *iref;
struct btrfs_extent_data_ref *dref;
struct btrfs_shared_data_ref *sref;
+ u64 seq;
u64 dref_offset;
u64 inline_offset;
u8 inline_type;
@@ -1414,6 +1430,7 @@ static int check_extent_item(struct extent_buffer *leaf,
iref = (struct btrfs_extent_inline_ref *)ptr;
inline_type = btrfs_extent_inline_ref_type(leaf, iref);
inline_offset = btrfs_extent_inline_ref_offset(leaf, iref);
+ seq = inline_offset;
if (unlikely(ptr + btrfs_extent_inline_ref_size(inline_type) > end)) {
extent_err(leaf, slot,
"inline ref item overflows extent item, ptr %lu iref size %u end %lu",
@@ -1444,6 +1461,10 @@ static int check_extent_item(struct extent_buffer *leaf,
case BTRFS_EXTENT_DATA_REF_KEY:
dref = (struct btrfs_extent_data_ref *)(&iref->offset);
dref_offset = btrfs_extent_data_ref_offset(leaf, dref);
+ seq = hash_extent_data_ref(
+ btrfs_extent_data_ref_root(leaf, dref),
+ btrfs_extent_data_ref_objectid(leaf, dref),
+ btrfs_extent_data_ref_offset(leaf, dref));
if (unlikely(!IS_ALIGNED(dref_offset,
fs_info->sectorsize))) {
extent_err(leaf, slot,
@@ -1470,6 +1491,24 @@ static int check_extent_item(struct extent_buffer *leaf,
inline_type);
return -EUCLEAN;
}
+ if (inline_type < last_type) {
+ extent_err(leaf, slot,
+ "inline ref out-of-order: has type %u, prev type %u",
+ inline_type, last_type);
+ return -EUCLEAN;
+ }
+ /* Type changed, allow the sequence starts from U64_MAX again. */
+ if (inline_type > last_type)
+ last_seq = U64_MAX;
+ if (seq > last_seq) {
+ extent_err(leaf, slot,
+"inline ref out-of-order: has type %u offset %llu seq 0x%llx, prev type %u seq 0x%llx",
+ inline_type, inline_offset, seq,
+ last_type, last_seq);
+ return -EUCLEAN;
+ }
+ last_type = inline_type;
+ last_seq = seq;
ptr += btrfs_extent_inline_ref_size(inline_type);
}
/* No padding is allowed */
--
2.45.2
From: Baokun Li <libaokun1(a)huawei.com>
When commit 13df4d44a3aa ("ext4: fix slab-out-of-bounds in
ext4_mb_find_good_group_avg_frag_lists()") was backported to stable, the
commit f536808adcc3 ("ext4: refactor out ext4_generic_attr_store()") that
uniformly determines if the ptr is null is not merged in, so it needs to
be judged whether ptr is null or not in each case of the switch, otherwise
null pointer dereferencing may occur.
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
fs/ext4/sysfs.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 63cbda3700ea..d65dccb44ed5 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -473,6 +473,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
*((unsigned int *) ptr) = t;
return len;
case attr_clusters_in_group:
+ if (!ptr)
+ return 0;
ret = kstrtouint(skip_spaces(buf), 0, &t);
if (ret)
return ret;
--
2.39.2
commit a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 upstream.
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Please apply this patch to the stable trees indicated by the subject
prefix instead of the patch that failed.
This patch is tailored to take page/folio conversion into account and
can be applied to these stable trees.
Also, all the builds and tests I did on each stable tree passed.
Thanks,
Ryusuke Konishi
fs/nilfs2/dir.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index 51c982ad9608..53e4e63c607e 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -396,11 +396,39 @@ nilfs_find_entry(struct inode *dir, const struct qstr *qstr,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct page **p)
{
- struct nilfs_dir_entry *de = nilfs_get_page(dir, 0, p);
+ struct page *page;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_page(dir, 0, &page);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *p = page;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ nilfs_put_page(page);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
--
2.43.5
We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8").
Gen8 platform has only timeslice and doesn't support a preemption mechanism
as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.
v2: Simplify can_preempt() function (Tvrtko Ursulin)
v3:
- Inside need_preempt(), condition of can_preempt() is not required
as simplified can_preempt() is enough. (Chris Wilson)
Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti <andi.shyti(a)intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote(a)intel.com>
Cc: Chris Wilson <chris.p.wilson(a)linux.intel.com>
CC: <stable(a)vger.kernel.org> # v5.2+
---
drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 21829439e686..72090f52fb85 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3315,11 +3315,7 @@ static void remove_from_engine(struct i915_request *rq)
static bool can_preempt(struct intel_engine_cs *engine)
{
- if (GRAPHICS_VER(engine->i915) > 8)
- return true;
-
- /* GPGPU on bdw requires extra w/a; not implemented */
- return engine->class != RENDER_CLASS;
+ return GRAPHICS_VER(engine->i915) > 8;
}
static void kick_execlists(const struct i915_request *rq, int prio)
--
2.25.1
From: Xiubo Li <xiubli(a)redhat.com>
If a client sends out a cap update dropping caps with the prior 'seq'
just before an incoming cap revoke request, then the client may drop
the revoke because it believes it's already released the requested
capabilities.
This causes the MDS to wait indefinitely for the client to respond
to the revoke. It's therefore always a good idea to ack the cap
revoke request with the bumped up 'seq'.
Currently if the cap->issued equals to the newcaps the check_caps()
will do nothing, we should force flush the caps.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/61782
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/caps.c | 16 ++++++++++++----
fs/ceph/super.h | 7 ++++---
2 files changed, 16 insertions(+), 7 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 24c31f795938..ba5809cf8f02 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2024,6 +2024,8 @@ bool __ceph_should_report_size(struct ceph_inode_info *ci)
* CHECK_CAPS_AUTHONLY - we should only check the auth cap
* CHECK_CAPS_FLUSH - we should flush any dirty caps immediately, without
* further delay.
+ * CHECK_CAPS_FLUSH_FORCE - we should flush any caps immediately, without
+ * further delay.
*/
void ceph_check_caps(struct ceph_inode_info *ci, int flags)
{
@@ -2105,7 +2107,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
}
doutc(cl, "%p %llx.%llx file_want %s used %s dirty %s "
- "flushing %s issued %s revoking %s retain %s %s%s%s\n",
+ "flushing %s issued %s revoking %s retain %s %s%s%s%s\n",
inode, ceph_vinop(inode), ceph_cap_string(file_wanted),
ceph_cap_string(used), ceph_cap_string(ci->i_dirty_caps),
ceph_cap_string(ci->i_flushing_caps),
@@ -2113,7 +2115,8 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
ceph_cap_string(retain),
(flags & CHECK_CAPS_AUTHONLY) ? " AUTHONLY" : "",
(flags & CHECK_CAPS_FLUSH) ? " FLUSH" : "",
- (flags & CHECK_CAPS_NOINVAL) ? " NOINVAL" : "");
+ (flags & CHECK_CAPS_NOINVAL) ? " NOINVAL" : "",
+ (flags & CHECK_CAPS_FLUSH_FORCE) ? " FLUSH_FORCE" : "");
/*
* If we no longer need to hold onto old our caps, and we may
@@ -2223,6 +2226,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
goto ack;
}
+ if (flags & CHECK_CAPS_FLUSH_FORCE)
+ goto ack;
+
/* things we might delay */
if ((cap->issued & ~retain) == 0)
continue; /* nope, all good */
@@ -3518,6 +3524,7 @@ static void handle_cap_grant(struct inode *inode,
bool queue_invalidate = false;
bool deleted_inode = false;
bool fill_inline = false;
+ int flags = 0;
/*
* If there is at least one crypto block then we'll trust
@@ -3751,6 +3758,7 @@ static void handle_cap_grant(struct inode *inode,
/* don't let check_caps skip sending a response to MDS for revoke msgs */
if (le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) {
cap->mds_wanted = 0;
+ flags |= CHECK_CAPS_FLUSH_FORCE;
if (cap == ci->i_auth_cap)
check_caps = 1; /* check auth cap only */
else
@@ -3806,9 +3814,9 @@ static void handle_cap_grant(struct inode *inode,
mutex_unlock(&session->s_mutex);
if (check_caps == 1)
- ceph_check_caps(ci, CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL);
+ ceph_check_caps(ci, flags | CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL);
else if (check_caps == 2)
- ceph_check_caps(ci, CHECK_CAPS_NOINVAL);
+ ceph_check_caps(ci, flags | CHECK_CAPS_NOINVAL);
}
/*
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index b0b368ed3018..831e8ec4d5da 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -200,9 +200,10 @@ struct ceph_cap {
struct list_head caps_item;
};
-#define CHECK_CAPS_AUTHONLY 1 /* only check auth cap */
-#define CHECK_CAPS_FLUSH 2 /* flush any dirty caps */
-#define CHECK_CAPS_NOINVAL 4 /* don't invalidate pagecache */
+#define CHECK_CAPS_AUTHONLY 1 /* only check auth cap */
+#define CHECK_CAPS_FLUSH 2 /* flush any dirty caps */
+#define CHECK_CAPS_NOINVAL 4 /* don't invalidate pagecache */
+#define CHECK_CAPS_FLUSH_FORCE 8 /* force flush any caps */
struct ceph_cap_flush {
u64 tid;
--
2.45.1
From: Baokun Li <libaokun1(a)huawei.com>
[ Upstream commit 9e8e819f8f272c4e5dcd0bd6c7450e36481ed139 ]
When setting values of type unsigned int through sysfs, we use kstrtoul()
to parse it and then truncate part of it as the final set value, when the
set value is greater than UINT_MAX, the set value will not match what we
see because of the truncation. As follows:
$ echo 4294967296 > /sys/fs/ext4/sda/mb_max_linear_groups
$ cat /sys/fs/ext4/sda/mb_max_linear_groups
0
So we use kstrtouint() to parse the attr_pointer_ui type to avoid the
inconsistency described above. In addition, a judgment is added to avoid
setting s_resv_clusters less than 0.
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20240319113325.3110393-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Stable-dep-of: 13df4d44a3aa ("ext4: fix slab-out-of-bounds in ext4_mb_find_good_group_avg_frag_lists()")
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
fs/ext4/sysfs.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index 6d332dff79dd..ca820620b974 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -104,7 +104,7 @@ static ssize_t reserved_clusters_store(struct ext4_sb_info *sbi,
int ret;
ret = kstrtoull(skip_spaces(buf), 0, &val);
- if (ret || val >= clusters)
+ if (ret || val >= clusters || (s64)val < 0)
return -EINVAL;
atomic64_set(&sbi->s_resv_clusters, val);
@@ -451,7 +451,8 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
s_kobj);
struct ext4_attr *a = container_of(attr, struct ext4_attr, attr);
void *ptr = calc_ptr(a, sbi);
- unsigned long t;
+ unsigned int t;
+ unsigned long lt;
int ret;
switch (a->attr_id) {
@@ -460,7 +461,7 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
case attr_pointer_ui:
if (!ptr)
return 0;
- ret = kstrtoul(skip_spaces(buf), 0, &t);
+ ret = kstrtouint(skip_spaces(buf), 0, &t);
if (ret)
return ret;
if (a->attr_ptr == ptr_ext4_super_block_offset)
@@ -471,10 +472,10 @@ static ssize_t ext4_attr_store(struct kobject *kobj,
case attr_pointer_ul:
if (!ptr)
return 0;
- ret = kstrtoul(skip_spaces(buf), 0, &t);
+ ret = kstrtoul(skip_spaces(buf), 0, <);
if (ret)
return ret;
- *((unsigned long *) ptr) = t;
+ *((unsigned long *) ptr) = lt;
return len;
case attr_inode_readahead:
return inode_readahead_blks_store(sbi, buf, len);
--
2.39.2
V2 main changes:
- Add Rob's reviewed-by in the binding patch.
- Re-name the error out labels and new RXWM macro.
- In #3 patch, add one fix tag, and CC stable kernel.
Based on i.MX8QM HSIO PHY driver, refine i.MX8QM SATA driver by using PHY
interface.
[PATCH v2 1/4] dt-bindings: ata: Add i.MX8QM AHCI compatible string
[PATCH v2 2/4] ata: ahci_imx: Clean up code by using i.MX8Q HSIO PHY
[PATCH v2 3/4] ata: ahci_imx: Enlarge RX water mark for i.MX8QM SATA
[PATCH v2 4/4] ata: ahci_imx: Correct the email address
Documentation/devicetree/bindings/ata/imx-sata.yaml | 47 +++++++++++
drivers/ata/ahci_imx.c | 406 ++++++++++++++++++++++++-----------------------------------------------------------------
2 files changed, 155 insertions(+), 298 deletions(-)
The patch titled
Subject: mm/mglru: fix ineffective protection calculation
has been added to the -mm mm-unstable branch. Its filename is
mm-mglru-fix-ineffective-protection-calculation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/mglru: fix ineffective protection calculation
Date: Fri, 12 Jul 2024 17:29:56 -0600
mem_cgroup_calculate_protection() is not stateless and should only be used
as part of a top-down tree traversal. shrink_one() traverses the per-node
memcg LRU instead of the root_mem_cgroup tree, and therefore it should not
call mem_cgroup_calculate_protection().
The existing misuse in shrink_one() can cause ineffective protection of
sub-trees that are grandchildren of root_mem_cgroup. Fix it by reusing
lru_gen_age_node(), which already traverses the root_mem_cgroup tree, to
calculate the protection.
Previously lru_gen_age_node() opportunistically skips the first pass,
i.e., when scan_control->priority is DEF_PRIORITY. On the second pass,
lruvec_is_sizable() uses appropriate scan_control->priority, set by
set_initial_priority() from lru_gen_shrink_node(), to decide whether a
memcg is too small to reclaim from.
Now lru_gen_age_node() unconditionally traverses the root_mem_cgroup tree.
So it should call set_initial_priority() upfront, to make sure
lruvec_is_sizable() uses appropriate scan_control->priority on the first
pass. Otherwise, lruvec_is_reclaimable() can return false negatives and
result in premature OOM kills when min_ttl_ms is used.
Link: https://lkml.kernel.org/r/20240712232956.1427127-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: T.J. Mercier <tjmercier(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 82 +++++++++++++++++++++++---------------------------
1 file changed, 38 insertions(+), 44 deletions(-)
--- a/mm/vmscan.c~mm-mglru-fix-ineffective-protection-calculation
+++ a/mm/vmscan.c
@@ -3915,6 +3915,32 @@ done:
* working set protection
******************************************************************************/
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int priority;
+ unsigned long reclaimable;
+
+ if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+ return;
+ /*
+ * Determine the initial priority based on
+ * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
+ * where reclaimed_to_scanned_ratio = inactive / total.
+ */
+ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+ if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc))
+ reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+ /* round down reclaimable and round up sc->nr_to_reclaim */
+ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+ /*
+ * The estimation is based on LRU pages only, so cap it to prevent
+ * overshoots of shrinker objects by large margins.
+ */
+ sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
+}
+
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;
@@ -3948,19 +3974,17 @@ static bool lruvec_is_reclaimable(struct
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MIN_SEQ(lruvec);
- /* see the comment on lru_gen_folio */
- gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
- birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
-
- if (time_is_after_jiffies(birth + min_ttl))
+ if (mem_cgroup_below_min(NULL, memcg))
return false;
if (!lruvec_is_sizable(lruvec, sc))
return false;
- mem_cgroup_calculate_protection(NULL, memcg);
+ /* see the comment on lru_gen_folio */
+ gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+ birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
- return !mem_cgroup_below_min(NULL, memcg);
+ return time_is_before_jiffies(birth + min_ttl);
}
/* to protect the working set of the last N jiffies */
@@ -3970,23 +3994,20 @@ static void lru_gen_age_node(struct pgli
{
struct mem_cgroup *memcg;
unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
+ bool reclaimable = !min_ttl;
VM_WARN_ON_ONCE(!current_is_kswapd());
- /* check the order to exclude compaction-induced reclaim */
- if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
- return;
+ set_initial_priority(pgdat, sc);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
- mem_cgroup_iter_break(NULL, memcg);
- return;
- }
+ mem_cgroup_calculate_protection(NULL, memcg);
- cond_resched();
+ if (!reclaimable)
+ reclaimable = lruvec_is_reclaimable(lruvec, sc, min_ttl);
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
/*
@@ -3994,7 +4015,7 @@ static void lru_gen_age_node(struct pgli
* younger than min_ttl. However, another possibility is all memcgs are
* either too small or below min.
*/
- if (mutex_trylock(&oom_lock)) {
+ if (!reclaimable && mutex_trylock(&oom_lock)) {
struct oom_control oc = {
.gfp_mask = sc->gfp_mask,
};
@@ -4786,8 +4807,7 @@ static int shrink_one(struct lruvec *lru
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- mem_cgroup_calculate_protection(NULL, memcg);
-
+ /* lru_gen_age_node() called mem_cgroup_calculate_protection() */
if (mem_cgroup_below_min(NULL, memcg))
return MEMCG_LRU_YOUNG;
@@ -4911,32 +4931,6 @@ static void lru_gen_shrink_lruvec(struct
blk_finish_plug(&plug);
}
-static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
-{
- int priority;
- unsigned long reclaimable;
-
- if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
- return;
- /*
- * Determine the initial priority based on
- * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
- * where reclaimed_to_scanned_ratio = inactive / total.
- */
- reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
- if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc))
- reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
-
- /* round down reclaimable and round up sc->nr_to_reclaim */
- priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
-
- /*
- * The estimation is based on LRU pages only, so cap it to prevent
- * overshoots of shrinker objects by large margins.
- */
- sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
-}
-
static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct blk_plug plug;
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-mglru-fix-ineffective-protection-calculation.patch
The patch titled
Subject: mm/mglru: fix ineffective protection calculation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mglru-fix-ineffective-protection-calculation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/mglru: fix ineffective protection calculation
Date: Fri, 12 Jul 2024 17:29:56 -0600
mem_cgroup_calculate_protection() is not stateless and should only be used
as part of a top-down tree traversal. shrink_one() traverses the per-node
memcg LRU instead of the root_mem_cgroup tree, and therefore it should not
call mem_cgroup_calculate_protection().
The existing misuse in shrink_one() can cause ineffective protection of
sub-trees that are grandchildren of root_mem_cgroup. Fix it by reusing
lru_gen_age_node(), which already traverses the root_mem_cgroup tree, to
calculate the protection.
Previously lru_gen_age_node() opportunistically skips the first pass,
i.e., when scan_control->priority is DEF_PRIORITY. On the second pass,
lruvec_is_sizable() uses appropriate scan_control->priority, set by
set_initial_priority() from lru_gen_shrink_node(), to decide whether a
memcg is too small to reclaim from.
Now lru_gen_age_node() unconditionally traverses the root_mem_cgroup tree.
So it should call set_initial_priority() upfront, to make sure
lruvec_is_sizable() uses appropriate scan_control->priority on the first
pass. Otherwise, lruvec_is_reclaimable() can return false negatives and
result in premature OOM kills when min_ttl_ms is used.
Link: https://lkml.kernel.org/r/20240712232956.1427127-1-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: T.J. Mercier <tjmercier(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 30 ++++++++++++------------------
1 file changed, 12 insertions(+), 18 deletions(-)
--- a/mm/vmscan.c~mm-mglru-fix-ineffective-protection-calculation
+++ a/mm/vmscan.c
@@ -3933,19 +3933,17 @@ static bool lruvec_is_reclaimable(struct
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MIN_SEQ(lruvec);
- /* see the comment on lru_gen_folio */
- gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
- birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
-
- if (time_is_after_jiffies(birth + min_ttl))
+ if (mem_cgroup_below_min(NULL, memcg))
return false;
if (!lruvec_is_sizable(lruvec, sc))
return false;
- mem_cgroup_calculate_protection(NULL, memcg);
+ /* see the comment on lru_gen_folio */
+ gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+ birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
- return !mem_cgroup_below_min(NULL, memcg);
+ return time_is_before_jiffies(birth + min_ttl);
}
/* to protect the working set of the last N jiffies */
@@ -3955,23 +3953,20 @@ static void lru_gen_age_node(struct pgli
{
struct mem_cgroup *memcg;
unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
+ bool reclaimable = !min_ttl;
VM_WARN_ON_ONCE(!current_is_kswapd());
- /* check the order to exclude compaction-induced reclaim */
- if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
- return;
+ set_initial_priority(pgdat, sc);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
- mem_cgroup_iter_break(NULL, memcg);
- return;
- }
+ mem_cgroup_calculate_protection(NULL, memcg);
- cond_resched();
+ if (!reclaimable)
+ reclaimable = lruvec_is_reclaimable(lruvec, sc, min_ttl);
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
/*
@@ -3979,7 +3974,7 @@ static void lru_gen_age_node(struct pgli
* younger than min_ttl. However, another possibility is all memcgs are
* either too small or below min.
*/
- if (mutex_trylock(&oom_lock)) {
+ if (!reclaimable && mutex_trylock(&oom_lock)) {
struct oom_control oc = {
.gfp_mask = sc->gfp_mask,
};
@@ -4772,8 +4767,7 @@ static int shrink_one(struct lruvec *lru
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- mem_cgroup_calculate_protection(NULL, memcg);
-
+ /* lru_gen_age_node() called mem_cgroup_calculate_protection() */
if (mem_cgroup_below_min(NULL, memcg))
return MEMCG_LRU_YOUNG;
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-mglru-fix-ineffective-protection-calculation.patch
The patch titled
Subject: mm/huge_memory: avoid PMD-size page cache if needed
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gavin Shan <gshan(a)redhat.com>
Subject: mm/huge_memory: avoid PMD-size page cache if needed
Date: Mon, 15 Jul 2024 10:04:23 +1000
xarray can't support arbitrary page cache size. the largest and supported
page cache size is defined as MAX_PAGECACHE_ORDER by commit 099d90642a71
("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to xarray"). However,
it's possible to have 512MB page cache in the huge memory's collapsing
path on ARM64 system whose base page size is 64KB. 512MB page cache is
breaking the limitation and a warning is raised when the xarray entry is
split as shown in the following example.
[root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize
KernelPageSize: 64 kB
[root@dhcp-10-26-1-207 ~]# cat /tmp/test.c
:
int main(int argc, char **argv)
{
const char *filename = TEST_XFS_FILENAME;
int fd = 0;
void *buf = (void *)-1, *p;
int pgsize = getpagesize();
int ret = 0;
if (pgsize != 0x10000) {
fprintf(stdout, "System with 64KB base page size is required!\n");
return -EPERM;
}
system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb");
system("echo 1 > /proc/sys/vm/drop_caches");
/* Open the xfs file */
fd = open(filename, O_RDONLY);
assert(fd > 0);
/* Create VMA */
buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0);
assert(buf != (void *)-1);
fprintf(stdout, "mapped buffer at 0x%p\n", buf);
/* Populate VMA */
ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE);
assert(ret == 0);
ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ);
assert(ret == 0);
/* Collapse VMA */
ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE);
assert(ret == 0);
ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE);
if (ret) {
fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno);
goto out;
}
/* Split xarray entry. Write permission is needed */
munmap(buf, TEST_MEM_SIZE);
buf = (void *)-1;
close(fd);
fd = open(filename, O_RDWR);
assert(fd > 0);
fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
TEST_MEM_SIZE - pgsize, pgsize);
out:
if (buf != (void *)-1)
munmap(buf, TEST_MEM_SIZE);
if (fd > 0)
close(fd);
return ret;
}
[root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test
[root@dhcp-10-26-1-207 ~]# /tmp/test
------------[ cut here ]------------
WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128
Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \
ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse \
xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net \
sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio
CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ #9
Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : xas_split_alloc+0xf8/0x128
lr : split_huge_page_to_list_to_order+0x1c4/0x780
sp : ffff8000ac32f660
x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0
x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d
x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000
x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000
x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c
x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8
x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40
x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000
Call trace:
xas_split_alloc+0xf8/0x128
split_huge_page_to_list_to_order+0x1c4/0x780
truncate_inode_partial_folio+0xdc/0x160
truncate_inode_pages_range+0x1b4/0x4a8
truncate_pagecache_range+0x84/0xa0
xfs_flush_unmap_range+0x70/0x90 [xfs]
xfs_file_fallocate+0xfc/0x4d8 [xfs]
vfs_fallocate+0x124/0x2f0
ksys_fallocate+0x4c/0xa0
__arm64_sys_fallocate+0x24/0x38
invoke_syscall.constprop.0+0x7c/0xd8
do_el0_svc+0xb4/0xd0
el0_svc+0x44/0x1d8
el0t_64_sync_handler+0x134/0x150
el0t_64_sync+0x17c/0x180
Fix it by correcting the supported page cache orders, different sets for
DAX and other files. With it corrected, 512MB page cache becomes
disallowed on all non-DAX files on ARM64 system where the base page size
is 64KB. After this patch is applied, the test program fails with error
-EINVAL returned from __thp_vma_allowable_orders() and the madvise()
system call to collapse the page caches.
Link: https://lkml.kernel.org/r/20240715000423.316491-1-gshan@redhat.com
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: <stable(a)vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/huge_mm.h | 12 +++++++++---
mm/huge_memory.c | 12 ++++++++++--
2 files changed, 19 insertions(+), 5 deletions(-)
--- a/include/linux/huge_mm.h~mm-huge_memory-avoid-pmd-size-page-cache-if-needed
+++ a/include/linux/huge_mm.h
@@ -72,14 +72,20 @@ extern struct kobj_attribute shmem_enabl
#define THP_ORDERS_ALL_ANON ((BIT(PMD_ORDER + 1) - 1) & ~(BIT(0) | BIT(1)))
/*
- * Mask of all large folio orders supported for file THP.
+ * Mask of all large folio orders supported for file THP. Folios in a DAX
+ * file is never split and the MAX_PAGECACHE_ORDER limit does not apply to
+ * it.
*/
-#define THP_ORDERS_ALL_FILE (BIT(PMD_ORDER) | BIT(PUD_ORDER))
+#define THP_ORDERS_ALL_FILE_DAX \
+ (BIT(PMD_ORDER) | BIT(PUD_ORDER))
+#define THP_ORDERS_ALL_FILE_DEFAULT \
+ ((BIT(MAX_PAGECACHE_ORDER + 1) - 1) & ~BIT(0))
/*
* Mask of all large folio orders supported for THP.
*/
-#define THP_ORDERS_ALL (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_FILE)
+#define THP_ORDERS_ALL \
+ (THP_ORDERS_ALL_ANON | THP_ORDERS_ALL_FILE_DAX | THP_ORDERS_ALL_FILE_DEFAULT)
#define TVA_SMAPS (1 << 0) /* Will be used for procfs */
#define TVA_IN_PF (1 << 1) /* Page fault handler */
--- a/mm/huge_memory.c~mm-huge_memory-avoid-pmd-size-page-cache-if-needed
+++ a/mm/huge_memory.c
@@ -88,9 +88,17 @@ unsigned long __thp_vma_allowable_orders
bool smaps = tva_flags & TVA_SMAPS;
bool in_pf = tva_flags & TVA_IN_PF;
bool enforce_sysfs = tva_flags & TVA_ENFORCE_SYSFS;
+ unsigned long supported_orders;
+
/* Check the intersection of requested and supported orders. */
- orders &= vma_is_anonymous(vma) ?
- THP_ORDERS_ALL_ANON : THP_ORDERS_ALL_FILE;
+ if (vma_is_anonymous(vma))
+ supported_orders = THP_ORDERS_ALL_ANON;
+ else if (vma_is_dax(vma))
+ supported_orders = THP_ORDERS_ALL_FILE_DAX;
+ else
+ supported_orders = THP_ORDERS_ALL_FILE_DEFAULT;
+
+ orders &= supported_orders;
if (!orders)
return 0;
_
Patches currently in -mm which might be from gshan(a)redhat.com are
mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch
Hi,
There are two patches for bpf that I think should be backported to
v6.6 kernel to fix a privilege escalation vulnerability in kernelCTF.
bpf: Fix too early release of tcx_entry
1cb6f0bae50441f4b4b32a28315853b279c7404e
selftests/bpf: Extend tcx tests to cover late tcx_entry release
5f1d18de79180deac2822c93e431bbe547f7d3ce
Thanks!
Best regards,
He
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 310d6c15e9104c99d5d9d0ff8e5383a79da7d5e6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071546-swerve-grew-de52@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
310d6c15e910 ("mm/damon/core: merge regions aggressively when max_nr_regions is unmet")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 310d6c15e9104c99d5d9d0ff8e5383a79da7d5e6 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj(a)kernel.org>
Date: Mon, 24 Jun 2024 10:58:14 -0700
Subject: [PATCH] mm/damon/core: merge regions aggressively when max_nr_regions
is unmet
DAMON keeps the number of regions under max_nr_regions by skipping regions
split operations when doing so can make the number higher than the limit.
It works well for preventing violation of the limit. But, if somehow the
violation happens, it cannot recovery well depending on the situation. In
detail, if the real number of regions having different access pattern is
higher than the limit, the mechanism cannot reduce the number below the
limit. In such a case, the system could suffer from high monitoring
overhead of DAMON.
The violation can actually happen. For an example, the user could reduce
max_nr_regions while DAMON is running, to be lower than the current number
of regions. Fix the problem by repeating the merge operations with
increasing aggressiveness in kdamond_merge_regions() for the case, until
the limit is met.
[sj(a)kernel.org: increase regions merge aggressiveness while respecting min_nr_regions]
Link: https://lkml.kernel.org/r/20240626164753.46270-1-sj@kernel.org
[sj(a)kernel.org: ensure max threshold attempt for max_nr_regions violation]
Link: https://lkml.kernel.org/r/20240627163153.75969-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20240624175814.89611-1-sj@kernel.org
Fixes: b9a6ac4e4ede ("mm/damon: adaptively adjust regions")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 6392f1cc97a3..e66823d6b10b 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1358,14 +1358,31 @@ static void damon_merge_regions_of(struct damon_target *t, unsigned int thres,
* access frequencies are similar. This is for minimizing the monitoring
* overhead under the dynamically changeable access pattern. If a merge was
* unnecessarily made, later 'kdamond_split_regions()' will revert it.
+ *
+ * The total number of regions could be higher than the user-defined limit,
+ * max_nr_regions for some cases. For example, the user can update
+ * max_nr_regions to a number that lower than the current number of regions
+ * while DAMON is running. For such a case, repeat merging until the limit is
+ * met while increasing @threshold up to possible maximum level.
*/
static void kdamond_merge_regions(struct damon_ctx *c, unsigned int threshold,
unsigned long sz_limit)
{
struct damon_target *t;
+ unsigned int nr_regions;
+ unsigned int max_thres;
- damon_for_each_target(t, c)
- damon_merge_regions_of(t, threshold, sz_limit);
+ max_thres = c->attrs.aggr_interval /
+ (c->attrs.sample_interval ? c->attrs.sample_interval : 1);
+ do {
+ nr_regions = 0;
+ damon_for_each_target(t, c) {
+ damon_merge_regions_of(t, threshold, sz_limit);
+ nr_regions += damon_nr_regions(t);
+ }
+ threshold = max(1, threshold * 2);
+ } while (nr_regions > c->attrs.max_nr_regions &&
+ threshold / 2 < max_thres);
}
/*
I have created a bz for this issue as well -
https://bugzilla.redhat.com/show_bug.cgi?id=2293600
In my digging, it seems that the 6.9+ kernels do not have the xpad
kernel module (running # sudo modprobe xpad returns no value)
A workaround was to manually install xone or xpad kernel module and
self sign the key (keeping secure boot intact) however, this is
somewhat tedious since it did work before.
Commit ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in
gs_start_io") adds null pointer checks to gs_start_io(), but it doesn't
fully fix the potential null pointer dereference issue. While
gserial_connect() calls gs_start_io() with port_lock held, gs_start_rx()
and gs_start_tx() release the lock during endpoint request submission.
This creates a window where gs_close() could set port->port_tty to NULL,
leading to a dereference when the lock is reacquired.
This patch adds a null pointer check for port->port_tty after RX/TX
submission, and removes the initial null pointer check in gs_start_io()
since the caller must hold port_lock and guarantee non-null values for
port_usb and port_tty.
Fixes: ffd603f21423 ("usb: gadget: u_serial: Add null pointer check in gs_start_io")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kuen-Han Tsai <khtsai(a)google.com>
---
Explanation:
CPU1: CPU2:
gserial_connect() // lock
gs_close() // await lock
gs_start_rx() // unlock
usb_ep_queue()
gs_close() // lock, reset port_tty and unlock
gs_start_rx() // lock
tty_wakeup() // dereference
Stack traces:
[ 51.494375][ T278] ttyGS1: shutdown
[ 51.494817][ T269] android_work: sent uevent USB_STATE=DISCONNECTED
[ 52.115792][ T1508] usb: [dm_bind] generic ttyGS1: super speed IN/ep1in OUT/ep1out
[ 52.516288][ T1026] android_work: sent uevent USB_STATE=CONNECTED
[ 52.551667][ T1533] gserial_connect: start ttyGS1
[ 52.565634][ T1533] [khtsai] enter gs_start_io, ttyGS1, port->port.tty=0000000046bd4060
[ 52.565671][ T1533] [khtsai] gs_start_rx, unlock port ttyGS1
[ 52.591552][ T1533] [khtsai] gs_start_rx, lock port ttyGS1
[ 52.619901][ T1533] [khtsai] gs_start_rx, unlock port ttyGS1
[ 52.638659][ T1325] [khtsai] gs_close, lock port ttyGS1
[ 52.656842][ T1325] gs_close: ttyGS1 (0000000046bd4060,00000000be9750a5) ...
[ 52.683005][ T1325] [khtsai] gs_close, clear ttyGS1
[ 52.683007][ T1325] gs_close: ttyGS1 (0000000046bd4060,00000000be9750a5) done!
[ 52.708643][ T1325] [khtsai] gs_close, unlock port ttyGS1
[ 52.747592][ T1533] [khtsai] gs_start_rx, lock port ttyGS1
[ 52.747616][ T1533] [khtsai] gs_start_io, ttyGS1, going to call tty_wakeup(), port->port.tty=0000000000000000
[ 52.747629][ T1533] Unable to handle kernel NULL pointer dereference at virtual address 00000000000001f8
---
drivers/usb/gadget/function/u_serial.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index a92eb6d90976..2f1890c8f473 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -539,20 +539,16 @@ static int gs_alloc_requests(struct usb_ep *ep, struct list_head *head,
static int gs_start_io(struct gs_port *port)
{
struct list_head *head = &port->read_pool;
- struct usb_ep *ep;
+ struct usb_ep *ep = port->port_usb->out;
int status;
unsigned started;
- if (!port->port_usb || !port->port.tty)
- return -EIO;
-
/* Allocate RX and TX I/O buffers. We can't easily do this much
* earlier (with GFP_KERNEL) because the requests are coupled to
* endpoints, as are the packet sizes we'll be using. Different
* configurations may use different endpoints with a given port;
* and high speed vs full speed changes packet sizes too.
*/
- ep = port->port_usb->out;
status = gs_alloc_requests(ep, head, gs_read_complete,
&port->read_allocated);
if (status)
@@ -569,12 +565,22 @@ static int gs_start_io(struct gs_port *port)
port->n_read = 0;
started = gs_start_rx(port);
+ /*
+ * The TTY may be set to NULL by gs_close() after gs_start_rx() or
+ * gs_start_tx() release locks for endpoint request submission.
+ */
+ if (!port->port.tty)
+ goto out;
+
if (started) {
gs_start_tx(port);
/* Unblock any pending writes into our circular buffer, in case
* we didn't in gs_start_tx() */
+ if (!port->port.tty)
+ goto out;
tty_wakeup(port->port.tty);
} else {
+out:
gs_free_requests(ep, head, &port->read_allocated);
gs_free_requests(port->port_usb->in, &port->write_pool,
&port->write_allocated);
--
2.43.0.275.g3460e3d667-goog
The page cache of the atomic file keeps new data pages which will be
stored in the COW file. It can also keep old data pages when GCing the
atomic file. In this case, new data can be overwritten by old data if a
GC thread sets the old data page as dirty after new data page was
evicted.
Also, since all writes to the atomic file are redirected to COW inodes,
GC for the atomic file is not working well as below.
f2fs_gc(gc_type=FG_GC)
- select A as a victim segment
do_garbage_collect
- iget atomic file's inode for block B
move_data_page
f2fs_do_write_data_page
- use dn of cow inode
- set fio->old_blkaddr from cow inode
- seg_freed is 0 since block B is still valid
- goto gc_more and A is selected as victim again
To solve the problem, let's separate GC writes and updates in the atomic
file by using the meta inode for GC writes.
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
---
v2:
- replace post_read to meta_gc
fs/f2fs/data.c | 4 ++--
fs/f2fs/f2fs.h | 7 ++++++-
fs/f2fs/gc.c | 6 +++---
fs/f2fs/segment.c | 6 +++---
4 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index b6dcb3bcaef7..9a213d03005d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2693,7 +2693,7 @@ int f2fs_do_write_data_page(struct f2fs_io_info *fio)
}
/* wait for GCed page writeback via META_MAPPING */
- if (fio->post_read)
+ if (fio->meta_gc)
f2fs_wait_on_block_writeback(inode, fio->old_blkaddr);
/*
@@ -2788,7 +2788,7 @@ int f2fs_write_single_data_page(struct page *page, int *submitted,
.submitted = 0,
.compr_blocks = compr_blocks,
.need_lock = compr_blocks ? LOCK_DONE : LOCK_RETRY,
- .post_read = f2fs_post_read_required(inode) ? 1 : 0,
+ .meta_gc = f2fs_meta_inode_gc_required(inode) ? 1 : 0,
.io_type = io_type,
.io_wbc = wbc,
.bio = bio,
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f7ee6c5e371e..796ae11c0fa3 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1211,7 +1211,7 @@ struct f2fs_io_info {
unsigned int in_list:1; /* indicate fio is in io_list */
unsigned int is_por:1; /* indicate IO is from recovery or not */
unsigned int encrypted:1; /* indicate file is encrypted */
- unsigned int post_read:1; /* require post read */
+ unsigned int meta_gc:1; /* require meta inode GC */
enum iostat_type io_type; /* io type */
struct writeback_control *io_wbc; /* writeback control */
struct bio **bio; /* bio for ipu */
@@ -4263,6 +4263,11 @@ static inline bool f2fs_post_read_required(struct inode *inode)
f2fs_compressed_file(inode);
}
+static inline bool f2fs_meta_inode_gc_required(struct inode *inode)
+{
+ return f2fs_post_read_required(inode) || f2fs_is_atomic_file(inode);
+}
+
/*
* compress.c
*/
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index ef667fec9a12..cb3006551ab5 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1589,7 +1589,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
start_bidx = f2fs_start_bidx_of_node(nofs, inode) +
ofs_in_node;
- if (f2fs_post_read_required(inode)) {
+ if (f2fs_meta_inode_gc_required(inode)) {
int err = ra_data_block(inode, start_bidx);
f2fs_up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
@@ -1640,7 +1640,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
start_bidx = f2fs_start_bidx_of_node(nofs, inode)
+ ofs_in_node;
- if (f2fs_post_read_required(inode))
+ if (f2fs_meta_inode_gc_required(inode))
err = move_data_block(inode, start_bidx,
gc_type, segno, off);
else
@@ -1648,7 +1648,7 @@ static int gc_data_segment(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
segno, off);
if (!err && (gc_type == FG_GC ||
- f2fs_post_read_required(inode)))
+ f2fs_meta_inode_gc_required(inode)))
submitted++;
if (locked) {
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 4db1add43e36..77ef46b384b4 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -3851,7 +3851,7 @@ int f2fs_inplace_write_data(struct f2fs_io_info *fio)
goto drop_bio;
}
- if (fio->post_read)
+ if (fio->meta_gc)
f2fs_truncate_meta_inode_pages(sbi, fio->new_blkaddr, 1);
stat_inc_inplace_blocks(fio->sbi);
@@ -4021,7 +4021,7 @@ void f2fs_wait_on_block_writeback(struct inode *inode, block_t blkaddr)
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *cpage;
- if (!f2fs_post_read_required(inode))
+ if (!f2fs_meta_inode_gc_required(inode))
return;
if (!__is_valid_data_blkaddr(blkaddr))
@@ -4040,7 +4040,7 @@ void f2fs_wait_on_block_writeback_range(struct inode *inode, block_t blkaddr,
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
block_t i;
- if (!f2fs_post_read_required(inode))
+ if (!f2fs_meta_inode_gc_required(inode))
return;
for (i = 0; i < len; i++)
--
2.25.1
Since the below commit, there are regressions for legacy setups:
1/ conntracks are created while there are no listener
2/ a listener starts and dumps all conntracks to get the current state
3/ conntracks deleted before the listener has started are not advertised
This is problematic in containers, where conntracks could be created early.
This sysctl is part of unsafe sysctl and could not be changed easily in
some environments.
Let's switch back to the legacy behavior.
CC: stable(a)vger.kernel.org
Fixes: 90d1daa45849 ("netfilter: conntrack: add nf_conntrack_events autodetect mode")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
---
Documentation/networking/nf_conntrack-sysctl.rst | 10 ++++++----
net/netfilter/nf_conntrack_ecache.c | 2 +-
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/Documentation/networking/nf_conntrack-sysctl.rst b/Documentation/networking/nf_conntrack-sysctl.rst
index c383a394c665..edc04f99e1aa 100644
--- a/Documentation/networking/nf_conntrack-sysctl.rst
+++ b/Documentation/networking/nf_conntrack-sysctl.rst
@@ -34,13 +34,15 @@ nf_conntrack_count - INTEGER (read-only)
nf_conntrack_events - BOOLEAN
- 0 - disabled
- - 1 - enabled
- - 2 - auto (default)
+ - 1 - enabled (default)
+ - 2 - auto
If this option is enabled, the connection tracking code will
provide userspace with connection tracking events via ctnetlink.
- The default allocates the extension if a userspace program is
- listening to ctnetlink events.
+ The 'auto' allocates the extension if a userspace program is
+ listening to ctnetlink events. Note that conntracks created
+ before the first listener has started won't trigger any netlink
+ event.
nf_conntrack_expect_max - INTEGER
Maximum size of expectation table. Default value is
diff --git a/net/netfilter/nf_conntrack_ecache.c b/net/netfilter/nf_conntrack_ecache.c
index 69948e1d6974..4c8559529e18 100644
--- a/net/netfilter/nf_conntrack_ecache.c
+++ b/net/netfilter/nf_conntrack_ecache.c
@@ -334,7 +334,7 @@ bool nf_ct_ecache_ext_add(struct nf_conn *ct, u16 ctmask, u16 expmask, gfp_t gfp
}
EXPORT_SYMBOL_GPL(nf_ct_ecache_ext_add);
-#define NF_CT_EVENTS_DEFAULT 2
+#define NF_CT_EVENTS_DEFAULT 1
static int nf_ct_events __read_mostly = NF_CT_EVENTS_DEFAULT;
void nf_conntrack_ecache_pernet_init(struct net *net)
--
2.43.1
The ov5675 specification says that the gap between XSHUTDN deassert and the
first I2C transaction should be a minimum of 8192 XVCLK cycles.
Right now we use a usleep_rage() that gives a sleep time of between about
430 and 860 microseconds.
On the Lenovo X13s we have observed that in about 1/20 cases the current
timing is too tight and we start transacting before the ov5675's reset
cycle completes, leading to I2C bus transaction failures.
The reset racing is sometimes triggered at initial chip probe but, more
usually on a subsequent power-off/power-on cycle e.g.
[ 71.451662] ov5675 24-0010: failed to write reg 0x0103. error = -5
[ 71.451686] ov5675 24-0010: failed to set plls
The current quiescence period we have is too tight. Instead of expressing
the post reset delay in terms of the current XVCLK this patch converts the
power-on and power-off delays to the maximum theoretical delay @ 6 MHz with
an additional buffer.
1.365 milliseconds on the power-on path is 1.5 milliseconds with grace.
85.3 microseconds on the power-off path is 90 microseconds with grace.
Fixes: 49d9ad719e89 ("media: ov5675: add device-tree support and support runtime PM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
v3:
- Fixed my out-by-one 853 -> 85.3 us calc and the 900 us -> 90us calc as a
result.
- Link to v2: https://lore.kernel.org/r/20240711-linux-next-ov5675-v2-1-d0ea6ac2e6e9@lina…
v2:
- Drop patch to read and act on reported XVCLK
- Use worst-case timings + a reasonable grace period in-lieu of previous
xvclk calculations on power-on and power-off.
- Link to v1: https://lore.kernel.org/r/20240711-linux-next-ov5675-v1-0-69e9b6c62c16@lina…
v1:
One long running saga for me on the Lenovo X13s is the occasional failure
to either probe or subsequently bring-up the ov5675 main RGB sensor on the
laptop.
Initially I suspected the PMIC for this part as the PMIC is using a new
interface on an I2C bus instead of an SPMI bus. In particular I thought
perhaps the I2C write to PMIC had completed but the regulator output hadn't
become stable from the perspective of the SoC. This however doesn't appear
to be the case - I can introduce a delay of milliseconds on the PMIC path
without resolving the sensor reset problem.
Secondly I thought about reset pin polarity or drive-strength but, again
playing about with both didn't yield decent results.
I also played with the duration of reset to no avail.
The error manifested as an I2C write timeout to the sensor which indicated
that the chip likely hadn't come out reset. An intermittent fault appearing
in perhaps 1/10 or 1/20 reset cycles.
Looking at the expression of the reset we see that there is a minimum time
expressed in XVCLK cycles between reset completion and first I2C
transaction to the sensor. The specification calls out the minimum delay @
8192 XVCLK cycles and the ov5675 driver meets that timing almost exactly.
A little too exactly - testing finally showed that we were too racy with
respect to the minimum quiescence between reset completion and first
command to the chip.
Fixing this error I choose to base the fix again on the number of clocks
but to also support any clock rate the chip could support by moving away
from a define to reading and using the XVCLK.
True enough only 19.2 MHz is currently supported but for the hypothetical
case where some other frequency is supported in the future, I wanted the
fix introduced in this series to still hold.
Hence this series:
1. Allows for any clock rate to be used in the valid range for the reset.
2. Elongates the post-reset period based on clock cycles which can now
vary.
Patch #2 can still be backported to stable irrespective of patch #1.
---
drivers/media/i2c/ov5675.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/media/i2c/ov5675.c b/drivers/media/i2c/ov5675.c
index 3641911bc73f..5b5127f8953f 100644
--- a/drivers/media/i2c/ov5675.c
+++ b/drivers/media/i2c/ov5675.c
@@ -972,12 +972,10 @@ static int ov5675_set_stream(struct v4l2_subdev *sd, int enable)
static int ov5675_power_off(struct device *dev)
{
- /* 512 xvclk cycles after the last SCCB transation or MIPI frame end */
- u32 delay_us = DIV_ROUND_UP(512, OV5675_XVCLK_19_2 / 1000 / 1000);
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct ov5675 *ov5675 = to_ov5675(sd);
- usleep_range(delay_us, delay_us * 2);
+ usleep_range(90, 100);
clk_disable_unprepare(ov5675->xvclk);
gpiod_set_value_cansleep(ov5675->reset_gpio, 1);
@@ -988,7 +986,6 @@ static int ov5675_power_off(struct device *dev)
static int ov5675_power_on(struct device *dev)
{
- u32 delay_us = DIV_ROUND_UP(8192, OV5675_XVCLK_19_2 / 1000 / 1000);
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct ov5675 *ov5675 = to_ov5675(sd);
int ret;
@@ -1014,8 +1011,11 @@ static int ov5675_power_on(struct device *dev)
gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
- /* 8192 xvclk cycles prior to the first SCCB transation */
- usleep_range(delay_us, delay_us * 2);
+ /* Worst case quiesence gap is 1.365 milliseconds @ 6MHz XVCLK
+ * Add an additional threshold grace period to ensure reset
+ * completion before initiating our first I2C transaction.
+ */
+ usleep_range(1500, 1600);
return 0;
}
---
base-commit: 523b23f0bee3014a7a752c9bb9f5c54f0eddae88
change-id: 20240710-linux-next-ov5675-60b0e83c73f1
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
The linux-5.10-y backport of commit b377c66ae350 ("x86/retpoline: Add
NOENDBR annotation to the SRSO dummy return thunk") misplaced the new
NOENDBR annotation, repeating the annotation on __x86_return_thunk,
rather than adding the annotation to the !CONFIG_CPU_SRSO version of
srso_alias_untrain_ret, as intended.
Move the annotation to the right place.
Fixes: 0bdc64e9e716 ("x86/retpoline: Add NOENDBR annotation to the SRSO dummy return thunk")
Reported-by: Greg Thelen <gthelen(a)google.com>
Signed-off-by: Jim Mattson <jmattson(a)google.com>
Acked-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: stable(a)vger.kernel.org
---
arch/x86/lib/retpoline.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index ab9b047790dd..d1902213a0d6 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -105,6 +105,7 @@ __EXPORT_THUNK(srso_alias_untrain_ret)
/* dummy definition for alternatives */
SYM_START(srso_alias_untrain_ret, SYM_L_GLOBAL, SYM_A_NONE)
ANNOTATE_UNRET_SAFE
+ ANNOTATE_NOENDBR
ret
int3
SYM_FUNC_END(srso_alias_untrain_ret)
@@ -258,7 +259,6 @@ SYM_CODE_START(__x86_return_thunk)
UNWIND_HINT_FUNC
ANNOTATE_NOENDBR
ANNOTATE_UNRET_SAFE
- ANNOTATE_NOENDBR
ret
int3
SYM_CODE_END(__x86_return_thunk)
--
2.45.2.803.g4e1b14247a-goog
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1f789a45c3f1aa77531db21768fca70b66c0eeb1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071532-alabaster-overstate-3512@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
1f789a45c3f1 ("mm/readahead: limit page cache size in page_cache_ra_order()")
e03c16fb4af1 ("readahead: use ilog2 instead of a while loop in page_cache_ra_order()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1f789a45c3f1aa77531db21768fca70b66c0eeb1 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan(a)redhat.com>
Date: Thu, 27 Jun 2024 10:39:50 +1000
Subject: [PATCH] mm/readahead: limit page cache size in page_cache_ra_order()
In page_cache_ra_order(), the maximal order of the page cache to be
allocated shouldn't be larger than MAX_PAGECACHE_ORDER. Otherwise, it's
possible the large page cache can't be supported by xarray when the
corresponding xarray entry is split.
For example, HPAGE_PMD_ORDER is 13 on ARM64 when the base page size is
64KB. The PMD-sized page cache can't be supported by xarray.
Link: https://lkml.kernel.org/r/20240627003953.1262512-3-gshan@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Zhenyu Zhang <zhenyzha(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.18+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/readahead.c b/mm/readahead.c
index c1b23989d9ca..817b2a352d78 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -503,11 +503,11 @@ void page_cache_ra_order(struct readahead_control *ractl,
limit = min(limit, index + ra->size - 1);
- if (new_order < MAX_PAGECACHE_ORDER) {
+ if (new_order < MAX_PAGECACHE_ORDER)
new_order += 2;
- new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
- new_order = min_t(unsigned int, new_order, ilog2(ra->size));
- }
+
+ new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
+ new_order = min_t(unsigned int, new_order, ilog2(ra->size));
/* See comment in page_cache_ra_unbounded() */
nofs = memalloc_nofs_save();
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1f789a45c3f1aa77531db21768fca70b66c0eeb1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071531-junkyard-cornea-9a80@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
1f789a45c3f1 ("mm/readahead: limit page cache size in page_cache_ra_order()")
e03c16fb4af1 ("readahead: use ilog2 instead of a while loop in page_cache_ra_order()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1f789a45c3f1aa77531db21768fca70b66c0eeb1 Mon Sep 17 00:00:00 2001
From: Gavin Shan <gshan(a)redhat.com>
Date: Thu, 27 Jun 2024 10:39:50 +1000
Subject: [PATCH] mm/readahead: limit page cache size in page_cache_ra_order()
In page_cache_ra_order(), the maximal order of the page cache to be
allocated shouldn't be larger than MAX_PAGECACHE_ORDER. Otherwise, it's
possible the large page cache can't be supported by xarray when the
corresponding xarray entry is split.
For example, HPAGE_PMD_ORDER is 13 on ARM64 when the base page size is
64KB. The PMD-sized page cache can't be supported by xarray.
Link: https://lkml.kernel.org/r/20240627003953.1262512-3-gshan@redhat.com
Fixes: 793917d997df ("mm/readahead: Add large folio readahead")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Darrick J. Wong <djwong(a)kernel.org>
Cc: Don Dutile <ddutile(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: Zhenyu Zhang <zhenyzha(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.18+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/readahead.c b/mm/readahead.c
index c1b23989d9ca..817b2a352d78 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -503,11 +503,11 @@ void page_cache_ra_order(struct readahead_control *ractl,
limit = min(limit, index + ra->size - 1);
- if (new_order < MAX_PAGECACHE_ORDER) {
+ if (new_order < MAX_PAGECACHE_ORDER)
new_order += 2;
- new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
- new_order = min_t(unsigned int, new_order, ilog2(ra->size));
- }
+
+ new_order = min_t(unsigned int, MAX_PAGECACHE_ORDER, new_order);
+ new_order = min_t(unsigned int, new_order, ilog2(ra->size));
/* See comment in page_cache_ra_unbounded() */
nofs = memalloc_nofs_save();
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ddab91f4b2de5c5b46e312a90107d9353087d8ea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071501-tasty-grandpa-318b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ddab91f4b2de ("pmdomain: qcom: rpmhpd: Skip retention level for Power Domains")
e2ad626f8f40 ("pmdomain: Rename the genpd subsystem to pmdomain")
b683a3620748 ("genpd: imx: relocate scu-pd under genpd")
fe38a2d570df ("MAINTAINERS: adjust file entry in STARFIVE JH71XX PMU CONTROLLER DRIVER")
7ed363cd8d0a ("genpd: move owl-sps-helper.c from drivers/soc")
b43f11e5b453 ("ARM: ux500: Move power-domain driver to the genpd dir")
444ffc820d90 ("soc: xilinx: Move power-domain driver to the genpd dir")
2449efaaf913 ("soc: ti: Mover power-domain drivers to the genpd dir")
27e0fef61ffd ("soc: tegra: Move powergate-bpmp driver to the genpd dir")
fd697e216040 ("soc: sunxi: Move power-domain driver to the genpd dir")
f3fb16291f48 ("soc: starfive: Move the power-domain driver to the genpd dir")
4419644bfc7f ("soc: samsung: Move power-domain driver to the genpd dir")
a8fcd3da73de ("soc: rockchip: Mover power-domain driver to the genpd dir")
86341a84495c ("soc: renesas: Move power-domain drivers to the genpd dir")
84e9c58c2166 ("soc: qcom: Move power-domain drivers to the genpd dir")
fcd9632122d7 ("soc: mediatek: Move power-domain drivers to the genpd dir")
e5300b2c3fe0 ("soc: imx: Move power-domain drivers to the genpd dir")
aded002384c1 ("soc: bcm: Move power-domain drivers to the genpd dir")
869b9dd3339a ("soc: apple: Move power-domain driver to the genpd dir")
22f86fab644b ("soc: amlogic: Move power-domain drivers to the genpd dir")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ddab91f4b2de5c5b46e312a90107d9353087d8ea Mon Sep 17 00:00:00 2001
From: Taniya Das <quic_tdas(a)quicinc.com>
Date: Tue, 25 Jun 2024 10:03:11 +0530
Subject: [PATCH] pmdomain: qcom: rpmhpd: Skip retention level for Power
Domains
In the cases where the power domain connected to logics is allowed to
transition from a level(L)-->power collapse(0)-->retention(1) or
vice versa retention(1)-->power collapse(0)-->level(L) will cause the
logic to lose the configurations. The ARC does not support retention
to collapse transition on MxC rails.
The targets from SM8450 onwards the PLL logics of clock controllers are
connected to MxC rails and the recommended configurations are carried
out during the clock controller probes. The MxC transition as mentioned
above should be skipped to ensure the PLL settings are intact across
clock controller power on & off.
On older targets that do not split MX into MxA and MxC does not collapse
the logic and it is parked always at RETENTION, thus this issue is never
observed on those targets.
Cc: stable(a)vger.kernel.org # v5.17
Reviewed-by: Bjorn Andersson <andersson(a)kernel.org>
Signed-off-by: Taniya Das <quic_tdas(a)quicinc.com>
Link: https://lore.kernel.org/r/20240625-avoid_mxc_retention-v2-1-af9c2f549a5f@qu…
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/pmdomain/qcom/rpmhpd.c b/drivers/pmdomain/qcom/rpmhpd.c
index de9121ef4216..d2cb4271a1ca 100644
--- a/drivers/pmdomain/qcom/rpmhpd.c
+++ b/drivers/pmdomain/qcom/rpmhpd.c
@@ -40,6 +40,7 @@
* @addr: Resource address as looped up using resource name from
* cmd-db
* @state_synced: Indicator that sync_state has been invoked for the rpmhpd resource
+ * @skip_retention_level: Indicate that retention level should not be used for the power domain
*/
struct rpmhpd {
struct device *dev;
@@ -56,6 +57,7 @@ struct rpmhpd {
const char *res_name;
u32 addr;
bool state_synced;
+ bool skip_retention_level;
};
struct rpmhpd_desc {
@@ -173,6 +175,7 @@ static struct rpmhpd mxc = {
.pd = { .name = "mxc", },
.peer = &mxc_ao,
.res_name = "mxc.lvl",
+ .skip_retention_level = true,
};
static struct rpmhpd mxc_ao = {
@@ -180,6 +183,7 @@ static struct rpmhpd mxc_ao = {
.active_only = true,
.peer = &mxc,
.res_name = "mxc.lvl",
+ .skip_retention_level = true,
};
static struct rpmhpd nsp = {
@@ -819,6 +823,9 @@ static int rpmhpd_update_level_mapping(struct rpmhpd *rpmhpd)
return -EINVAL;
for (i = 0; i < rpmhpd->level_count; i++) {
+ if (rpmhpd->skip_retention_level && buf[i] == RPMH_REGULATOR_LEVEL_RETENTION)
+ continue;
+
rpmhpd->level[i] = buf[i];
/* Remember the first corner with non-zero level */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1723f04caacb32cadc4e063725d836a0c4450694
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071539-magnetize-nimble-15ba@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
1723f04caacb ("Fix userfaultfd_api to return EINVAL as expected")
2ff559f31a5d ("Revert "userfaultfd: don't fail on unrecognized features"")
914eedcb9ba0 ("userfaultfd: don't fail on unrecognized features")
b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
824ddc601adc ("userfaultfd: provide unmasked address on page-fault")
964ab0040ff9 ("userfaultfd/shmem: advertise shmem minor fault support")
c949b097ef2e ("userfaultfd/shmem: support minor fault registration for shmem")
00b151f21f39 ("mm/userfaultfd: fail uffd-wp registration if not supported")
b8da5cd4e5f1 ("userfaultfd: update documentation to describe minor fault handling")
f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
7677f7fd8be7 ("userfaultfd: add minor fault registration mode")
44835d20b2a0 ("mm: add FGP_ENTRY")
8f251a3d5ce3 ("hugetlb: convert page_huge_active() HPageMigratable flag")
d6995da31122 ("hugetlb: use page.private for hugetlb specific page flags")
99ca0edb41aa ("Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1723f04caacb32cadc4e063725d836a0c4450694 Mon Sep 17 00:00:00 2001
From: Audra Mitchell <audra(a)redhat.com>
Date: Wed, 26 Jun 2024 09:05:11 -0400
Subject: [PATCH] Fix userfaultfd_api to return EINVAL as expected
Currently if we request a feature that is not set in the Kernel config we
fail silently and return all the available features. However, the man
page indicates we should return an EINVAL.
We need to fix this issue since we can end up with a Kernel warning should
a program request the feature UFFD_FEATURE_WP_UNPOPULATED on a kernel with
the config not set with this feature.
[ 200.812896] WARNING: CPU: 91 PID: 13634 at mm/memory.c:1660 zap_pte_range+0x43d/0x660
[ 200.820738] Modules linked in:
[ 200.869387] CPU: 91 PID: 13634 Comm: userfaultfd Kdump: loaded Not tainted 6.9.0-rc5+ #8
[ 200.877477] Hardware name: Dell Inc. PowerEdge R6525/0N7YGH, BIOS 2.7.3 03/30/2022
[ 200.885052] RIP: 0010:zap_pte_range+0x43d/0x660
Link: https://lkml.kernel.org/r/20240626130513.120193-1-audra@redhat.com
Fixes: e06f1e1dd499 ("userfaultfd: wp: enabled write protection in userfaultfd API")
Signed-off-by: Audra Mitchell <audra(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Jan Kara <jack(a)suse.cz>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rafael Aquini <raquini(a)redhat.com>
Cc: Shaohua Li <shli(a)fb.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index eee7320ab0b0..17e409ceaa33 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -2057,7 +2057,7 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
goto out;
features = uffdio_api.features;
ret = -EINVAL;
- if (uffdio_api.api != UFFD_API || (features & ~UFFD_API_FEATURES))
+ if (uffdio_api.api != UFFD_API)
goto err_out;
ret = -EPERM;
if ((features & UFFD_FEATURE_EVENT_FORK) && !capable(CAP_SYS_PTRACE))
@@ -2081,6 +2081,11 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx,
uffdio_api.features &= ~UFFD_FEATURE_WP_UNPOPULATED;
uffdio_api.features &= ~UFFD_FEATURE_WP_ASYNC;
#endif
+
+ ret = -EINVAL;
+ if (features & ~uffdio_api.features)
+ goto err_out;
+
uffdio_api.ioctls = UFFD_API_IOCTLS;
ret = -EFAULT;
if (copy_to_user(buf, &uffdio_api, sizeof(uffdio_api)))
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071511-stained-facebook-224d@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071536-cope-capillary-34c4@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071518-alfalfa-identify-0afb@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071554-rebel-footsore-1818@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071557-bullion-punk-ee0a@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
d6efb3ac3e6c ("Merge tag 'tty-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071556-cognition-backstab-964b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
d6efb3ac3e6c ("Merge tag 'tty-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071555-occupy-tables-792a@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071554-arrival-handiness-71ea@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 507786c51ccf8df726df804ae316a8c52537b407
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071554-plunder-envelope-5c9d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
507786c51ccf ("serial: qcom-geni: fix hard lockup on buffer flush")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 507786c51ccf8df726df804ae316a8c52537b407 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:04 +0200
Subject: [PATCH] serial: qcom-geni: fix hard lockup on buffer flush
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91d9 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders(a)chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5be ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable(a)vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index a41360d34790..b2bbd2d79dbb 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -906,13 +906,17 @@ static void qcom_geni_serial_handle_tx_fifo(struct uart_port *uport,
else
pending = kfifo_len(&tport->xmit_fifo);
- /* All data has been transmitted and acknowledged as received */
- if (!pending && !status && done) {
+ /* All data has been transmitted or command has been cancelled */
+ if (!pending && done) {
qcom_geni_serial_stop_tx_fifo(uport);
goto out_write_wakeup;
}
- avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ if (active)
+ avail = port->tx_fifo_depth - (status & TX_FIFO_WC);
+ else
+ avail = port->tx_fifo_depth;
+
avail *= BYTES_PER_FIFO_WORD;
chunk = min(avail, pending);
@@ -1091,6 +1095,11 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
qcom_geni_serial_cancel_tx_cmd(uport);
}
+static void qcom_geni_serial_flush_buffer(struct uart_port *uport)
+{
+ qcom_geni_serial_cancel_tx_cmd(uport);
+}
+
static int qcom_geni_serial_port_setup(struct uart_port *uport)
{
struct qcom_geni_serial_port *port = to_dev_port(uport);
@@ -1547,6 +1556,7 @@ static const struct uart_ops qcom_geni_console_pops = {
.request_port = qcom_geni_serial_request_port,
.config_port = qcom_geni_serial_config_port,
.shutdown = qcom_geni_serial_shutdown,
+ .flush_buffer = qcom_geni_serial_flush_buffer,
.type = qcom_geni_serial_get_type,
.set_mctrl = qcom_geni_serial_set_mctrl,
.get_mctrl = qcom_geni_serial_get_mctrl,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071534-designer-distract-6edf@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
9aff74cc4e9e ("serial: qcom-geni: fix console shutdown hang")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
d6efb3ac3e6c ("Merge tag 'tty-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071533-imply-commute-d073@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
9aff74cc4e9e ("serial: qcom-geni: fix console shutdown hang")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
d6efb3ac3e6c ("Merge tag 'tty-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071532-unmarked-deeply-9b62@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
9aff74cc4e9e ("serial: qcom-geni: fix console shutdown hang")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071531-frigidly-repost-8196@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
9aff74cc4e9e ("serial: qcom-geni: fix console shutdown hang")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
654a8d6c93e7 ("tty: serial: qcom-geni-serial: Implement start_rx callback")
c2194bc999d4 ("tty: serial: qcom-geni-serial: Remove uart frequency table. Instead, find suitable frequency with call to clk_round_rate.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071530-chastise-driveway-b99e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
947cc4ecc06c ("serial: qcom-geni: fix soft lockup on sw flow control and suspend")
9aff74cc4e9e ("serial: qcom-geni: fix console shutdown hang")
2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
40ec6d41c841 ("tty: serial: qcom-geni-serial: use of_device_id data")
0626afe57b1f ("tty: serial: qcom-geni-serial: drop the return value from handle_rx")
bd7955840cbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_send_chunk_fifo()")
d420fb491cbc ("tty: serial: qcom-geni-serial: split out the FIFO tx code")
fe6a00e8fcbe ("tty: serial: qcom-geni-serial: refactor qcom_geni_serial_isr()")
00ce7c6e86b5 ("tty: serial: qcom-geni-serial: improve the to_dev_port() macro")
6cde11dbf4b6 ("tty: serial: qcom-geni-serial: align #define values")
68c6bd92c86c ("tty: serial: qcom-geni-serial: remove unused symbols")
d0fabb0dc1a6 ("tty: serial: qcom-geni-serial: drop unneeded forward definitions")
d8aca2f96813 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
35781d8356a2 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 947cc4ecc06cb80a2aa2cebbbbf0e546fbaf0238 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Thu, 4 Jul 2024 12:18:03 +0200
Subject: [PATCH] serial: qcom-geni: fix soft lockup on sw flow control and
suspend
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable(a)vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 2bd25afe0d92..a41360d34790 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -649,15 +649,25 @@ static void qcom_geni_serial_start_tx_dma(struct uart_port *uport)
static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
{
+ unsigned char c;
u32 irq_en;
- if (qcom_geni_serial_main_active(uport) ||
- !qcom_geni_serial_tx_empty(uport))
- return;
+ /*
+ * Start a new transfer in case the previous command was cancelled and
+ * left data in the FIFO which may prevent the watermark interrupt
+ * from triggering. Note that the stale data is discarded.
+ */
+ if (!qcom_geni_serial_main_active(uport) &&
+ !qcom_geni_serial_tx_empty(uport)) {
+ if (uart_fifo_out(uport, &c, 1) == 1) {
+ writel(M_CMD_DONE_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+ qcom_geni_serial_setup_tx(uport, 1);
+ writel(c, uport->membase + SE_GENI_TX_FIFOn);
+ }
+ }
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN;
-
writel(DEF_TX_WM, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
}
@@ -665,13 +675,17 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport)
static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
{
u32 irq_en;
- struct qcom_geni_serial_port *port = to_dev_port(uport);
irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN);
irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN);
writel(0, uport->membase + SE_GENI_TX_WATERMARK_REG);
writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN);
- /* Possible stop tx is called multiple times. */
+}
+
+static void qcom_geni_serial_cancel_tx_cmd(struct uart_port *uport)
+{
+ struct qcom_geni_serial_port *port = to_dev_port(uport);
+
if (!qcom_geni_serial_main_active(uport))
return;
@@ -684,6 +698,8 @@ static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport)
writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
}
writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR);
+
+ port->tx_remaining = 0;
}
static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop)
@@ -1069,11 +1085,10 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport)
{
disable_irq(uport->irq);
- if (uart_console(uport))
- return;
-
qcom_geni_serial_stop_tx(uport);
qcom_geni_serial_stop_rx(uport);
+
+ qcom_geni_serial_cancel_tx_cmd(uport);
}
static int qcom_geni_serial_port_setup(struct uart_port *uport)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 7a0a6d0a7c805f9380381f4deedffdf87b93f408
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071511-unwritten-backache-b96e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
7a0a6d0a7c80 ("nvmem: meson-efuse: Fix return value of nvmem callbacks")
8cde3c2153e8 ("firmware: meson_sm: Rework driver as a proper platform driver")
611fbca1c861 ("nvmem: meson-efuse: add peripheral clock")
8649dbe58d35 ("nvmem: meson-efuse: add error message on user_max failure.")
0789724f86a5 ("firmware: meson_sm: Add serial number sysfs entry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a0a6d0a7c805f9380381f4deedffdf87b93f408 Mon Sep 17 00:00:00 2001
From: Joy Chakraborty <joychakr(a)google.com>
Date: Fri, 28 Jun 2024 12:37:02 +0100
Subject: [PATCH] nvmem: meson-efuse: Fix return value of nvmem callbacks
Read/write callbacks registered with nvmem core expect 0 to be returned
on success and a negative value to be returned on failure.
meson_efuse_read() and meson_efuse_write() call into
meson_sm_call_read() and meson_sm_call_write() respectively which return
the number of bytes read or written on success as per their api
description.
Fix to return error if meson_sm_call_read()/meson_sm_call_write()
returns an error else return 0.
Fixes: a29a63bdaf6f ("nvmem: meson-efuse: simplify read callback")
Cc: stable(a)vger.kernel.org
Signed-off-by: Joy Chakraborty <joychakr(a)google.com>
Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla(a)linaro.org>
Link: https://lore.kernel.org/r/20240628113704.13742-3-srinivas.kandagatla@linaro…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/nvmem/meson-efuse.c b/drivers/nvmem/meson-efuse.c
index 33678d0af2c2..6c2f80e166e2 100644
--- a/drivers/nvmem/meson-efuse.c
+++ b/drivers/nvmem/meson-efuse.c
@@ -18,18 +18,24 @@ static int meson_efuse_read(void *context, unsigned int offset,
void *val, size_t bytes)
{
struct meson_sm_firmware *fw = context;
+ int ret;
- return meson_sm_call_read(fw, (u8 *)val, bytes, SM_EFUSE_READ, offset,
- bytes, 0, 0, 0);
+ ret = meson_sm_call_read(fw, (u8 *)val, bytes, SM_EFUSE_READ, offset,
+ bytes, 0, 0, 0);
+
+ return ret < 0 ? ret : 0;
}
static int meson_efuse_write(void *context, unsigned int offset,
void *val, size_t bytes)
{
struct meson_sm_firmware *fw = context;
+ int ret;
- return meson_sm_call_write(fw, (u8 *)val, bytes, SM_EFUSE_WRITE, offset,
- bytes, 0, 0, 0);
+ ret = meson_sm_call_write(fw, (u8 *)val, bytes, SM_EFUSE_WRITE, offset,
+ bytes, 0, 0, 0);
+
+ return ret < 0 ? ret : 0;
}
static const struct of_device_id meson_efuse_match[] = {
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x aaa18ff54b97706b84306b6613630262706b1f6b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071517-deepen-remedial-db3b@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
aaa18ff54b97 ("thermal: gov_power_allocator: Return early in manage if trip_max is NULL")
ca0e9728d372 ("thermal: gov_power_allocator: Eliminate a redundant variable")
41ddbcc6fd2c ("thermal: gov_power_allocator: Use .manage() callback instead of .throttle()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From aaa18ff54b97706b84306b6613630262706b1f6b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?N=C3=ADcolas=20F=2E=20R=2E=20A=2E=20Prado?=
<nfraprado(a)collabora.com>
Date: Tue, 2 Jul 2024 17:24:56 -0400
Subject: [PATCH] thermal: gov_power_allocator: Return early in manage if
trip_max is NULL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Commit da781936e7c3 ("thermal: gov_power_allocator: Allow binding
without trip points") allowed the governor to bind even when trip_max
is NULL. This allows a NULL pointer dereference to happen in the manage
callback.
Add an early return to prevent it, since the governor is expected to not do
anything in this case.
Fixes: da781936e7c3 ("thermal: gov_power_allocator: Allow binding without trip points")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Link: https://patch.msgid.link/20240702-power-allocator-null-trip-max-v1-1-47a60d…
Cc: All applicable <stable(a)vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c
index 45f04a25255a..1b2345a697c5 100644
--- a/drivers/thermal/gov_power_allocator.c
+++ b/drivers/thermal/gov_power_allocator.c
@@ -759,6 +759,9 @@ static void power_allocator_manage(struct thermal_zone_device *tz)
return;
}
+ if (!params->trip_max)
+ return;
+
allocate_power(tz, params->trip_max->temperature);
params->update_cdevs = true;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7a6bbc2829d4ab592c7e440a6f6f5deb3cd95db4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071523-amusing-backache-3d32@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
7a6bbc2829d4 ("scsi: sd: Do not repeat the starting disk message")
0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
c4367ac83805 ("scsi: Remove scsi device no_start_on_resume flag")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a6bbc2829d4ab592c7e440a6f6f5deb3cd95db4 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Tue, 2 Jul 2024 06:53:26 +0900
Subject: [PATCH] scsi: sd: Do not repeat the starting disk message
The SCSI disk message "Starting disk" to signal resuming of a suspended
disk is printed in both sd_resume() and sd_resume_common() which results
in this message being printed twice when resuming from e.g. autosuspend:
$ echo 5000 > /sys/block/sda/device/power/autosuspend_delay_ms
$ echo auto > /sys/block/sda/device/power/control
[ 4962.438293] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 4962.501121] sd 0:0:0:0: [sda] Stopping disk
$ echo on > /sys/block/sda/device/power/control
[ 4972.805851] sd 0:0:0:0: [sda] Starting disk
[ 4980.558806] sd 0:0:0:0: [sda] Starting disk
Fix this double print by removing the call to sd_printk() from sd_resume()
and moving the call to sd_printk() in sd_resume_common() earlier in the
function, before the check using sd_do_start_stop(). Doing so, the message
is printed once regardless if sd_resume_common() actually executes
sd_start_stop_device() (i.e. SCSI device case) or not (libsas and libata
managed ATA devices case).
Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240701215326.128067-1-dlemoal@kernel.org
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Reviewed-by: John Garry <john.g.garry(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fe82baa924f8..6203915945a4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -4117,8 +4117,6 @@ static int sd_resume(struct device *dev)
{
struct scsi_disk *sdkp = dev_get_drvdata(dev);
- sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-
if (opal_unlock_from_suspend(sdkp->opal_dev)) {
sd_printk(KERN_NOTICE, sdkp, "OPAL unlock failed\n");
return -EIO;
@@ -4135,12 +4133,13 @@ static int sd_resume_common(struct device *dev, bool runtime)
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
return 0;
+ sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
+
if (!sd_do_start_stop(sdkp->device, runtime)) {
sdkp->suspended = false;
return 0;
}
- sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
ret = sd_start_stop_device(sdkp, 1);
if (!ret) {
sd_resume(dev);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7a6bbc2829d4ab592c7e440a6f6f5deb3cd95db4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071518-municipal-zigzagged-a93d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7a6bbc2829d4 ("scsi: sd: Do not repeat the starting disk message")
0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
c4367ac83805 ("scsi: Remove scsi device no_start_on_resume flag")
99398d2070ab ("scsi: sd: Do not issue commands to suspended disks on shutdown")
8b4d9469d0b0 ("ata: libata-scsi: Fix delayed scsi_rescan_device() execution")
ff48b37802e5 ("scsi: Do not attempt to rescan suspended devices")
aa3998dbeb3a ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
3cc2ffe5c16d ("scsi: sd: Differentiate system and runtime start/stop management")
2a5a4326e583 ("Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a6bbc2829d4ab592c7e440a6f6f5deb3cd95db4 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <dlemoal(a)kernel.org>
Date: Tue, 2 Jul 2024 06:53:26 +0900
Subject: [PATCH] scsi: sd: Do not repeat the starting disk message
The SCSI disk message "Starting disk" to signal resuming of a suspended
disk is printed in both sd_resume() and sd_resume_common() which results
in this message being printed twice when resuming from e.g. autosuspend:
$ echo 5000 > /sys/block/sda/device/power/autosuspend_delay_ms
$ echo auto > /sys/block/sda/device/power/control
[ 4962.438293] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 4962.501121] sd 0:0:0:0: [sda] Stopping disk
$ echo on > /sys/block/sda/device/power/control
[ 4972.805851] sd 0:0:0:0: [sda] Starting disk
[ 4980.558806] sd 0:0:0:0: [sda] Starting disk
Fix this double print by removing the call to sd_printk() from sd_resume()
and moving the call to sd_printk() in sd_resume_common() earlier in the
function, before the check using sd_do_start_stop(). Doing so, the message
is printed once regardless if sd_resume_common() actually executes
sd_start_stop_device() (i.e. SCSI device case) or not (libsas and libata
managed ATA devices case).
Fixes: 0c76106cb975 ("scsi: sd: Fix TCG OPAL unlock on system resume")
Cc: stable(a)vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal(a)kernel.org>
Link: https://lore.kernel.org/r/20240701215326.128067-1-dlemoal@kernel.org
Reviewed-by: Bart Van Assche <bvanassche(a)acm.org>
Reviewed-by: John Garry <john.g.garry(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index fe82baa924f8..6203915945a4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -4117,8 +4117,6 @@ static int sd_resume(struct device *dev)
{
struct scsi_disk *sdkp = dev_get_drvdata(dev);
- sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
-
if (opal_unlock_from_suspend(sdkp->opal_dev)) {
sd_printk(KERN_NOTICE, sdkp, "OPAL unlock failed\n");
return -EIO;
@@ -4135,12 +4133,13 @@ static int sd_resume_common(struct device *dev, bool runtime)
if (!sdkp) /* E.g.: runtime resume at the start of sd_probe() */
return 0;
+ sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
+
if (!sd_do_start_stop(sdkp->device, runtime)) {
sdkp->suspended = false;
return 0;
}
- sd_printk(KERN_NOTICE, sdkp, "Starting disk\n");
ret = sd_start_stop_device(sdkp, 1);
if (!ret) {
sd_resume(dev);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x a9e1ddc09ca55746079cc479aa3eb6411f0d99d4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071540-widow-expansion-f3c5@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
a9e1ddc09ca5 ("nilfs2: fix kernel bug on rename operation of broken directory")
6f133c97e5ce ("nilfs2: convert nilfs_rename() to use folios")
a4bf041e44d5 ("nilfs2: convert nilfs_find_entry to use a folio")
9b77f66f9927 ("nilfs2: switch to kmap_local for directory handling")
09a46acb3697 ("nilfs2: return the mapped address from nilfs_get_page()")
6af2191f8358 ("nilfs2: remove page_address() from nilfs_delete_entry")
6bb09fa1b44f ("nilfs2: remove page_address() from nilfs_set_link")
8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
584db20c181f ("nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link")
b3e1cc3935ff ("nilfs2: convert to new timestamp accessors")
e21d4f419402 ("nilfs2: convert to ctime accessor functions")
21a87d88c225 ("nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()")
79ea65563ad8 ("nilfs2: Remove check for PageError")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sat, 29 Jun 2024 01:51:07 +0900
Subject: [PATCH] nilfs2: fix kernel bug on rename operation of broken
directory
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index dddfa604491a..4a29b0138d75 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -383,11 +383,39 @@ struct nilfs_dir_entry *nilfs_find_entry(struct inode *dir,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct folio **foliop)
{
- struct nilfs_dir_entry *de = nilfs_get_folio(dir, 0, foliop);
+ struct folio *folio;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_folio(dir, 0, &folio);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *foliop = folio;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ folio_release_kmap(folio, de);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a9e1ddc09ca55746079cc479aa3eb6411f0d99d4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071539-chess-overrun-78ac@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a9e1ddc09ca5 ("nilfs2: fix kernel bug on rename operation of broken directory")
6f133c97e5ce ("nilfs2: convert nilfs_rename() to use folios")
a4bf041e44d5 ("nilfs2: convert nilfs_find_entry to use a folio")
9b77f66f9927 ("nilfs2: switch to kmap_local for directory handling")
09a46acb3697 ("nilfs2: return the mapped address from nilfs_get_page()")
6af2191f8358 ("nilfs2: remove page_address() from nilfs_delete_entry")
6bb09fa1b44f ("nilfs2: remove page_address() from nilfs_set_link")
8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
584db20c181f ("nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link")
b3e1cc3935ff ("nilfs2: convert to new timestamp accessors")
e21d4f419402 ("nilfs2: convert to ctime accessor functions")
21a87d88c225 ("nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()")
79ea65563ad8 ("nilfs2: Remove check for PageError")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sat, 29 Jun 2024 01:51:07 +0900
Subject: [PATCH] nilfs2: fix kernel bug on rename operation of broken
directory
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index dddfa604491a..4a29b0138d75 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -383,11 +383,39 @@ struct nilfs_dir_entry *nilfs_find_entry(struct inode *dir,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct folio **foliop)
{
- struct nilfs_dir_entry *de = nilfs_get_folio(dir, 0, foliop);
+ struct folio *folio;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_folio(dir, 0, &folio);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *foliop = folio;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ folio_release_kmap(folio, de);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a9e1ddc09ca55746079cc479aa3eb6411f0d99d4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071538-overreach-eatable-7ae6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a9e1ddc09ca5 ("nilfs2: fix kernel bug on rename operation of broken directory")
6f133c97e5ce ("nilfs2: convert nilfs_rename() to use folios")
a4bf041e44d5 ("nilfs2: convert nilfs_find_entry to use a folio")
9b77f66f9927 ("nilfs2: switch to kmap_local for directory handling")
09a46acb3697 ("nilfs2: return the mapped address from nilfs_get_page()")
6af2191f8358 ("nilfs2: remove page_address() from nilfs_delete_entry")
6bb09fa1b44f ("nilfs2: remove page_address() from nilfs_set_link")
8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
584db20c181f ("nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link")
b3e1cc3935ff ("nilfs2: convert to new timestamp accessors")
e21d4f419402 ("nilfs2: convert to ctime accessor functions")
21a87d88c225 ("nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()")
79ea65563ad8 ("nilfs2: Remove check for PageError")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sat, 29 Jun 2024 01:51:07 +0900
Subject: [PATCH] nilfs2: fix kernel bug on rename operation of broken
directory
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index dddfa604491a..4a29b0138d75 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -383,11 +383,39 @@ struct nilfs_dir_entry *nilfs_find_entry(struct inode *dir,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct folio **foliop)
{
- struct nilfs_dir_entry *de = nilfs_get_folio(dir, 0, foliop);
+ struct folio *folio;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_folio(dir, 0, &folio);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *foliop = folio;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ folio_release_kmap(folio, de);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a9e1ddc09ca55746079cc479aa3eb6411f0d99d4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071537-wilt-goliath-e654@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a9e1ddc09ca5 ("nilfs2: fix kernel bug on rename operation of broken directory")
6f133c97e5ce ("nilfs2: convert nilfs_rename() to use folios")
a4bf041e44d5 ("nilfs2: convert nilfs_find_entry to use a folio")
9b77f66f9927 ("nilfs2: switch to kmap_local for directory handling")
09a46acb3697 ("nilfs2: return the mapped address from nilfs_get_page()")
6af2191f8358 ("nilfs2: remove page_address() from nilfs_delete_entry")
6bb09fa1b44f ("nilfs2: remove page_address() from nilfs_set_link")
8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
584db20c181f ("nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link")
b3e1cc3935ff ("nilfs2: convert to new timestamp accessors")
e21d4f419402 ("nilfs2: convert to ctime accessor functions")
21a87d88c225 ("nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()")
79ea65563ad8 ("nilfs2: Remove check for PageError")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sat, 29 Jun 2024 01:51:07 +0900
Subject: [PATCH] nilfs2: fix kernel bug on rename operation of broken
directory
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index dddfa604491a..4a29b0138d75 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -383,11 +383,39 @@ struct nilfs_dir_entry *nilfs_find_entry(struct inode *dir,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct folio **foliop)
{
- struct nilfs_dir_entry *de = nilfs_get_folio(dir, 0, foliop);
+ struct folio *folio;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_folio(dir, 0, &folio);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *foliop = folio;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ folio_release_kmap(folio, de);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a9e1ddc09ca55746079cc479aa3eb6411f0d99d4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071537-acetone-vastly-d974@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a9e1ddc09ca5 ("nilfs2: fix kernel bug on rename operation of broken directory")
6f133c97e5ce ("nilfs2: convert nilfs_rename() to use folios")
a4bf041e44d5 ("nilfs2: convert nilfs_find_entry to use a folio")
9b77f66f9927 ("nilfs2: switch to kmap_local for directory handling")
09a46acb3697 ("nilfs2: return the mapped address from nilfs_get_page()")
6af2191f8358 ("nilfs2: remove page_address() from nilfs_delete_entry")
6bb09fa1b44f ("nilfs2: remove page_address() from nilfs_set_link")
8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
584db20c181f ("nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link")
b3e1cc3935ff ("nilfs2: convert to new timestamp accessors")
e21d4f419402 ("nilfs2: convert to ctime accessor functions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sat, 29 Jun 2024 01:51:07 +0900
Subject: [PATCH] nilfs2: fix kernel bug on rename operation of broken
directory
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index dddfa604491a..4a29b0138d75 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -383,11 +383,39 @@ struct nilfs_dir_entry *nilfs_find_entry(struct inode *dir,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct folio **foliop)
{
- struct nilfs_dir_entry *de = nilfs_get_folio(dir, 0, foliop);
+ struct folio *folio;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_folio(dir, 0, &folio);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *foliop = folio;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ folio_release_kmap(folio, de);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a9e1ddc09ca55746079cc479aa3eb6411f0d99d4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071536-overstay-scholar-83a6@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
a9e1ddc09ca5 ("nilfs2: fix kernel bug on rename operation of broken directory")
6f133c97e5ce ("nilfs2: convert nilfs_rename() to use folios")
a4bf041e44d5 ("nilfs2: convert nilfs_find_entry to use a folio")
9b77f66f9927 ("nilfs2: switch to kmap_local for directory handling")
09a46acb3697 ("nilfs2: return the mapped address from nilfs_get_page()")
6af2191f8358 ("nilfs2: remove page_address() from nilfs_delete_entry")
6bb09fa1b44f ("nilfs2: remove page_address() from nilfs_set_link")
8cf57c6df818 ("nilfs2: eliminate staggered calls to kunmap in nilfs_rename")
584db20c181f ("nilfs2: move page release outside of nilfs_delete_entry and nilfs_set_link")
b3e1cc3935ff ("nilfs2: convert to new timestamp accessors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a9e1ddc09ca55746079cc479aa3eb6411f0d99d4 Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Sat, 29 Jun 2024 01:51:07 +0900
Subject: [PATCH] nilfs2: fix kernel bug on rename operation of broken
directory
Syzbot reported that in rename directory operation on broken directory on
nilfs2, __block_write_begin_int() called to prepare block write may fail
BUG_ON check for access exceeding the folio/page size.
This is because nilfs_dotdot(), which gets parent directory reference
entry ("..") of the directory to be moved or renamed, does not check
consistency enough, and may return location exceeding folio/page size for
broken directories.
Fix this issue by checking required directory entries ("." and "..") in
the first chunk of the directory in nilfs_dotdot().
Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+d3abed1ad3d367fa2627(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/dir.c b/fs/nilfs2/dir.c
index dddfa604491a..4a29b0138d75 100644
--- a/fs/nilfs2/dir.c
+++ b/fs/nilfs2/dir.c
@@ -383,11 +383,39 @@ struct nilfs_dir_entry *nilfs_find_entry(struct inode *dir,
struct nilfs_dir_entry *nilfs_dotdot(struct inode *dir, struct folio **foliop)
{
- struct nilfs_dir_entry *de = nilfs_get_folio(dir, 0, foliop);
+ struct folio *folio;
+ struct nilfs_dir_entry *de, *next_de;
+ size_t limit;
+ char *msg;
+ de = nilfs_get_folio(dir, 0, &folio);
if (IS_ERR(de))
return NULL;
- return nilfs_next_entry(de);
+
+ limit = nilfs_last_byte(dir, 0); /* is a multiple of chunk size */
+ if (unlikely(!limit || le64_to_cpu(de->inode) != dir->i_ino ||
+ !nilfs_match(1, ".", de))) {
+ msg = "missing '.'";
+ goto fail;
+ }
+
+ next_de = nilfs_next_entry(de);
+ /*
+ * If "next_de" has not reached the end of the chunk, there is
+ * at least one more record. Check whether it matches "..".
+ */
+ if (unlikely((char *)next_de == (char *)de + nilfs_chunk_size(dir) ||
+ !nilfs_match(2, "..", next_de))) {
+ msg = "missing '..'";
+ goto fail;
+ }
+ *foliop = folio;
+ return next_de;
+
+fail:
+ nilfs_error(dir->i_sb, "directory #%lu %s", dir->i_ino, msg);
+ folio_release_kmap(folio, de);
+ return NULL;
}
ino_t nilfs_inode_by_name(struct inode *dir, const struct qstr *qstr)
Simplify error handling in probe() and also fix one possible issue in
remove().
Best regards,
Krzysztof
---
Krzysztof Kozlowski (12):
thermal/drivers/broadcom: fix race between removal and clock disable
thermal/drivers/broadcom: simplify probe() with local dev variable
thermal/drivers/broadcom: simplify with dev_err_probe()
thermal/drivers/exynos: simplify probe() with local dev variable
thermal/drivers/exynos: simplify with dev_err_probe()
thermal/drivers/hisi: simplify with dev_err_probe()
thermal/drivers/imx: simplify probe() with local dev variable
thermal/drivers/imx: simplify with dev_err_probe()
thermal/drivers/qcom-spmi-adc-tm5: simplify with dev_err_probe()
thermal/drivers/qcom-tsens: simplify with dev_err_probe()
thermal/drivers/generic-adc: simplify probe() with local dev variable
thermal/drivers/generic-adc: simplify with dev_err_probe()
drivers/thermal/broadcom/bcm2835_thermal.c | 49 +++++++--------------------
drivers/thermal/hisi_thermal.c | 9 ++---
drivers/thermal/imx_thermal.c | 42 +++++++++++------------
drivers/thermal/qcom/qcom-spmi-adc-tm5.c | 9 ++---
drivers/thermal/qcom/tsens.c | 8 ++---
drivers/thermal/samsung/exynos_tmu.c | 54 +++++++++++++-----------------
drivers/thermal/thermal-generic-adc.c | 27 +++++++--------
7 files changed, 76 insertions(+), 122 deletions(-)
---
base-commit: 2e0171396caa83c9d908ba2676ba59bce333b550
change-id: 20240709-thermal-probe-a747013ed28d
Best regards,
--
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
The patch below does not apply to the 6.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.9.y
git checkout FETCH_HEAD
git cherry-pick -x f442fa6141379a20b48ae3efabee827a3d260787
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071539-recopy-opulently-c509@gregkh' --subject-prefix 'PATCH 6.9.y' HEAD^..
Possible dependencies:
f442fa614137 ("mm: gup: stop abusing try_grab_folio")
01d89b93e176 ("mm/gup: fix hugepd handling in hugetlb rework")
9cbe4954c6d9 ("gup: use folios for gup_devmap")
53e45c4f6d4f ("mm: convert put_devmap_managed_page_refs() to put_devmap_managed_folio_refs()")
6785c54a1b43 ("mm: remove put_devmap_managed_page()")
25176ad09ca3 ("mm/treewide: rename CONFIG_HAVE_FAST_GUP to CONFIG_HAVE_GUP_FAST")
23babe1934d7 ("mm/gup: consistently name GUP-fast functions")
a12083d721d7 ("mm/gup: handle hugepd for follow_page()")
4418c522f683 ("mm/gup: handle huge pmd for follow_pmd_mask()")
1b1676180246 ("mm/gup: handle huge pud for follow_pud_mask()")
caf8cab79857 ("mm/gup: cache *pudp in follow_pud_mask()")
878b0c451621 ("mm/gup: handle hugetlb for no_page_table()")
f3c94c625fe3 ("mm/gup: refactor record_subpages() to find 1st small page")
607c63195d63 ("mm/gup: drop gup_fast_folio_allowed() in hugepd processing")
f002882ca369 ("mm: merge folio_is_secretmem() and folio_fast_pin_allowed() into gup_fast_folio_allowed()")
1965e933ddeb ("mm/treewide: replace pXd_huge() with pXd_leaf()")
7db86dc389aa ("mm/gup: merge pXd huge mapping checks")
089f92141ed0 ("mm/gup: check p4d presence before going on")
e6fd5564c07c ("mm/gup: cache p4d in follow_p4d_mask()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f442fa6141379a20b48ae3efabee827a3d260787 Mon Sep 17 00:00:00 2001
From: Yang Shi <yang(a)os.amperecomputing.com>
Date: Fri, 28 Jun 2024 12:14:58 -0700
Subject: [PATCH] mm: gup: stop abusing try_grab_folio
A kernel warning was reported when pinning folio in CMA memory when
launching SEV virtual machine. The splat looks like:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520
[ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6
[ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520
[ 464.325515] Call Trace:
[ 464.325520] <TASK>
[ 464.325523] ? __get_user_pages+0x423/0x520
[ 464.325528] ? __warn+0x81/0x130
[ 464.325536] ? __get_user_pages+0x423/0x520
[ 464.325541] ? report_bug+0x171/0x1a0
[ 464.325549] ? handle_bug+0x3c/0x70
[ 464.325554] ? exc_invalid_op+0x17/0x70
[ 464.325558] ? asm_exc_invalid_op+0x1a/0x20
[ 464.325567] ? __get_user_pages+0x423/0x520
[ 464.325575] __gup_longterm_locked+0x212/0x7a0
[ 464.325583] internal_get_user_pages_fast+0xfb/0x190
[ 464.325590] pin_user_pages_fast+0x47/0x60
[ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd]
[ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Per the analysis done by yangge, when starting the SEV virtual machine, it
will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory.
But the page is in CMA area, so fast GUP will fail then fallback to the
slow path due to the longterm pinnalbe check in try_grab_folio().
The slow path will try to pin the pages then migrate them out of CMA area.
But the slow path also uses try_grab_folio() to pin the page, it will
also fail due to the same check then the above warning is triggered.
In addition, the try_grab_folio() is supposed to be used in fast path and
it elevates folio refcount by using add ref unless zero. We are guaranteed
to have at least one stable reference in slow path, so the simple atomic add
could be used. The performance difference should be trivial, but the
misuse may be confusing and misleading.
Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page()
to try_grab_folio(), and use them in the proper paths. This solves both
the abuse and the kernel warning.
The proper naming makes their usecase more clear and should prevent from
abusing in the future.
peterx said:
: The user will see the pin fails, for gpu-slow it further triggers the WARN
: right below that failure (as in the original report):
:
: folio = try_grab_folio(page, page_increm - 1,
: foll_flags);
: if (WARN_ON_ONCE(!folio)) { <------------------------ here
: /*
: * Release the 1st page ref if the
: * folio is problematic, fail hard.
: */
: gup_put_folio(page_folio(page), 1,
: foll_flags);
: ret = -EFAULT;
: goto out;
: }
[1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge11…
[shy828301(a)gmail.com: fix implicit declaration of function try_grab_folio_fast]
Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xj…
Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.…
Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Reported-by: yangge <yangge1116(a)126.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 469799f805f1..f1d6bc06eb52 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -97,95 +97,6 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
return folio;
}
-/**
- * try_grab_folio() - Attempt to get or pin a folio.
- * @page: pointer to page to be grabbed
- * @refs: the value to (effectively) add to the folio's refcount
- * @flags: gup flags: these are the FOLL_* flag values.
- *
- * "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
- *
- * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
- * same time. (That's true throughout the get_user_pages*() and
- * pin_user_pages*() APIs.) Cases:
- *
- * FOLL_GET: folio's refcount will be incremented by @refs.
- *
- * FOLL_PIN on large folios: folio's refcount will be incremented by
- * @refs, and its pincount will be incremented by @refs.
- *
- * FOLL_PIN on single-page folios: folio's refcount will be incremented by
- * @refs * GUP_PIN_COUNTING_BIAS.
- *
- * Return: The folio containing @page (with refcount appropriately
- * incremented) for success, or NULL upon failure. If neither FOLL_GET
- * nor FOLL_PIN was set, that's considered failure, and furthermore,
- * a likely bug in the caller, so a warning is also emitted.
- */
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
-{
- struct folio *folio;
-
- if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
- return NULL;
-
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
- return NULL;
-
- if (flags & FOLL_GET)
- return try_get_folio(page, refs);
-
- /* FOLL_PIN is set */
-
- /*
- * Don't take a pin on the zero page - it's not going anywhere
- * and it is used in a *lot* of places.
- */
- if (is_zero_page(page))
- return page_folio(page);
-
- folio = try_get_folio(page, refs);
- if (!folio)
- return NULL;
-
- /*
- * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
- * right zone, so fail and let the caller fall back to the slow
- * path.
- */
- if (unlikely((flags & FOLL_LONGTERM) &&
- !folio_is_longterm_pinnable(folio))) {
- if (!put_devmap_managed_folio_refs(folio, refs))
- folio_put_refs(folio, refs);
- return NULL;
- }
-
- /*
- * When pinning a large folio, use an exact count to track it.
- *
- * However, be sure to *also* increment the normal folio
- * refcount field at least once, so that the folio really
- * is pinned. That's why the refcount from the earlier
- * try_get_folio() is left intact.
- */
- if (folio_test_large(folio))
- atomic_add(refs, &folio->_pincount);
- else
- folio_ref_add(folio,
- refs * (GUP_PIN_COUNTING_BIAS - 1));
- /*
- * Adjust the pincount before re-checking the PTE for changes.
- * This is essentially a smp_mb() and is paired with a memory
- * barrier in folio_try_share_anon_rmap_*().
- */
- smp_mb__after_atomic();
-
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
-
- return folio;
-}
-
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
@@ -203,58 +114,59 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
}
/**
- * try_grab_page() - elevate a page's refcount by a flag-dependent amount
- * @page: pointer to page to be grabbed
- * @flags: gup flags: these are the FOLL_* flag values.
+ * try_grab_folio() - add a folio's refcount by a flag-dependent amount
+ * @folio: pointer to folio to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values
*
* This might not do anything at all, depending on the flags argument.
*
* "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
*
* Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
- * time. Cases: please see the try_grab_folio() documentation, with
- * "refs=1".
+ * time.
*
* Return: 0 for success, or if no action was required (if neither FOLL_PIN
* nor FOLL_GET was set, nothing is done). A negative error code for failure:
*
- * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not
+ * -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not
* be grabbed.
+ *
+ * It is called when we have a stable reference for the folio, typically in
+ * GUP slow path.
*/
-int __must_check try_grab_page(struct page *page, unsigned int flags)
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags)
{
- struct folio *folio = page_folio(page);
-
if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
return -ENOMEM;
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page)))
return -EREMOTEIO;
if (flags & FOLL_GET)
- folio_ref_inc(folio);
+ folio_ref_add(folio, refs);
else if (flags & FOLL_PIN) {
/*
* Don't take a pin on the zero page - it's not going anywhere
* and it is used in a *lot* of places.
*/
- if (is_zero_page(page))
+ if (is_zero_folio(folio))
return 0;
/*
- * Similar to try_grab_folio(): be sure to *also*
- * increment the normal page refcount field at least once,
+ * Increment the normal page refcount field at least once,
* so that the page really is pinned.
*/
if (folio_test_large(folio)) {
- folio_ref_add(folio, 1);
- atomic_add(1, &folio->_pincount);
+ folio_ref_add(folio, refs);
+ atomic_add(refs, &folio->_pincount);
} else {
- folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+ folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS);
}
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
}
return 0;
@@ -515,6 +427,102 @@ static int record_subpages(struct page *page, unsigned long sz,
return nr;
}
+
+/**
+ * try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
+ * @page: pointer to page to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values.
+ *
+ * "grab" names in this file mean, "look at flags to decide whether to use
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
+ *
+ * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
+ * same time. (That's true throughout the get_user_pages*() and
+ * pin_user_pages*() APIs.) Cases:
+ *
+ * FOLL_GET: folio's refcount will be incremented by @refs.
+ *
+ * FOLL_PIN on large folios: folio's refcount will be incremented by
+ * @refs, and its pincount will be incremented by @refs.
+ *
+ * FOLL_PIN on single-page folios: folio's refcount will be incremented by
+ * @refs * GUP_PIN_COUNTING_BIAS.
+ *
+ * Return: The folio containing @page (with refcount appropriately
+ * incremented) for success, or NULL upon failure. If neither FOLL_GET
+ * nor FOLL_PIN was set, that's considered failure, and furthermore,
+ * a likely bug in the caller, so a warning is also emitted.
+ *
+ * It uses add ref unless zero to elevate the folio refcount and must be called
+ * in fast path only.
+ */
+static struct folio *try_grab_folio_fast(struct page *page, int refs,
+ unsigned int flags)
+{
+ struct folio *folio;
+
+ /* Raise warn if it is not called in fast GUP */
+ VM_WARN_ON_ONCE(!irqs_disabled());
+
+ if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
+ return NULL;
+
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ return NULL;
+
+ if (flags & FOLL_GET)
+ return try_get_folio(page, refs);
+
+ /* FOLL_PIN is set */
+
+ /*
+ * Don't take a pin on the zero page - it's not going anywhere
+ * and it is used in a *lot* of places.
+ */
+ if (is_zero_page(page))
+ return page_folio(page);
+
+ folio = try_get_folio(page, refs);
+ if (!folio)
+ return NULL;
+
+ /*
+ * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
+ * right zone, so fail and let the caller fall back to the slow
+ * path.
+ */
+ if (unlikely((flags & FOLL_LONGTERM) &&
+ !folio_is_longterm_pinnable(folio))) {
+ if (!put_devmap_managed_folio_refs(folio, refs))
+ folio_put_refs(folio, refs);
+ return NULL;
+ }
+
+ /*
+ * When pinning a large folio, use an exact count to track it.
+ *
+ * However, be sure to *also* increment the normal folio
+ * refcount field at least once, so that the folio really
+ * is pinned. That's why the refcount from the earlier
+ * try_get_folio() is left intact.
+ */
+ if (folio_test_large(folio))
+ atomic_add(refs, &folio->_pincount);
+ else
+ folio_ref_add(folio,
+ refs * (GUP_PIN_COUNTING_BIAS - 1));
+ /*
+ * Adjust the pincount before re-checking the PTE for changes.
+ * This is essentially a smp_mb() and is paired with a memory
+ * barrier in folio_try_share_anon_rmap_*().
+ */
+ smp_mb__after_atomic();
+
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
+
+ return folio;
+}
#endif /* CONFIG_ARCH_HAS_HUGEPD || CONFIG_HAVE_GUP_FAST */
#ifdef CONFIG_ARCH_HAS_HUGEPD
@@ -535,7 +543,7 @@ static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
*/
static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz,
unsigned long addr, unsigned long end, unsigned int flags,
- struct page **pages, int *nr)
+ struct page **pages, int *nr, bool fast)
{
unsigned long pte_end;
struct page *page;
@@ -558,9 +566,15 @@ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz
page = pte_page(pte);
refs = record_subpages(page, sz, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
- if (!folio)
- return 0;
+ if (fast) {
+ folio = try_grab_folio_fast(page, refs, flags);
+ if (!folio)
+ return 0;
+ } else {
+ folio = page_folio(page);
+ if (try_grab_folio(folio, refs, flags))
+ return 0;
+ }
if (unlikely(pte_val(pte) != pte_val(ptep_get(ptep)))) {
gup_put_folio(folio, refs, flags);
@@ -588,7 +602,7 @@ static int gup_hugepte(struct vm_area_struct *vma, pte_t *ptep, unsigned long sz
static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
unsigned long addr, unsigned int pdshift,
unsigned long end, unsigned int flags,
- struct page **pages, int *nr)
+ struct page **pages, int *nr, bool fast)
{
pte_t *ptep;
unsigned long sz = 1UL << hugepd_shift(hugepd);
@@ -598,7 +612,8 @@ static int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
ptep = hugepte_offset(hugepd, addr, pdshift);
do {
next = hugepte_addr_end(addr, end, sz);
- ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr);
+ ret = gup_hugepte(vma, ptep, sz, addr, end, flags, pages, nr,
+ fast);
if (ret != 1)
return ret;
} while (ptep++, addr = next, addr != end);
@@ -625,7 +640,7 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
ptep = hugepte_offset(hugepd, addr, pdshift);
ptl = huge_pte_lock(h, vma->vm_mm, ptep);
ret = gup_hugepd(vma, hugepd, addr, pdshift, addr + PAGE_SIZE,
- flags, &page, &nr);
+ flags, &page, &nr, false);
spin_unlock(ptl);
if (ret == 1) {
@@ -642,7 +657,7 @@ static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
static inline int gup_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
unsigned long addr, unsigned int pdshift,
unsigned long end, unsigned int flags,
- struct page **pages, int *nr)
+ struct page **pages, int *nr, bool fast)
{
return 0;
}
@@ -729,7 +744,7 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma,
gup_must_unshare(vma, flags, page))
return ERR_PTR(-EMLINK);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
else
@@ -806,7 +821,7 @@ static struct page *follow_huge_pmd(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
return ERR_PTR(ret);
@@ -968,8 +983,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
- ret = try_grab_page(page, flags);
+ /* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (unlikely(ret)) {
page = ERR_PTR(ret);
goto out;
@@ -1233,7 +1248,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
goto unmap;
*page = pte_page(entry);
}
- ret = try_grab_page(*page, gup_flags);
+ ret = try_grab_folio(page_folio(*page), 1, gup_flags);
if (unlikely(ret))
goto unmap;
out:
@@ -1636,20 +1651,19 @@ static long __get_user_pages(struct mm_struct *mm,
* pages.
*/
if (page_increm > 1) {
- struct folio *folio;
+ struct folio *folio = page_folio(page);
/*
* Since we already hold refcount on the
* large folio, this should never fail.
*/
- folio = try_grab_folio(page, page_increm - 1,
- foll_flags);
- if (WARN_ON_ONCE(!folio)) {
+ if (try_grab_folio(folio, page_increm - 1,
+ foll_flags)) {
/*
* Release the 1st page ref if the
* folio is problematic, fail hard.
*/
- gup_put_folio(page_folio(page), 1,
+ gup_put_folio(folio, 1,
foll_flags);
ret = -EFAULT;
goto out;
@@ -2797,7 +2811,6 @@ EXPORT_SYMBOL(get_user_pages_unlocked);
* This code is based heavily on the PowerPC implementation by Nick Piggin.
*/
#ifdef CONFIG_HAVE_GUP_FAST
-
/*
* Used in the GUP-fast path to determine whether GUP is permitted to work on
* a specific folio.
@@ -2962,7 +2975,7 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- folio = try_grab_folio(page, 1, flags);
+ folio = try_grab_folio_fast(page, 1, flags);
if (!folio)
goto pte_unmap;
@@ -3049,7 +3062,7 @@ static int gup_fast_devmap_leaf(unsigned long pfn, unsigned long addr,
break;
}
- folio = try_grab_folio(page, 1, flags);
+ folio = try_grab_folio_fast(page, 1, flags);
if (!folio) {
gup_fast_undo_dev_pagemap(nr, nr_start, flags, pages);
break;
@@ -3138,7 +3151,7 @@ static int gup_fast_pmd_leaf(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = pmd_page(orig);
refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -3182,7 +3195,7 @@ static int gup_fast_pud_leaf(pud_t orig, pud_t *pudp, unsigned long addr,
page = pud_page(orig);
refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -3222,7 +3235,7 @@ static int gup_fast_pgd_leaf(pgd_t orig, pgd_t *pgdp, unsigned long addr,
page = pgd_page(orig);
refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -3276,7 +3289,8 @@ static int gup_fast_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr,
* pmd format and THP pmd format
*/
if (gup_hugepd(NULL, __hugepd(pmd_val(pmd)), addr,
- PMD_SHIFT, next, flags, pages, nr) != 1)
+ PMD_SHIFT, next, flags, pages, nr,
+ true) != 1)
return 0;
} else if (!gup_fast_pte_range(pmd, pmdp, addr, next, flags,
pages, nr))
@@ -3306,7 +3320,8 @@ static int gup_fast_pud_range(p4d_t *p4dp, p4d_t p4d, unsigned long addr,
return 0;
} else if (unlikely(is_hugepd(__hugepd(pud_val(pud))))) {
if (gup_hugepd(NULL, __hugepd(pud_val(pud)), addr,
- PUD_SHIFT, next, flags, pages, nr) != 1)
+ PUD_SHIFT, next, flags, pages, nr,
+ true) != 1)
return 0;
} else if (!gup_fast_pmd_range(pudp, pud, addr, next, flags,
pages, nr))
@@ -3333,7 +3348,8 @@ static int gup_fast_p4d_range(pgd_t *pgdp, pgd_t pgd, unsigned long addr,
BUILD_BUG_ON(p4d_leaf(p4d));
if (unlikely(is_hugepd(__hugepd(p4d_val(p4d))))) {
if (gup_hugepd(NULL, __hugepd(p4d_val(p4d)), addr,
- P4D_SHIFT, next, flags, pages, nr) != 1)
+ P4D_SHIFT, next, flags, pages, nr,
+ true) != 1)
return 0;
} else if (!gup_fast_pud_range(p4dp, p4d, addr, next, flags,
pages, nr))
@@ -3362,7 +3378,8 @@ static void gup_fast_pgd_range(unsigned long addr, unsigned long end,
return;
} else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) {
if (gup_hugepd(NULL, __hugepd(pgd_val(pgd)), addr,
- PGDIR_SHIFT, next, flags, pages, nr) != 1)
+ PGDIR_SHIFT, next, flags, pages, nr,
+ true) != 1)
return;
} else if (!gup_fast_p4d_range(pgdp, pgd, addr, next, flags,
pages, nr))
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index db7946a0a28c..2120f7478e55 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1331,7 +1331,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!*pgmap)
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
diff --git a/mm/internal.h b/mm/internal.h
index 6902b7dd8509..cc2c5e07fad3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1182,8 +1182,8 @@ int migrate_device_coherent_page(struct page *page);
/*
* mm/gup.c
*/
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
-int __must_check try_grab_page(struct page *page, unsigned int flags);
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags);
/*
* mm/huge_memory.c
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x c15a688e49987385baa8804bf65d570e362f8576
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071538-nerd-march-746a@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
c15a688e4998 ("USB: serial: mos7840: fix crash on resume")
7183192196a6 ("USB: serial: mos7840: rip out broken interrupt handling")
32d8a6fc5bd6 ("USB: serial: mos7840: remove set but not used variables 'st, data1, iflag'")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c15a688e49987385baa8804bf65d570e362f8576 Mon Sep 17 00:00:00 2001
From: Dmitry Smirnov <d.smirnov(a)inbox.lv>
Date: Sat, 15 Jun 2024 01:45:56 +0300
Subject: [PATCH] USB: serial: mos7840: fix crash on resume
Since commit c49cfa917025 ("USB: serial: use generic method if no
alternative is provided in usb serial layer"), USB serial core calls the
generic resume implementation when the driver has not provided one.
This can trigger a crash on resume with mos7840 since support for
multiple read URBs was added back in 2011. Specifically, both port read
URBs are now submitted on resume for open ports, but the context pointer
of the second URB is left set to the core rather than mos7840 port
structure.
Fix this by implementing dedicated suspend and resume functions for
mos7840.
Tested with Delock 87414 USB 2.0 to 4x serial adapter.
Signed-off-by: Dmitry Smirnov <d.smirnov(a)inbox.lv>
[ johan: analyse crash and rewrite commit message; set busy flag on
resume; drop bulk-in check; drop unnecessary usb_kill_urb() ]
Fixes: d83b405383c9 ("USB: serial: add support for multiple read urbs")
Cc: stable(a)vger.kernel.org # 3.3
Signed-off-by: Johan Hovold <johan(a)kernel.org>
diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 8b0308d84270..85697466b147 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -1737,6 +1737,49 @@ static void mos7840_port_remove(struct usb_serial_port *port)
kfree(mos7840_port);
}
+static int mos7840_suspend(struct usb_serial *serial, pm_message_t message)
+{
+ struct moschip_port *mos7840_port;
+ struct usb_serial_port *port;
+ int i;
+
+ for (i = 0; i < serial->num_ports; ++i) {
+ port = serial->port[i];
+ if (!tty_port_initialized(&port->port))
+ continue;
+
+ mos7840_port = usb_get_serial_port_data(port);
+
+ usb_kill_urb(mos7840_port->read_urb);
+ mos7840_port->read_urb_busy = false;
+ }
+
+ return 0;
+}
+
+static int mos7840_resume(struct usb_serial *serial)
+{
+ struct moschip_port *mos7840_port;
+ struct usb_serial_port *port;
+ int res;
+ int i;
+
+ for (i = 0; i < serial->num_ports; ++i) {
+ port = serial->port[i];
+ if (!tty_port_initialized(&port->port))
+ continue;
+
+ mos7840_port = usb_get_serial_port_data(port);
+
+ mos7840_port->read_urb_busy = true;
+ res = usb_submit_urb(mos7840_port->read_urb, GFP_NOIO);
+ if (res)
+ mos7840_port->read_urb_busy = false;
+ }
+
+ return 0;
+}
+
static struct usb_serial_driver moschip7840_4port_device = {
.driver = {
.owner = THIS_MODULE,
@@ -1764,6 +1807,8 @@ static struct usb_serial_driver moschip7840_4port_device = {
.port_probe = mos7840_port_probe,
.port_remove = mos7840_port_remove,
.read_bulk_callback = mos7840_bulk_in_callback,
+ .suspend = mos7840_suspend,
+ .resume = mos7840_resume,
};
static struct usb_serial_driver * const serial_drivers[] = {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x c15a688e49987385baa8804bf65d570e362f8576
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071537-emergency-sprawl-4c37@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
c15a688e4998 ("USB: serial: mos7840: fix crash on resume")
7183192196a6 ("USB: serial: mos7840: rip out broken interrupt handling")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c15a688e49987385baa8804bf65d570e362f8576 Mon Sep 17 00:00:00 2001
From: Dmitry Smirnov <d.smirnov(a)inbox.lv>
Date: Sat, 15 Jun 2024 01:45:56 +0300
Subject: [PATCH] USB: serial: mos7840: fix crash on resume
Since commit c49cfa917025 ("USB: serial: use generic method if no
alternative is provided in usb serial layer"), USB serial core calls the
generic resume implementation when the driver has not provided one.
This can trigger a crash on resume with mos7840 since support for
multiple read URBs was added back in 2011. Specifically, both port read
URBs are now submitted on resume for open ports, but the context pointer
of the second URB is left set to the core rather than mos7840 port
structure.
Fix this by implementing dedicated suspend and resume functions for
mos7840.
Tested with Delock 87414 USB 2.0 to 4x serial adapter.
Signed-off-by: Dmitry Smirnov <d.smirnov(a)inbox.lv>
[ johan: analyse crash and rewrite commit message; set busy flag on
resume; drop bulk-in check; drop unnecessary usb_kill_urb() ]
Fixes: d83b405383c9 ("USB: serial: add support for multiple read urbs")
Cc: stable(a)vger.kernel.org # 3.3
Signed-off-by: Johan Hovold <johan(a)kernel.org>
diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 8b0308d84270..85697466b147 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -1737,6 +1737,49 @@ static void mos7840_port_remove(struct usb_serial_port *port)
kfree(mos7840_port);
}
+static int mos7840_suspend(struct usb_serial *serial, pm_message_t message)
+{
+ struct moschip_port *mos7840_port;
+ struct usb_serial_port *port;
+ int i;
+
+ for (i = 0; i < serial->num_ports; ++i) {
+ port = serial->port[i];
+ if (!tty_port_initialized(&port->port))
+ continue;
+
+ mos7840_port = usb_get_serial_port_data(port);
+
+ usb_kill_urb(mos7840_port->read_urb);
+ mos7840_port->read_urb_busy = false;
+ }
+
+ return 0;
+}
+
+static int mos7840_resume(struct usb_serial *serial)
+{
+ struct moschip_port *mos7840_port;
+ struct usb_serial_port *port;
+ int res;
+ int i;
+
+ for (i = 0; i < serial->num_ports; ++i) {
+ port = serial->port[i];
+ if (!tty_port_initialized(&port->port))
+ continue;
+
+ mos7840_port = usb_get_serial_port_data(port);
+
+ mos7840_port->read_urb_busy = true;
+ res = usb_submit_urb(mos7840_port->read_urb, GFP_NOIO);
+ if (res)
+ mos7840_port->read_urb_busy = false;
+ }
+
+ return 0;
+}
+
static struct usb_serial_driver moschip7840_4port_device = {
.driver = {
.owner = THIS_MODULE,
@@ -1764,6 +1807,8 @@ static struct usb_serial_driver moschip7840_4port_device = {
.port_probe = mos7840_port_probe,
.port_remove = mos7840_port_remove,
.read_bulk_callback = mos7840_bulk_in_callback,
+ .suspend = mos7840_suspend,
+ .resume = mos7840_resume,
};
static struct usb_serial_driver * const serial_drivers[] = {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 0913ec336a6c0c4a2b296bd9f74f8e41c4c83c8c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071524-yin-woozy-adaf@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
0913ec336a6c ("net: ks8851: Fix deadlock with the SPI chip variant")
317a215d4932 ("net: ks8851: Fix another TX stall caused by wrong ISR flag handling")
e0863634bf9f ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs")
be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
f96f700449b6 ("net: ks8851: Inline ks8851_rx_skb()")
3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
90f77c1c512f ("net: ethernet: Use netif_rx().")
2dc95a4d30ed ("net: Add dm9051 driver")
47aeea0d57e8 ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED")
2e49761e4fd1 ("net: lan966x: Add support for multiple bridge flags")
e14f72398df4 ("net: lan966x: Extend switchdev bridge flags")
6d2c186afa5d ("net: lan966x: Add vlan support.")
cf2f60897e92 ("net: lan966x: Add support to offload the forwarding.")
5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer")
2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics")
e18aba8941b4 ("net: lan966x: add mactable support")
d28d6d2e37d1 ("net: lan966x: add port module support")
db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
a97c69ba4f30 ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0913ec336a6c0c4a2b296bd9f74f8e41c4c83c8c Mon Sep 17 00:00:00 2001
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Date: Sat, 6 Jul 2024 12:13:37 +0200
Subject: [PATCH] net: ks8851: Fix deadlock with the SPI chip variant
When SMP is enabled and spinlocks are actually functional then there is
a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
and ks8851_irq:
watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
call trace:
queued_spin_lock_slowpath+0x100/0x284
do_raw_spin_lock+0x34/0x44
ks8851_start_xmit_spi+0x30/0xb8
ks8851_start_xmit+0x14/0x20
netdev_start_xmit+0x40/0x6c
dev_hard_start_xmit+0x6c/0xbc
sch_direct_xmit+0xa4/0x22c
__qdisc_run+0x138/0x3fc
qdisc_run+0x24/0x3c
net_tx_action+0xf8/0x130
handle_softirqs+0x1ac/0x1f0
__do_softirq+0x14/0x20
____do_softirq+0x10/0x1c
call_on_irq_stack+0x3c/0x58
do_softirq_own_stack+0x1c/0x28
__irq_exit_rcu+0x54/0x9c
irq_exit_rcu+0x10/0x1c
el1_interrupt+0x38/0x50
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x64/0x68
__netif_schedule+0x6c/0x80
netif_tx_wake_queue+0x38/0x48
ks8851_irq+0xb8/0x2c8
irq_thread_fn+0x2c/0x74
irq_thread+0x10c/0x1b0
kthread+0xc8/0xd8
ret_from_fork+0x10/0x20
This issue has not been identified earlier because tests were done on
a device with SMP disabled and so spinlocks were actually NOPs.
Now use spin_(un)lock_bh for TX queue related locking to avoid execution
of softirq work synchronously that would lead to a deadlock.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://patch.msgid.link/20240706101337.854474-1-rwahl@gmx.de
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 6453c92f0fa7..13462811eaae 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -352,11 +352,11 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
netif_dbg(ks, intr, ks->netdev,
"%s: txspace %d\n", __func__, tx_space);
- spin_lock(&ks->statelock);
+ spin_lock_bh(&ks->statelock);
ks->tx_space = tx_space;
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
- spin_unlock(&ks->statelock);
+ spin_unlock_bh(&ks->statelock);
}
if (status & IRQ_SPIBEI) {
@@ -635,14 +635,14 @@ static void ks8851_set_rx_mode(struct net_device *dev)
/* schedule work to do the actual set of the data if needed */
- spin_lock(&ks->statelock);
+ spin_lock_bh(&ks->statelock);
if (memcmp(&rxctrl, &ks->rxctrl, sizeof(rxctrl)) != 0) {
memcpy(&ks->rxctrl, &rxctrl, sizeof(ks->rxctrl));
schedule_work(&ks->rxctrl_work);
}
- spin_unlock(&ks->statelock);
+ spin_unlock_bh(&ks->statelock);
}
static int ks8851_set_mac_address(struct net_device *dev, void *addr)
diff --git a/drivers/net/ethernet/micrel/ks8851_spi.c b/drivers/net/ethernet/micrel/ks8851_spi.c
index 670c1de966db..3062cc0f9199 100644
--- a/drivers/net/ethernet/micrel/ks8851_spi.c
+++ b/drivers/net/ethernet/micrel/ks8851_spi.c
@@ -340,10 +340,10 @@ static void ks8851_tx_work(struct work_struct *work)
tx_space = ks8851_rdreg16_spi(ks, KS_TXMIR);
- spin_lock(&ks->statelock);
+ spin_lock_bh(&ks->statelock);
ks->queued_len -= dequeued_len;
ks->tx_space = tx_space;
- spin_unlock(&ks->statelock);
+ spin_unlock_bh(&ks->statelock);
ks8851_unlock_spi(ks, &flags);
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 0913ec336a6c0c4a2b296bd9f74f8e41c4c83c8c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071523-slate-cobweb-d1a8@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
0913ec336a6c ("net: ks8851: Fix deadlock with the SPI chip variant")
317a215d4932 ("net: ks8851: Fix another TX stall caused by wrong ISR flag handling")
e0863634bf9f ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs")
be0384bf599c ("net: ks8851: Handle softirqs at the end of IRQ thread to fix hang")
f96f700449b6 ("net: ks8851: Inline ks8851_rx_skb()")
3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
90f77c1c512f ("net: ethernet: Use netif_rx().")
2dc95a4d30ed ("net: Add dm9051 driver")
47aeea0d57e8 ("net: lan966x: Implement the callback SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED")
2e49761e4fd1 ("net: lan966x: Add support for multiple bridge flags")
e14f72398df4 ("net: lan966x: Extend switchdev bridge flags")
6d2c186afa5d ("net: lan966x: Add vlan support.")
cf2f60897e92 ("net: lan966x: Add support to offload the forwarding.")
5ccd66e01cbe ("net: lan966x: add support for interrupts from analyzer")
2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
12c2d0a5b8e2 ("net: lan966x: add ethtool configuration and statistics")
e18aba8941b4 ("net: lan966x: add mactable support")
d28d6d2e37d1 ("net: lan966x: add port module support")
db8bcaad5393 ("net: lan966x: add the basic lan966x driver")
a97c69ba4f30 ("net: ax88796c: ASIX AX88796C SPI Ethernet Adapter Driver")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0913ec336a6c0c4a2b296bd9f74f8e41c4c83c8c Mon Sep 17 00:00:00 2001
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Date: Sat, 6 Jul 2024 12:13:37 +0200
Subject: [PATCH] net: ks8851: Fix deadlock with the SPI chip variant
When SMP is enabled and spinlocks are actually functional then there is
a deadlock with the 'statelock' spinlock between ks8851_start_xmit_spi
and ks8851_irq:
watchdog: BUG: soft lockup - CPU#0 stuck for 27s!
call trace:
queued_spin_lock_slowpath+0x100/0x284
do_raw_spin_lock+0x34/0x44
ks8851_start_xmit_spi+0x30/0xb8
ks8851_start_xmit+0x14/0x20
netdev_start_xmit+0x40/0x6c
dev_hard_start_xmit+0x6c/0xbc
sch_direct_xmit+0xa4/0x22c
__qdisc_run+0x138/0x3fc
qdisc_run+0x24/0x3c
net_tx_action+0xf8/0x130
handle_softirqs+0x1ac/0x1f0
__do_softirq+0x14/0x20
____do_softirq+0x10/0x1c
call_on_irq_stack+0x3c/0x58
do_softirq_own_stack+0x1c/0x28
__irq_exit_rcu+0x54/0x9c
irq_exit_rcu+0x10/0x1c
el1_interrupt+0x38/0x50
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x64/0x68
__netif_schedule+0x6c/0x80
netif_tx_wake_queue+0x38/0x48
ks8851_irq+0xb8/0x2c8
irq_thread_fn+0x2c/0x74
irq_thread+0x10c/0x1b0
kthread+0xc8/0xd8
ret_from_fork+0x10/0x20
This issue has not been identified earlier because tests were done on
a device with SMP disabled and so spinlocks were actually NOPs.
Now use spin_(un)lock_bh for TX queue related locking to avoid execution
of softirq work synchronously that would lead to a deadlock.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://patch.msgid.link/20240706101337.854474-1-rwahl@gmx.de
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 6453c92f0fa7..13462811eaae 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -352,11 +352,11 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
netif_dbg(ks, intr, ks->netdev,
"%s: txspace %d\n", __func__, tx_space);
- spin_lock(&ks->statelock);
+ spin_lock_bh(&ks->statelock);
ks->tx_space = tx_space;
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
- spin_unlock(&ks->statelock);
+ spin_unlock_bh(&ks->statelock);
}
if (status & IRQ_SPIBEI) {
@@ -635,14 +635,14 @@ static void ks8851_set_rx_mode(struct net_device *dev)
/* schedule work to do the actual set of the data if needed */
- spin_lock(&ks->statelock);
+ spin_lock_bh(&ks->statelock);
if (memcmp(&rxctrl, &ks->rxctrl, sizeof(rxctrl)) != 0) {
memcpy(&ks->rxctrl, &rxctrl, sizeof(ks->rxctrl));
schedule_work(&ks->rxctrl_work);
}
- spin_unlock(&ks->statelock);
+ spin_unlock_bh(&ks->statelock);
}
static int ks8851_set_mac_address(struct net_device *dev, void *addr)
diff --git a/drivers/net/ethernet/micrel/ks8851_spi.c b/drivers/net/ethernet/micrel/ks8851_spi.c
index 670c1de966db..3062cc0f9199 100644
--- a/drivers/net/ethernet/micrel/ks8851_spi.c
+++ b/drivers/net/ethernet/micrel/ks8851_spi.c
@@ -340,10 +340,10 @@ static void ks8851_tx_work(struct work_struct *work)
tx_space = ks8851_rdreg16_spi(ks, KS_TXMIR);
- spin_lock(&ks->statelock);
+ spin_lock_bh(&ks->statelock);
ks->queued_len -= dequeued_len;
ks->tx_space = tx_space;
- spin_unlock(&ks->statelock);
+ spin_unlock_bh(&ks->statelock);
ks8851_unlock_spi(ks, &flags);
}
console_lock is the outermost subsystem lock for a lot of subsystems,
which means get/put_user must nest within. Which means it cannot be
acquired somewhere deeply nested in other locks, and most definitely
not while holding fs locks potentially needed to resolve faults.
console_trylock is the best we can do here.
Including printk folks since even trylock feels realyl iffy here to
me.
Reported-by: syzbot+6cebc1af246fe020a2f0(a)syzkaller.appspotmail.com
References: https://lore.kernel.org/dri-devel/00000000000026c1ff061cd0de12@google.com/
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
Fixes: a8f354284304 ("bcachefs: bch2_print_string_as_lines()")
Cc: <stable(a)vger.kernel.org> # v6.7+
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Brian Foster <bfoster(a)redhat.com>
Cc: linux-bcachefs(a)vger.kernel.org
Cc: Petr Mladek <pmladek(a)suse.com>
Cc: Steven Rostedt <rostedt(a)goodmis.org>
Cc: John Ogness <john.ogness(a)linutronix.de>
Cc: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Signed-off-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
---
fs/bcachefs/util.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/bcachefs/util.c b/fs/bcachefs/util.c
index de331dec2a99..02381c653603 100644
--- a/fs/bcachefs/util.c
+++ b/fs/bcachefs/util.c
@@ -255,13 +255,14 @@ void bch2_prt_u64_base2(struct printbuf *out, u64 v)
void bch2_print_string_as_lines(const char *prefix, const char *lines)
{
const char *p;
+ int locked;
if (!lines) {
printk("%s (null)\n", prefix);
return;
}
- console_lock();
+ locked = console_trylock();
while (1) {
p = strchrnul(lines, '\n');
printk("%s%.*s\n", prefix, (int) (p - lines), lines);
@@ -269,7 +270,8 @@ void bch2_print_string_as_lines(const char *prefix, const char *lines)
break;
lines = p + 1;
}
- console_unlock();
+ if (locked)
+ console_unlock();
}
int bch2_save_backtrace(bch_stacktrace *stack, struct task_struct *task, unsigned skipnr,
--
2.45.2
I'm announcing the release of the 6.6.40 kernel.
All users of the 6.1 kernel series that use the XHCI USB host controller driver
(i.e. USB 3) must upgrade.
The updated 6.6.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/usb/host/xhci-ring.c | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)
Greg Kroah-Hartman (2):
Revert "usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB"
Linux 6.6.40
I'm announcing the release of the 6.1.99 kernel.
All users of the 6.1 kernel series that use the XHCI USB host controller driver
(i.e. USB 3) must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/usb/host/xhci-ring.c | 5 ++---
2 files changed, 3 insertions(+), 4 deletions(-)
Greg Kroah-Hartman (2):
Revert "usb: xhci: prevent potential failure in handle_tx_event() for Transfer events without TRB"
Linux 6.1.99
From: Ahmed Ehab <bottaawesome633(a)gmail.com>
Preventing lockdep_set_subclass from creating a new instance of the
string literal. Hence, we will always have the same class->name among
parent and subclasses. This prevents kernel panics when looking up a
lock class while comparing class locks and class names.
Reported-by: <syzbot+7f4a6f7f7051474e40ad(a)syzkaller.appspotmail.com>
Fixes: de8f5e4f2dc1f ("lockdep: Introduce wait-type checks")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ahmed Ehab <bottaawesome633(a)gmail.com>
---
include/linux/lockdep.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 08b0d1d9d78b..df8fa5929de7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -173,7 +173,7 @@ static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
(lock)->dep_map.lock_type)
#define lockdep_set_subclass(lock, sub) \
- lockdep_init_map_type(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+ lockdep_init_map_type(&(lock)->dep_map, (lock)->dep_map.name, (lock)->dep_map.key, sub,\
(lock)->dep_map.wait_type_inner, \
(lock)->dep_map.wait_type_outer, \
(lock)->dep_map.lock_type)
--
2.45.2
From: Ahmed Ehab <bottaawesome633(a)gmail.com>
Preventing lockdep_set_subclass from creating a new instance of the
string literal. Hence, we will always have the same class->name among
parent and subclasses. This prevents kernel panics when looking up a
lock class while comparing class locks and class names.
Reported-by: <syzbot+7f4a6f7f7051474e40ad(a)syzkaller.appspotmail.com>
Fixes: de8f5e4f2dc1f ("lockdep: Introduce wait-type checks")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ahmed Ehab <bottaawesome633(a)gmail.com>
---
include/linux/lockdep.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 08b0d1d9d78b..df8fa5929de7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -173,7 +173,7 @@ static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
(lock)->dep_map.lock_type)
#define lockdep_set_subclass(lock, sub) \
- lockdep_init_map_type(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+ lockdep_init_map_type(&(lock)->dep_map, (lock)->dep_map.name, (lock)->dep_map.key, sub,\
(lock)->dep_map.wait_type_inner, \
(lock)->dep_map.wait_type_outer, \
(lock)->dep_map.lock_type)
--
2.45.2
From: Ahmed Ehab <bottaawesome633(a)gmail.com>
Preventing lockdep_set_subclass from creating a new instance of the
string literal. Hence, we will always have the same class->name among
parent and subclasses. This prevents kernel panics when looking up a
lock class while comparing class locks and class names.
Reported-by: <syzbot+7f4a6f7f7051474e40ad(a)syzkaller.appspotmail.com>
Fixes: fd5e3f5fe27
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ahmed Ehab <bottaawesome633(a)gmail.com>
---
include/linux/lockdep.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 08b0d1d9d78b..df8fa5929de7 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -173,7 +173,7 @@ static inline void lockdep_init_map(struct lockdep_map *lock, const char *name,
(lock)->dep_map.lock_type)
#define lockdep_set_subclass(lock, sub) \
- lockdep_init_map_type(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\
+ lockdep_init_map_type(&(lock)->dep_map, (lock)->dep_map.name, (lock)->dep_map.key, sub,\
(lock)->dep_map.wait_type_inner, \
(lock)->dep_map.wait_type_outer, \
(lock)->dep_map.lock_type)
--
2.45.2
On Mon, Jul 15, 2024 at 12:39:45AM +0300, ahmed Ehab wrote:
> Ok, I will.
> I just put ext4 because the syzkaller bug was mentioned in the ext4
> subsystem.
> Thanks,
> Ahmed
>
Please avoid top-posting. And
> On Mon, Jul 15, 2024 at 12:22 AM Waiman Long <longman(a)redhat.com> wrote:
>
> > On 7/14/24 01:14, botta633 wrote:
> > > From: Ahmed Ehab <bottaawesome633(a)gmail.com>
> > >
> > > Preventing lockdep_set_subclass from creating a new instance of the
> > > string literal. Hence, we will always have the same class->name among
> > > parent and subclasses. This prevents kernel panics when looking up a
> > > lock class while comparing class locks and class names.
> > >
> > > Reported-by: <syzbot+7f4a6f7f7051474e40ad(a)syzkaller.appspotmail.com>
> > > Fixes: fd5e3f5fe27
please add the title of the commit here as well, e.g.
Fixes: <sha1> ("<title>")
see
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
for example.
Regards,
Boqun
> > > Cc: <stable(a)vger.kernel.org>
> > > Signed-off-by: Ahmed Ehab <bottaawesome633(a)gmail.com>
> > > ---
> > > include/linux/lockdep.h | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
> > > index 08b0d1d9d78b..df8fa5929de7 100644
> > > --- a/include/linux/lockdep.h
> > > +++ b/include/linux/lockdep.h
> > > @@ -173,7 +173,7 @@ static inline void lockdep_init_map(struct
> > lockdep_map *lock, const char *name,
> > > (lock)->dep_map.lock_type)
> > >
> > > #define lockdep_set_subclass(lock, sub)
> > \
> > > - lockdep_init_map_type(&(lock)->dep_map, #lock,
> > (lock)->dep_map.key, sub,\
> > > + lockdep_init_map_type(&(lock)->dep_map, (lock)->dep_map.name,
> > (lock)->dep_map.key, sub,\
> > > (lock)->dep_map.wait_type_inner, \
> > > (lock)->dep_map.wait_type_outer, \
> > > (lock)->dep_map.lock_type)
> >
> > ext4 is a filesystem. It has nothing to do with locking/lockdep. Could
> > you resend the patches with the proper prefix of "lockdep:" or
> > "locking/lockdep:"?
> >
> > Thanks,
> > Longman
> >
> >
Currently, netconsole cleans up the netpoll structure before disabling
the target. This approach can lead to race conditions, as message
senders (write_ext_msg() and write_msg()) check if the target is
enabled before using netpoll. The sender can validate that the target is
enabled, but, the netpoll might be de-allocated already, causing
undesired behaviours.
This patch reverses the order of operations:
1. Disable the target
2. Clean up the netpoll structure
This change eliminates the potential race condition, ensuring that
no messages are sent through a partially cleaned-up netpoll structure.
Fixes: 2382b15bcc39 ("netconsole: take care of NETDEV_UNREGISTER event")
Cc: stable(a)vger.kernel.org
Signed-off-by: Breno Leitao <leitao(a)debian.org>
---
Changelog:
v2:
* Targeting "net" instead of "net-dev" (Jakub)
v1:
* https://lore.kernel.org/all/20240709144403.544099-4-leitao@debian.org/
drivers/net/netconsole.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d7070dd4fe73..aa66c923790f 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -974,6 +974,7 @@ static int netconsole_netdev_event(struct notifier_block *this,
/* rtnl_lock already held
* we might sleep in __netpoll_cleanup()
*/
+ nt->enabled = false;
spin_unlock_irqrestore(&target_list_lock, flags);
__netpoll_cleanup(&nt->np);
@@ -981,7 +982,6 @@ static int netconsole_netdev_event(struct notifier_block *this,
spin_lock_irqsave(&target_list_lock, flags);
netdev_put(nt->np.dev, &nt->np.dev_tracker);
nt->np.dev = NULL;
- nt->enabled = false;
stopped = true;
netconsole_target_put(nt);
goto restart;
--
2.43.0
In read_handle(), of_get_address() may return NULL which is later
dereferenced. Fix this by adding NULL check.
Based on our customized static analysis tool, extract vulnerability
features[1], then match similar vulnerability features in this function.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit
/?id=2d9adecc88ab678785b581ab021f039372c324cb
Cc: stable(a)vger.kernel.org
Fixes: 14baf4d9c739 ("cxl: Add guest-specific code")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v3:
- fixed up the changelog text as suggestions.
Changes in v2:
- added an explanation of how the potential vulnerability was discovered,
but not meet the description specification requirements.
---
drivers/misc/cxl/of.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/cxl/of.c b/drivers/misc/cxl/of.c
index bcc005dff1c0..d8dbb3723951 100644
--- a/drivers/misc/cxl/of.c
+++ b/drivers/misc/cxl/of.c
@@ -58,7 +58,7 @@ static int read_handle(struct device_node *np, u64 *handle)
/* Get address and size of the node */
prop = of_get_address(np, 0, &size, NULL);
- if (size)
+ if (!prop || size)
return -EINVAL;
/* Helper to read a big number; size is in cells (not bytes) */
--
2.25.1
The ov5675 specification says that the gap between XSHUTDN deassert and the
first I2C transaction should be a minimum of 8192 XVCLK cycles.
Right now we use a usleep_rage() that gives a sleep time of between about
430 and 860 microseconds.
On the Lenovo X13s we have observed that in about 1/20 cases the current
timing is too tight and we start transacting before the ov5675's reset
cycle completes, leading to I2C bus transaction failures.
The reset racing is sometimes triggered at initial chip probe but, more
usually on a subsequent power-off/power-on cycle e.g.
[ 71.451662] ov5675 24-0010: failed to write reg 0x0103. error = -5
[ 71.451686] ov5675 24-0010: failed to set plls
The current quiescence period we have is too tight. Instead of expressing
the post reset delay in terms of the current XVCLK this patch converts the
power-on and power-off delays to the maximum theoretical delay @ 6 MHz with
an additional buffer.
1.365 milliseconds on the power-on path is 1.5 milliseconds with grace.
853 microseconds on the power-off path is 900 microseconds with grace.
Fixes: 49d9ad719e89 ("media: ov5675: add device-tree support and support runtime PM")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
v2:
- Drop patch to read and act on reported XVCLK
- Use worst-case timings + a reasonable grace period in-lieu of previous
xvclk calculations on power-on and power-off.
- Link to v1: https://lore.kernel.org/r/20240711-linux-next-ov5675-v1-0-69e9b6c62c16@lina…
v1:
One long running saga for me on the Lenovo X13s is the occasional failure
to either probe or subsequently bring-up the ov5675 main RGB sensor on the
laptop.
Initially I suspected the PMIC for this part as the PMIC is using a new
interface on an I2C bus instead of an SPMI bus. In particular I thought
perhaps the I2C write to PMIC had completed but the regulator output hadn't
become stable from the perspective of the SoC. This however doesn't appear
to be the case - I can introduce a delay of milliseconds on the PMIC path
without resolving the sensor reset problem.
Secondly I thought about reset pin polarity or drive-strength but, again
playing about with both didn't yield decent results.
I also played with the duration of reset to no avail.
The error manifested as an I2C write timeout to the sensor which indicated
that the chip likely hadn't come out reset. An intermittent fault appearing
in perhaps 1/10 or 1/20 reset cycles.
Looking at the expression of the reset we see that there is a minimum time
expressed in XVCLK cycles between reset completion and first I2C
transaction to the sensor. The specification calls out the minimum delay @
8192 XVCLK cycles and the ov5675 driver meets that timing almost exactly.
A little too exactly - testing finally showed that we were too racy with
respect to the minimum quiescence between reset completion and first
command to the chip.
Fixing this error I choose to base the fix again on the number of clocks
but to also support any clock rate the chip could support by moving away
from a define to reading and using the XVCLK.
True enough only 19.2 MHz is currently supported but for the hypothetical
case where some other frequency is supported in the future, I wanted the
fix introduced in this series to still hold.
Hence this series:
1. Allows for any clock rate to be used in the valid range for the reset.
2. Elongates the post-reset period based on clock cycles which can now
vary.
Patch #2 can still be backported to stable irrespective of patch #1.
---
drivers/media/i2c/ov5675.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/media/i2c/ov5675.c b/drivers/media/i2c/ov5675.c
index 3641911bc73f..547d6fab816a 100644
--- a/drivers/media/i2c/ov5675.c
+++ b/drivers/media/i2c/ov5675.c
@@ -972,12 +972,10 @@ static int ov5675_set_stream(struct v4l2_subdev *sd, int enable)
static int ov5675_power_off(struct device *dev)
{
- /* 512 xvclk cycles after the last SCCB transation or MIPI frame end */
- u32 delay_us = DIV_ROUND_UP(512, OV5675_XVCLK_19_2 / 1000 / 1000);
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct ov5675 *ov5675 = to_ov5675(sd);
- usleep_range(delay_us, delay_us * 2);
+ usleep_range(900, 1000);
clk_disable_unprepare(ov5675->xvclk);
gpiod_set_value_cansleep(ov5675->reset_gpio, 1);
@@ -988,7 +986,6 @@ static int ov5675_power_off(struct device *dev)
static int ov5675_power_on(struct device *dev)
{
- u32 delay_us = DIV_ROUND_UP(8192, OV5675_XVCLK_19_2 / 1000 / 1000);
struct v4l2_subdev *sd = dev_get_drvdata(dev);
struct ov5675 *ov5675 = to_ov5675(sd);
int ret;
@@ -1014,8 +1011,11 @@ static int ov5675_power_on(struct device *dev)
gpiod_set_value_cansleep(ov5675->reset_gpio, 0);
- /* 8192 xvclk cycles prior to the first SCCB transation */
- usleep_range(delay_us, delay_us * 2);
+ /* Worst case quiesence gap is 1.365 milliseconds @ 6MHz XVCLK
+ * Add an additional threshold grace period to ensure reset
+ * completion before initiating our first I2C transaction.
+ */
+ usleep_range(1500, 1600);
return 0;
}
---
base-commit: 523b23f0bee3014a7a752c9bb9f5c54f0eddae88
change-id: 20240710-linux-next-ov5675-60b0e83c73f1
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
when that drive has a write-mostly flag set. During such an attempt,
the following assertion in bio_split() is hit:
BUG_ON(sectors <= 0);
Call Trace:
? bio_split+0x96/0xb0
? exc_invalid_op+0x53/0x70
? bio_split+0x96/0xb0
? asm_exc_invalid_op+0x1b/0x20
? bio_split+0x96/0xb0
? raid1_read_request+0x890/0xd20
? __call_rcu_common.constprop.0+0x97/0x260
raid1_make_request+0x81/0xce0
? __get_random_u32_below+0x17/0x70
? new_slab+0x2b3/0x580
md_handle_request+0x77/0x210
md_submit_bio+0x62/0xa0
__submit_bio+0x17b/0x230
submit_bio_noacct_nocheck+0x18e/0x3c0
submit_bio_noacct+0x244/0x670
After investigation, it turned out that choose_slow_rdev() does not set
the value of max_sectors in some cases and because of it,
raid1_read_request calls bio_split with sectors == 0.
Fix it by filling in this variable.
This bug was introduced in
commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
but apparently hidden until
commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
shortly thereafter.
Cc: stable(a)vger.kernel.org # 6.9.x+
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
Cc: Song Liu <song(a)kernel.org>
Cc: Yu Kuai <yukuai3(a)huawei.com>
Cc: Paul Luse <paul.e.luse(a)linux.intel.com>
Cc: Xiao Ni <xni(a)redhat.com>
Cc: Mariusz Tkaczyk <mariusz.tkaczyk(a)linux.intel.com>
Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@o2.pl/
--
Tested on both Linux 6.10 and 6.9.8.
Inside a VM, mdadm testsuite for RAID1 on 6.10 did not find any problems:
./test --dev=loop --no-error --raidtype=raid1
(on 6.9.8 there was one failure, caused by external bitmap support not
compiled in).
Notes:
- I was reliably getting deadlocks when adding / removing devices
on such an array - while the array was loaded with fsstress with 20
concurrent processes. When the array was idle or loaded with fsstress
with 8 processes, no such deadlocks happened in my tests.
This occurred also on unpatched Linux 6.8.0 though, but not on
6.1.97-rc1, so this is likely an independent regression (to be
investigated).
- I was also getting deadlocks when adding / removing the bitmap on the
array in similar conditions - this happened on Linux 6.1.97-rc1
also though. fsstress with 8 concurrent processes did cause it only
once during many tests.
- in my testing, there was once a problem with hot adding an
internal bitmap to the array:
mdadm: Cannot add bitmap while array is resyncing or reshaping etc.
mdadm: failed to set internal bitmap.
even though no such reshaping was happening according to /proc/mdstat.
This seems unrelated, though.
---
drivers/md/raid1.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b8a71ca66dd..82f70a4ce6ed 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -680,6 +680,7 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
len = r1_bio->sectors;
read_len = raid1_check_read_range(rdev, this_sector, &len);
if (read_len == r1_bio->sectors) {
+ *max_sectors = read_len;
update_read_sectors(conf, disk, this_sector, read_len);
return disk;
}
base-commit: 256abd8e550ce977b728be79a74e1729438b4948
--
2.25.1
mem_cgroup_calculate_protection() is not stateless and should only be
used as part of a top-down tree traversal. shrink_one() traverses the
per-node memcg LRU instead of the root_mem_cgroup tree, and therefore
it should not call mem_cgroup_calculate_protection().
The existing misuse in shrink_one() can cause ineffective protection
of sub-trees that are grandchildren of root_mem_cgroup. Fix it by
reusing lru_gen_age_node(), which already traverses the
root_mem_cgroup tree, to calculate the protection.
Previously lru_gen_age_node() opportunistically skips the first pass,
i.e., when scan_control->priority is DEF_PRIORITY. On the second pass,
lruvec_is_sizable() uses appropriate scan_control->priority, set by
set_initial_priority() from lru_gen_shrink_node(), to decide whether a
memcg is too small to reclaim from.
Now lru_gen_age_node() unconditionally traverses the root_mem_cgroup
tree. So it should call set_initial_priority() upfront, to make sure
lruvec_is_sizable() uses appropriate scan_control->priority on the
first pass. Otherwise, lruvec_is_reclaimable() can return false
negatives and result in premature OOM kills when min_ttl_ms is used.
Reported-by: T.J. Mercier <tjmercier(a)google.com>
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
---
mm/vmscan.c | 86 +++++++++++++++++++++++++----------------------------
1 file changed, 40 insertions(+), 46 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 6216d79edb7f..525d3ffa8451 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3915,6 +3915,32 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec, unsigned long seq,
* working set protection
******************************************************************************/
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
+{
+ int priority;
+ unsigned long reclaimable;
+
+ if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+ return;
+ /*
+ * Determine the initial priority based on
+ * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
+ * where reclaimed_to_scanned_ratio = inactive / total.
+ */
+ reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
+ if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc))
+ reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
+
+ /* round down reclaimable and round up sc->nr_to_reclaim */
+ priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+ /*
+ * The estimation is based on LRU pages only, so cap it to prevent
+ * overshoots of shrinker objects by large margins.
+ */
+ sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
+}
+
static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *sc)
{
int gen, type, zone;
@@ -3948,19 +3974,17 @@ static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_control *sc
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
DEFINE_MIN_SEQ(lruvec);
+ if (mem_cgroup_below_min(NULL, memcg))
+ return false;
+
+ if (!lruvec_is_sizable(lruvec, sc))
+ return false;
+
/* see the comment on lru_gen_folio */
gen = lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
birth = READ_ONCE(lruvec->lrugen.timestamps[gen]);
- if (time_is_after_jiffies(birth + min_ttl))
- return false;
-
- if (!lruvec_is_sizable(lruvec, sc))
- return false;
-
- mem_cgroup_calculate_protection(NULL, memcg);
-
- return !mem_cgroup_below_min(NULL, memcg);
+ return time_is_before_jiffies(birth + min_ttl);
}
/* to protect the working set of the last N jiffies */
@@ -3970,23 +3994,20 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct mem_cgroup *memcg;
unsigned long min_ttl = READ_ONCE(lru_gen_min_ttl);
+ bool reclaimable = !min_ttl;
VM_WARN_ON_ONCE(!current_is_kswapd());
- /* check the order to exclude compaction-induced reclaim */
- if (!min_ttl || sc->order || sc->priority == DEF_PRIORITY)
- return;
+ set_initial_priority(pgdat, sc);
memcg = mem_cgroup_iter(NULL, NULL, NULL);
do {
struct lruvec *lruvec = mem_cgroup_lruvec(memcg, pgdat);
- if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
- mem_cgroup_iter_break(NULL, memcg);
- return;
- }
+ mem_cgroup_calculate_protection(NULL, memcg);
- cond_resched();
+ if (!reclaimable)
+ reclaimable = lruvec_is_reclaimable(lruvec, sc, min_ttl);
} while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)));
/*
@@ -3994,7 +4015,7 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc)
* younger than min_ttl. However, another possibility is all memcgs are
* either too small or below min.
*/
- if (mutex_trylock(&oom_lock)) {
+ if (!reclaimable && mutex_trylock(&oom_lock)) {
struct oom_control oc = {
.gfp_mask = sc->gfp_mask,
};
@@ -4786,8 +4807,7 @@ static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
- mem_cgroup_calculate_protection(NULL, memcg);
-
+ /* lru_gen_age_node() called mem_cgroup_calculate_protection() */
if (mem_cgroup_below_min(NULL, memcg))
return MEMCG_LRU_YOUNG;
@@ -4911,32 +4931,6 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc
blk_finish_plug(&plug);
}
-static void set_initial_priority(struct pglist_data *pgdat, struct scan_control *sc)
-{
- int priority;
- unsigned long reclaimable;
-
- if (sc->priority != DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
- return;
- /*
- * Determine the initial priority based on
- * (total >> priority) * reclaimed_to_scanned_ratio = nr_to_reclaim,
- * where reclaimed_to_scanned_ratio = inactive / total.
- */
- reclaimable = node_page_state(pgdat, NR_INACTIVE_FILE);
- if (can_reclaim_anon_pages(NULL, pgdat->node_id, sc))
- reclaimable += node_page_state(pgdat, NR_INACTIVE_ANON);
-
- /* round down reclaimable and round up sc->nr_to_reclaim */
- priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
-
- /*
- * The estimation is based on LRU pages only, so cap it to prevent
- * overshoots of shrinker objects by large margins.
- */
- sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
-}
-
static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
{
struct blk_plug plug;
--
2.45.2.993.g49e7a77208-goog
The quilt patch titled
Subject: mm/mglru: fix overshooting shrinker memory
has been removed from the -mm tree. Its filename was
mm-mglru-fix-overshooting-shrinker-memory.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/mglru: fix overshooting shrinker memory
Date: Thu, 11 Jul 2024 13:19:57 -0600
set_initial_priority() tries to jump-start global reclaim by estimating
the priority based on cold/hot LRU pages. The estimation does not account
for shrinker objects, and it cannot do so because their sizes can be in
different units other than page.
If shrinker objects are the majority, e.g., on TrueNAS SCALE 24.04.0 where
ZFS ARC can use almost all system memory, set_initial_priority() can
vastly underestimate how much memory ARC shrinker can evict and assign
extreme low values to scan_control->priority, resulting in overshoots of
shrinker objects.
To reproduce the problem, using TrueNAS SCALE 24.04.0 with 32GB DRAM, a
test ZFS pool and the following commands:
fio --name=mglru.file --numjobs=36 --ioengine=io_uring \
--directory=/root/test-zfs-pool/ --size=1024m --buffered=1 \
--rw=randread --random_distribution=random \
--time_based --runtime=1h &
for ((i = 0; i < 20; i++))
do
sleep 120
fio --name=mglru.anon --numjobs=16 --ioengine=mmap \
--filename=/dev/zero --size=1024m --fadvise_hint=0 \
--rw=randrw --random_distribution=random \
--time_based --runtime=1m
done
To fix the problem:
1. Cap scan_control->priority at or above DEF_PRIORITY/2, to prevent
the jump-start from being overly aggressive.
2. Account for the progress from mm_account_reclaimed_pages(), to
prevent kswapd_shrink_node() from raising the priority
unnecessarily.
Link: https://lkml.kernel.org/r/20240711191957.939105-2-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Alexander Motin <mav(a)ixsystems.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/vmscan.c~mm-mglru-fix-overshooting-shrinker-memory
+++ a/mm/vmscan.c
@@ -4930,7 +4930,11 @@ static void set_initial_priority(struct
/* round down reclaimable and round up sc->nr_to_reclaim */
priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
- sc->priority = clamp(priority, 0, DEF_PRIORITY);
+ /*
+ * The estimation is based on LRU pages only, so cap it to prevent
+ * overshoots of shrinker objects by large margins.
+ */
+ sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
}
static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
@@ -6754,6 +6758,7 @@ static bool kswapd_shrink_node(pg_data_t
{
struct zone *zone;
int z;
+ unsigned long nr_reclaimed = sc->nr_reclaimed;
/* Reclaim a number of pages proportional to the number of zones */
sc->nr_to_reclaim = 0;
@@ -6781,7 +6786,8 @@ static bool kswapd_shrink_node(pg_data_t
if (sc->order && sc->nr_reclaimed >= compact_gap(sc->order))
sc->order = 0;
- return sc->nr_scanned >= sc->nr_to_reclaim;
+ /* account for progress from mm_account_reclaimed_pages() */
+ return max(sc->nr_scanned, sc->nr_reclaimed - nr_reclaimed) >= sc->nr_to_reclaim;
}
/* Page allocator PCP high watermark is lowered if reclaim is active. */
_
Patches currently in -mm which might be from yuzhao(a)google.com are
The quilt patch titled
Subject: mm/mglru: fix div-by-zero in vmpressure_calc_level()
has been removed from the -mm tree. Its filename was
mm-mglru-fix-div-by-zero-in-vmpressure_calc_level.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/mglru: fix div-by-zero in vmpressure_calc_level()
Date: Thu, 11 Jul 2024 13:19:56 -0600
evict_folios() uses a second pass to reclaim folios that have gone through
page writeback and become clean before it finishes the first pass, since
folio_rotate_reclaimable() cannot handle those folios due to the
isolation.
The second pass tries to avoid potential double counting by deducting
scan_control->nr_scanned. However, this can result in underflow of
nr_scanned, under a condition where shrink_folio_list() does not increment
nr_scanned, i.e., when folio_trylock() fails.
The underflow can cause the divisor, i.e., scale=scanned+reclaimed in
vmpressure_calc_level(), to become zero, resulting in the following crash:
[exception RIP: vmpressure_work_fn+101]
process_one_work at ffffffffa3313f2b
Since scan_control->nr_scanned has no established semantics, the potential
double counting has minimal risks. Therefore, fix the problem by not
deducting scan_control->nr_scanned in evict_folios().
Link: https://lkml.kernel.org/r/20240711191957.939105-1-yuzhao@google.com
Fixes: 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated")
Reported-by: Wei Xu <weixugc(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Cc: Alexander Motin <mav(a)ixsystems.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 1 -
1 file changed, 1 deletion(-)
--- a/mm/vmscan.c~mm-mglru-fix-div-by-zero-in-vmpressure_calc_level
+++ a/mm/vmscan.c
@@ -4597,7 +4597,6 @@ retry:
/* retry folios that may have missed folio_rotate_reclaimable() */
list_move(&folio->lru, &clean);
- sc->nr_scanned -= folio_nr_pages(folio);
}
spin_lock_irq(&lruvec->lru_lock);
_
Patches currently in -mm which might be from yuzhao(a)google.com are
The quilt patch titled
Subject: mm/hugetlb: fix potential race with try_memory_failure_hugetlb()
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/hugetlb: fix potential race with try_memory_failure_hugetlb()
Date: Wed, 10 Jul 2024 16:14:45 +0800
There is a potential race between __update_and_free_hugetlb_folio() and
try_memory_failure_hugetlb():
CPU1 CPU2
__update_and_free_hugetlb_folio try_memory_failure_hugetlb
spin_lock_irq(&hugetlb_lock);
__get_huge_page_for_hwpoison
folio_test_hugetlb
-- It's still hugetlb folio.
folio_test_hugetlb_raw_hwp_unreliable
-- raw_hwp_unreliable flag is not set yet.
folio_set_hugetlb_hwpoison
-- raw_hwp_unreliable flag might
be set.
spin_unlock_irq(&hugetlb_lock);
spin_lock_irq(&hugetlb_lock);
__folio_clear_hugetlb(folio);
-- Hugetlb flag is cleared but too late!
spin_unlock_irq(&hugetlb_lock);
When this race occurs, raw error pages will hit pcplists/buddy. Fix this
issue by deferring folio_test_hugetlb_raw_hwp_unreliable() until
__folio_clear_hugetlb() is done. The raw_hwp_unreliable flag cannot be
set after hugetlb folio flag is cleared.
Link: https://lkml.kernel.org/r/20240710081445.3307355-1-linmiaohe@huawei.com
Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb
+++ a/mm/hugetlb.c
@@ -1706,13 +1706,6 @@ static void __update_and_free_hugetlb_fo
return;
/*
- * If we don't know which subpages are hwpoisoned, we can't free
- * the hugepage, so it's leaked intentionally.
- */
- if (folio_test_hugetlb_raw_hwp_unreliable(folio))
- return;
-
- /*
* If folio is not vmemmap optimized (!clear_flag), then the folio
* is no longer identified as a hugetlb page. hugetlb_vmemmap_restore_folio
* can only be passed hugetlb pages and will BUG otherwise.
@@ -1730,6 +1723,13 @@ static void __update_and_free_hugetlb_fo
}
/*
+ * If we don't know which subpages are hwpoisoned, we can't free
+ * the hugepage, so it's leaked intentionally.
+ */
+ if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+ return;
+
+ /*
* Move PageHWPoison flag from head page to the raw error pages,
* which makes any healthy subpages reusable.
*/
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-fix-vm_bug_on_pagepagepoisonedpage-when-unpoison-memory.patch
mm-hugetlb-fix-possible-recursive-locking-detected-warning.patch
The quilt patch titled
Subject: mm: shmem: rename mTHP shmem counters
has been removed from the -mm tree. Its filename was
mm-shmem-rename-mthp-shmem-counters.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: shmem: rename mTHP shmem counters
Date: Wed, 10 Jul 2024 10:55:01 +0100
The legacy PMD-sized THP counters at /proc/vmstat include thp_file_alloc,
thp_file_fallback and thp_file_fallback_charge, which rather confusingly
refer to shmem THP and do not include any other types of file pages. This
is inconsistent since in most other places in the kernel, THP counters are
explicitly separated for anon, shmem and file flavours. However, we are
stuck with it since it constitutes a user ABI.
Recently, commit 66f44583f9b6 ("mm: shmem: add mTHP counters for anonymous
shmem") added equivalent mTHP stats for shmem, keeping the same "file_"
prefix in the names. But in future, we may want to add extra stats to
cover actual file pages, at which point, it would all become very
confusing.
So let's take the opportunity to rename these new counters "shmem_" before
the change makes it upstream and the ABI becomes immutable. While we are
at it, let's improve the documentation for the legacy counters to make it
clear that they count shmem pages only.
Link: https://lkml.kernel.org/r/20240710095503.3193901-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Reviewed-by: Lance Yang <ioworker0(a)gmail.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Barry Song <baohua(a)kernel.org>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Daniel Gomez <da.gomez(a)samsung.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/mm/transhuge.rst | 29 ++++++++++---------
include/linux/huge_mm.h | 6 +--
mm/huge_memory.c | 12 +++----
mm/shmem.c | 8 ++---
4 files changed, 29 insertions(+), 26 deletions(-)
--- a/Documentation/admin-guide/mm/transhuge.rst~mm-shmem-rename-mthp-shmem-counters
+++ a/Documentation/admin-guide/mm/transhuge.rst
@@ -412,20 +412,23 @@ thp_collapse_alloc_failed
the allocation.
thp_file_alloc
- is incremented every time a file huge page is successfully
- allocated.
+ is incremented every time a shmem huge page is successfully
+ allocated (Note that despite being named after "file", the counter
+ measures only shmem).
thp_file_fallback
- is incremented if a file huge page is attempted to be allocated
- but fails and instead falls back to using small pages.
+ is incremented if a shmem huge page is attempted to be allocated
+ but fails and instead falls back to using small pages. (Note that
+ despite being named after "file", the counter measures only shmem).
thp_file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
- successful.
+ successful. (Note that despite being named after "file", the
+ counter measures only shmem).
thp_file_mapped
- is incremented every time a file huge page is mapped into
+ is incremented every time a file or shmem huge page is mapped into
user address space.
thp_split_page
@@ -496,16 +499,16 @@ swpout_fallback
Usually because failed to allocate some continuous swap space
for the huge page.
-file_alloc
- is incremented every time a file huge page is successfully
+shmem_alloc
+ is incremented every time a shmem huge page is successfully
allocated.
-file_fallback
- is incremented if a file huge page is attempted to be allocated
+shmem_fallback
+ is incremented if a shmem huge page is attempted to be allocated
but fails and instead falls back to using small pages.
-file_fallback_charge
- is incremented if a file huge page cannot be charged and instead
+shmem_fallback_charge
+ is incremented if a shmem huge page cannot be charged and instead
falls back to using small pages even though the allocation was
successful.
--- a/include/linux/huge_mm.h~mm-shmem-rename-mthp-shmem-counters
+++ a/include/linux/huge_mm.h
@@ -269,9 +269,9 @@ enum mthp_stat_item {
MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE,
MTHP_STAT_SWPOUT,
MTHP_STAT_SWPOUT_FALLBACK,
- MTHP_STAT_FILE_ALLOC,
- MTHP_STAT_FILE_FALLBACK,
- MTHP_STAT_FILE_FALLBACK_CHARGE,
+ MTHP_STAT_SHMEM_ALLOC,
+ MTHP_STAT_SHMEM_FALLBACK,
+ MTHP_STAT_SHMEM_FALLBACK_CHARGE,
MTHP_STAT_SPLIT,
MTHP_STAT_SPLIT_FAILED,
MTHP_STAT_SPLIT_DEFERRED,
--- a/mm/huge_memory.c~mm-shmem-rename-mthp-shmem-counters
+++ a/mm/huge_memory.c
@@ -568,9 +568,9 @@ DEFINE_MTHP_STAT_ATTR(anon_fault_fallbac
DEFINE_MTHP_STAT_ATTR(anon_fault_fallback_charge, MTHP_STAT_ANON_FAULT_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(swpout, MTHP_STAT_SWPOUT);
DEFINE_MTHP_STAT_ATTR(swpout_fallback, MTHP_STAT_SWPOUT_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
-DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
-DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
+DEFINE_MTHP_STAT_ATTR(shmem_alloc, MTHP_STAT_SHMEM_ALLOC);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback, MTHP_STAT_SHMEM_FALLBACK);
+DEFINE_MTHP_STAT_ATTR(shmem_fallback_charge, MTHP_STAT_SHMEM_FALLBACK_CHARGE);
DEFINE_MTHP_STAT_ATTR(split, MTHP_STAT_SPLIT);
DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED);
DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED);
@@ -581,9 +581,9 @@ static struct attribute *stats_attrs[] =
&anon_fault_fallback_charge_attr.attr,
&swpout_attr.attr,
&swpout_fallback_attr.attr,
- &file_alloc_attr.attr,
- &file_fallback_attr.attr,
- &file_fallback_charge_attr.attr,
+ &shmem_alloc_attr.attr,
+ &shmem_fallback_attr.attr,
+ &shmem_fallback_charge_attr.attr,
&split_attr.attr,
&split_failed_attr.attr,
&split_deferred_attr.attr,
--- a/mm/shmem.c~mm-shmem-rename-mthp-shmem-counters
+++ a/mm/shmem.c
@@ -1777,7 +1777,7 @@ static struct folio *shmem_alloc_and_add
if (pages == HPAGE_PMD_NR)
count_vm_event(THP_FILE_FALLBACK);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
+ count_mthp_stat(order, MTHP_STAT_SHMEM_FALLBACK);
#endif
order = next_order(&suitable_orders, order);
}
@@ -1804,8 +1804,8 @@ allocated:
count_vm_event(THP_FILE_FALLBACK_CHARGE);
}
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK);
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_FALLBACK_CHARGE);
#endif
}
goto unlock;
@@ -2181,7 +2181,7 @@ repeat:
if (folio_test_pmd_mappable(folio))
count_vm_event(THP_FILE_ALLOC);
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SHMEM_ALLOC);
#endif
goto alloced;
}
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
The quilt patch titled
Subject: mm/migrate: putback split folios when numa hint migration fails
has been removed from the -mm tree. Its filename was
mm-migrate-putback-split-folios-when-numa-hint-migration-fails.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/migrate: putback split folios when numa hint migration fails
Date: Mon, 8 Jul 2024 17:55:37 -0400
This issue is not from any report yet, but by code observation only.
This is yet another fix besides Hugh's patch [1] but on relevant code
path, where eager split of folio can happen if the folio is already on
deferred list during a folio migration.
Here the issue is NUMA path (migrate_misplaced_folio()) may start to
encounter such folio split now even with MR_NUMA_MISPLACED hint applied.
Then when migrate_pages() didn't migrate all the folios, it's possible the
split small folios be put onto the list instead of the original folio.
Then putting back only the head page won't be enough.
Fix it by putting back all the folios on the list.
[1] https://lore.kernel.org/all/46c948b4-4dd8-6e03-4c7b-ce4e81cfa536@google.com/
[akpm(a)linux-foundation.org: remove now unused local `nr_pages']
Link: https://lkml.kernel.org/r/20240708215537.2630610-1-peterx@redhat.com
Fixes: 7262f208ca68 ("mm/migrate: split source folio if it is on deferred split list")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/migrate.c | 11 ++---------
1 file changed, 2 insertions(+), 9 deletions(-)
--- a/mm/migrate.c~mm-migrate-putback-split-folios-when-numa-hint-migration-fails
+++ a/mm/migrate.c
@@ -2621,20 +2621,13 @@ int migrate_misplaced_folio(struct folio
int nr_remaining;
unsigned int nr_succeeded;
LIST_HEAD(migratepages);
- int nr_pages = folio_nr_pages(folio);
list_add(&folio->lru, &migratepages);
nr_remaining = migrate_pages(&migratepages, alloc_misplaced_dst_folio,
NULL, node, MIGRATE_ASYNC,
MR_NUMA_MISPLACED, &nr_succeeded);
- if (nr_remaining) {
- if (!list_empty(&migratepages)) {
- list_del(&folio->lru);
- node_stat_mod_folio(folio, NR_ISOLATED_ANON +
- folio_is_file_lru(folio), -nr_pages);
- folio_putback_lru(folio);
- }
- }
+ if (nr_remaining && !list_empty(&migratepages))
+ putback_movable_pages(&migratepages);
if (nr_succeeded) {
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
_
Patches currently in -mm which might be from peterx(a)redhat.com are
The quilt patch titled
Subject: mm: fix khugepaged activation policy
has been removed from the -mm tree. Its filename was
mm-fix-khugepaged-activation-policy.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: fix khugepaged activation policy
Date: Thu, 4 Jul 2024 10:10:50 +0100
Since the introduction of mTHP, the docuementation has stated that
khugepaged would be enabled when any mTHP size is enabled, and disabled
when all mTHP sizes are disabled. There are 2 problems with this; 1.
this is not what was implemented by the code and 2. this is not the
desirable behavior.
Desirable behavior is for khugepaged to be enabled when any PMD-sized THP
is enabled, anon or file. (Note that file THP is still controlled by the
top-level control so we must always consider that, as well as the PMD-size
mTHP control for anon). khugepaged only supports collapsing to PMD-sized
THP so there is no value in enabling it when PMD-sized THP is disabled.
So let's change the code and documentation to reflect this policy.
Further, per-size enabled control modification events were not previously
forwarded to khugepaged to give it an opportunity to start or stop.
Consequently the following was resulting in khugepaged eroneously not
being activated:
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
[ryan.roberts(a)arm.com: v3]
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Link: https://lkml.kernel.org/r/20240704091051.2411934-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redha…
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/mm/transhuge.rst | 11 ++----
include/linux/huge_mm.h | 12 ------
mm/huge_memory.c | 7 ++++
mm/khugepaged.c | 33 ++++++++++++++-----
4 files changed, 38 insertions(+), 25 deletions(-)
--- a/Documentation/admin-guide/mm/transhuge.rst~mm-fix-khugepaged-activation-policy
+++ a/Documentation/admin-guide/mm/transhuge.rst
@@ -202,12 +202,11 @@ PMD-mappable transparent hugepage::
cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size
-khugepaged will be automatically started when one or more hugepage
-sizes are enabled (either by directly setting "always" or "madvise",
-or by setting "inherit" while the top-level enabled is set to "always"
-or "madvise"), and it'll be automatically shutdown when the last
-hugepage size is disabled (either by directly setting "never", or by
-setting "inherit" while the top-level enabled is set to "never").
+khugepaged will be automatically started when PMD-sized THP is enabled
+(either of the per-size anon control or the top-level control are set
+to "always" or "madvise"), and it'll be automatically shutdown when
+PMD-sized THP is disabled (when both the per-size anon control and the
+top-level control are "never")
Khugepaged controls
-------------------
--- a/include/linux/huge_mm.h~mm-fix-khugepaged-activation-policy
+++ a/include/linux/huge_mm.h
@@ -128,18 +128,6 @@ static inline bool hugepage_global_alway
(1<<TRANSPARENT_HUGEPAGE_FLAG);
}
-static inline bool hugepage_flags_enabled(void)
-{
- /*
- * We cover both the anon and the file-backed case here; we must return
- * true if globally enabled, even when all anon sizes are set to never.
- * So we don't need to look at huge_anon_orders_inherit.
- */
- return hugepage_global_enabled() ||
- READ_ONCE(huge_anon_orders_always) ||
- READ_ONCE(huge_anon_orders_madvise);
-}
-
static inline int highest_order(unsigned long orders)
{
return fls_long(orders) - 1;
--- a/mm/huge_memory.c~mm-fix-khugepaged-activation-policy
+++ a/mm/huge_memory.c
@@ -502,6 +502,13 @@ static ssize_t thpsize_enabled_store(str
} else
ret = -EINVAL;
+ if (ret > 0) {
+ int err;
+
+ err = start_stop_khugepaged();
+ if (err)
+ ret = err;
+ }
return ret;
}
--- a/mm/khugepaged.c~mm-fix-khugepaged-activation-policy
+++ a/mm/khugepaged.c
@@ -413,6 +413,26 @@ static inline int hpage_collapse_test_ex
test_bit(MMF_DISABLE_THP, &mm->flags);
}
+static bool hugepage_pmd_enabled(void)
+{
+ /*
+ * We cover both the anon and the file-backed case here; file-backed
+ * hugepages, when configured in, are determined by the global control.
+ * Anon pmd-sized hugepages are determined by the pmd-size control.
+ */
+ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+ hugepage_global_enabled())
+ return true;
+ if (test_bit(PMD_ORDER, &huge_anon_orders_always))
+ return true;
+ if (test_bit(PMD_ORDER, &huge_anon_orders_madvise))
+ return true;
+ if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) &&
+ hugepage_global_enabled())
+ return true;
+ return false;
+}
+
void __khugepaged_enter(struct mm_struct *mm)
{
struct khugepaged_mm_slot *mm_slot;
@@ -449,7 +469,7 @@ void khugepaged_enter_vma(struct vm_area
unsigned long vm_flags)
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) &&
- hugepage_flags_enabled()) {
+ hugepage_pmd_enabled()) {
if (thp_vma_allowable_order(vma, vm_flags, TVA_ENFORCE_SYSFS,
PMD_ORDER))
__khugepaged_enter(vma->vm_mm);
@@ -2462,8 +2482,7 @@ breakouterloop_mmap_lock:
static int khugepaged_has_work(void)
{
- return !list_empty(&khugepaged_scan.mm_head) &&
- hugepage_flags_enabled();
+ return !list_empty(&khugepaged_scan.mm_head) && hugepage_pmd_enabled();
}
static int khugepaged_wait_event(void)
@@ -2536,7 +2555,7 @@ static void khugepaged_wait_work(void)
return;
}
- if (hugepage_flags_enabled())
+ if (hugepage_pmd_enabled())
wait_event_freezable(khugepaged_wait, khugepaged_wait_event());
}
@@ -2567,7 +2586,7 @@ static void set_recommended_min_free_kby
int nr_zones = 0;
unsigned long recommended_min;
- if (!hugepage_flags_enabled()) {
+ if (!hugepage_pmd_enabled()) {
calculate_min_free_kbytes();
goto update_wmarks;
}
@@ -2617,7 +2636,7 @@ int start_stop_khugepaged(void)
int err = 0;
mutex_lock(&khugepaged_mutex);
- if (hugepage_flags_enabled()) {
+ if (hugepage_pmd_enabled()) {
if (!khugepaged_thread)
khugepaged_thread = kthread_run(khugepaged, NULL,
"khugepaged");
@@ -2643,7 +2662,7 @@ fail:
void khugepaged_min_free_kbytes_update(void)
{
mutex_lock(&khugepaged_mutex);
- if (hugepage_flags_enabled() && khugepaged_thread)
+ if (hugepage_pmd_enabled() && khugepaged_thread)
set_recommended_min_free_kbytes();
mutex_unlock(&khugepaged_mutex);
}
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
The quilt patch titled
Subject: mm-fix-khugepaged-activation-policy-v3
has been removed from the -mm tree. Its filename was
mm-fix-khugepaged-activation-policy-v3.patch
This patch was dropped because it was folded into mm-fix-khugepaged-activation-policy.patch
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm-fix-khugepaged-activation-policy-v3
Date: Fri, 5 Jul 2024 11:28:48 +0100
- Make hugepage_pmd_enabled() out-of-line static in khugepaged.c (per Andrew)
- Refactor hugepage_pmd_enabled() for better readability (per Andrew)
Link: https://lkml.kernel.org/r/20240705102849.2479686-1-ryan.roberts@arm.com
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Fixes: 3485b88390b0 ("mm: thp: introduce multi-size THP sysfs interface")
Closes: https://lore.kernel.org/linux-mm/7a0bbe69-1e3d-4263-b206-da007791a5c4@redha…
Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Cc: Barry Song <baohua(a)kernel.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Lance Yang <ioworker0(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/huge_mm.h | 13 -------------
mm/khugepaged.c | 20 ++++++++++++++++++++
2 files changed, 20 insertions(+), 13 deletions(-)
--- a/include/linux/huge_mm.h~mm-fix-khugepaged-activation-policy-v3
+++ a/include/linux/huge_mm.h
@@ -128,19 +128,6 @@ static inline bool hugepage_global_alway
(1<<TRANSPARENT_HUGEPAGE_FLAG);
}
-static inline bool hugepage_pmd_enabled(void)
-{
- /*
- * We cover both the anon and the file-backed case here; file-backed
- * hugepages, when configured in, are determined by the global control.
- * Anon pmd-sized hugepages are determined by the pmd-size control.
- */
- return (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && hugepage_global_enabled()) ||
- test_bit(PMD_ORDER, &huge_anon_orders_always) ||
- test_bit(PMD_ORDER, &huge_anon_orders_madvise) ||
- (test_bit(PMD_ORDER, &huge_anon_orders_inherit) && hugepage_global_enabled());
-}
-
static inline int highest_order(unsigned long orders)
{
return fls_long(orders) - 1;
--- a/mm/khugepaged.c~mm-fix-khugepaged-activation-policy-v3
+++ a/mm/khugepaged.c
@@ -413,6 +413,26 @@ static inline int hpage_collapse_test_ex
test_bit(MMF_DISABLE_THP, &mm->flags);
}
+static bool hugepage_pmd_enabled(void)
+{
+ /*
+ * We cover both the anon and the file-backed case here; file-backed
+ * hugepages, when configured in, are determined by the global control.
+ * Anon pmd-sized hugepages are determined by the pmd-size control.
+ */
+ if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) &&
+ hugepage_global_enabled())
+ return true;
+ if (test_bit(PMD_ORDER, &huge_anon_orders_always))
+ return true;
+ if (test_bit(PMD_ORDER, &huge_anon_orders_madvise))
+ return true;
+ if (test_bit(PMD_ORDER, &huge_anon_orders_inherit) &&
+ hugepage_global_enabled())
+ return true;
+ return false;
+}
+
void __khugepaged_enter(struct mm_struct *mm)
{
struct khugepaged_mm_slot *mm_slot;
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-fix-khugepaged-activation-policy.patch
mm-shmem-rename-mthp-shmem-counters.patch
The patch titled
Subject: mm/hugetlb: fix possible recursive locking detected warning
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-possible-recursive-locking-detected-warning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/hugetlb: fix possible recursive locking detected warning
Date: Fri, 12 Jul 2024 11:13:14 +0800
When tries to demote 1G hugetlb folios, a lockdep warning is observed:
============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460
but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&h->resize_lock);
lock(&h->resize_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by bash/710:
#0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
#1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
#2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
#3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460
stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
__lock_acquire+0x10f2/0x1ca0
lock_acquire+0xbe/0x2d0
__mutex_lock+0x6d/0x400
demote_store+0x244/0x460
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x380/0x540
ksys_write+0x64/0xe0
do_syscall_64+0xb9/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
</TASK>
Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.
Link: https://lkml.kernel.org/r/20240712031314.2570452-1-linmiaohe@huawei.com
Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb.h | 1 +
mm/hugetlb.c | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb.h~mm-hugetlb-fix-possible-recursive-locking-detected-warning
+++ a/include/linux/hugetlb.h
@@ -663,6 +663,7 @@ HPAGEFLAG(RawHwpUnreliable, raw_hwp_unre
/* Defines one hugetlb page size */
struct hstate {
struct mutex resize_lock;
+ struct lock_class_key resize_key;
int next_nid_to_alloc;
int next_nid_to_free;
unsigned int order;
--- a/mm/hugetlb.c~mm-hugetlb-fix-possible-recursive-locking-detected-warning
+++ a/mm/hugetlb.c
@@ -4645,7 +4645,7 @@ void __init hugetlb_add_hstate(unsigned
BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
BUG_ON(order < order_base_2(__NR_USED_SUBPAGE));
h = &hstates[hugetlb_max_hstate++];
- mutex_init(&h->resize_lock);
+ __mutex_init(&h->resize_lock, "resize mutex", &h->resize_key);
h->order = order;
h->mask = ~(huge_page_size(h) - 1);
for (i = 0; i < MAX_NUMNODES; ++i)
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-remove-obsolete-mf_msg_different_compound.patch
mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb.patch
mm-memory-failure-fix-vm_bug_on_pagepagepoisonedpage-when-unpoison-memory.patch
mm-hugetlb-fix-possible-recursive-locking-detected-warning.patch
The patch titled
Subject: mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory-failure-fix-vm_bug_on_pagepagepoisonedpage-when-unpoison-memory.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/memory-failure: fix VM_BUG_ON_PAGE(PagePoisoned(page)) when unpoison memory
Date: Fri, 12 Jul 2024 14:42:49 +0800
When I did memory failure tests recently, below panic occurs:
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(page))
kernel BUG at include/linux/page-flags.h:616!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 720 Comm: bash Not tainted 6.10.0-rc1-00195-g148743902568 #40
RIP: 0010:unpoison_memory+0x2f3/0x590
RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
Call Trace:
<TASK>
unpoison_memory+0x2f3/0x590
simple_attr_write_xsigned.constprop.0.isra.0+0xb3/0x110
debugfs_attr_write+0x42/0x60
full_proxy_write+0x5b/0x80
vfs_write+0xd5/0x540
ksys_write+0x64/0xe0
do_syscall_64+0xb9/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f08f0314887
RSP: 002b:00007ffece710078 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f08f0314887
RDX: 0000000000000009 RSI: 0000564787a30410 RDI: 0000000000000001
RBP: 0000564787a30410 R08: 000000000000fefe R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
R13: 00007f08f041b780 R14: 00007f08f0417600 R15: 00007f08f0416a00
</TASK>
Modules linked in: hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:unpoison_memory+0x2f3/0x590
RSP: 0018:ffffa57fc8787d60 EFLAGS: 00000246
RAX: 0000000000000037 RBX: 0000000000000009 RCX: ffff9be25fcdc9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff9be25fcdc9c0
RBP: 0000000000300000 R08: ffffffffb4956f88 R09: 0000000000009ffb
R10: 0000000000000284 R11: ffffffffb4926fa0 R12: ffffe6b00c000000
R13: ffff9bdb453dfd00 R14: 0000000000000000 R15: fffffffffffffffe
FS: 00007f08f04e4740(0000) GS:ffff9be25fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564787a30410 CR3: 000000010d4e2000 CR4: 00000000000006f0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x31c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
---[ end Kernel panic - not syncing: Fatal exception ]---
The root cause is that unpoison_memory() tries to check the PG_HWPoison
flags of an uninitialized page. So VM_BUG_ON_PAGE(PagePoisoned(page)) is
triggered. This can be reproduced by below steps:
1.Offline memory block:
echo offline > /sys/devices/system/memory/memory12/state
2.Get offlined memory pfn:
page-types -b n -rlN
3.Write pfn to unpoison-pfn
echo <pfn> > /sys/kernel/debug/hwpoison/unpoison-pfn
Link: https://lkml.kernel.org/r/20240712064249.3882707-1-linmiaohe@huawei.com
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/memory-failure.c~mm-memory-failure-fix-vm_bug_on_pagepagepoisonedpage-when-unpoison-memory
+++ a/mm/memory-failure.c
@@ -2553,6 +2553,13 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex;
}
+ if (PagePoisoned(p)) {
+ unpoison_pr_info("%#lx: page is uninitialized\n",
+ pfn, &unpoison_rs);
+ ret = -EOPNOTSUPP;
+ goto unlock_mutex;
+ }
+
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-fix-vm_bug_on_pagepagepoisonedpage-when-unpoison-memory.patch
mm-memory-failure-remove-obsolete-mf_msg_different_compound.patch
mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb.patch
The patch titled
Subject: mm/numa_balancing: teach mpol_to_str about the balancing mode
has been added to the -mm mm-unstable branch. Its filename is
mm-numa_balancing-teach-mpol_to_str-about-the-balancing-mode.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Subject: mm/numa_balancing: teach mpol_to_str about the balancing mode
Date: Mon, 8 Jul 2024 08:56:32 +0100
Since balancing mode was added in bda420b98505 ("numa balancing: migrate
on fault among multiple bound nodes"), it was possible to set this mode
but it wouldn't be shown in /proc/<pid>/numa_maps since there was no
support for it in the mpol_to_str() helper.
Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it
would be displayed as 'default' due a workaround introduced a few years
earlier in 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in
numa_maps").
To tidy this up we implement two changes:
Replace the MPOL_F_MORON check by pointer comparison against the
preferred_node_policy array. By doing this we generalise the current
special casing and replace the incorrect 'default' with the correct 'bind'
for the mode.
Secondly, we add a string representation and corresponding handling for
the MPOL_F_NUMA_BALANCING flag.
With the two changes together we start showing the balancing flag when it
is set and therefore complete the fix.
Representation format chosen is to separate multiple flags with vertical
bars, following what existed long time ago in kernel 2.6.25. But as
between then and now there wasn't a way to display multiple flags, this
patch does not change the format in practice.
Some /proc/<pid>/numa_maps output examples:
555559580000 bind=balancing:0-1,3 file=...
555585800000 bind=balancing|static:0,2 file=...
555635240000 prefer=relative:0 file=
Link: https://lkml.kernel.org/r/20240708075632.95857-1-tursulin@igalia.com
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps")
Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.12+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
--- a/mm/mempolicy.c~mm-numa_balancing-teach-mpol_to_str-about-the-balancing-mode
+++ a/mm/mempolicy.c
@@ -3297,8 +3297,9 @@ out:
* @pol: pointer to mempolicy to be formatted
*
* Convert @pol into a string. If @buffer is too short, truncate the string.
- * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the
- * longest flag, "relative", and to display at least a few node ids.
+ * Recommend a @maxlen of at least 51 for the longest mode, "weighted
+ * interleave", plus the longest flag flags, "relative|balancing", and to
+ * display at least a few node ids.
*/
void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
{
@@ -3307,7 +3308,10 @@ void mpol_to_str(char *buffer, int maxle
unsigned short mode = MPOL_DEFAULT;
unsigned short flags = 0;
- if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) {
+ if (pol &&
+ pol != &default_policy &&
+ !(pol >= &preferred_node_policy[0] &&
+ pol <= &preferred_node_policy[ARRAY_SIZE(preferred_node_policy) - 1])) {
mode = pol->mode;
flags = pol->flags;
}
@@ -3335,12 +3339,18 @@ void mpol_to_str(char *buffer, int maxle
p += snprintf(p, buffer + maxlen - p, "=");
/*
- * Currently, the only defined flags are mutually exclusive
+ * Static and relative are mutually exclusive.
*/
if (flags & MPOL_F_STATIC_NODES)
p += snprintf(p, buffer + maxlen - p, "static");
else if (flags & MPOL_F_RELATIVE_NODES)
p += snprintf(p, buffer + maxlen - p, "relative");
+
+ if (flags & MPOL_F_NUMA_BALANCING) {
+ if (!is_power_of_2(flags & MPOL_MODE_FLAGS))
+ p += snprintf(p, buffer + maxlen - p, "|");
+ p += snprintf(p, buffer + maxlen - p, "balancing");
+ }
}
if (!nodes_empty(nodes))
_
Patches currently in -mm which might be from tvrtko.ursulin(a)igalia.com are
mm-numa_balancing-teach-mpol_to_str-about-the-balancing-mode.patch
The patch titled
Subject: mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-use-config_64bit-to-relax-huge-page-alignment-on-32-bit-machines.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yang Shi <yang(a)os.amperecomputing.com>
Subject: mm: huge_memory: use !CONFIG_64BIT to relax huge page alignment on 32 bit machines
Date: Fri, 12 Jul 2024 08:58:55 -0700
Yves-Alexis Perez reported commit 4ef9ad19e176 ("mm: huge_memory: don't
force huge page alignment on 32 bit") didn't work for x86_32 [1]. It is
because x86_32 uses CONFIG_X86_32 instead of CONFIG_32BIT.
!CONFIG_64BIT should cover all 32 bit machines.
[1] https://lore.kernel.org/linux-mm/CAHbLzkr1LwH3pcTgM+aGQ31ip2bKqiqEQ8=FQB+t2…
Link: https://lkml.kernel.org/r/20240712155855.1130330-1-yang@os.amperecomputing.…
Fixes: 4ef9ad19e176 ("mm: huge_memory: don't force huge page alignment on 32 bit")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Reported-by: Yves-Alexis Perez <corsac(a)debian.org>
Tested-by: Yves-Alexis Perez <corsac(a)debian.org>
Cc: Ben Hutchings <ben(a)decadent.org.uk>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Jiri Slaby <jirislaby(a)kernel.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Salvatore Bonaccorso <carnil(a)debian.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org> [6.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-use-config_64bit-to-relax-huge-page-alignment-on-32-bit-machines
+++ a/mm/huge_memory.c
@@ -858,7 +858,7 @@ static unsigned long __thp_get_unmapped_
loff_t off_align = round_up(off, size);
unsigned long len_pad, ret, off_sub;
- if (IS_ENABLED(CONFIG_32BIT) || in_compat_syscall())
+ if (!IS_ENABLED(CONFIG_64BIT) || in_compat_syscall())
return 0;
if (off_end <= off_align || (off_end - off_align) < size)
_
Patches currently in -mm which might be from yang(a)os.amperecomputing.com are
mm-huge_memory-use-config_64bit-to-relax-huge-page-alignment-on-32-bit-machines.patch
This is effectively a revert of commit 6ea4c0fe4570 ("soc/fsl/qbman:
Update device tree with reserved memory").
What that commit intended to do: Fix up the device tree that is passed
to a subsequent kexec-loaded kernel, so that the reserved-memory nodes
have the same base addresses as the currently running kernel.
What that commit actually does: Fix up the running device tree, which
has no effect whatsoever upon the device tree passed to the next kernel.
I would have refrained from making this kind of non-bugfix change in
stable kernels, but qbman_init_private_mem() grossly misrepresents
what this function does, and for an actual upcoming bug fix, it needs to
be refactored. There is no place for the bogus code afterwards, so it
needs to go as part of that, sadly.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean(a)nxp.com>
---
drivers/soc/fsl/qbman/dpaa_sys.c | 31 -------------------------------
1 file changed, 31 deletions(-)
diff --git a/drivers/soc/fsl/qbman/dpaa_sys.c b/drivers/soc/fsl/qbman/dpaa_sys.c
index e1d7b79cc450..b1cee145cbd7 100644
--- a/drivers/soc/fsl/qbman/dpaa_sys.c
+++ b/drivers/soc/fsl/qbman/dpaa_sys.c
@@ -39,8 +39,6 @@ int qbman_init_private_mem(struct device *dev, int idx, const char *compat,
{
struct device_node *mem_node;
struct reserved_mem *rmem;
- int err;
- __be32 *res_array;
mem_node = of_parse_phandle(dev->of_node, "memory-region", idx);
if (!mem_node) {
@@ -60,34 +58,5 @@ int qbman_init_private_mem(struct device *dev, int idx, const char *compat,
*addr = rmem->base;
*size = rmem->size;
- /*
- * Check if the reg property exists - if not insert the node
- * so upon kexec() the same memory region address will be preserved.
- * This is needed because QBMan HW does not allow the base address/
- * size to be modified once set.
- */
- if (!of_property_present(mem_node, "reg")) {
- struct property *prop;
-
- prop = devm_kzalloc(dev, sizeof(*prop), GFP_KERNEL);
- if (!prop)
- return -ENOMEM;
- prop->value = res_array = devm_kzalloc(dev, sizeof(__be32) * 4,
- GFP_KERNEL);
- if (!prop->value)
- return -ENOMEM;
- res_array[0] = cpu_to_be32(upper_32_bits(*addr));
- res_array[1] = cpu_to_be32(lower_32_bits(*addr));
- res_array[2] = cpu_to_be32(upper_32_bits(*size));
- res_array[3] = cpu_to_be32(lower_32_bits(*size));
- prop->length = sizeof(__be32) * 4;
- prop->name = devm_kstrdup(dev, "reg", GFP_KERNEL);
- if (!prop->name)
- return -ENOMEM;
- err = of_add_property(mem_node, prop);
- if (err)
- return err;
- }
-
return 0;
}
--
2.34.1
From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Since balancing mode was added in
bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes"),
it was possible to set this mode but it wouldn't be shown in
/proc/<pid>/numa_maps since there was no support for it in the
mpol_to_str() helper.
Furthermore, because the balancing mode sets the MPOL_F_MORON flag, it
would be displayed as 'default' due a workaround introduced a few years
earlier in
8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps").
To tidy this up we implement two changes:
Replace the MPOL_F_MORON check by pointer comparison against the
preferred_node_policy array. By doing this we generalise the current
special casing and replace the incorrect 'default' with the correct
'bind' for the mode.
Secondly, we add a string representation and corresponding handling for
the MPOL_F_NUMA_BALANCING flag.
With the two changes together we start showing the balancing flag when it
is set and therefore complete the fix.
Representation format chosen is to separate multiple flags with vertical
bars, following what existed long time ago in kernel 2.6.25. But as
between then and now there wasn't a way to display multiple flags, this
patch does not change the format in practice.
Some /proc/<pid>/numa_maps output examples:
555559580000 bind=balancing:0-1,3 file=...
555585800000 bind=balancing|static:0,2 file=...
555635240000 prefer=relative:0 file=
v2:
* Fully fix by introducing MPOL_F_KERNEL.
v3:
* Abandoned the MPOL_F_KERNEL approach in favour of pointer comparisons.
* Removed lookup generalisation for easier backporting.
* Replaced commas as separator with vertical bars.
* Added a few more words about the string format in the commit message.
v4:
* Use is_power_of_2.
* Use ARRAY_SIZE and update recommended buffer size for two flags.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com>
Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes")
References: 8790c71a18e5 ("mm/mempolicy.c: fix mempolicy printing in numa_maps")
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: <stable(a)vger.kernel.org> # v5.12+
---
mm/mempolicy.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index aec756ae5637..a1bf9aa15c33 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -3293,8 +3293,9 @@ int mpol_parse_str(char *str, struct mempolicy **mpol)
* @pol: pointer to mempolicy to be formatted
*
* Convert @pol into a string. If @buffer is too short, truncate the string.
- * Recommend a @maxlen of at least 32 for the longest mode, "interleave", the
- * longest flag, "relative", and to display at least a few node ids.
+ * Recommend a @maxlen of at least 51 for the longest mode, "weighted
+ * interleave", plus the longest flag flags, "relative|balancing", and to
+ * display at least a few node ids.
*/
void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
{
@@ -3303,7 +3304,10 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
unsigned short mode = MPOL_DEFAULT;
unsigned short flags = 0;
- if (pol && pol != &default_policy && !(pol->flags & MPOL_F_MORON)) {
+ if (pol &&
+ pol != &default_policy &&
+ !(pol >= &preferred_node_policy[0] &&
+ pol <= &preferred_node_policy[ARRAY_SIZE(preferred_node_policy) - 1])) {
mode = pol->mode;
flags = pol->flags;
}
@@ -3331,12 +3335,18 @@ void mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol)
p += snprintf(p, buffer + maxlen - p, "=");
/*
- * Currently, the only defined flags are mutually exclusive
+ * Static and relative are mutually exclusive.
*/
if (flags & MPOL_F_STATIC_NODES)
p += snprintf(p, buffer + maxlen - p, "static");
else if (flags & MPOL_F_RELATIVE_NODES)
p += snprintf(p, buffer + maxlen - p, "relative");
+
+ if (flags & MPOL_F_NUMA_BALANCING) {
+ if (!is_power_of_2(flags & MPOL_MODE_FLAGS))
+ p += snprintf(p, buffer + maxlen - p, "|");
+ p += snprintf(p, buffer + maxlen - p, "balancing");
+ }
}
if (!nodes_empty(nodes))
--
2.44.0
From: Xiubo Li <xiubli(a)redhat.com>
If a client sends out a cap update dropping caps with the prior 'seq'
just before an incoming cap revoke request, then the client may drop
the revoke because it believes it's already released the requested
capabilities.
This causes the MDS to wait indefinitely for the client to respond
to the revoke. It's therefore always a good idea to ack the cap
revoke request with the bumped up 'seq'.
Currently if the cap->issued equals to the newcaps the check_caps()
will do nothing, we should force flush the caps.
Cc: stable(a)vger.kernel.org
Link: https://tracker.ceph.com/issues/61782
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
V2:
- Improved the patch to force send the cap update only when no caps
being used.
fs/ceph/caps.c | 33 ++++++++++++++++++++++-----------
fs/ceph/super.h | 7 ++++---
2 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c
index 24c31f795938..b5473085a47b 100644
--- a/fs/ceph/caps.c
+++ b/fs/ceph/caps.c
@@ -2024,6 +2024,8 @@ bool __ceph_should_report_size(struct ceph_inode_info *ci)
* CHECK_CAPS_AUTHONLY - we should only check the auth cap
* CHECK_CAPS_FLUSH - we should flush any dirty caps immediately, without
* further delay.
+ * CHECK_CAPS_FLUSH_FORCE - we should flush any caps immediately, without
+ * further delay.
*/
void ceph_check_caps(struct ceph_inode_info *ci, int flags)
{
@@ -2105,7 +2107,7 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
}
doutc(cl, "%p %llx.%llx file_want %s used %s dirty %s "
- "flushing %s issued %s revoking %s retain %s %s%s%s\n",
+ "flushing %s issued %s revoking %s retain %s %s%s%s%s\n",
inode, ceph_vinop(inode), ceph_cap_string(file_wanted),
ceph_cap_string(used), ceph_cap_string(ci->i_dirty_caps),
ceph_cap_string(ci->i_flushing_caps),
@@ -2113,7 +2115,8 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
ceph_cap_string(retain),
(flags & CHECK_CAPS_AUTHONLY) ? " AUTHONLY" : "",
(flags & CHECK_CAPS_FLUSH) ? " FLUSH" : "",
- (flags & CHECK_CAPS_NOINVAL) ? " NOINVAL" : "");
+ (flags & CHECK_CAPS_NOINVAL) ? " NOINVAL" : "",
+ (flags & CHECK_CAPS_FLUSH_FORCE) ? " FLUSH_FORCE" : "");
/*
* If we no longer need to hold onto old our caps, and we may
@@ -2223,6 +2226,9 @@ void ceph_check_caps(struct ceph_inode_info *ci, int flags)
goto ack;
}
+ if (flags & CHECK_CAPS_FLUSH_FORCE)
+ goto ack;
+
/* things we might delay */
if ((cap->issued & ~retain) == 0)
continue; /* nope, all good */
@@ -3518,6 +3524,8 @@ static void handle_cap_grant(struct inode *inode,
bool queue_invalidate = false;
bool deleted_inode = false;
bool fill_inline = false;
+ bool revoke_wait = false;
+ int flags = 0;
/*
* If there is at least one crypto block then we'll trust
@@ -3713,16 +3721,18 @@ static void handle_cap_grant(struct inode *inode,
ceph_cap_string(cap->issued), ceph_cap_string(newcaps),
ceph_cap_string(revoking));
if (S_ISREG(inode->i_mode) &&
- (revoking & used & CEPH_CAP_FILE_BUFFER))
+ (revoking & used & CEPH_CAP_FILE_BUFFER)) {
writeback = true; /* initiate writeback; will delay ack */
- else if (queue_invalidate &&
+ revoke_wait = true;
+ } else if (queue_invalidate &&
revoking == CEPH_CAP_FILE_CACHE &&
- (newcaps & CEPH_CAP_FILE_LAZYIO) == 0)
- ; /* do nothing yet, invalidation will be queued */
- else if (cap == ci->i_auth_cap)
+ (newcaps & CEPH_CAP_FILE_LAZYIO) == 0) {
+ revoke_wait = true; /* do nothing yet, invalidation will be queued */
+ } else if (cap == ci->i_auth_cap) {
check_caps = 1; /* check auth cap only */
- else
+ } else {
check_caps = 2; /* check all caps */
+ }
/* If there is new caps, try to wake up the waiters */
if (~cap->issued & newcaps)
wake = true;
@@ -3749,8 +3759,9 @@ static void handle_cap_grant(struct inode *inode,
BUG_ON(cap->issued & ~cap->implemented);
/* don't let check_caps skip sending a response to MDS for revoke msgs */
- if (le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) {
+ if (!revoke_wait && le32_to_cpu(grant->op) == CEPH_CAP_OP_REVOKE) {
cap->mds_wanted = 0;
+ flags |= CHECK_CAPS_FLUSH_FORCE;
if (cap == ci->i_auth_cap)
check_caps = 1; /* check auth cap only */
else
@@ -3806,9 +3817,9 @@ static void handle_cap_grant(struct inode *inode,
mutex_unlock(&session->s_mutex);
if (check_caps == 1)
- ceph_check_caps(ci, CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL);
+ ceph_check_caps(ci, flags | CHECK_CAPS_AUTHONLY | CHECK_CAPS_NOINVAL);
else if (check_caps == 2)
- ceph_check_caps(ci, CHECK_CAPS_NOINVAL);
+ ceph_check_caps(ci, flags | CHECK_CAPS_NOINVAL);
}
/*
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index b0b368ed3018..831e8ec4d5da 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -200,9 +200,10 @@ struct ceph_cap {
struct list_head caps_item;
};
-#define CHECK_CAPS_AUTHONLY 1 /* only check auth cap */
-#define CHECK_CAPS_FLUSH 2 /* flush any dirty caps */
-#define CHECK_CAPS_NOINVAL 4 /* don't invalidate pagecache */
+#define CHECK_CAPS_AUTHONLY 1 /* only check auth cap */
+#define CHECK_CAPS_FLUSH 2 /* flush any dirty caps */
+#define CHECK_CAPS_NOINVAL 4 /* don't invalidate pagecache */
+#define CHECK_CAPS_FLUSH_FORCE 8 /* force flush any caps */
struct ceph_cap_flush {
u64 tid;
--
2.45.1
From: Ronald Wahl <ronald.wahl(a)raritan.com>
The amount of TX space in the hardware buffer is tracked in the tx_space
variable. The initial value is currently only set during driver probing.
After closing the interface and reopening it the tx_space variable has
the last value it had before close. If it is smaller than the size of
the first send packet after reopeing the interface the queue will be
stopped. The queue is woken up after receiving a TX interrupt but this
will never happen since we did not send anything.
This commit moves the initialization of the tx_space variable to the
ks8851_net_open function right before starting the TX queue. Also query
the value from the hardware instead of using a hard coded value.
Only the SPI chip variant is affected by this issue because only this
driver variant actually depends on the tx_space variable in the xmit
function.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
---
drivers/net/ethernet/micrel/ks8851_common.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 6453c92f0fa7..03a554df6e7a 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -482,6 +482,7 @@ static int ks8851_net_open(struct net_device *dev)
ks8851_wrreg16(ks, KS_IER, ks->rc_ier);
ks->queued_len = 0;
+ ks->tx_space = ks8851_rdreg16(ks, KS_TXMIR);
netif_start_queue(ks->netdev);
netif_dbg(ks, ifup, ks->netdev, "network device up\n");
@@ -1101,7 +1102,6 @@ int ks8851_probe_common(struct net_device *netdev, struct device *dev,
int ret;
ks->netdev = netdev;
- ks->tx_space = 6144;
ks->gpio = devm_gpiod_get_optional(dev, "reset", GPIOD_OUT_HIGH);
ret = PTR_ERR_OR_ZERO(ks->gpio);
--
2.45.2
Originally, the check_unaligned_access_emulated_all_cpus function
only checked the boot hart. This fixes the function to check all
harts.
Fixes: 71c54b3d169d ("riscv: report misaligned accesses emulation to hwprobe")
Signed-off-by: Jesse Taube <jesse(a)rivosinc.com>
Cc: stable(a)vger.kernel.org
---
V1 -> V2:
- New patch
V2 -> V3:
- Split patch
V3 -> V4:
- Re-add check for a system where a heterogeneous
CPU is hotplugged into a previously homogenous
system.
---
arch/riscv/kernel/traps_misaligned.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c
index b62d5a2f4541..1a1bb41472ea 100644
--- a/arch/riscv/kernel/traps_misaligned.c
+++ b/arch/riscv/kernel/traps_misaligned.c
@@ -526,11 +526,11 @@ int handle_misaligned_store(struct pt_regs *regs)
return 0;
}
-static bool check_unaligned_access_emulated(int cpu)
+static void check_unaligned_access_emulated(struct work_struct *unused)
{
+ int cpu = smp_processor_id();
long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu);
unsigned long tmp_var, tmp_val;
- bool misaligned_emu_detected;
*mas_ptr = RISCV_HWPROBE_MISALIGNED_UNKNOWN;
@@ -538,19 +538,16 @@ static bool check_unaligned_access_emulated(int cpu)
" "REG_L" %[tmp], 1(%[ptr])\n"
: [tmp] "=r" (tmp_val) : [ptr] "r" (&tmp_var) : "memory");
- misaligned_emu_detected = (*mas_ptr == RISCV_HWPROBE_MISALIGNED_EMULATED);
/*
* If unaligned_ctl is already set, this means that we detected that all
* CPUS uses emulated misaligned access at boot time. If that changed
* when hotplugging the new cpu, this is something we don't handle.
*/
- if (unlikely(unaligned_ctl && !misaligned_emu_detected)) {
+ if (unlikely(unaligned_ctl && (*mas_ptr != RISCV_HWPROBE_MISALIGNED_EMULATED))) {
pr_crit("CPU misaligned accesses non homogeneous (expected all emulated)\n");
while (true)
cpu_relax();
}
-
- return misaligned_emu_detected;
}
bool check_unaligned_access_emulated_all_cpus(void)
@@ -562,8 +559,11 @@ bool check_unaligned_access_emulated_all_cpus(void)
* accesses emulated since tasks requesting such control can run on any
* CPU.
*/
+ schedule_on_each_cpu(check_unaligned_access_emulated);
+
for_each_online_cpu(cpu)
- if (!check_unaligned_access_emulated(cpu))
+ if (per_cpu(misaligned_access_speed, cpu)
+ != RISCV_HWPROBE_MISALIGNED_EMULATED)
return false;
unaligned_ctl = true;
--
2.45.2
The patch titled
Subject: watchdog/perf: Properly initialize the turbo mode timestamp and rearm counter
has been added to the -mm mm-nonmm-unstable branch. Its filename is
watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Thomas Gleixner <tglx(a)linutronix.de>
Subject: watchdog/perf: Properly initialize the turbo mode timestamp and rearm counter
Date: Thu, 11 Jul 2024 22:25:21 +0200
For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.
This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.
So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period. As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.
Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.
Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Arjan van de Ven <arjan(a)linux.intel.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/watchdog_perf.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/kernel/watchdog_perf.c~watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter
+++ a/kernel/watchdog_perf.c
@@ -75,11 +75,15 @@ static bool watchdog_check_timestamp(voi
__this_cpu_write(last_timestamp, now);
return true;
}
-#else
-static inline bool watchdog_check_timestamp(void)
+
+static void watchdog_init_timestamp(void)
{
- return true;
+ __this_cpu_write(nmi_rearmed, 0);
+ __this_cpu_write(last_timestamp, ktime_get_mono_fast_ns());
}
+#else
+static inline bool watchdog_check_timestamp(void) { return true; }
+static inline void watchdog_init_timestamp(void) { }
#endif
static struct perf_event_attr wd_hw_attr = {
@@ -161,6 +165,7 @@ void watchdog_hardlockup_enable(unsigned
if (!atomic_fetch_inc(&watchdog_cpus))
pr_info("Enabled. Permanently consumes one hw-PMU counter.\n");
+ watchdog_init_timestamp();
perf_event_enable(this_cpu_read(watchdog_ev));
}
_
Patches currently in -mm which might be from tglx(a)linutronix.de are
watchdog-perf-properly-initialize-the-turbo-mode-timestamp-and-rearm-counter.patch
With the introduction of binder_available_for_proc_work_ilocked() in
commit 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue") a binder
thread can only "wait_for_proc_work" after its thread->looper has been
marked as BINDER_LOOPER_STATE_{ENTERED|REGISTERED}.
This means an unregistered reader risks waiting indefinitely for work
since it never gets added to the proc->waiting_threads. If there are no
further references to its waitqueue either the task will hang. The same
applies to readers using the (e)poll interface.
I couldn't find the rationale behind this restriction. So this patch
restores the previous behavior of allowing unregistered threads to
"wait_for_proc_work". Note that an error message for this scenario,
which had previously become unreachable, is now re-enabled.
Fixes: 1b77e9dcc3da ("ANDROID: binder: remove proc waitqueue")
Cc: stable(a)vger.kernel.org
Cc: Martijn Coenen <maco(a)google.com>
Cc: Arve Hjønnevåg <arve(a)google.com>
Signed-off-by: Carlos Llamas <cmllamas(a)google.com>
---
drivers/android/binder.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index b21a7b246a0d..2d0a24a56508 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -570,9 +570,7 @@ static bool binder_has_work(struct binder_thread *thread, bool do_proc_work)
static bool binder_available_for_proc_work_ilocked(struct binder_thread *thread)
{
return !thread->transaction_stack &&
- binder_worklist_empty_ilocked(&thread->todo) &&
- (thread->looper & (BINDER_LOOPER_STATE_ENTERED |
- BINDER_LOOPER_STATE_REGISTERED));
+ binder_worklist_empty_ilocked(&thread->todo);
}
static void binder_wakeup_poll_threads_ilocked(struct binder_proc *proc,
--
2.45.2.993.g49e7a77208-goog
The patch titled
Subject: mm/huge_memory: avoid PMD-size page cache if needed
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Gavin Shan <gshan(a)redhat.com>
Subject: mm/huge_memory: avoid PMD-size page cache if needed
Date: Thu, 11 Jul 2024 20:48:40 +1000
Currently, xarray can't support arbitrary page cache size and the largest
and supported page cache size is defined as MAX_PAGECACHE_ORDER in commit
099d90642a71 ("mm/filemap: make MAX_PAGECACHE_ORDER acceptable to
xarray"). However, it's possible to have 512MB page cache in the huge
memory collapsing path on ARM64 system whose base page size is 64KB. A
warning is raised when the huge page cache is split as shown in the
following example.
[root@dhcp-10-26-1-207 ~]# cat /proc/1/smaps | grep KernelPageSize
KernelPageSize: 64 kB
[root@dhcp-10-26-1-207 ~]# cat /tmp/test.c
:
int main(int argc, char **argv)
{
const char *filename = TEST_XFS_FILENAME;
int fd = 0;
void *buf = (void *)-1, *p;
int pgsize = getpagesize();
int ret = 0;
if (pgsize != 0x10000) {
fprintf(stdout, "System with 64KB base page size is required!\n");
return -EPERM;
}
system("echo 0 > /sys/devices/virtual/bdi/253:0/read_ahead_kb");
system("echo 1 > /proc/sys/vm/drop_caches");
/* Open xfs or shmem file */
fd = open(filename, O_RDONLY);
assert(fd > 0);
/* Create VMA */
buf = mmap(NULL, TEST_MEM_SIZE, PROT_READ, MAP_SHARED, fd, 0);
assert(buf != (void *)-1);
fprintf(stdout, "mapped buffer at 0x%p\n", buf);
/* Populate VMA */
ret = madvise(buf, TEST_MEM_SIZE, MADV_NOHUGEPAGE);
assert(ret == 0);
ret = madvise(buf, TEST_MEM_SIZE, MADV_POPULATE_READ);
assert(ret == 0);
/* Collapse VMA */
ret = madvise(buf, TEST_MEM_SIZE, MADV_HUGEPAGE);
assert(ret == 0);
ret = madvise(buf, TEST_MEM_SIZE, MADV_COLLAPSE);
if (ret) {
fprintf(stdout, "Error %d to madvise(MADV_COLLAPSE)\n", errno);
goto out;
}
/* Split xarray. The file needs to reopened with write permission */
munmap(buf, TEST_MEM_SIZE);
buf = (void *)-1;
close(fd);
fd = open(filename, O_RDWR);
assert(fd > 0);
fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE,
TEST_MEM_SIZE - pgsize, pgsize);
out:
if (buf != (void *)-1)
munmap(buf, TEST_MEM_SIZE);
if (fd > 0)
close(fd);
return ret;
}
[root@dhcp-10-26-1-207 ~]# gcc /tmp/test.c -o /tmp/test
[root@dhcp-10-26-1-207 ~]# /tmp/test
------------[ cut here ]------------
WARNING: CPU: 25 PID: 7560 at lib/xarray.c:1025 xas_split_alloc+0xf8/0x128
Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib \
nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct \
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 \
ip_set rfkill nf_tables nfnetlink vfat fat virtio_balloon drm fuse \
xfs libcrc32c crct10dif_ce ghash_ce sha2_ce sha256_arm64 virtio_net \
sha1_ce net_failover virtio_blk virtio_console failover dimlib virtio_mmio
CPU: 25 PID: 7560 Comm: test Kdump: loaded Not tainted 6.10.0-rc7-gavin+ #9
Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-1.el9 05/24/2024
pstate: 83400005 (Nzcv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
pc : xas_split_alloc+0xf8/0x128
lr : split_huge_page_to_list_to_order+0x1c4/0x780
sp : ffff8000ac32f660
x29: ffff8000ac32f660 x28: ffff0000e0969eb0 x27: ffff8000ac32f6c0
x26: 0000000000000c40 x25: ffff0000e0969eb0 x24: 000000000000000d
x23: ffff8000ac32f6c0 x22: ffffffdfc0700000 x21: 0000000000000000
x20: 0000000000000000 x19: ffffffdfc0700000 x18: 0000000000000000
x17: 0000000000000000 x16: ffffd5f3708ffc70 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
x11: ffffffffffffffc0 x10: 0000000000000040 x9 : ffffd5f3708e692c
x8 : 0000000000000003 x7 : 0000000000000000 x6 : ffff0000e0969eb8
x5 : ffffd5f37289e378 x4 : 0000000000000000 x3 : 0000000000000c40
x2 : 000000000000000d x1 : 000000000000000c x0 : 0000000000000000
Call trace:
xas_split_alloc+0xf8/0x128
split_huge_page_to_list_to_order+0x1c4/0x780
truncate_inode_partial_folio+0xdc/0x160
truncate_inode_pages_range+0x1b4/0x4a8
truncate_pagecache_range+0x84/0xa0
xfs_flush_unmap_range+0x70/0x90 [xfs]
xfs_file_fallocate+0xfc/0x4d8 [xfs]
vfs_fallocate+0x124/0x2f0
ksys_fallocate+0x4c/0xa0
__arm64_sys_fallocate+0x24/0x38
invoke_syscall.constprop.0+0x7c/0xd8
do_el0_svc+0xb4/0xd0
el0_svc+0x44/0x1d8
el0t_64_sync_handler+0x134/0x150
el0t_64_sync+0x17c/0x180
Fix it by avoiding PMD-sized page cache in the huge memory collapsing
path. After this patch is applied, the test program fails with error
-EINVAL returned from __thp_vma_allowable_orders() and the madvise()
system call to collapse the page caches.
Link: https://lkml.kernel.org/r/20240711104840.200573-1-gshan@redhat.com
Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache")
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: William Kucharski <william.kucharski(a)oracle.com>
Cc: <stable(a)vger.kernel.org> [5.17+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-huge_memory-avoid-pmd-size-page-cache-if-needed
+++ a/mm/huge_memory.c
@@ -136,7 +136,8 @@ unsigned long __thp_vma_allowable_orders
while (orders) {
addr = vma->vm_end - (PAGE_SIZE << order);
- if (thp_vma_suitable_order(vma, addr, order))
+ if (!(vma->vm_file && order > MAX_PAGECACHE_ORDER) &&
+ thp_vma_suitable_order(vma, addr, order))
break;
order = next_order(&orders, order);
}
_
Patches currently in -mm which might be from gshan(a)redhat.com are
mm-huge_memory-avoid-pmd-size-page-cache-if-needed.patch
The patch titled
Subject: mm/mglru: fix overshooting shrinker memory
has been added to the -mm mm-unstable branch. Its filename is
mm-mglru-fix-overshooting-shrinker-memory.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/mglru: fix overshooting shrinker memory
Date: Thu, 11 Jul 2024 13:19:57 -0600
set_initial_priority() tries to jump-start global reclaim by estimating
the priority based on cold/hot LRU pages. The estimation does not account
for shrinker objects, and it cannot do so because their sizes can be in
different units other than page.
If shrinker objects are the majority, e.g., on TrueNAS SCALE 24.04.0 where
ZFS ARC can use almost all system memory, set_initial_priority() can
vastly underestimate how much memory ARC shrinker can evict and assign
extreme low values to scan_control->priority, resulting in overshoots of
shrinker objects.
To reproduce the problem, using TrueNAS SCALE 24.04.0 with 32GB DRAM, a
test ZFS pool and the following commands:
fio --name=mglru.file --numjobs=36 --ioengine=io_uring \
--directory=/root/test-zfs-pool/ --size=1024m --buffered=1 \
--rw=randread --random_distribution=random \
--time_based --runtime=1h &
for ((i = 0; i < 20; i++))
do
sleep 120
fio --name=mglru.anon --numjobs=16 --ioengine=mmap \
--filename=/dev/zero --size=1024m --fadvise_hint=0 \
--rw=randrw --random_distribution=random \
--time_based --runtime=1m
done
To fix the problem:
1. Cap scan_control->priority at or above DEF_PRIORITY/2, to prevent
the jump-start from being overly aggressive.
2. Account for the progress from mm_account_reclaimed_pages(), to
prevent kswapd_shrink_node() from raising the priority
unnecessarily.
Link: https://lkml.kernel.org/r/20240711191957.939105-2-yuzhao@google.com
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Alexander Motin <mav(a)ixsystems.com>
Cc: Wei Xu <weixugc(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/vmscan.c~mm-mglru-fix-overshooting-shrinker-memory
+++ a/mm/vmscan.c
@@ -4930,7 +4930,11 @@ static void set_initial_priority(struct
/* round down reclaimable and round up sc->nr_to_reclaim */
priority = fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
- sc->priority = clamp(priority, 0, DEF_PRIORITY);
+ /*
+ * The estimation is based on LRU pages only, so cap it to prevent
+ * overshoots of shrinker objects by large margins.
+ */
+ sc->priority = clamp(priority, DEF_PRIORITY / 2, DEF_PRIORITY);
}
static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_control *sc)
@@ -6754,6 +6758,7 @@ static bool kswapd_shrink_node(pg_data_t
{
struct zone *zone;
int z;
+ unsigned long nr_reclaimed = sc->nr_reclaimed;
/* Reclaim a number of pages proportional to the number of zones */
sc->nr_to_reclaim = 0;
@@ -6781,7 +6786,8 @@ static bool kswapd_shrink_node(pg_data_t
if (sc->order && sc->nr_reclaimed >= compact_gap(sc->order))
sc->order = 0;
- return sc->nr_scanned >= sc->nr_to_reclaim;
+ /* account for progress from mm_account_reclaimed_pages() */
+ return max(sc->nr_scanned, sc->nr_reclaimed - nr_reclaimed) >= sc->nr_to_reclaim;
}
/* Page allocator PCP high watermark is lowered if reclaim is active. */
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-truncate-batch-clear-shadow-entries.patch
mm-truncate-batch-clear-shadow-entries-v2.patch
mm-mglru-fix-div-by-zero-in-vmpressure_calc_level.patch
mm-mglru-fix-overshooting-shrinker-memory.patch
The patch titled
Subject: mm/mglru: fix div-by-zero in vmpressure_calc_level()
has been added to the -mm mm-unstable branch. Its filename is
mm-mglru-fix-div-by-zero-in-vmpressure_calc_level.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Yu Zhao <yuzhao(a)google.com>
Subject: mm/mglru: fix div-by-zero in vmpressure_calc_level()
Date: Thu, 11 Jul 2024 13:19:56 -0600
evict_folios() uses a second pass to reclaim folios that have gone through
page writeback and become clean before it finishes the first pass, since
folio_rotate_reclaimable() cannot handle those folios due to the
isolation.
The second pass tries to avoid potential double counting by deducting
scan_control->nr_scanned. However, this can result in underflow of
nr_scanned, under a condition where shrink_folio_list() does not increment
nr_scanned, i.e., when folio_trylock() fails.
The underflow can cause the divisor, i.e., scale=scanned+reclaimed in
vmpressure_calc_level(), to become zero, resulting in the following crash:
[exception RIP: vmpressure_work_fn+101]
process_one_work at ffffffffa3313f2b
Since scan_control->nr_scanned has no established semantics, the potential
double counting has minimal risks. Therefore, fix the problem by not
deducting scan_control->nr_scanned in evict_folios().
Link: https://lkml.kernel.org/r/20240711191957.939105-1-yuzhao@google.com
Fixes: 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated")
Reported-by: Wei Xu <weixugc(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Cc: Alexander Motin <mav(a)ixsystems.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 1 -
1 file changed, 1 deletion(-)
--- a/mm/vmscan.c~mm-mglru-fix-div-by-zero-in-vmpressure_calc_level
+++ a/mm/vmscan.c
@@ -4597,7 +4597,6 @@ retry:
/* retry folios that may have missed folio_rotate_reclaimable() */
list_move(&folio->lru, &clean);
- sc->nr_scanned -= folio_nr_pages(folio);
}
spin_lock_irq(&lruvec->lru_lock);
_
Patches currently in -mm which might be from yuzhao(a)google.com are
mm-truncate-batch-clear-shadow-entries.patch
mm-truncate-batch-clear-shadow-entries-v2.patch
mm-mglru-fix-div-by-zero-in-vmpressure_calc_level.patch
mm-mglru-fix-overshooting-shrinker-memory.patch
evict_folios() uses a second pass to reclaim folios that have gone
through page writeback and become clean before it finishes the first
pass, since folio_rotate_reclaimable() cannot handle those folios due
to the isolation.
The second pass tries to avoid potential double counting by deducting
scan_control->nr_scanned. However, this can result in underflow of
nr_scanned, under a condition where shrink_folio_list() does not
increment nr_scanned, i.e., when folio_trylock() fails.
The underflow can cause the divisor, i.e., scale=scanned+reclaimed in
vmpressure_calc_level(), to become zero, resulting in the following
crash:
[exception RIP: vmpressure_work_fn+101]
process_one_work at ffffffffa3313f2b
Since scan_control->nr_scanned has no established semantics, the
potential double counting has minimal risks. Therefore, fix the
problem by not deducting scan_control->nr_scanned in evict_folios().
Reported-by: Wei Xu <weixugc(a)google.com>
Fixes: 359a5e1416ca ("mm: multi-gen LRU: retry folios written back while isolated")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
---
mm/vmscan.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 0761f91b407f..6403038c776e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4597,7 +4597,6 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap
/* retry folios that may have missed folio_rotate_reclaimable() */
list_move(&folio->lru, &clean);
- sc->nr_scanned -= folio_nr_pages(folio);
}
spin_lock_irq(&lruvec->lru_lock);
--
2.45.2.993.g49e7a77208-goog
The patch titled
Subject: crash: fix x86_32 memory reserve dead loop retry bug
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
crash-fix-x86_32-memory-reserve-dead-loop-retry-bug.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jinjie Ruan <ruanjinjie(a)huawei.com>
Subject: crash: fix x86_32 memory reserve dead loop retry bug
Date: Thu, 11 Jul 2024 15:31:18 +0800
On x86_32 Qemu machine with 1GB memory, the cmdline "crashkernel=1G,high"
will cause system stall as below:
ACPI: Reserving FACP table memory at [mem 0x3ffe18b8-0x3ffe192b]
ACPI: Reserving DSDT table memory at [mem 0x3ffe0040-0x3ffe18b7]
ACPI: Reserving FACS table memory at [mem 0x3ffe0000-0x3ffe003f]
ACPI: Reserving APIC table memory at [mem 0x3ffe192c-0x3ffe19bb]
ACPI: Reserving HPET table memory at [mem 0x3ffe19bc-0x3ffe19f3]
ACPI: Reserving WAET table memory at [mem 0x3ffe19f4-0x3ffe1a1b]
143MB HIGHMEM available.
879MB LOWMEM available.
mapped low ram: 0 - 36ffe000
low ram: 0 - 36ffe000
(stall here)
The reason is that the CRASH_ADDR_LOW_MAX is equal to CRASH_ADDR_HIGH_MAX
on x86_32, the first high crash kernel memory reservation will fail, then
go into the "retry" loop and never came out as below.
-> reserve_crashkernel_generic() and high is true
-> alloc at [CRASH_ADDR_LOW_MAX, CRASH_ADDR_HIGH_MAX] fail
-> alloc at [0, CRASH_ADDR_LOW_MAX] fail and repeatedly
(because CRASH_ADDR_LOW_MAX = CRASH_ADDR_HIGH_MAX).
Fix it by changing the out check condition.
After this patch, it prints:
cannot allocate crashkernel (size:0x40000000)
Link: https://lkml.kernel.org/r/20240711073118.1289866-1-ruanjinjie@huawei.com
Fixes: 9c08a2a139fe ("x86: kdump: use generic interface to simplify crashkernel reservation code")
Signed-off-by: Jinjie Ruan <ruanjinjie(a)huawei.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/crash_reserve.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/crash_reserve.c~crash-fix-x86_32-memory-reserve-dead-loop-retry-bug
+++ a/kernel/crash_reserve.c
@@ -421,7 +421,7 @@ retry:
* For crashkernel=size[KMG],high, if the first attempt was
* for high memory, fall back to low memory.
*/
- if (high && search_end == CRASH_ADDR_HIGH_MAX) {
+ if (high && search_base == CRASH_ADDR_LOW_MAX) {
search_end = CRASH_ADDR_LOW_MAX;
search_base = 0;
goto retry;
_
Patches currently in -mm which might be from ruanjinjie(a)huawei.com are
crash-fix-x86_32-memory-reserve-dead-loop-retry-bug.patch
Robert Gill reported below #GP when dosemu software was executing vm86()
system call:
general protection fault: 0000 [#1] PREEMPT SMP
CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1
Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010
EIP: restore_all_switch_stack+0xbe/0xcf
EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc
DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046
CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0
Call Trace:
show_regs+0x70/0x78
die_addr+0x29/0x70
exc_general_protection+0x13c/0x348
exc_bounds+0x98/0x98
handle_exception+0x14d/0x14d
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
exc_bounds+0x98/0x98
restore_all_switch_stack+0xbe/0xcf
This only happens when VERW based mitigations like MDS/RFDS are enabled.
This is because segment registers with an arbitrary user value can result
in #GP when executing VERW. Intel SDM vol. 2C documents the following
behavior for VERW instruction:
#GP(0) - If a memory operand effective address is outside the CS, DS, ES,
FS, or GS segment limit.
CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user
space. Replace CLEAR_CPU_BUFFERS with a safer version that uses %ss to
refer VERW operand mds_verw_sel. This ensures VERW will not #GP for an
arbitrary user %ds. Also, in NMI return path, move VERW to after
RESTORE_ALL_NMI that touches GPRs.
For clarity, below are the locations where the new CLEAR_CPU_BUFFERS_SAFE
version is being used:
* entry_INT80_32(), entry_SYSENTER_32() and interrupts (via
handle_exception_return) do:
restore_all_switch_stack:
[...]
mov %esi,%esi
verw %ss:0xc0fc92c0 <-------------
iret
* Opportunistic SYSEXIT:
[...]
verw %ss:0xc0fc92c0 <-------------
btrl $0x9,(%esp)
popf
pop %eax
sti
sysexit
* nmi_return and nmi_from_espfix:
mov %esi,%esi
verw %ss:0xc0fc92c0 <-------------
jmp .Lirq_return
Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition")
Cc: stable(a)vger.kernel.org # 5.10+
Reported-by: Robert Gill <rtgill82(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707
Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.i…
Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Suggested-by: Brian Gerst <brgerst(a)gmail.com> # Use %ss
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
v4:
- Further simplify the patch by using %ss for all VERW calls in 32-bit mode (Brian).
- In NMI exit path move VERW after RESTORE_ALL_NMI that touches GPRs (Dave).
v3: https://lore.kernel.org/r/20240701-fix-dosemu-vm86-v3-1-b1969532c75a@linux.…
- Simplify CLEAR_CPU_BUFFERS_SAFE by using %ss instead of %ds (Brian).
- Do verw before popf in SYSEXIT path (Jari).
v2: https://lore.kernel.org/r/20240627-fix-dosemu-vm86-v2-1-d5579f698e77@linux.…
- Safe guard against any other system calls like vm86() that might change %ds (Dave).
v1: https://lore.kernel.org/r/20240426-fix-dosemu-vm86-v1-1-88c826a3f378@linux.…
---
---
arch/x86/entry/entry_32.S | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index d3a814efbff6..d54f6002e5a0 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -253,6 +253,16 @@
.Lend_\@:
.endm
+/*
+ * Safer version of CLEAR_CPU_BUFFERS that uses %ss to reference VERW operand
+ * mds_verw_sel. This ensures VERW will not #GP for an arbitrary user %ds.
+ */
+.macro CLEAR_CPU_BUFFERS_SAFE
+ ALTERNATIVE "jmp .Lskip_verw\@", "", X86_FEATURE_CLEAR_CPU_BUF
+ verw %ss:_ASM_RIP(mds_verw_sel)
+.Lskip_verw\@:
+.endm
+
.macro RESTORE_INT_REGS
popl %ebx
popl %ecx
@@ -871,6 +881,8 @@ SYM_FUNC_START(entry_SYSENTER_32)
/* Now ready to switch the cr3 */
SWITCH_TO_USER_CR3 scratch_reg=%eax
+ /* Clobbers ZF */
+ CLEAR_CPU_BUFFERS_SAFE
/*
* Restore all flags except IF. (We restore IF separately because
@@ -881,7 +893,6 @@ SYM_FUNC_START(entry_SYSENTER_32)
BUG_IF_WRONG_CR3 no_user_check=1
popfl
popl %eax
- CLEAR_CPU_BUFFERS
/*
* Return back to the vDSO, which will pop ecx and edx.
@@ -951,7 +962,7 @@ restore_all_switch_stack:
/* Restore user state */
RESTORE_REGS pop=4 # skip orig_eax/error_code
- CLEAR_CPU_BUFFERS
+ CLEAR_CPU_BUFFERS_SAFE
.Lirq_return:
/*
* ARCH_HAS_MEMBARRIER_SYNC_CORE rely on IRET core serialization
@@ -1144,7 +1155,6 @@ SYM_CODE_START(asm_exc_nmi)
/* Not on SYSENTER stack. */
call exc_nmi
- CLEAR_CPU_BUFFERS
jmp .Lnmi_return
.Lnmi_from_sysenter_stack:
@@ -1165,6 +1175,7 @@ SYM_CODE_START(asm_exc_nmi)
CHECK_AND_APPLY_ESPFIX
RESTORE_ALL_NMI cr3_reg=%edi pop=4
+ CLEAR_CPU_BUFFERS_SAFE
jmp .Lirq_return
#ifdef CONFIG_X86_ESPFIX32
@@ -1206,6 +1217,7 @@ SYM_CODE_START(asm_exc_nmi)
* 1 - orig_ax
*/
lss (1+5+6)*4(%esp), %esp # back to espfix stack
+ CLEAR_CPU_BUFFERS_SAFE
jmp .Lirq_return
#endif
SYM_CODE_END(asm_exc_nmi)
---
base-commit: f2661062f16b2de5d7b6a5c42a9a5c96326b8454
change-id: 20240426-fix-dosemu-vm86-dd111a01737e
From: Kan Liang <kan.liang(a)linux.intel.com>
The EAX of the CPUID Leaf 023H enumerates the mask of valid sub-leaves.
To tell the availability of the sub-leaf 1 (enumerate the counter mask),
perf should check the bit 1 (0x2) of EAS, rather than bit 0 (0x1).
The error is not user-visible on bare metal. Because the sub-leaf 0 and
the sub-leaf 1 are always available. However, it may bring issues in a
virtualization environment when a VMM only enumerates the sub-leaf 0.
Fixes: eb467aaac21e ("perf/x86/intel: Support Architectural PerfMon Extension leaf")
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 4 ++--
arch/x86/include/asm/perf_event.h | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index cd8f2db6cdf6..3fb81f7b618c 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4842,8 +4842,8 @@ static void update_pmu_cap(struct x86_hybrid_pmu *pmu)
if (ebx & ARCH_PERFMON_EXT_EQ)
pmu->config_mask |= ARCH_PERFMON_EVENTSEL_EQ;
- if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF_BIT) {
- cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF,
+ if (sub_bitmaps & ARCH_PERFMON_NUM_COUNTER_LEAF) {
+ cpuid_count(ARCH_PERFMON_EXT_LEAF, ARCH_PERFMON_NUM_COUNTER_LEAF_BIT,
&eax, &ebx, &ecx, &edx);
pmu->cntr_mask64 = eax;
pmu->fixed_cntr_mask64 = ebx;
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 91b73571412f..41ace8431e01 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -190,7 +190,7 @@ union cpuid10_edx {
#define ARCH_PERFMON_EXT_UMASK2 0x1
#define ARCH_PERFMON_EXT_EQ 0x2
#define ARCH_PERFMON_NUM_COUNTER_LEAF_BIT 0x1
-#define ARCH_PERFMON_NUM_COUNTER_LEAF 0x1
+#define ARCH_PERFMON_NUM_COUNTER_LEAF BIT(ARCH_PERFMON_NUM_COUNTER_LEAF_BIT)
/*
* Intel Architectural LBR CPUID detection/enumeration details:
--
2.38.1
The following commit has been merged into the timers/core branch of tip:
Commit-ID: f7d43dd206e7e18c182f200e67a8db8c209907fa
Gitweb: https://git.kernel.org/tip/f7d43dd206e7e18c182f200e67a8db8c209907fa
Author: Yu Liao <liaoyu15(a)huawei.com>
AuthorDate: Thu, 11 Jul 2024 20:48:43 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 11 Jul 2024 18:00:24 +02:00
tick/broadcast: Make takeover of broadcast hrtimer reliable
Running the LTP hotplug stress test on a aarch64 machine results in
rcu_sched stall warnings when the broadcast hrtimer was owned by the
un-plugged CPU. The issue is the following:
CPU1 (owns the broadcast hrtimer) CPU2
tick_broadcast_enter()
// shutdown local timer device
broadcast_shutdown_local()
...
tick_broadcast_exit()
clockevents_switch_state(dev, CLOCK_EVT_STATE_ONESHOT)
// timer device is not programmed
cpumask_set_cpu(cpu, tick_broadcast_force_mask)
initiates offlining of CPU1
take_cpu_down()
/*
* CPU1 shuts down and does not
* send broadcast IPI anymore
*/
takedown_cpu()
hotplug_cpu__broadcast_tick_pull()
// move broadcast hrtimer to this CPU
clockevents_program_event()
bc_set_next()
hrtimer_start()
/*
* timer device is not programmed
* because only the first expiring
* timer will trigger clockevent
* device reprogramming
*/
What happens is that CPU2 exits broadcast mode with force bit set, then the
local timer device is not reprogrammed and CPU2 expects to receive the
expired event by the broadcast IPI. But this does not happen because CPU1
is offlined by CPU2. CPU switches the clockevent device to ONESHOT state,
but does not reprogram the device.
The subsequent reprogramming of the hrtimer broadcast device does not
program the clockevent device of CPU2 either because the pending expiry
time is already in the past and the CPU expects the event to be delivered.
As a consequence all CPUs which wait for a broadcast event to be delivered
are stuck forever.
Fix this issue by reprogramming the local timer device if the broadcast
force bit of the CPU is set so that the broadcast hrtimer is delivered.
[ tglx: Massage comment and change log. Add Fixes tag ]
Fixes: 989dcb645ca7 ("tick: Handle broadcast wakeup of multiple cpus")
Signed-off-by: Yu Liao <liaoyu15(a)huawei.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240711124843.64167-1-liaoyu15@huawei.com
---
kernel/time/tick-broadcast.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/kernel/time/tick-broadcast.c b/kernel/time/tick-broadcast.c
index 771d1e0..b484309 100644
--- a/kernel/time/tick-broadcast.c
+++ b/kernel/time/tick-broadcast.c
@@ -1141,6 +1141,7 @@ void tick_broadcast_switch_to_oneshot(void)
#ifdef CONFIG_HOTPLUG_CPU
void hotplug_cpu__broadcast_tick_pull(int deadcpu)
{
+ struct tick_device *td = this_cpu_ptr(&tick_cpu_device);
struct clock_event_device *bc;
unsigned long flags;
@@ -1148,6 +1149,28 @@ void hotplug_cpu__broadcast_tick_pull(int deadcpu)
bc = tick_broadcast_device.evtdev;
if (bc && broadcast_needs_cpu(bc, deadcpu)) {
+ /*
+ * If the broadcast force bit of the current CPU is set,
+ * then the current CPU has not yet reprogrammed the local
+ * timer device to avoid a ping-pong race. See
+ * ___tick_broadcast_oneshot_control().
+ *
+ * If the broadcast device is hrtimer based then
+ * programming the broadcast event below does not have any
+ * effect because the local clockevent device is not
+ * running and not programmed because the broadcast event
+ * is not earlier than the pending event of the local clock
+ * event device. As a consequence all CPUs waiting for a
+ * broadcast event are stuck forever.
+ *
+ * Detect this condition and reprogram the cpu local timer
+ * device to avoid the starvation.
+ */
+ if (tick_check_broadcast_expired()) {
+ cpumask_clear_cpu(smp_processor_id(), tick_broadcast_force_mask);
+ tick_program_event(td->evtdev->next_event, 1);
+ }
+
/* This moves the broadcast assignment to this CPU: */
clockevents_program_event(bc, bc->next_event, 1);
}
No check is done on the size of the data to be transmiited. This causes
a kernel panic when this size exceeds the sg_miter's length.
Limit the number of transmitted bytes to sgm->length.
Cc: stable(a)vger.kernel.org
Fixes: ed01d210fd91 ("mmc: davinci_mmc: Use sg_miter for PIO")
Signed-off-by: Bastien Curutchet <bastien.curutchet(a)bootlin.com>
---
drivers/mmc/host/davinci_mmc.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/mmc/host/davinci_mmc.c b/drivers/mmc/host/davinci_mmc.c
index d7427894e0bc..c302eb380e42 100644
--- a/drivers/mmc/host/davinci_mmc.c
+++ b/drivers/mmc/host/davinci_mmc.c
@@ -224,6 +224,9 @@ static void davinci_fifo_data_trans(struct mmc_davinci_host *host,
}
p = sgm->addr;
+ if (n > sgm->length)
+ n = sgm->length;
+
/* NOTE: we never transfer more than rw_threshold bytes
* to/from the fifo here; there's no I/O overlap.
* This also assumes that access width( i.e. ACCWD) is 4 bytes
--
2.45.0
One long running saga for me on the Lenovo X13s is the occasional failure
to either probe or subsequently bring-up the ov5675 main RGB sensor on the
laptop.
Initially I suspected the PMIC for this part as the PMIC is using a new
interface on an I2C bus instead of an SPMI bus. In particular I thought
perhaps the I2C write to PMIC had completed but the regulator output hadn't
become stable from the perspective of the SoC. This however doesn't appear
to be the case - I can introduce a delay of milliseconds on the PMIC path
without resolving the sensor reset problem.
Secondly I thought about reset pin polarity or drive-strength but, again
playing about with both didn't yield decent results.
I also played with the duration of reset to no avail.
The error manifested as an I2C write timeout to the sensor which indicated
that the chip likely hadn't come out reset. An intermittent fault appearing
in perhaps 1/10 or 1/20 reset cycles.
Looking at the expression of the reset we see that there is a minimum time
expressed in XVCLK cycles between reset completion and first I2C
transaction to the sensor. The specification calls out the minimum delay @
8192 XVCLK cycles and the ov5675 driver meets that timing almost exactly.
A little too exactly - testing finally showed that we were too racy with
respect to the minimum quiescence between reset completion and first
command to the chip.
Fixing this error I choose to base the fix again on the number of clocks
but to also support any clock rate the chip could support by moving away
from a define to reading and using the XVCLK.
True enough only 19.2 MHz is currently supported but for the hypothetical
case where some other frequency is supported in the future, I wanted the
fix introduced in this series to still hold.
Hence this series:
1. Allows for any clock rate to be used in the valid range for the reset.
2. Elongates the post-reset period based on clock cycles which can now
vary.
Patch #2 can still be backported to stable irrespective of patch #1.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (2):
media: ov5675: Derive delay cycles from the clock rate reported
media: ov5675: Elongate reset to first transaction minimum gap
drivers/media/i2c/ov5675.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
---
base-commit: 523b23f0bee3014a7a752c9bb9f5c54f0eddae88
change-id: 20240710-linux-next-ov5675-60b0e83c73f1
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
From: Jonathan Denose <jdenose(a)google.com>
[ Upstream commit a69ce592cbe0417664bc5a075205aa75c2ec1273 ]
The Lenovo N24 on resume becomes stuck in a state where it
sends incorrect packets, causing elantech_packet_check_v4 to fail.
The only way for the device to resume sending the correct packets is for
it to be disabled and then re-enabled.
This change adds a dmi check to trigger this behavior on resume.
Signed-off-by: Jonathan Denose <jdenose(a)google.com>
Link: https://lore.kernel.org/r/20240503155020.v2.1.Ifa0e25ebf968d8f307f58d678036…
Signed-off-by: Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/input/mouse/elantech.c | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c
index 6759cab82a723..6f747c59cd652 100644
--- a/drivers/input/mouse/elantech.c
+++ b/drivers/input/mouse/elantech.c
@@ -1527,16 +1527,47 @@ static void elantech_disconnect(struct psmouse *psmouse)
psmouse->private = NULL;
}
+/*
+ * Some hw_version 4 models fail to properly activate absolute mode on
+ * resume without going through disable/enable cycle.
+ */
+static const struct dmi_system_id elantech_needs_reenable[] = {
+#if defined(CONFIG_DMI) && defined(CONFIG_X86)
+ {
+ /* Lenovo N24 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "81AF"),
+ },
+ },
+#endif
+ { }
+};
+
/*
* Put the touchpad back into absolute mode when reconnecting
*/
static int elantech_reconnect(struct psmouse *psmouse)
{
+ int err;
+
psmouse_reset(psmouse);
if (elantech_detect(psmouse, 0))
return -1;
+ if (dmi_check_system(elantech_needs_reenable)) {
+ err = ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_DISABLE);
+ if (err)
+ psmouse_warn(psmouse, "failed to deactivate mouse on %s: %d\n",
+ psmouse->ps2dev.serio->phys, err);
+
+ err = ps2_command(&psmouse->ps2dev, NULL, PSMOUSE_CMD_ENABLE);
+ if (err)
+ psmouse_warn(psmouse, "failed to reactivate mouse on %s: %d\n",
+ psmouse->ps2dev.serio->phys, err);
+ }
+
if (elantech_set_absolute_mode(psmouse)) {
psmouse_err(psmouse,
"failed to put touchpad back into absolute mode.\n");
--
2.43.0
We're seeing a GPU HANG issue on a CHV platform, which was caused by
bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8").
Gen8 platform has only timeslice and doesn't support a preemption mechanism
as engines do not have a preemption timer and doesn't send an irq if the
preemption timeout expires. So, add a fix to not consider preemption
during dequeuing for gen8 platforms.
Also move can_preemt() above need_preempt() function to resolve implicit
declaration of function ‘can_preempt' error and make can_preempt()
function param as const to resolve error: passing argument 1 of
‘can_preempt’ discards ‘const’ qualifier from the pointer target type.
v2: Simplify can_preemt() function (Tvrtko Ursulin)
Fixes: bac24f59f454 ("drm/i915/execlists: Enable coarse preemption boundaries for gen8")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11396
Suggested-by: Andi Shyti <andi.shyti(a)intel.com>
Signed-off-by: Nitin Gote <nitin.r.gote(a)intel.com>
Cc: Chris Wilson <chris.p.wilson(a)linux.intel.com>
CC: <stable(a)vger.kernel.org> # v5.2+
---
.../drm/i915/gt/intel_execlists_submission.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 21829439e686..59885d7721e4 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -294,11 +294,19 @@ static int virtual_prio(const struct intel_engine_execlists *el)
return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
}
+static bool can_preempt(const struct intel_engine_cs *engine)
+{
+ return GRAPHICS_VER(engine->i915) > 8;
+}
+
static bool need_preempt(const struct intel_engine_cs *engine,
const struct i915_request *rq)
{
int last_prio;
+ if (!can_preempt(engine))
+ return false;
+
if (!intel_engine_has_semaphores(engine))
return false;
@@ -3313,15 +3321,6 @@ static void remove_from_engine(struct i915_request *rq)
i915_request_notify_execute_cb_imm(rq);
}
-static bool can_preempt(struct intel_engine_cs *engine)
-{
- if (GRAPHICS_VER(engine->i915) > 8)
- return true;
-
- /* GPGPU on bdw requires extra w/a; not implemented */
- return engine->class != RENDER_CLASS;
-}
-
static void kick_execlists(const struct i915_request *rq, int prio)
{
struct intel_engine_cs *engine = rq->engine;
--
2.25.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071155-mushiness-lumpiness-edc1@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
7278a8fb8d03 ("s390: Mark psw in __load_psw_mask() as __unitialized")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Tue, 30 Apr 2024 16:30:01 +0200
Subject: [PATCH] s390: Mark psw in __load_psw_mask() as __unitialized
Without __unitialized, the following code is generated when
INIT_STACK_ALL_ZERO is enabled:
86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15)
8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15)
92: c0 10 00 00 00 08 larl %r1, 0xa2
98: e3 10 f0 a8 00 24 stg %r1, 168(%r15)
9e: b2 b2 f0 a0 lpswe 160(%r15)
The xc is not adding any security because psw is fully initialized
with the following instructions. Add __unitialized to the psw
definitiation to avoid the superfluous clearing of psw.
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1e2fc6d6963c..07ad5a1df878 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -314,8 +314,8 @@ static inline void __load_psw(psw_t psw)
*/
static __always_inline void __load_psw_mask(unsigned long mask)
{
+ psw_t psw __uninitialized;
unsigned long addr;
- psw_t psw;
psw.mask = mask;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071154-kitten-oxidize-b3a1@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
7278a8fb8d03 ("s390: Mark psw in __load_psw_mask() as __unitialized")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Tue, 30 Apr 2024 16:30:01 +0200
Subject: [PATCH] s390: Mark psw in __load_psw_mask() as __unitialized
Without __unitialized, the following code is generated when
INIT_STACK_ALL_ZERO is enabled:
86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15)
8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15)
92: c0 10 00 00 00 08 larl %r1, 0xa2
98: e3 10 f0 a8 00 24 stg %r1, 168(%r15)
9e: b2 b2 f0 a0 lpswe 160(%r15)
The xc is not adding any security because psw is fully initialized
with the following instructions. Add __unitialized to the psw
definitiation to avoid the superfluous clearing of psw.
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1e2fc6d6963c..07ad5a1df878 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -314,8 +314,8 @@ static inline void __load_psw(psw_t psw)
*/
static __always_inline void __load_psw_mask(unsigned long mask)
{
+ psw_t psw __uninitialized;
unsigned long addr;
- psw_t psw;
psw.mask = mask;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071153-cassette-savings-69dd@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
7278a8fb8d03 ("s390: Mark psw in __load_psw_mask() as __unitialized")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Tue, 30 Apr 2024 16:30:01 +0200
Subject: [PATCH] s390: Mark psw in __load_psw_mask() as __unitialized
Without __unitialized, the following code is generated when
INIT_STACK_ALL_ZERO is enabled:
86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15)
8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15)
92: c0 10 00 00 00 08 larl %r1, 0xa2
98: e3 10 f0 a8 00 24 stg %r1, 168(%r15)
9e: b2 b2 f0 a0 lpswe 160(%r15)
The xc is not adding any security because psw is fully initialized
with the following instructions. Add __unitialized to the psw
definitiation to avoid the superfluous clearing of psw.
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1e2fc6d6963c..07ad5a1df878 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -314,8 +314,8 @@ static inline void __load_psw(psw_t psw)
*/
static __always_inline void __load_psw_mask(unsigned long mask)
{
+ psw_t psw __uninitialized;
unsigned long addr;
- psw_t psw;
psw.mask = mask;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071153-tanned-cobalt-76a1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
7278a8fb8d03 ("s390: Mark psw in __load_psw_mask() as __unitialized")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Tue, 30 Apr 2024 16:30:01 +0200
Subject: [PATCH] s390: Mark psw in __load_psw_mask() as __unitialized
Without __unitialized, the following code is generated when
INIT_STACK_ALL_ZERO is enabled:
86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15)
8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15)
92: c0 10 00 00 00 08 larl %r1, 0xa2
98: e3 10 f0 a8 00 24 stg %r1, 168(%r15)
9e: b2 b2 f0 a0 lpswe 160(%r15)
The xc is not adding any security because psw is fully initialized
with the following instructions. Add __unitialized to the psw
definitiation to avoid the superfluous clearing of psw.
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1e2fc6d6963c..07ad5a1df878 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -314,8 +314,8 @@ static inline void __load_psw(psw_t psw)
*/
static __always_inline void __load_psw_mask(unsigned long mask)
{
+ psw_t psw __uninitialized;
unsigned long addr;
- psw_t psw;
psw.mask = mask;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071152-smoked-reheat-1908@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
7278a8fb8d03 ("s390: Mark psw in __load_psw_mask() as __unitialized")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Tue, 30 Apr 2024 16:30:01 +0200
Subject: [PATCH] s390: Mark psw in __load_psw_mask() as __unitialized
Without __unitialized, the following code is generated when
INIT_STACK_ALL_ZERO is enabled:
86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15)
8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15)
92: c0 10 00 00 00 08 larl %r1, 0xa2
98: e3 10 f0 a8 00 24 stg %r1, 168(%r15)
9e: b2 b2 f0 a0 lpswe 160(%r15)
The xc is not adding any security because psw is fully initialized
with the following instructions. Add __unitialized to the psw
definitiation to avoid the superfluous clearing of psw.
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1e2fc6d6963c..07ad5a1df878 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -314,8 +314,8 @@ static inline void __load_psw(psw_t psw)
*/
static __always_inline void __load_psw_mask(unsigned long mask)
{
+ psw_t psw __uninitialized;
unsigned long addr;
- psw_t psw;
psw.mask = mask;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024071151-recital-stage-0612@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
7278a8fb8d03 ("s390: Mark psw in __load_psw_mask() as __unitialized")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7278a8fb8d032dfdc03d9b5d17e0bc451cdc1492 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Tue, 30 Apr 2024 16:30:01 +0200
Subject: [PATCH] s390: Mark psw in __load_psw_mask() as __unitialized
Without __unitialized, the following code is generated when
INIT_STACK_ALL_ZERO is enabled:
86: d7 0f f0 a0 f0 a0 xc 160(16,%r15), 160(%r15)
8c: e3 40 f0 a0 00 24 stg %r4, 160(%r15)
92: c0 10 00 00 00 08 larl %r1, 0xa2
98: e3 10 f0 a8 00 24 stg %r1, 168(%r15)
9e: b2 b2 f0 a0 lpswe 160(%r15)
The xc is not adding any security because psw is fully initialized
with the following instructions. Add __unitialized to the psw
definitiation to avoid the superfluous clearing of psw.
Reviewed-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 1e2fc6d6963c..07ad5a1df878 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -314,8 +314,8 @@ static inline void __load_psw(psw_t psw)
*/
static __always_inline void __load_psw_mask(unsigned long mask)
{
+ psw_t psw __uninitialized;
unsigned long addr;
- psw_t psw;
psw.mask = mask;
Switching to transparent mode leads to a loss of link synchronization,
so prevent doing this on an active link. This happened at least on an
Intel N100 system / DELL UD22 dock, the LTTPR residing either on the
host or the dock. To fix the issue, keep the current mode on an active
link, adjusting the LTTPR count accordingly (resetting it to 0 in
transparent mode).
v2: Adjust code comment during link training about reiniting the LTTPRs.
(Ville)
Fixes: 7b2a4ab8b0ef ("drm/i915: Switch to LTTPR transparent mode link training")
Reported-and-tested-by: Gareth Yu <gareth.yu(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10902
Cc: <stable(a)vger.kernel.org> # v5.15+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
.../drm/i915/display/intel_dp_link_training.c | 55 ++++++++++++++++---
1 file changed, 48 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_link_training.c b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
index 1bc4ef84ff3bc..d044c8e36bb3d 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_link_training.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_link_training.c
@@ -117,10 +117,24 @@ intel_dp_set_lttpr_transparent_mode(struct intel_dp *intel_dp, bool enable)
return drm_dp_dpcd_write(&intel_dp->aux, DP_PHY_REPEATER_MODE, &val, 1) == 1;
}
-static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
+static bool intel_dp_lttpr_transparent_mode_enabled(struct intel_dp *intel_dp)
+{
+ return intel_dp->lttpr_common_caps[DP_PHY_REPEATER_MODE -
+ DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV] ==
+ DP_PHY_REPEATER_MODE_TRANSPARENT;
+}
+
+/*
+ * Read the LTTPR common capabilities and switch the LTTPR PHYs to
+ * non-transparent mode if this is supported. Preserve the
+ * transparent/non-transparent mode on an active link.
+ *
+ * Return the number of detected LTTPRs in non-transparent mode or 0 if the
+ * LTTPRs are in transparent mode or the detection failed.
+ */
+static int intel_dp_init_lttpr_phys(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
{
int lttpr_count;
- int i;
if (!intel_dp_read_lttpr_common_caps(intel_dp, dpcd))
return 0;
@@ -134,6 +148,19 @@ static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEI
if (lttpr_count == 0)
return 0;
+ /*
+ * Don't change the mode on an active link, to prevent a loss of link
+ * synchronization. See DP Standard v2.0 3.6.7. about the LTTPR
+ * resetting its internal state when the mode is changed from
+ * non-transparent to transparent.
+ */
+ if (intel_dp->link_trained) {
+ if (lttpr_count < 0 || intel_dp_lttpr_transparent_mode_enabled(intel_dp))
+ goto out_reset_lttpr_count;
+
+ return lttpr_count;
+ }
+
/*
* See DP Standard v2.0 3.6.6.1. about the explicit disabling of
* non-transparent mode and the disable->enable non-transparent mode
@@ -154,11 +181,25 @@ static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEI
"Switching to LTTPR non-transparent LT mode failed, fall-back to transparent mode\n");
intel_dp_set_lttpr_transparent_mode(intel_dp, true);
- intel_dp_reset_lttpr_count(intel_dp);
- return 0;
+ goto out_reset_lttpr_count;
}
+ return lttpr_count;
+
+out_reset_lttpr_count:
+ intel_dp_reset_lttpr_count(intel_dp);
+
+ return 0;
+}
+
+static int intel_dp_init_lttpr(struct intel_dp *intel_dp, const u8 dpcd[DP_RECEIVER_CAP_SIZE])
+{
+ int lttpr_count;
+ int i;
+
+ lttpr_count = intel_dp_init_lttpr_phys(intel_dp, dpcd);
+
for (i = 0; i < lttpr_count; i++)
intel_dp_read_lttpr_phy_caps(intel_dp, dpcd, DP_PHY_LTTPR(i));
@@ -1482,10 +1523,10 @@ void intel_dp_start_link_train(struct intel_atomic_state *state,
struct intel_digital_port *dig_port = dp_to_dig_port(intel_dp);
struct intel_encoder *encoder = &dig_port->base;
bool passed;
-
/*
- * TODO: Reiniting LTTPRs here won't be needed once proper connector
- * HW state readout is added.
+ * Reinit the LTTPRs here to ensure that they are switched to
+ * non-transparent mode. During an earlier LTTPR detection this
+ * could've been prevented by an active link.
*/
int lttpr_count = intel_dp_init_lttpr_and_dprx_caps(intel_dp);
--
2.43.3
Regularly retraining a link during an atomic commit happens with the
given pipe/link already disabled and hence intel_dp->link_trained being
false. Ensure this also for retraining a DP SST link via direct calls to
the link training functions (vs. an actual commit as for DP MST). So far
nothing depended on this, however the next patch will depend on
link_trained==false for changing the LTTPR mode to non-transparent.
Cc: <stable(a)vger.kernel.org> # v5.15+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_dp.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 3903f6ead6e66..59f11af3b0a1d 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -5314,6 +5314,8 @@ static int intel_dp_retrain_link(struct intel_encoder *encoder,
const struct intel_crtc_state *crtc_state =
to_intel_crtc_state(crtc->base.state);
+ intel_dp->link_trained = false;
+
intel_dp_check_frl_training(intel_dp);
intel_dp_pcon_dsc_configure(intel_dp, crtc_state);
intel_dp_start_link_train(NULL, intel_dp, crtc_state);
--
2.43.3
We are accessing the start and len field in em after it is free'd.
This patch moves the line accessing the free'd values in em before
they were free'd so we won't access free'd memory.
Reported-by: syzbot+853d80cba98ce1157ae6(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=853d80cba98ce1157ae6
Signed-off-by: Pei Li <peili.dev(a)gmail.com>
---
Syzbot reported the following error:
BUG: KASAN: slab-use-after-free in add_ra_bio_pages.constprop.0.isra.0+0xf03/0xfb0 fs/btrfs/compression.c:529
This is because we are reading the values from em right after freeing it
before through free_extent_map(em).
This patch moves the line accessing the free'd values in em before
they were free'd so we won't access free'd memory.
Fixes: 6a4049102055 ("btrfs: subpage: make add_ra_bio_pages() compatible")
---
Changes in v2:
- Adapt Qu's suggestion to move the read-after-free line before freeing
- Cc stable kernel
- Link to v1: https://lore.kernel.org/r/20240710-bug11-v1-1-aa02297fbbc9@gmail.com
---
fs/btrfs/compression.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 6441e47d8a5e..f271df10ef1c 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -514,6 +514,8 @@ static noinline int add_ra_bio_pages(struct inode *inode,
put_page(page);
break;
}
+ add_size = min(em->start + em->len, page_end + 1) - cur;
+
free_extent_map(em);
if (page->index == end_index) {
@@ -526,7 +528,6 @@ static noinline int add_ra_bio_pages(struct inode *inode,
}
}
- add_size = min(em->start + em->len, page_end + 1) - cur;
ret = bio_add_page(orig_bio, page, add_size, offset_in_page(cur));
if (ret != add_size) {
unlock_extent(tree, cur, page_end, NULL);
---
base-commit: 563a50672d8a86ec4b114a4a2f44d6e7ff855f5b
change-id: 20240710-bug11-a8ac18afb724
Best regards,
--
Pei Li <peili.dev(a)gmail.com>
There's no reason to have jbd2_journal_get_max_txn_bufs() public
function. Currently all users are internal and can use
journal->j_max_transaction_buffers instead. This saves some unnecessary
recomputations of the limit as a bonus which becomes important as this
function gets more complex in the following patch.
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/jbd2/commit.c | 2 +-
fs/jbd2/journal.c | 5 +++++
include/linux/jbd2.h | 5 -----
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index 75ea4e9a5cab..e7fc912693bd 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -766,7 +766,7 @@ void jbd2_journal_commit_transaction(journal_t *journal)
if (first_block < journal->j_tail)
freed += journal->j_last - journal->j_first;
/* Update tail only if we free significant amount of space */
- if (freed < jbd2_journal_get_max_txn_bufs(journal))
+ if (freed < journal->j_max_transaction_buffers)
update_tail = 0;
}
J_ASSERT(commit_transaction->t_state == T_COMMIT);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 03c4b9214f56..1bb73750d307 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1698,6 +1698,11 @@ journal_t *jbd2_journal_init_inode(struct inode *inode)
return journal;
}
+static int jbd2_journal_get_max_txn_bufs(journal_t *journal)
+{
+ return (journal->j_total_len - journal->j_fc_wbufsize) / 4;
+}
+
/*
* Given a journal_t structure, initialise the various fields for
* startup of a new journaling session. We use this both when creating
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index ab04c1c27fae..f91b930abe20 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1660,11 +1660,6 @@ int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode);
int jbd2_fc_wait_bufs(journal_t *journal, int num_blks);
int jbd2_fc_release_bufs(journal_t *journal);
-static inline int jbd2_journal_get_max_txn_bufs(journal_t *journal)
-{
- return (journal->j_total_len - journal->j_fc_wbufsize) / 4;
-}
-
/*
* is_journal_abort
*
--
2.35.3
The patch titled
Subject: mm/hugetlb: fix potential race with try_memory_failure_hugetlb()
has been added to the -mm mm-unstable branch. Its filename is
mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/hugetlb: fix potential race with try_memory_failure_hugetlb()
Date: Wed, 10 Jul 2024 16:14:45 +0800
There is a potential race between __update_and_free_hugetlb_folio() and
try_memory_failure_hugetlb():
CPU1 CPU2
__update_and_free_hugetlb_folio try_memory_failure_hugetlb
spin_lock_irq(&hugetlb_lock);
__get_huge_page_for_hwpoison
folio_test_hugetlb
-- It's still hugetlb folio.
folio_test_hugetlb_raw_hwp_unreliable
-- raw_hwp_unreliable flag is not set yet.
folio_set_hugetlb_hwpoison
-- raw_hwp_unreliable flag might
be set.
spin_unlock_irq(&hugetlb_lock);
spin_lock_irq(&hugetlb_lock);
__folio_clear_hugetlb(folio);
-- Hugetlb flag is cleared but too late!
spin_unlock_irq(&hugetlb_lock);
When this race occurs, raw error pages will hit pcplists/buddy. Fix this
issue by deferring folio_test_hugetlb_raw_hwp_unreliable() until
__folio_clear_hugetlb() is done. The raw_hwp_unreliable flag cannot be
set after hugetlb folio flag is cleared.
Link: https://lkml.kernel.org/r/20240710081445.3307355-1-linmiaohe@huawei.com
Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb
+++ a/mm/hugetlb.c
@@ -1706,13 +1706,6 @@ static void __update_and_free_hugetlb_fo
return;
/*
- * If we don't know which subpages are hwpoisoned, we can't free
- * the hugepage, so it's leaked intentionally.
- */
- if (folio_test_hugetlb_raw_hwp_unreliable(folio))
- return;
-
- /*
* If folio is not vmemmap optimized (!clear_flag), then the folio
* is no longer identified as a hugetlb page. hugetlb_vmemmap_restore_folio
* can only be passed hugetlb pages and will BUG otherwise.
@@ -1730,6 +1723,13 @@ static void __update_and_free_hugetlb_fo
}
/*
+ * If we don't know which subpages are hwpoisoned, we can't free
+ * the hugepage, so it's leaked intentionally.
+ */
+ if (folio_test_hugetlb_raw_hwp_unreliable(folio))
+ return;
+
+ /*
* Move PageHWPoison flag from head page to the raw error pages,
* which makes any healthy subpages reusable.
*/
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-remove-obsolete-mf_msg_different_compound.patch
mm-hugetlb-fix-potential-race-with-try_memory_failure_hugetlb.patch
The patch titled
Subject: mm: fix old/young bit handling in the faulting path
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-old-young-bit-handling-in-the-faulting-path.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ram Tummala <rtummala(a)nvidia.com>
Subject: mm: fix old/young bit handling in the faulting path
Date: Tue, 9 Jul 2024 18:45:39 -0700
Commit 3bd786f76de2 ("mm: convert do_set_pte() to set_pte_range()")
replaced do_set_pte() with set_pte_range() and that introduced a
regression in the following faulting path of non-anonymous vmas which
caused the PTE for the faulting address to be marked as old instead of
young.
handle_pte_fault()
do_pte_missing()
do_fault()
do_read_fault() || do_cow_fault() || do_shared_fault()
finish_fault()
set_pte_range()
The polarity of prefault calculation is incorrect. This leads to prefault
being incorrectly set for the faulting address. The following check will
incorrectly mark the PTE old rather than young. On some architectures
this will cause a double fault to mark it young when the access is
retried.
if (prefault && arch_wants_old_prefaulted_pte())
entry = pte_mkold(entry);
On a subsequent fault on the same address, the faulting path will see a
non NULL vmf->pte and instead of reaching the do_pte_missing() path, PTE
will then be correctly marked young in handle_pte_fault() itself.
Due to this bug, performance degradation in the fault handling path will
be observed due to unnecessary double faulting.
Link: https://lkml.kernel.org/r/20240710014539.746200-1-rtummala@nvidia.com
Fixes: 3bd786f76de2 ("mm: convert do_set_pte() to set_pte_range()")
Signed-off-by: Ram Tummala <rtummala(a)nvidia.com>
Reviewed-by: Yin Fengwei <fengwei.yin(a)intel.com>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Yin Fengwei <fengwei.yin(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory.c~mm-fix-old-young-bit-handling-in-the-faulting-path
+++ a/mm/memory.c
@@ -4681,7 +4681,7 @@ void set_pte_range(struct vm_fault *vmf,
{
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
- bool prefault = in_range(vmf->address, addr, nr * PAGE_SIZE);
+ bool prefault = !in_range(vmf->address, addr, nr * PAGE_SIZE);
pte_t entry;
flush_icache_pages(vma, page, nr);
_
Patches currently in -mm which might be from rtummala(a)nvidia.com are
mm-fix-old-young-bit-handling-in-the-faulting-path.patch
In read_handle(), of_get_address() may return NULL which is later
dereferenced. Fix this by adding NULL check.
Cc: stable(a)vger.kernel.org
Fixes: 14baf4d9c739 ("cxl: Add guest-specific code")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
Changes in v2:
- The potential vulnerability was discovered as follows: based on our
customized static analysis tool, extract vulnerability features[1], and
then match similar vulnerability features in this function.
- Reference link:
[1] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
---
drivers/misc/cxl/of.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/cxl/of.c b/drivers/misc/cxl/of.c
index bcc005dff1c0..d8dbb3723951 100644
--- a/drivers/misc/cxl/of.c
+++ b/drivers/misc/cxl/of.c
@@ -58,7 +58,7 @@ static int read_handle(struct device_node *np, u64 *handle)
/* Get address and size of the node */
prop = of_get_address(np, 0, &size, NULL);
- if (size)
+ if (!prop || size)
return -EINVAL;
/* Helper to read a big number; size is in cells (not bytes) */
--
2.25.1
The patch titled
Subject: mm: fix PTE_AF handling in fault path on architectures with HW AF support
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-pte_af-handling-in-fault-path-on-architectures-with-hw-af-support.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ram Tummala <rtummala(a)nvidia.com>
Subject: mm: fix PTE_AF handling in fault path on architectures with HW AF support
Date: Tue, 9 Jul 2024 17:09:42 -0700
Commit 3bd786f76de2 ("mm: convert do_set_pte() to set_pte_range()")
replaced do_set_pte() with set_pte_range() and that introduced a
regression in the following faulting path of non-anonymous vmas on CPUs
with HW AF (Access Flag) support.
handle_pte_fault()
do_pte_missing()
do_fault()
do_read_fault() || do_cow_fault() || do_shared_fault()
finish_fault()
set_pte_range()
The polarity of prefault calculation is incorrect. This leads to prefault
being incorrectly set for the faulting address. The following if check
will incorrectly clear the PTE_AF bit instead of setting it and the access
will fault again on the same address due to the missing PTE_AF bit.
if (prefault && arch_wants_old_prefaulted_pte())
entry = pte_mkold(entry);
On a subsequent fault on the same address, the faulting path will see a
non NULL vmf->pte and instead of reaching the do_pte_missing() path,
PTE_AF will be correctly set in handle_pte_fault() itself.
Due to this bug, performance degradation in the fault handling path will
be observed due to unnecessary double faulting.
Link: https://lkml.kernel.org/r/20240710000942.623704-1-rtummala@nvidia.com
Fixes: 3bd786f76de2 ("mm: convert do_set_pte() to set_pte_range()")
Signed-off-by: Ram Tummala <rtummala(a)nvidia.com>
Reviewed-by: Yin Fengwei <fengwei.yin(a)intel.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Alistair Popple <apopple(a)nvidia.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memory.c~mm-fix-pte_af-handling-in-fault-path-on-architectures-with-hw-af-support
+++ a/mm/memory.c
@@ -4681,7 +4681,7 @@ void set_pte_range(struct vm_fault *vmf,
{
struct vm_area_struct *vma = vmf->vma;
bool write = vmf->flags & FAULT_FLAG_WRITE;
- bool prefault = in_range(vmf->address, addr, nr * PAGE_SIZE);
+ bool prefault = !in_range(vmf->address, addr, nr * PAGE_SIZE);
pte_t entry;
flush_icache_pages(vma, page, nr);
_
Patches currently in -mm which might be from rtummala(a)nvidia.com are
mm-fix-pte_af-handling-in-fault-path-on-architectures-with-hw-af-support.patch
From: Martin Wilck <martin.wilck(a)suse.com>
[ Upstream commit 10157b1fc1a762293381e9145041253420dfc6ad ]
When a host is configured with a few LUNs and I/O is running, injecting FC
faults repeatedly leads to path recovery problems. The LUNs have 4 paths
each and 3 of them come back active after say an FC fault which makes 2 of
the paths go down, instead of all 4. This happens after several iterations
of continuous FC faults.
Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE, ASYMMETRIC
ACCESS STATE TRANSITION) instead of retrying.
[mwilck: The original patch was developed by Rajashekhar M A and Hannes
Reinecke. I moved the code to alua_check_sense() as suggested by Mike
Christie [1]. Evan Milne had raised the question whether pg->state should
be set to transitioning in the UA case [2]. I believe that doing this is
correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
errors. Our handler schedules an RTPG, which will only result in an I/O
error condition if the transitioning timeout expires.]
[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVE…
Co-developed-by: Rajashekhar M A <rajs(a)netapp.com>
Co-developed-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Hannes Reinecke <hare(a)suse.de>
Signed-off-by: Martin Wilck <martin.wilck(a)suse.com>
Link: https://lore.kernel.org/r/20240514140344.19538-1-mwilck@suse.com
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Mike Christie <michael.christie(a)oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 31 +++++++++++++++-------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 0781f991e7845..f5fc8631883d5 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -406,28 +406,40 @@ static char print_alua_state(unsigned char state)
}
}
-static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
- struct scsi_sense_hdr *sense_hdr)
+static void alua_handle_state_transition(struct scsi_device *sdev)
{
struct alua_dh_data *h = sdev->handler_data;
struct alua_port_group *pg;
+ rcu_read_lock();
+ pg = rcu_dereference(h->pg);
+ if (pg)
+ pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+ rcu_read_unlock();
+ alua_check(sdev, false);
+}
+
+static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
+ struct scsi_sense_hdr *sense_hdr)
+{
switch (sense_hdr->sense_key) {
case NOT_READY:
if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
/*
* LUN Not Accessible - ALUA state transition
*/
- rcu_read_lock();
- pg = rcu_dereference(h->pg);
- if (pg)
- pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
- rcu_read_unlock();
- alua_check(sdev, false);
+ alua_handle_state_transition(sdev);
return NEEDS_RETRY;
}
break;
case UNIT_ATTENTION:
+ if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
+ /*
+ * LUN Not Accessible - ALUA state transition
+ */
+ alua_handle_state_transition(sdev);
+ return NEEDS_RETRY;
+ }
if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
/*
* Power On, Reset, or Bus Device Reset.
@@ -494,7 +506,8 @@ static int alua_tur(struct scsi_device *sdev)
retval = scsi_test_unit_ready(sdev, ALUA_FAILOVER_TIMEOUT * HZ,
ALUA_FAILOVER_RETRIES, &sense_hdr);
- if (sense_hdr.sense_key == NOT_READY &&
+ if ((sense_hdr.sense_key == NOT_READY ||
+ sense_hdr.sense_key == UNIT_ATTENTION) &&
sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
return SCSI_DH_RETRY;
else if (retval)
--
2.43.0
In case of the COW file, new updates and GC writes are already
separated to page caches of the atomic file and COW file. As some cases
that use the meta inode for GC, there are some race issues between a
foreground thread and GC thread.
To handle them, we need to take care when to invalidate and wait
writeback of GC pages in COW files as the case of using the meta inode.
Also, a pointer from the COW inode to the original inode is required to
check the state of original pages.
For the former, we can solve the problem by using the meta inode for GC
of COW files. Then let's get a page from the original inode in
move_data_block when GCing the COW file to avoid race condition.
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
Reviewed-by: Chao Yu <chao(a)kernel.org>
---
v3:
- make the mapping variable to select a proper inode
v2:
- use union for cow inode to point to atomic inode
fs/f2fs/data.c | 2 +-
fs/f2fs/f2fs.h | 13 +++++++++++--
fs/f2fs/file.c | 3 +++
fs/f2fs/gc.c | 7 +++++--
fs/f2fs/inline.c | 2 +-
fs/f2fs/inode.c | 3 ++-
6 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 9a213d03005d..f6b1782f965a 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2606,7 +2606,7 @@ bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
return true;
if (IS_NOQUOTA(inode))
return true;
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_used_in_atomic_write(inode))
return true;
/* rewrite low ratio compress data w/ OPU mode to avoid fragmentation */
if (f2fs_compressed_file(inode) &&
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 796ae11c0fa3..4a8621e4a33a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -843,7 +843,11 @@ struct f2fs_inode_info {
struct task_struct *atomic_write_task; /* store atomic write task */
struct extent_tree *extent_tree[NR_EXTENT_CACHES];
/* cached extent_tree entry */
- struct inode *cow_inode; /* copy-on-write inode for atomic write */
+ union {
+ struct inode *cow_inode; /* copy-on-write inode for atomic write */
+ struct inode *atomic_inode;
+ /* point to atomic_inode, available only for cow_inode */
+ };
/* avoid racing between foreground op and gc */
struct f2fs_rwsem i_gc_rwsem[2];
@@ -4263,9 +4267,14 @@ static inline bool f2fs_post_read_required(struct inode *inode)
f2fs_compressed_file(inode);
}
+static inline bool f2fs_used_in_atomic_write(struct inode *inode)
+{
+ return f2fs_is_atomic_file(inode) || f2fs_is_cow_file(inode);
+}
+
static inline bool f2fs_meta_inode_gc_required(struct inode *inode)
{
- return f2fs_post_read_required(inode) || f2fs_is_atomic_file(inode);
+ return f2fs_post_read_required(inode) || f2fs_used_in_atomic_write(inode);
}
/*
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index e4a7cff00796..547e7ec32b1f 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -2183,6 +2183,9 @@ static int f2fs_ioc_start_atomic_write(struct file *filp, bool truncate)
set_inode_flag(fi->cow_inode, FI_COW_FILE);
clear_inode_flag(fi->cow_inode, FI_INLINE_DATA);
+
+ /* Set the COW inode's atomic_inode to the atomic inode */
+ F2FS_I(fi->cow_inode)->atomic_inode = inode;
} else {
/* Reuse the already created COW inode */
ret = f2fs_do_truncate_blocks(fi->cow_inode, 0, true);
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index cb3006551ab5..724bbcb447d3 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -1171,7 +1171,8 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
static int ra_data_block(struct inode *inode, pgoff_t index)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct address_space *mapping = inode->i_mapping;
+ struct address_space *mapping = f2fs_is_cow_file(inode) ?
+ F2FS_I(inode)->atomic_inode->i_mapping : inode->i_mapping;
struct dnode_of_data dn;
struct page *page;
struct f2fs_io_info fio = {
@@ -1260,6 +1261,8 @@ static int ra_data_block(struct inode *inode, pgoff_t index)
static int move_data_block(struct inode *inode, block_t bidx,
int gc_type, unsigned int segno, int off)
{
+ struct address_space *mapping = f2fs_is_cow_file(inode) ?
+ F2FS_I(inode)->atomic_inode->i_mapping : inode->i_mapping;
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
.ino = inode->i_ino,
@@ -1282,7 +1285,7 @@ static int move_data_block(struct inode *inode, block_t bidx,
CURSEG_ALL_DATA_ATGC : CURSEG_COLD_DATA;
/* do not read out */
- page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
+ page = f2fs_grab_cache_page(mapping, bidx, false);
if (!page)
return -ENOMEM;
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 1fba5728be70..cca7d448e55c 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -16,7 +16,7 @@
static bool support_inline_data(struct inode *inode)
{
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_used_in_atomic_write(inode))
return false;
if (!S_ISREG(inode->i_mode) && !S_ISLNK(inode->i_mode))
return false;
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 7a3e2458b2d9..18dea43e694b 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -804,8 +804,9 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_abort_atomic_write(inode, true);
- if (fi->cow_inode) {
+ if (fi->cow_inode && f2fs_is_cow_file(fi->cow_inode)) {
clear_inode_flag(fi->cow_inode, FI_COW_FILE);
+ F2FS_I(fi->cow_inode)->atomic_inode = NULL;
iput(fi->cow_inode);
fi->cow_inode = NULL;
}
--
2.25.1