The following crash was seen on a 1k-block ext4 filesystem when the user
passed a fmr_physical of value zero for the high key:
kernel BUG at fs/ext4/ext4.h:3354!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 422 Comm: syz-executor339 Not tainted 5.15.74-syzkaller-00001-g4ec71a9ec769 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:ext4_get_group_info fs/ext4/ext4.h:3354 [inline]
RIP: 0010:ext4_mb_load_buddy_gfp+0xe9d/0xeb0 fs/ext4/mballoc.c:1498
Code: 99 ed c1 ff e9 47 f4 ff ff e8 5f a0 7f ff 48 c7 c7 c0 c3 a9 86 48 89 de 4c 89 ea e8 ad 8e 92 00 e9 a1 f2 ff ff e8 43 a0 7f ff <0f> 0b e8 3c a0 7f ff 0f 0b e8 35 a0 7f ff 0f 0b 0f 1f 00 55 48 89
RSP: 0018:ffffc9000044f320 EFLAGS: 00010293
RAX: ffffffff81f1f14d RBX: 0000000000000001 RCX: ffff888106194f00
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000001
RBP: ffffc9000044f3b0 R08: ffffffff81f1e3a4 R09: ffffc9000044f440
R10: fffff52000089e8f R11: 1ffff92000089e88 R12: ffff888100a553c8
R13: dffffc0000000000 R14: 0000000000000001 R15: ffff888105f9a000
FS: 000055555675b300(0000) GS:ffff8881f7000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c4f7cde3e8 CR3: 000000011c064000 CR4: 00000000003506b0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
ext4_mb_load_buddy fs/ext4/mballoc.c:1620 [inline]
ext4_mballoc_query_range+0xb8/0x7b0 fs/ext4/mballoc.c:6560
ext4_getfsmap_datadev+0x1c8a/0x28c0 fs/ext4/fsmap.c:537
ext4_getfsmap+0xcff/0x1040 fs/ext4/fsmap.c:708
ext4_ioc_getfsmap fs/ext4/ioctl.c:660 [inline]
__ext4_ioctl fs/ext4/ioctl.c:862 [inline]
ext4_ioctl+0x3020/0x50e0 fs/ext4/ioctl.c:1276
vfs_ioctl fs/ioctl.c:51 [inline]
__do_sys_ioctl fs/ioctl.c:874 [inline]
__se_sys_ioctl+0x115/0x190 fs/ioctl.c:860
__x64_sys_ioctl+0x7b/0x90 fs/ioctl.c:860
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x61/0xcb
RIP: 0033:0x7f8d2372e3e9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcae7c2578 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f8d2372e3e9
RDX: 0000000020000200 RSI: 00000000c0c0583b RDI: 0000000000000004
RBP: 0000000000000000 R08: 000000000000000d R09: 000000000000000d
R10: 00000000000003f1 R11: 0000000000000246 R12: 00007f8d236ed5c0
R13: 00007ffcae7c25a0 R14: 00007ffcae7c258c R15: 00007ffcae7c2590
The crash is due to insufficient key validation.
When high_key->fmr_physical is zero for a 1k-block ext4 filesystem we
encounter an underflow in the second call to
ext4_get_group_no_and_offset():
blocknr = blocknr - le32_to_cpu(es->s_first_data_block);
blocknr comes with value zero (high_key->fmr_physical is zero) and
the first_data_block is one, thus blocknr becomes -1. Since blocknr is
of type ext4_fsblk_t (unsigned long long) the value is all ones. Due to
the high values for blocknr and offset we hit the BUG_ON in
ext4_get_group_info() as the groupnr exceeds EXT4_SB(sb)->s_groups_count).
Fix it by returning early when keys[1].fmr_physical is less than or equal
to the first data block, as there's no data to return. This replicates the
behavior of returning zero, like in keys[0].fmr_physical >= eofs. In both
cases -EINVAL could be returned but for some reason the author decided to
just ignore out of bounds requests.
One may notice that the check introduced addresses just the
ext4_getfsmap_datadev() handler. This is intentional, because for the
ext4_getfsmap_logdev() handler, the value of the high key passed by the
user is ignored and instead the code uses:
memset(&info->gfi_high, 0xFF, sizeof(info->gfi_high));
Cc: stable(a)vger.kernel.org
Fixes: 0c9ec4beecac ("ext4: support GETFSMAP ioctls")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c300…
Reported-by: syzbot+6be2b977c89f79b6b153(a)syzkaller.appspotmail.com
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
---
fs/ext4/fsmap.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/fs/ext4/fsmap.c b/fs/ext4/fsmap.c
index 4493ef0c715e..b5289378a761 100644
--- a/fs/ext4/fsmap.c
+++ b/fs/ext4/fsmap.c
@@ -479,6 +479,8 @@ static int ext4_getfsmap_datadev(struct super_block *sb,
int error = 0;
bofs = le32_to_cpu(sbi->s_es->s_first_data_block);
+ if (keys[1].fmr_physical <= bofs)
+ return 0;
eofs = ext4_blocks_count(sbi->s_es);
if (keys[0].fmr_physical >= eofs)
return 0;
--
2.39.2.637.g21b0678d19-goog
From: Matthias Maennich <maennich(a)google.com>
Can we please pick this series up for 5.15? I am particularly interested
in the last patch to enable BTF + DWARF5, but the cleanup patches before
are a very reasonable choice for stable@ as well as they simplify the
pahole version calculation and allow future BTF/pahole related patches
to apply cleanly as well. I intentionally kept the config
PAHOLE_HAS_BTF_TAG and hence its patch complete, even though there is no
user for it.
Cheers,
Matthias
Cc: <stable(a)vger.kernel.org> # v5.15+
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Andrii Nakryiko <andrii(a)kernel.org>
Signed-off-by: Matthias Maennich <maennich(a)google.com>
Nathan Chancellor (5):
MAINTAINERS: Add scripts/pahole-flags.sh to BPF section
kbuild: Add CONFIG_PAHOLE_VERSION
scripts/pahole-flags.sh: Use pahole-version.sh
lib/Kconfig.debug: Use CONFIG_PAHOLE_VERSION
lib/Kconfig.debug: Allow BTF + DWARF5 with pahole 1.21+
MAINTAINERS | 2 ++
init/Kconfig | 4 ++++
lib/Kconfig.debug | 12 ++++++++++--
scripts/pahole-flags.sh | 2 +-
scripts/pahole-version.sh | 13 +++++++++++++
5 files changed, 30 insertions(+), 3 deletions(-)
create mode 100755 scripts/pahole-version.sh
--
2.39.2.637.g21b0678d19-goog
From: Nathan Chancellor <nathan(a)kernel.org>
Commit 98cd6f521f10 ("Kconfig: allow explicit opt in to DWARF v5")
prevented CONFIG_DEBUG_INFO_DWARF5 from being selected when
CONFIG_DEBUG_INFO_BTF is enabled because pahole had issues with clang's
DWARF5 info. This was resolved by [1], which is in pahole v1.21.
Allow DEBUG_INFO_DWARF5 to be selected with DEBUG_INFO_BTF when using
pahole v1.21 or newer.
[1]: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=7d8e829f6…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-6-nathan@kernel.org
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
lib/Kconfig.debug | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0743c9567d7e..dd86de130cdf 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -302,7 +302,7 @@ config DEBUG_INFO_DWARF4
config DEBUG_INFO_DWARF5
bool "Generate DWARF Version 5 debuginfo"
depends on !CC_IS_CLANG || AS_IS_LLVM || (AS_IS_GNU && AS_VERSION >= 23502 && AS_HAS_NON_CONST_LEB128)
- depends on !DEBUG_INFO_BTF
+ depends on !DEBUG_INFO_BTF || PAHOLE_VERSION >= 121
help
Generate DWARF v5 debug info. Requires binutils 2.35.2, gcc 5.0+ (gcc
5.0+ accepts the -gdwarf-5 flag but only had partial support for some
--
2.39.2.637.g21b0678d19-goog
From: Nathan Chancellor <nathan(a)kernel.org>
There are a few different places where pahole's version is turned into a
three digit form with the exact same command. Move this command into
scripts/pahole-version.sh to reduce the amount of duplication across the
tree.
Create CONFIG_PAHOLE_VERSION so the version code can be used in Kconfig
to enable and disable configuration options based on the pahole version,
which is already done in a couple of places.
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Acked-by: Andrii Nakryiko <andrii(a)kernel.org>
Link: https://lore.kernel.org/bpf/20220201205624.652313-3-nathan@kernel.org
Signed-off-by: Matthias Maennich <maennich(a)google.com>
---
MAINTAINERS | 1 +
init/Kconfig | 4 ++++
scripts/pahole-version.sh | 13 +++++++++++++
3 files changed, 18 insertions(+)
create mode 100755 scripts/pahole-version.sh
diff --git a/MAINTAINERS b/MAINTAINERS
index 176485e625a0..ee8a1b5c28a6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3408,6 +3408,7 @@ F: net/sched/cls_bpf.c
F: samples/bpf/
F: scripts/bpf_doc.py
F: scripts/pahole-flags.sh
+F: scripts/pahole-version.sh
F: tools/bpf/
F: tools/lib/bpf/
F: tools/testing/selftests/bpf/
diff --git a/init/Kconfig b/init/Kconfig
index a4144393717b..dafc3ba6fa7a 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -91,6 +91,10 @@ config CC_HAS_ASM_INLINE
config CC_HAS_NO_PROFILE_FN_ATTR
def_bool $(success,echo '__attribute__((no_profile_instrument_function)) int x();' | $(CC) -x c - -c -o /dev/null -Werror)
+config PAHOLE_VERSION
+ int
+ default $(shell,$(srctree)/scripts/pahole-version.sh $(PAHOLE))
+
config CONSTRUCTORS
bool
diff --git a/scripts/pahole-version.sh b/scripts/pahole-version.sh
new file mode 100755
index 000000000000..f8a32ab93ad1
--- /dev/null
+++ b/scripts/pahole-version.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Usage: $ ./pahole-version.sh pahole
+#
+# Prints pahole's version in a 3-digit form, such as 119 for v1.19.
+
+if [ ! -x "$(command -v "$@")" ]; then
+ echo 0
+ exit 1
+fi
+
+"$@" --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/'
--
2.39.2.637.g21b0678d19-goog
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam serdecznie,
Wiktor Nurek
This is the start of the stable review cycle for the 5.10.169 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 22 Feb 2023 13:35:35 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.169-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.169-rc1
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix return value
Dan Carpenter <error27(a)gmail.com>
net: sched: sch: Fix off by one in htb_activate_prios()
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: SOF: Intel: hda-dai: fix possible stream_tag leak
Thomas Gleixner <tglx(a)linutronix.de>
alarmtimer: Prevent starvation by small intervals and SIG_IGN
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
kvm: initialize all of the kvm_debugregs structure before sending it to userspace
Pedro Tammela <pctammela(a)mojatatu.com>
net/sched: tcindex: search key must be 16 bits
Natalia Petrova <n.petrova(a)fintech.ru>
i40e: Add checking for null for nlmsg_find_attr()
Pedro Tammela <pctammela(a)mojatatu.com>
net/sched: act_ctinfo: use percpu stats
Baowen Zheng <baowen.zheng(a)corigine.com>
flow_offload: fill flags to action structure
Matt Roper <matthew.d.roper(a)intel.com>
drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list
Raviteja Goud Talla <ravitejax.goud.talla(a)intel.com>
drm/i915/gen11: Moving WAs to icl_gt_workarounds_init()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix underflow in second superblock position calculations
Guillaume Nault <gnault(a)redhat.com>
ipv6: Fix tcp socket connection with DSCP.
Guillaume Nault <gnault(a)redhat.com>
ipv6: Fix datagram socket connection with DSCP.
Jason Xing <kernelxing(a)tencent.com>
ixgbe: add double of VLAN header when computing the max MTU
Jakub Kicinski <kuba(a)kernel.org>
net: mpls: fix stale pointer if allocation fails during device rename
Cristian Ciocaltea <cristian.ciocaltea(a)collabora.com>
net: stmmac: Restrict warning on disabling DMA store and fwd mode
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix mqprio and XDP ring checking logic
Johannes Zink <j.zink(a)pengutronix.de>
net: stmmac: fix order of dwmac5 FlexPPS parametrization sequence
Hangyu Hua <hbh25y(a)gmail.com>
net: openvswitch: fix possible memory leak in ovs_meter_cmd_set()
Miko Larsson <mikoxyzzz(a)gmail.com>
net/usb: kalmia: Don't pass act_len in usb_bulk_msg error path
Kuniyuki Iwashima <kuniyu(a)amazon.com>
dccp/tcp: Avoid negative sk_forward_alloc by ipv6_pinfo.pktoptions.
Pedro Tammela <pctammela(a)mojatatu.com>
net/sched: tcindex: update imperfect hash filters respecting rcu
Pietro Borrello <borrello(a)diag.uniroma1.it>
sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list
Rafał Miłecki <rafal(a)milecki.pl>
net: bgmac: fix BCM5358 support by setting correct flags
Jason Xing <kernelxing(a)tencent.com>
i40e: add double of VLAN header when computing the max MTU
Jason Xing <kernelxing(a)tencent.com>
ixgbe: allow to increase MTU to 3K with XDP enabled
Andrew Morton <akpm(a)linux-foundation.org>
revert "squashfs: harden sanity check in squashfs_read_xattr_id_table"
Felix Riemann <felix.riemann(a)sma.de>
net: Fix unwanted sign extension in netdev_stats_to_stats64()
Aaron Thompson <dev(a)aaront.org>
Revert "mm: Always release pages to the buddy allocator in memblock_free_late()."
Mike Kravetz <mike.kravetz(a)oracle.com>
hugetlb: check for undefined shift on 32 bit architectures
Munehisa Kamata <kamatam(a)amazon.com>
sched/psi: Fix use-after-free in ep_remove_wait_queue()
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - fixed wrong gpio assigned
Bo Liu <bo.liu(a)senarytech.com>
ALSA: hda/conexant: add a new hda codec SN6180
Yang Yingliang <yangyingliang(a)huawei.com>
mmc: mmc_spi: fix error handling in mmc_spi_probe()
Yang Yingliang <yangyingliang(a)huawei.com>
mmc: sdio: fix possible resource leaks in some error paths
Paul Cercueil <paul(a)crapouillou.net>
mmc: jz4740: Work around bug on JZ4760(B)
Florian Westphal <fw(a)strlen.de>
netfilter: nft_tproxy: restrict to prerouting hook
Amir Goldstein <amir73il(a)gmail.com>
ovl: remove privs in ovl_fallocate()
Amir Goldstein <amir73il(a)gmail.com>
ovl: remove privs in ovl_copyfile()
Sumanth Korikkar <sumanthk(a)linux.ibm.com>
s390/signal: fix endless loop in do_signal
Seth Jenkins <sethjenkins(a)google.com>
aio: fix mremap after fork null-deref
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix registration vs use race
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: fix cleanup after dev_set_name()
Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
nvmem: core: remove nvmem_config wp_gpio
Gaosheng Cui <cuigaosheng1(a)huawei.com>
nvmem: core: add error handling for dev_set_name
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: touchscreen_dmi: Add Chuwi Vi8 (CWI501) DMI match
Amit Engel <Amit.Engel(a)dell.com>
nvme-fc: fix a missing queue put in nvmet_fc_ls_create_association
Vasily Gorbik <gor(a)linux.ibm.com>
s390/decompressor: specify __decompress() buf len to avoid overflow
Kees Cook <keescook(a)chromium.org>
net: sched: sch: Bounds check priority
Andrey Konovalov <andrey.konovalov(a)linaro.org>
net: stmmac: do not stop RX_CLK in Rx LPI state for qcs404 SoC
Hyunwoo Kim <v4bel(a)theori.io>
net/rose: Fix to not accept on connected socket
Shunsuke Mie <mie(a)igel.co.jp>
tools/virtio: fix the vringh test for virtio ring changes
Arnd Bergmann <arnd(a)arndb.de>
ASoC: cs42l56: fix DT probe
Cezary Rojewski <cezary.rojewski(a)intel.com>
ALSA: hda: Do not unset preset when cleaning up codec
Eduard Zingerman <eddyz87(a)gmail.com>
selftests/bpf: Verify copy_register_state() preserves parent/live fields
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: Intel: sof_rt5682: always set dpcm_capture for amplifiers
-------------
Diffstat:
Makefile | 4 +-
arch/s390/boot/compressed/decompressor.c | 2 +-
arch/s390/kernel/signal.c | 2 +-
arch/x86/kvm/x86.c | 3 +-
drivers/gpu/drm/i915/gt/intel_workarounds.c | 32 ++++++++--------
drivers/mmc/core/sdio_bus.c | 17 +++++++--
drivers/mmc/core/sdio_cis.c | 12 ------
drivers/mmc/host/jz4740_mmc.c | 10 +++++
drivers/mmc/host/mmc_spi.c | 8 ++--
drivers/net/ethernet/broadcom/bgmac-bcma.c | 6 +--
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 +++-
drivers/net/ethernet/intel/i40e/i40e_main.c | 4 +-
drivers/net/ethernet/intel/ixgbe/ixgbe.h | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 28 ++++++++------
.../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 2 +
drivers/net/ethernet/stmicro/stmmac/dwmac5.c | 3 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +-
.../net/ethernet/stmicro/stmmac/stmmac_platform.c | 2 +-
drivers/net/usb/kalmia.c | 8 ++--
drivers/nvme/target/fc.c | 4 +-
drivers/nvmem/core.c | 43 +++++++++++-----------
drivers/platform/x86/touchscreen_dmi.c | 9 +++++
fs/aio.c | 4 ++
fs/nilfs2/ioctl.c | 7 ++++
fs/nilfs2/super.c | 9 +++++
fs/nilfs2/the_nilfs.c | 8 +++-
fs/overlayfs/file.c | 28 ++++++++++++--
fs/squashfs/xattr_id.c | 2 +-
include/linux/hugetlb.h | 5 ++-
include/linux/nvmem-provider.h | 2 -
include/linux/stmmac.h | 1 +
include/net/sock.h | 13 +++++++
kernel/sched/psi.c | 7 ++--
kernel/time/alarmtimer.c | 33 +++++++++++++++--
mm/memblock.c | 8 +---
net/core/dev.c | 2 +-
net/dccp/ipv6.c | 7 +---
net/ipv6/datagram.c | 2 +-
net/ipv6/tcp_ipv6.c | 11 ++----
net/mpls/af_mpls.c | 4 ++
net/netfilter/nft_tproxy.c | 8 ++++
net/openvswitch/meter.c | 4 +-
net/rose/af_rose.c | 8 ++++
net/sched/act_bpf.c | 2 +-
net/sched/act_connmark.c | 2 +-
net/sched/act_ctinfo.c | 6 +--
net/sched/act_gate.c | 2 +-
net/sched/act_ife.c | 2 +-
net/sched/act_ipt.c | 2 +-
net/sched/act_mpls.c | 2 +-
net/sched/act_nat.c | 2 +-
net/sched/act_pedit.c | 2 +-
net/sched/act_police.c | 2 +-
net/sched/act_sample.c | 2 +-
net/sched/act_simple.c | 2 +-
net/sched/act_skbedit.c | 2 +-
net/sched/act_skbmod.c | 2 +-
net/sched/cls_tcindex.c | 34 +++++++++++++++--
net/sched/sch_htb.c | 5 ++-
net/sctp/diag.c | 4 +-
sound/pci/hda/hda_bind.c | 2 +
sound/pci/hda/hda_codec.c | 1 -
sound/pci/hda/patch_conexant.c | 1 +
sound/pci/hda/patch_realtek.c | 2 +-
sound/soc/codecs/cs42l56.c | 6 ---
sound/soc/intel/boards/sof_rt5682.c | 3 ++
sound/soc/sof/intel/hda-dai.c | 8 ++--
.../selftests/bpf/verifier/search_pruning.c | 36 ++++++++++++++++++
tools/virtio/linux/bug.h | 8 ++--
tools/virtio/linux/build_bug.h | 7 ++++
tools/virtio/linux/cpumask.h | 7 ++++
tools/virtio/linux/gfp.h | 7 ++++
tools/virtio/linux/kernel.h | 1 +
tools/virtio/linux/kmsan.h | 12 ++++++
tools/virtio/linux/scatterlist.h | 1 +
tools/virtio/linux/topology.h | 7 ++++
76 files changed, 404 insertions(+), 165 deletions(-)
Hi Andrew,
On Tue, 21 Feb 2023 17:03:13 +0800 Andrew Yang <andrew.yang(a)mediatek.com> wrote:
> From: "andrew.yang" <andrew.yang(a)mediatek.com>
>
> damon_get_page() would always increase page _refcount and
> isolate_lru_page() would increase page _refcount if the page's lru
> flag is set.
>
> If a unevictable page isolated successfully, there will be two more
> _refcount. The one from isolate_lru_page() will be decreased in
> putback_lru_page(), but the other one from damon_get_page() will be
> left behind. This causes a pin page.
>
> Whatever the case, the _refcount from damon_get_page() should be
> decreased.
Thank you for finding this issue! I think the David suggested subject[1] is
better, though.
I think we could add below Fixes: and Cc: tags?
Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme")
Cc: <stable(a)vger.kernel.org> # 5.16.x
>
> Signed-off-by: andrew.yang <andrew.yang(a)mediatek.com>
> ---
> mm/damon/paddr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index e1a4315c4be6..56d8abd08fb1 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -223,8 +223,8 @@ static unsigned long damon_pa_pageout(struct damon_region *r)
> putback_lru_page(page);
> } else {
> list_add(&page->lru, &page_list);
> - put_page(page);
> }
> + put_page(page);
Seems your patch is not based on mm-unstable tree[2]. Could you please rebase
on it?
Also, let's remove the braces for the single statements[3].
[1] https://lore.kernel.org/damon/1b3e8e88-ed5c-7302-553f-4ddb3400d466@redhat.c…
[2] https://docs.kernel.org/next/mm/damon/maintainer-profile.html#scm-trees
[3] https://docs.kernel.org/process/coding-style.html?highlight=coding+style#pl…
Thanks,
SJ
> }
> applied = reclaim_pages(&page_list);
> cond_resched();
> --
> 2.18.0
在 2023/2/22 4:26, Seth Jenkins 写道:
> > So you are letting an opaque US government agency, and random third
> > party companies, dictate your company's internal engineering policies
> > and resource allocations? That feels very very odd and ripe for abuse.
> I wholeheartedly agree with gregkh that the CVE system is flawed in
> numerous ways and generates many false positives, however in this case,
> this issue is a real issue with real security impact that received a
> real patch upstream. The idea of backporting this to earlier kernels
> probably warrants discussion if someone is willing to bite off that
> work. But this gets into the fuzzy "backporting exploit mitigations to
> stable" conversation.
>
> > And are you sure this can really happen? Have you proven this?
> Yes this can really, provably, happen:
> https://googleprojectzero.blogspot.com/2022/12/exploiting-CVE-2022-42703-br…
> <https://googleprojectzero.blogspot.com/2022/12/exploiting-CVE-2022-42703-br…>
>
> > And why is this really an issue, KAS[L]R is a known-weak-defense and
> almost
> > useless against local attacks.
> Granted, Peter Z's fix does help to mitigate the impact from remote
> attackers but yes this fix does not resolve the security impact from
> local exploits.
Thank you seth.
Yes, for this specific CVE issue, KASLR is indeed a mitigation measure.
>
>
> On Tue, Feb 21, 2023 at 7:30 AM Tong Tiangen <tongtiangen(a)huawei.com
> <mailto:tongtiangen@huawei.com>> wrote:
>
>
>
> 在 2023/2/21 18:40, Greg KH 写道:
> > On Tue, Feb 21, 2023 at 05:19:42PM +0800, Tong Tiangen wrote:
> >>
> >>
> >> 在 2023/2/21 16:40, Greg KH 写道:
> >>> On Tue, Feb 21, 2023 at 03:46:27PM +0800, Tong Tiangen wrote:
> >>>>
> >>>>
> >>>> 在 2023/2/21 15:30, Greg KH 写道:
> >>>>> On Tue, Feb 21, 2023 at 03:19:05PM +0800, Tong Tiangen wrote:
> >>>>>> Hi peter:
> >>>>>>
> >>>>>> Do you have any plans to backport this patch[1] to the
> stable branch of the
> >>>>>> lower version, such as 4.19.y ?
> >>>>>
> >>>>> Why? That is a new feature for 6.2 why would it be needed to fix
> >>>>> anything in really old kernels?
> >>>>
> >>>> Hi Greg:
> >>>>
> >>>> This patch fix CVE-2023-0597[1],
> >>>
> >>> The kernel developers do not care about CVEs as they are almost
> always
> >>> invalid and do not mean anything,
> >>
> >> Ok, thanks.
> >>
> >>
> >>> sorry. It is well known that, companies like Red Hat use them
> to make
> >>> up for broken internal engineering policies.
> >>
> >> Yeah, For company's internal engineering policies, the CVE with
> certain
> >> impact must be repaired.
> >
> > So you are letting an opaque US government agency, and random third
> > party companies, dictate your company's internal engineering policies
> > and resource allocations? That feels very very odd and ripe for
> abuse.
> >
> > Also note that MITRE refuses to allocate CVEs for many real kernel
> > issues for unknown reasons, (i.e. they reject all of my requests), so
> > you are getting only a small subset of real issues here.
> >
> > Also, how do you handle revocation of CVEs that are obviously invalid
> > and/or don't actually do anything (like this one?)
> >
> >>> Are you sure this really is a valid problem that must be fixed
> in older
> >>> kernels?
> >>>
> >>>> this CVE report a flaw possibility of memory leak. And this is
> >>>> important for some products using this stable version.
> >>>
> >>> What exact memory leak are you referring to?
> >>
> >> Sorry for Inaccurate description, the memory leak means: a potential
> >> security risk of kernel memory information disclosure caused by no
> >> randomization of the exception stacks.
> >
> > And are you sure this can really happen? Have you proven this?
> >
> > And why is this really an issue, KASR is a known-week-defense and
> almost
> > useless against local attacks.
> >
> > Anyway, please provide working patches if you think this really is an
> > issue.
> >
> > And please revisit your company's policies, they do not seem very
> sane :)
>
> Hi Greg:
>
> Thanks for these very useful suggestions and we will revisit our
> policies :)
>
> Thanks,
> Tong
> .
>
> >
> > thanks,
> >
> > greg k-h
> > .
>
Bom dia stable(a)vger.kernel.org,
Sou o advogado Mustafa Ayvaz, advogado pessoal do falecido Sr.
Robert, que perdeu a vida devido ao coronavírus, contraído
durante sua viagem de negócios à China. Entrei em contato com
você para trabalhar comigo na transferência de um fundo:
$4,420,000.00 (quatro milhões, quatrocentos e vinte mil dólares)
legado deixado por ele.
Procurei minuciosamente o parente mais próximo de meu cliente
falecido, mas falhei porque não tenho sua residência atual e
detalhes de contato. Enquanto pesquisava, encontrei seu perfil
com o mesmo sobrenome e na mesma localidade com o parente mais
próximo. Decidi entrar em contato com você e usá-lo como parente
genuíno.
Solicito seu consentimento para apresentá-lo como parente mais
próximo de meu falecido cliente, já que ambos têm o mesmo
sobrenome. Os fundos serão então transferidos para você como
beneficiário e compartilhados de acordo com um padrão/proporção
de compartilhamento proposto de 60:40, ou seja, 60% para mim e
40% para você. Por favor, entre em contato comigo imediatamente
para obter mais informações.
Cumprimentos
Mustafá Ayvaz
The riscv defconfig and tinyconfig builds failed with clang-nightly
due to below build warnings / errors on latest stable-rc 5.10.
Regression on riscv:
- build/clang-nightly-tinyconfig - FAILED
- build/clang-nightly-defconfig - FAILED
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
metadata:
-----
git_repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc
git_sha: 7d11e4c4fc56eb25c5b41da93748dbcf21956316
git_short_log: 7d11e4c4fc56 ("Linux 5.10.169-rc1")
git_describe: v5.10.168-58-g7d11e4c4fc56
Build log:
----
make --silent --keep-going --jobs=8
O=/home/tuxbuild/.cache/tuxmake/builds/1/build ARCH=riscv
CROSS_COMPILE=riscv64-linux-gnu- HOSTCC=clang CC=clang LLVM=1
LLVM_IAS=1 LD=riscv64-linux-gnu-ld
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/version.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/do_mounts.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/noinitramfs.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
riscv64-linux-gnu-ld: failed to merge target specific data of file
init/calibrate.o
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_zicsr2p0_zifencei2p0:
Invalid or unknown z ISA extension: 'zifencei'
Build details,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10…
Build history,
https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10…https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10…
steps to reproduce:
-----------
# To install tuxmake on your system globally:
# sudo pip3 install -U tuxmake
#
# See https://docs.tuxmake.org/ for complete documentation.
# Original tuxmake command with fragments listed below.
# tuxmake --runtime podman --target-arch riscv --toolchain
clang-nightly --kconfig tinyconfig LLVM=1 LLVM_IAS=1
LD=riscv64-linux-gnu-ld
--
Linaro LKFT
https://lkft.linaro.org
Beste gebruiker,
Hallo mijn gelukkige vriend, ik ben Charles W. Jackson Jr., de
megawinnaar van $ 344,6 miljoen Mega Millions Jackpot, ik doneer aan 5
willekeurige individuen, als je deze e-mail ontvangt, is je e-mail
geselecteerd na een draaibal. Ik heb het grootste deel van mijn vermogen
verdeeld over een aantal goede doelen en organisaties. Ik heb vrijwillig
besloten om de som van $ 3 miljoen USD aan jou of je organisatie te
doneren als een van de geselecteerde 5, je kunt mijn winst verifiëren
via de onderstaande YouTube-pagina.
BEKIJK MIJ HIER: https://www.youtube.com/watch?v=0MUR8QEIMQI
Deze donatie van $ 3 miljoen USD is gedaan om u in staat te stellen uw
persoonlijke problemen te versterken en genereus de hand te reiken aan
de minst bevoorrechte, verweesde en liefdadigheidsorganisaties in uw
gemeenschap.
Wat voor mij het belangrijkste is, is dat u de gedoneerde fondsen op de
beste manier toewijst die u, uw familie, vrienden ten goede komt en om
de behoeftigen in uw directe gemeenschap te helpen.
DIT IS JE DONATIECODE: DON207152
Stuur uw DONATIECODE zo snel mogelijk naar onderstaand e-mailadres,
zodat we de donatieprocedure snel kunnen afronden.
E-mail contact: charlesjacksonj1(a)hotmail.com
charlesjackson06(a)bahnhof.se
Ik hoop u en uw gezin dit jaar gelukkig te maken, ik wens u het beste en
gefeliciteerd.
Groeten,
Charles W.Jackson Jr
Summary:
--------
CXL RAM support allows for the dynamic provisioning of new CXL RAM
regions, and more routinely, assembling a region from an existing
configuration established by platform-firmware. The latter is motivated
by CXL memory RAS (Reliability, Availability and Serviceability)
support, that requires associating device events with System Physical
Address ranges and vice versa.
The 'Soft Reserved' policy rework arranges for performance
differentiated memory like CXL attached DRAM, or high-bandwidth memory,
to be designated for 'System RAM' by default, rather than the device-dax
dedicated access mode. That current device-dax default is confusing and
surprising for the Pareto of users that do not expect memory to be
quarantined for dedicated access by default. Most users expect all
'System RAM'-capable memory to show up in FREE(1).
Details:
--------
Recall that the Linux 'Soft Reserved' designation for memory is a
reaction to platform-firmware, like EFI EDK2, delineating memory with
the EFI Specific Purpose Memory attribute (EFI_MEMORY_SP). An
alternative way to think of that attribute is that it specifies the
*not* general-purpose memory pool. It is memory that may be too precious
for general usage or not performant enough for some hot data structures.
However, in the absence of explicit policy it should just be 'System
RAM' by default.
Rather than require every distribution to ship a udev policy to assign
dax devices to dax_kmem (the device-memory hotplug driver) just make
that the kernel default. This is similar to the rationale in:
commit 8604d9e534a3 ("memory_hotplug: introduce CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE")
With this change the relatively niche use case of accessing this memory
via mapping a device-dax instance can be achieved by building with
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n, or specifying
memhp_default_state=offline at boot, and then use:
daxctl reconfigure-device $device -m devdax --force
...to shift the corresponding address range to device-dax access.
The process of assembling a device-dax instance for a given CXL region
device configuration is similar to the process of assembling a
Device-Mapper or MDRAID storage-device array. Specifically, asynchronous
probing by the PCI and driver core enumerates all CXL endpoints and
their decoders. Then, once enough decoders have arrived to a describe a
given region, that region is passed to the device-dax subsystem where it
is subject to the above 'dax_kmem' policy. This assignment and policy
choice is only possible if memory is set aside by the 'Soft Reserved'
designation. Otherwise, CXL that is mapped as 'System RAM' becomes
immutable by CXL driver mechanisms, but is still enumerated for RAS
purposes.
This series is also available via:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/…
...and has gone through some preview testing in various forms.
---
Dan Williams (18):
cxl/Documentation: Update references to attributes added in v6.0
cxl/region: Add a mode attribute for regions
cxl/region: Support empty uuids for non-pmem regions
cxl/region: Validate region mode vs decoder mode
cxl/region: Add volatile region creation support
cxl/region: Refactor attach_target() for autodiscovery
cxl/region: Move region-position validation to a helper
kernel/range: Uplevel the cxl subsystem's range_contains() helper
cxl/region: Enable CONFIG_CXL_REGION to be toggled
cxl/region: Fix passthrough-decoder detection
cxl/region: Add region autodiscovery
tools/testing/cxl: Define a fixed volatile configuration to parse
dax/hmem: Move HMAT and Soft reservation probe initcall level
dax/hmem: Drop unnecessary dax_hmem_remove()
dax/hmem: Convey the dax range via memregion_info()
dax/hmem: Move hmem device registration to dax_hmem.ko
dax: Assign RAM regions to memory-hotplug by default
cxl/dax: Create dax devices for CXL RAM regions
Documentation/ABI/testing/sysfs-bus-cxl | 64 +-
MAINTAINERS | 1
drivers/acpi/numa/hmat.c | 4
drivers/cxl/Kconfig | 12
drivers/cxl/acpi.c | 3
drivers/cxl/core/core.h | 7
drivers/cxl/core/hdm.c | 8
drivers/cxl/core/pci.c | 5
drivers/cxl/core/port.c | 34 +
drivers/cxl/core/region.c | 848 ++++++++++++++++++++++++++++---
drivers/cxl/cxl.h | 46 ++
drivers/cxl/cxlmem.h | 3
drivers/cxl/port.c | 26 +
drivers/dax/Kconfig | 17 +
drivers/dax/Makefile | 2
drivers/dax/bus.c | 53 +-
drivers/dax/bus.h | 12
drivers/dax/cxl.c | 53 ++
drivers/dax/device.c | 3
drivers/dax/hmem/Makefile | 3
drivers/dax/hmem/device.c | 102 ++--
drivers/dax/hmem/hmem.c | 148 +++++
drivers/dax/kmem.c | 1
include/linux/dax.h | 7
include/linux/memregion.h | 2
include/linux/range.h | 5
lib/stackinit_kunit.c | 6
tools/testing/cxl/test/cxl.c | 146 +++++
28 files changed, 1355 insertions(+), 266 deletions(-)
create mode 100644 drivers/dax/cxl.c
base-commit: 172738bbccdb4ef76bdd72fc72a315c741c39161
A recent commit moved enabling of runtime PM from adreno_gpu_init() to
adreno_load_gpu() (called on first open()), which means that unbind()
may now be called with runtime PM disabled in case the device was never
opened in between.
Make sure to only forcibly suspend and disable runtime PM at unbind() in
case runtime PM has been enabled to prevent a disable count imbalance.
This specifically avoids leaving runtime PM disabled when the device
is later opened after a successful bind:
msm_dpu ae01000.display-controller: [drm:adreno_load_gpu [msm]] *ERROR* Couldn't power up the GPU: -13
Fixes: 4b18299b3365 ("drm/msm/adreno: Defer enabling runpm until hw_init()")
Reported-by: Bjorn Andersson <quic_bjorande(a)quicinc.com>
Link: https://lore.kernel.org/lkml/20230203181245.3523937-1-quic_bjorande@quicinc…
Cc: stable(a)vger.kernel.org # 6.0
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/gpu/drm/msm/adreno/adreno_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c
index 36f062c7582f..c5c4c93b3689 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_device.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_device.c
@@ -558,7 +558,8 @@ static void adreno_unbind(struct device *dev, struct device *master,
struct msm_drm_private *priv = dev_get_drvdata(master);
struct msm_gpu *gpu = dev_to_gpu(dev);
- WARN_ON_ONCE(adreno_system_suspend(dev));
+ if (pm_runtime_enabled(dev))
+ WARN_ON_ONCE(adreno_system_suspend(dev));
gpu->funcs->destroy(gpu);
priv->gpu_pdev = NULL;
--
2.39.2
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 0af2795f936f1ea1f9f1497447145dfcc7ed2823
Gitweb: https://git.kernel.org/tip/0af2795f936f1ea1f9f1497447145dfcc7ed2823
Author: Marc Zyngier <maz(a)kernel.org>
AuthorDate: Mon, 20 Feb 2023 19:01:01
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 20 Feb 2023 22:29:54 +01:00
genirq/msi: Take the per-device MSI lock before validating the control structure
Calling msi_ctrl_valid() ultimately results in calling
msi_get_device_domain(), which requires holding the device MSI lock.
However, in msi_domain_populate_irqs() the lock is taken right after having
called msi_ctrl_valid(), which is just a tad too late.
Take the lock before invoking msi_ctrl_valid().
Fixes: 40742716f294 ("genirq/msi: Make msi_add_simple_msi_descs() device domain aware")
Reported-by: "Russell King (Oracle)" <linux(a)armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/Y/Opu6ETe3ZzZ/8E@shell.armlinux.org.uk
Link: https://lore.kernel.org/r/20230220190101.314446-1-maz@kernel.org
---
kernel/irq/msi.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 783a3e6..13d9649 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1084,10 +1084,13 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
struct xarray *xa;
int ret, virq;
- if (!msi_ctrl_valid(dev, &ctrl))
- return -EINVAL;
-
msi_lock_descs(dev);
+
+ if (!msi_ctrl_valid(dev, &ctrl)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
ret = msi_domain_add_simple_msi_descs(dev, &ctrl);
if (ret)
goto unlock;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
c06ba7b892a5 ("nvme-apple: reset controller during shutdown")
285b6e9b5717 ("nvme: merge nvme_shutdown_ctrl into nvme_disable_ctrl")
e6d275de2e4a ("nvme: use nvme_wait_ready in nvme_shutdown_ctrl")
c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable")
9f27bd701d18 ("nvme: rename the queue quiescing helpers")
91c11d5f3254 ("nvme-rdma: stop auth work after tearing down queues in error recovery")
1f1a4f89562d ("nvme-tcp: stop auth work after tearing down queues in error recovery")
eac3ef262941 ("nvme-pci: split the initial probe from the rest path")
a6ee7f19ebfd ("nvme-pci: call nvme_pci_configure_admin_queue from nvme_pci_enable")
3f30a79c2e2c ("nvme-pci: set constant paramters in nvme_pci_alloc_ctrl")
2e87570be9d2 ("nvme-pci: factor out a nvme_pci_alloc_dev helper")
081a7d958ce4 ("nvme-pci: factor the iod mempool creation into a helper")
94cc781f69f4 ("nvme: move OPAL setup from PCIe to core")
cd50f9b24726 ("nvme: split nvme_kill_queues")
6bcd5089ee13 ("nvme: don't unquiesce the admin queue in nvme_kill_queues")
0ffc7e98bfaa ("nvme-pci: refactor the tagset handling in nvme_reset_work")
71b26083d59c ("block: set the disk capacity to 0 in blk_mark_disk_dead")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c06ba7b892a50b48522ad441a40053f483dfee9e Mon Sep 17 00:00:00 2001
From: Janne Grunau <j(a)jannau.net>
Date: Tue, 17 Jan 2023 19:25:00 +0100
Subject: [PATCH] nvme-apple: reset controller during shutdown
This is a functional revert of c76b8308e4c9 ("nvme-apple: fix controller
shutdown in apple_nvme_disable").
The commit broke suspend/resume since apple_nvme_reset_work() tries to
disable the controller on resume. This does not work for the apple NVMe
controller since register access only works while the co-processor
firmware is running.
Disabling the NVMe controller in the shutdown path is also required
for shutting the co-processor down. The original code was appropriate
for this hardware. Add a comment to prevent a similar breaking changes
in the future.
Fixes: c76b8308e4c9 ("nvme-apple: fix controller shutdown in apple_nvme_disable")
Reported-by: Janne Grunau <j(a)jannau.net>
Link: https://lore.kernel.org/all/20230110174745.GA3576@jannau.net/
Signed-off-by: Janne Grunau <j(a)jannau.net>
[hch: updated with a more descriptive comment from Hector Martin]
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index bf1c60edb7f9..146c9e63ce77 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -829,7 +829,23 @@ static void apple_nvme_disable(struct apple_nvme *anv, bool shutdown)
apple_nvme_remove_cq(anv);
}
- nvme_disable_ctrl(&anv->ctrl, shutdown);
+ /*
+ * Always disable the NVMe controller after shutdown.
+ * We need to do this to bring it back up later anyway, and we
+ * can't do it while the firmware is not running (e.g. in the
+ * resume reset path before RTKit is initialized), so for Apple
+ * controllers it makes sense to unconditionally do it here.
+ * Additionally, this sequence of events is reliable, while
+ * others (like disabling after bringing back the firmware on
+ * resume) seem to run into trouble under some circumstances.
+ *
+ * Both U-Boot and m1n1 also use this convention (i.e. an ANS
+ * NVMe controller is handed off with firmware shut down, in an
+ * NVMe disabled state, after a clean shutdown).
+ */
+ if (shutdown)
+ nvme_disable_ctrl(&anv->ctrl, shutdown);
+ nvme_disable_ctrl(&anv->ctrl, false);
}
WRITE_ONCE(anv->ioq.enabled, false);
Current btrfs_log_dev_io_error() increases the read error count even if the
erroneous IO is a WRITE request. This is because it forget to use "else
if", and all the error WRITE requests counts as READ error as there is (of
course) no REQ_RAHEAD bit set.
Fixes: c3a62baf21ad ("btrfs: use chained bios when cloning")
CC: stable(a)vger.kernel.org # 6.1
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
---
fs/btrfs/bio.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c
index d8b90f95b157..726592868e9c 100644
--- a/fs/btrfs/bio.c
+++ b/fs/btrfs/bio.c
@@ -287,7 +287,7 @@ static void btrfs_log_dev_io_error(struct bio *bio, struct btrfs_device *dev)
if (btrfs_op(bio) == BTRFS_MAP_WRITE)
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
- if (!(bio->bi_opf & REQ_RAHEAD))
+ else if (!(bio->bi_opf & REQ_RAHEAD))
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_READ_ERRS);
if (bio->bi_opf & REQ_PREFLUSH)
btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_FLUSH_ERRS);
--
2.39.1
Calling msi_ctrl_valid() ultimately results in calling
msi_get_device_domain(), which requires holding the device MSI lock.
However, we take that lock right after having called msi_ctrl_valid(),
which is just a tad too late. Taking the lock earlier solves the issue.
Fixes: 40742716f294 ("genirq/msi: Make msi_add_simple_msi_descs() device domain aware")
Reported-by: "Russell King (Oracle)" <linux(a)armlinux.org.uk>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/Y/Opu6ETe3ZzZ/8E@shell.armlinux.org.uk
---
kernel/irq/msi.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 783a3e6a0b10..13d96495e6d0 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -1084,10 +1084,13 @@ int msi_domain_populate_irqs(struct irq_domain *domain, struct device *dev,
struct xarray *xa;
int ret, virq;
- if (!msi_ctrl_valid(dev, &ctrl))
- return -EINVAL;
-
msi_lock_descs(dev);
+
+ if (!msi_ctrl_valid(dev, &ctrl)) {
+ ret = -EINVAL;
+ goto unlock;
+ }
+
ret = msi_domain_add_simple_msi_descs(dev, &ctrl);
if (ret)
goto unlock;
--
2.34.1
From: Yu Kuai <yukuai3(a)huawei.com>
commit 940c264984fd1457918393c49674f6b39ee16506 upstream.
If 'part_shift' is not zero, then 'index << part_shift' might
overflow to a value that is not greater than '0xfffff', then sysfs
might complains about duplicate creation.
Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices")
Signed-off-by: Yu Kuai <yukuai3(a)huawei.com>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Link: https://lore.kernel.org/r/20211102015237.2309763-3-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Wen Yang <wenyang.linux(a)foxmail.com>
---
drivers/block/nbd.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index bd05eaf04143..1ddae8f768d5 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1773,11 +1773,11 @@ static int nbd_dev_add(int index)
disk->major = NBD_MAJOR;
/* Too big first_minor can cause duplicate creation of
- * sysfs files/links, since MKDEV() expect that the max bits of
- * first_minor is 20.
+ * sysfs files/links, since index << part_shift might overflow, or
+ * MKDEV() expect that the max bits of first_minor is 20.
*/
disk->first_minor = index << part_shift;
- if (disk->first_minor > MINORMASK) {
+ if (disk->first_minor < index || disk->first_minor > MINORMASK) {
err = -EINVAL;
goto out_free_idr;
}
--
2.37.2
From: Wen Yang <wenyang.linux(a)foxmail.com>
This reverts commit 0daa75bf750c400af0a0127fae37cd959d36dee7.
These problems such as:
https://lore.kernel.org/all/CACPK8XfUWoOHr-0RwRoYoskia4fbAbZ7DYf5wWBnv6qUnG…
It was introduced by introduced by commit b1a811633f73 ("block: nbd: add sanity check for first_minor")
and has been have been fixed by commit e4c4871a7394 ("nbd: fix max value for 'first_minor'").
Cc: Joel Stanley <joel(a)jms.id.au>
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Pavel Skripkin <paskripkin(a)gmail.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Wen Yang <wenyang.linux(a)foxmail.com>
---
drivers/block/nbd.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index b0d3dadeb964..bf8148ebd858 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1771,7 +1771,17 @@ static int nbd_dev_add(int index)
refcount_set(&nbd->refs, 1);
INIT_LIST_HEAD(&nbd->list);
disk->major = NBD_MAJOR;
+
+ /* Too big first_minor can cause duplicate creation of
+ * sysfs files/links, since first_minor will be truncated to
+ * byte in __device_add_disk().
+ */
disk->first_minor = index << part_shift;
+ if (disk->first_minor > 0xff) {
+ err = -EINVAL;
+ goto out_free_idr;
+ }
+
disk->fops = &nbd_fops;
disk->private_data = nbd;
sprintf(disk->disk_name, "nbd%d", index);
--
2.37.2
With the introduction of KERNEL_IBRS, STIBP is no longer needed
to prevent cross thread training in the kernel space. When KERNEL_IBRS
was added, it also disabled the user-mode protections for spectre_v2.
KERNEL_IBRS does not mitigate cross thread training in the userspace.
In order to demonstrate the issue, one needs to avoid syscalls in the
victim as syscalls can shorten the window size due to
a user -> kernel -> user transition which sets the
IBRS bit when entering kernel space and clearing any training the
attacker may have done.
Allow users to select a spectre_v2_user mitigation (STIBP always on,
opt-in via prctl) when KERNEL_IBRS is enabled.
Reported-by: José Oliveira <joseloliveira11(a)gmail.com>
Reported-by: Rodrigo Branco <rodrigo(a)kernelhacking.com>
Reviewed-by: Alexandra Sandulescu <aesa(a)google.com>
Reviewed-by: Jim Mattson <jmattson(a)google.com>
Fixes: 7c693f54c873 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Cc: stable(a)vger.kernel.org
Signed-off-by: KP Singh <kpsingh(a)kernel.org>
---
arch/x86/kernel/cpu/bugs.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index bca0bd8f4846..b05ca1575d81 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1132,6 +1132,19 @@ static inline bool spectre_v2_in_ibrs_mode(enum spectre_v2_mitigation mode)
mode == SPECTRE_V2_EIBRS_LFENCE;
}
+static inline bool spectre_v2_user_no_stibp(enum spectre_v2_mitigation mode)
+{
+ /* When IBRS or enhanced IBRS is enabled, STIBP is not needed.
+ *
+ * However, With KERNEL_IBRS, the IBRS bit is cleared on return
+ * to user and the user-mode code needs to be able to enable protection
+ * from cross-thread training, either by always enabling STIBP or
+ * by enabling it via prctl.
+ */
+ return (spectre_v2_in_ibrs_mode(mode) &&
+ !cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS));
+}
+
static void __init
spectre_v2_user_select_mitigation(void)
{
@@ -1193,13 +1206,8 @@ spectre_v2_user_select_mitigation(void)
"always-on" : "conditional");
}
- /*
- * If no STIBP, IBRS or enhanced IBRS is enabled, or SMT impossible,
- * STIBP is not required.
- */
- if (!boot_cpu_has(X86_FEATURE_STIBP) ||
- !smt_possible ||
- spectre_v2_in_ibrs_mode(spectre_v2_enabled))
+ if (!boot_cpu_has(X86_FEATURE_STIBP) || !smt_possible ||
+ spectre_v2_user_no_stibp(spectre_v2_enabled))
return;
/*
@@ -1496,6 +1504,7 @@ static void __init spectre_v2_select_mitigation(void)
break;
case SPECTRE_V2_IBRS:
+ pr_err("enabling KERNEL_IBRS");
setup_force_cpu_cap(X86_FEATURE_KERNEL_IBRS);
if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED))
pr_warn(SPECTRE_V2_IBRS_PERF_MSG);
@@ -2327,7 +2336,7 @@ static ssize_t mmio_stale_data_show_state(char *buf)
static char *stibp_state(void)
{
- if (spectre_v2_in_ibrs_mode(spectre_v2_enabled))
+ if (spectre_v2_user_no_stibp(spectre_v2_enabled))
return "";
switch (spectre_v2_user_stibp) {
--
2.39.2.637.g21b0678d19-goog
This is a request for adding the following patches to stable 5.10.y.
Poisoned shmem and hugetlb pages are removed from the pagecache.
Subsequent access to the offset in the file results in a NEW zero
filled page. Application code does not get notified of the data
loss, and the only 'clue' is a message in the system log. Data
loss has been experienced by real users.
This was addressed upstream. Most commits were marked for backports,
but some were not. This was discussed here [1] and here [2].
Patches apply cleanly to v5.4.224 and pass tests checking for this
specific data loss issue. LTP mm tests show no regressions.
All patches except 4 "mm: hwpoison: handle non-anonymous THP correctly"
required a small bit of change to apply correctly: mostly for context.
linux-mm Cc'ed as it would be great to get at least an ACK from others
familiar with this issue.
[1] https://lore.kernel.org/linux-mm/Y2UTUNBHVY5U9si2@monkey/
[2] https://lore.kernel.org/stable/20221114131403.GA3807058@u2004/
James Houghton (1):
hugetlbfs: don't delete error page from pagecache
Yang Shi (5):
mm: hwpoison: remove the unnecessary THP check
mm: filemap: check if THP has hwpoisoned subpage for PMD page fault
mm: hwpoison: refactor refcount check handling
mm: hwpoison: handle non-anonymous THP correctly
mm: shmem: don't truncate page if memory failure happens
fs/hugetlbfs/inode.c | 13 ++--
include/linux/page-flags.h | 23 ++++++
mm/huge_memory.c | 2 +
mm/hugetlb.c | 4 +
mm/memory-failure.c | 153 ++++++++++++++++++++++++-------------
mm/memory.c | 9 +++
mm/page_alloc.c | 4 +-
mm/shmem.c | 51 +++++++++++--
8 files changed, 191 insertions(+), 68 deletions(-)
--
2.38.1
[ Upstream commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d ]
There were just too many code changes since 5.4 stable to prevent clean
picking the fix from mainline.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Cc: stable(a)vger.kernel.org # 5.4.x
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index b3cdd10c83ae..c30e450fa970 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -211,6 +211,10 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
int stream_tag;
int ret;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
+ if (!link)
+ return -EINVAL;
+
/* get stored dma data if resuming from system suspend */
link_dev = snd_soc_dai_get_dma_data(dai, substream);
if (!link_dev) {
@@ -231,10 +235,6 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
if (ret < 0)
return ret;
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
- if (!link)
- return -EINVAL;
-
/* set the stream tag in the codec dai dma params */
if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK)
snd_soc_dai_set_tdm_slot(codec_dai, stream_tag, 0, 0, 0);
--
2.39.2
[ Upstream commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d ]
There were just too many code changes since 5.10/5.15 stable to prevent
clean picking the fix from mainline.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Cc: stable(a)vger.kernel.org # 5.15.x 5.10.x
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index de80f1b3d7f2..a6275cc92a40 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -212,6 +212,10 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
int stream_tag;
int ret;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
+ if (!link)
+ return -EINVAL;
+
/* get stored dma data if resuming from system suspend */
link_dev = snd_soc_dai_get_dma_data(dai, substream);
if (!link_dev) {
@@ -232,10 +236,6 @@ static int hda_link_hw_params(struct snd_pcm_substream *substream,
if (ret < 0)
return ret;
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
- if (!link)
- return -EINVAL;
-
/* set the hdac_stream in the codec dai */
snd_soc_dai_set_stream(codec_dai, hdac_stream(link_dev), substream->stream);
--
2.39.2
[ Upstream commit 1f810d2b6b2fbdc5279644d8b2c140b1f7c9d43d ]
The reason for manual backport is that in 6.2 there were a naming cleanup
around hdac, link, hlink, for example snd_hdac_ext_bus_get_link is renamed
to snd_hdac_ext_bus_get_hlink_by_name in 6.2.
The HDaudio stream allocation is done first, and in a second step the
LOSIDV parameter is programmed for the multi-link used by a codec.
This leads to a possible stream_tag leak, e.g. if a DisplayAudio link
is not used. This would happen when a non-Intel graphics card is used
and userspace unconditionally uses the Intel Display Audio PCMs without
checking if they are connected to a receiver with jack controls.
We should first check that there is a valid multi-link entry to
configure before allocating a stream_tag. This change aligns the
dma_assign and dma_cleanup phases.
Cc: stable(a)vger.kernel.org # 6.1.x 6.0.x
Complements: b0cd60f3e9f5 ("ALSA/ASoC: hda: clarify bus_get_link() and bus_link_get() helpers")
Link: https://github.com/thesofproject/linux/issues/4151
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan(a)linux.intel.com>
Reviewed-by: Rander Wang <rander.wang(a)intel.com>
Reviewed-by: Bard Liao <yung-chuan.liao(a)linux.intel.com>
Signed-off-by: Peter Ujfalusi <peter.ujfalusi(a)linux.intel.com>
Link: https://lore.kernel.org/r/20230216162340.19480-1-peter.ujfalusi@linux.intel…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
sound/soc/sof/intel/hda-dai.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/sound/soc/sof/intel/hda-dai.c b/sound/soc/sof/intel/hda-dai.c
index 556e883a32ed..5f03ee390d54 100644
--- a/sound/soc/sof/intel/hda-dai.c
+++ b/sound/soc/sof/intel/hda-dai.c
@@ -216,6 +216,10 @@ static int hda_link_dma_hw_params(struct snd_pcm_substream *substream,
struct hdac_bus *bus = hstream->bus;
struct hdac_ext_link *link;
+ link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
+ if (!link)
+ return -EINVAL;
+
hext_stream = snd_soc_dai_get_dma_data(cpu_dai, substream);
if (!hext_stream) {
hext_stream = hda_link_stream_assign(bus, substream);
@@ -225,10 +229,6 @@ static int hda_link_dma_hw_params(struct snd_pcm_substream *substream,
snd_soc_dai_set_dma_data(cpu_dai, substream, (void *)hext_stream);
}
- link = snd_hdac_ext_bus_get_link(bus, codec_dai->component->name);
- if (!link)
- return -EINVAL;
-
/* set the hdac_stream in the codec dai */
snd_soc_dai_set_stream(codec_dai, hdac_stream(hext_stream), substream->stream);
--
2.39.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d125d1349abe ("alarmtimer: Prevent starvation by small intervals and SIG_IGN")
41b3b8dffc1f ("alarmtimer: Rename gettime() callback to get_ktime()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d125d1349abeb46945dc5e98f7824bf688266f13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 9 Feb 2023 23:25:49 +0100
Subject: [PATCH] alarmtimer: Prevent starvation by small intervals and SIG_IGN
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b(a)syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828b9d7e..7e5dff602585 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -470,11 +470,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
}
EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
+ ktime_t now = base->get_ktime();
+
+ if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
+ /*
+ * Same issue as with posix_timer_fn(). Timers which are
+ * periodic but the signal is ignored can starve the system
+ * with a very small interval. The real fix which was
+ * promised in the context of posix_timer_fn() never
+ * materialized, but someone should really work on it.
+ *
+ * To prevent DOS fake @now to be 1 jiffie out which keeps
+ * the overrun accounting correct but creates an
+ * inconsistency vs. timer_gettime(2).
+ */
+ ktime_t kj = NSEC_PER_SEC / HZ;
+
+ if (interval < kj)
+ now = ktime_add(now, kj);
+ }
+
+ return alarm_forward(alarm, now, interval);
+}
- return alarm_forward(alarm, base->get_ktime(), interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+ return __alarm_forward_now(alarm, interval, false);
}
EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -551,9 +575,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (posix_timer_event(ptr, si_private) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
- * away once we handle ignored signals proper.
+ * away once we handle ignored signals proper. Ensure that
+ * small intervals cannot starve the system.
*/
- ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval);
+ ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d125d1349abe ("alarmtimer: Prevent starvation by small intervals and SIG_IGN")
41b3b8dffc1f ("alarmtimer: Rename gettime() callback to get_ktime()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d125d1349abeb46945dc5e98f7824bf688266f13 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Thu, 9 Feb 2023 23:25:49 +0100
Subject: [PATCH] alarmtimer: Prevent starvation by small intervals and SIG_IGN
syzbot reported a RCU stall which is caused by setting up an alarmtimer
with a very small interval and ignoring the signal. The reproducer arms the
alarm timer with a relative expiry of 8ns and an interval of 9ns. Not a
problem per se, but that's an issue when the signal is ignored because then
the timer is immediately rearmed because there is no way to delay that
rearming to the signal delivery path. See posix_timer_fn() and commit
58229a189942 ("posix-timers: Prevent softirq starvation by small intervals
and SIG_IGN") for details.
The reproducer does not set SIG_IGN explicitely, but it sets up the timers
signal with SIGCONT. That has the same effect as explicitely setting
SIG_IGN for a signal as SIGCONT is ignored if there is no handler set and
the task is not ptraced.
The log clearly shows that:
[pid 5102] --- SIGCONT {si_signo=SIGCONT, si_code=SI_TIMER, si_timerid=0, si_overrun=316014, si_int=0, si_ptr=NULL} ---
It works because the tasks are traced and therefore the signal is queued so
the tracer can see it, which delays the restart of the timer to the signal
delivery path. But then the tracer is killed:
[pid 5087] kill(-5102, SIGKILL <unfinished ...>
...
./strace-static-x86_64: Process 5107 detached
and after it's gone the stall can be observed:
syzkaller login: [ 79.439102][ C0] hrtimer: interrupt took 68471 ns
[ 184.460538][ C1] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
...
[ 184.658237][ C1] rcu: Stack dump where RCU GP kthread last ran:
[ 184.664574][ C1] Sending NMI from CPU 1 to CPUs 0:
[ 184.669821][ C0] NMI backtrace for cpu 0
[ 184.669831][ C0] CPU: 0 PID: 5108 Comm: syz-executor192 Not tainted 6.2.0-rc6-next-20230203-syzkaller #0
...
[ 184.670036][ C0] Call Trace:
[ 184.670041][ C0] <IRQ>
[ 184.670045][ C0] alarmtimer_fired+0x327/0x670
posix_timer_fn() prevents that by checking whether the interval for
timers which have the signal ignored is smaller than a jiffie and
artifically delay it by shifting the next expiry out by a jiffie. That's
accurate vs. the overrun accounting, but slightly inaccurate
vs. timer_gettimer(2).
The comment in that function says what needs to be done and there was a fix
available for the regular userspace induced SIG_IGN mechanism, but that did
not work due to the implicit ignore for SIGCONT and similar signals. This
needs to be worked on, but for now the only available workaround is to do
exactly what posix_timer_fn() does:
Increase the interval of self-rearming timers, which have their signal
ignored, to at least a jiffie.
Interestingly this has been fixed before via commit ff86bf0c65f1
("alarmtimer: Rate limit periodic intervals") already, but that fix got
lost in a later rework.
Reported-by: syzbot+b9564ba6e8e00694511b(a)syzkaller.appspotmail.com
Fixes: f2c45807d399 ("alarmtimer: Switch over to generic set/get/rearm routine")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Acked-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87k00q1no2.ffs@tglx
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828b9d7e..7e5dff602585 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -470,11 +470,35 @@ u64 alarm_forward(struct alarm *alarm, ktime_t now, ktime_t interval)
}
EXPORT_SYMBOL_GPL(alarm_forward);
-u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+static u64 __alarm_forward_now(struct alarm *alarm, ktime_t interval, bool throttle)
{
struct alarm_base *base = &alarm_bases[alarm->type];
+ ktime_t now = base->get_ktime();
+
+ if (IS_ENABLED(CONFIG_HIGH_RES_TIMERS) && throttle) {
+ /*
+ * Same issue as with posix_timer_fn(). Timers which are
+ * periodic but the signal is ignored can starve the system
+ * with a very small interval. The real fix which was
+ * promised in the context of posix_timer_fn() never
+ * materialized, but someone should really work on it.
+ *
+ * To prevent DOS fake @now to be 1 jiffie out which keeps
+ * the overrun accounting correct but creates an
+ * inconsistency vs. timer_gettime(2).
+ */
+ ktime_t kj = NSEC_PER_SEC / HZ;
+
+ if (interval < kj)
+ now = ktime_add(now, kj);
+ }
+
+ return alarm_forward(alarm, now, interval);
+}
- return alarm_forward(alarm, base->get_ktime(), interval);
+u64 alarm_forward_now(struct alarm *alarm, ktime_t interval)
+{
+ return __alarm_forward_now(alarm, interval, false);
}
EXPORT_SYMBOL_GPL(alarm_forward_now);
@@ -551,9 +575,10 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (posix_timer_event(ptr, si_private) && ptr->it_interval) {
/*
* Handle ignored signals and rearm the timer. This will go
- * away once we handle ignored signals proper.
+ * away once we handle ignored signals proper. Ensure that
+ * small intervals cannot starve the system.
*/
- ptr->it_overrun += alarm_forward_now(alarm, ptr->it_interval);
+ ptr->it_overrun += __alarm_forward_now(alarm, ptr->it_interval, true);
++ptr->it_requeue_pending;
ptr->it_active = 1;
result = ALARMTIMER_RESTART;
While checking Pin Assignments of the port and partner during probe, we
don't take into account whether the peripheral is a plug or receptacle.
This manifests itself in a mode entry failure on certain docks and
dongles with captive cables. For instance, the Startech.com Type-C to DP
dongle (Model #CDP2DP) advertises its DP VDO as 0x405. This would fail
the Pin Assignment compatibility check, despite it supporting
Pin Assignment C as a UFP.
Update the check to use the correct DP Pin Assign macros that
take the peripheral's receptacle bit into account.
Fixes: c1e5c2f0cb8a ("usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles")
Cc: stable(a)vger.kernel.org
Reported-by: Diana Zigterman <dzigterman(a)chromium.org>
Signed-off-by: Prashant Malani <pmalani(a)chromium.org>
---
I realize this is a bit late in the release cycle, but figured since it
is a fix it might still be considered. Please let me know if it's too
late and I can re-send this after the 6.3-rc1 is released. Thanks!
drivers/usb/typec/altmodes/displayport.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/altmodes/displayport.c b/drivers/usb/typec/altmodes/displayport.c
index 20db51471c98..662cd043b50e 100644
--- a/drivers/usb/typec/altmodes/displayport.c
+++ b/drivers/usb/typec/altmodes/displayport.c
@@ -547,10 +547,10 @@ int dp_altmode_probe(struct typec_altmode *alt)
/* FIXME: Port can only be DFP_U. */
/* Make sure we have compatiple pin configurations */
- if (!(DP_CAP_DFP_D_PIN_ASSIGN(port->vdo) &
- DP_CAP_UFP_D_PIN_ASSIGN(alt->vdo)) &&
- !(DP_CAP_UFP_D_PIN_ASSIGN(port->vdo) &
- DP_CAP_DFP_D_PIN_ASSIGN(alt->vdo)))
+ if (!(DP_CAP_PIN_ASSIGN_DFP_D(port->vdo) &
+ DP_CAP_PIN_ASSIGN_UFP_D(alt->vdo)) &&
+ !(DP_CAP_PIN_ASSIGN_UFP_D(port->vdo) &
+ DP_CAP_PIN_ASSIGN_DFP_D(alt->vdo)))
return -ENODEV;
ret = sysfs_create_group(&alt->dev.kobj, &dp_altmode_group);
--
2.39.1.519.gcb327c4b5f-goog
On default driver load device gets configured with unexpected
higher interrupt coalescing values instead of default expected
values as memory allocated from krealloc() is not supposed to
be zeroed out and may contain garbage values.
Fix this by allocating the memory of required size first with
kcalloc() and then use krealloc() to resize and preserve the
contents across down/up of the interface.
Signed-off-by: Manish Chopra <manishc(a)marvell.com>
Fixes: b0ec5489c480 ("qede: preserve per queue stats across up/down of interface")
Cc: stable(a)vger.kernel.org
Cc: Bhaskar Upadhaya <bupadhaya(a)marvell.com>
Cc: David S. Miller <davem(a)davemloft.net>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2160054
Signed-off-by: Alok Prasad <palok(a)marvell.com>
Signed-off-by: Ariel Elior <aelior(a)marvell.com>
---
drivers/net/ethernet/qlogic/qede/qede_main.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 953f304b8588..af39513db1ba 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -970,8 +970,15 @@ static int qede_alloc_fp_array(struct qede_dev *edev)
goto err;
}
- mem = krealloc(edev->coal_entry, QEDE_QUEUE_CNT(edev) *
- sizeof(*edev->coal_entry), GFP_KERNEL);
+ if (!edev->coal_entry) {
+ mem = kcalloc(QEDE_MAX_RSS_CNT(edev),
+ sizeof(*edev->coal_entry), GFP_KERNEL);
+ } else {
+ mem = krealloc(edev->coal_entry,
+ QEDE_QUEUE_CNT(edev) * sizeof(*edev->coal_entry),
+ GFP_KERNEL);
+ }
+
if (!mem) {
DP_ERR(edev, "coalesce entry allocation failed\n");
kfree(edev->coal_entry);
--
2.27.0
While experimenting with CXL region removal the following corruption of
/proc/iomem appeared.
Before:
f010000000-f04fffffff : CXL Window 0
f010000000-f02fffffff : region4
f010000000-f02fffffff : dax4.0
f010000000-f02fffffff : System RAM (kmem)
After (modprobe -r cxl_test):
f010000000-f02fffffff : **redacted binary garbage**
f010000000-f02fffffff : System RAM (kmem)
...and testing further the same is visible with persistent memory
assigned to kmem:
Before:
480000000-243fffffff : Persistent Memory
480000000-57e1fffff : namespace3.0
580000000-243fffffff : dax3.0
580000000-243fffffff : System RAM (kmem)
After (ndctl disable-region all):
480000000-243fffffff : Persistent Memory
580000000-243fffffff : ***redacted binary garbage***
580000000-243fffffff : System RAM (kmem)
The corrupted data is from a use-after-free of the "dax4.0" and "dax3.0"
resources, and it also shows that the "System RAM (kmem)" resource is
not being removed. The bug does not appear after "modprobe -r kmem", it
requires the parent of "dax4.0" and "dax3.0" to be removed which
re-parents the leaked "System RAM (kmem)" instances. Those in turn
reference the freed resource as a parent.
First up for the fix is release_mem_region_adjustable() needs to
reliably delete the resource inserted by add_memory_driver_managed().
That is thwarted by a check for IORESOURCE_SYSRAM that predates the
dax/kmem driver, from commit:
65c78784135f ("kernel, resource: check for IORESOURCE_SYSRAM in release_mem_region_adjustable")
That appears to be working around the behavior of HMM's
"MEMORY_DEVICE_PUBLIC" facility that has since been deleted. With that
check removed the "System RAM (kmem)" resource gets removed, but
corruption still occurs occasionally because the "dax" resource is not
reliably removed.
The dax range information is freed before the device is unregistered, so
the driver can not reliably recall (another use after free) what it is
meant to release. Lastly if that use after free got lucky, the driver
was covering up the leak of "System RAM (kmem)" due to its use of
release_resource() which detaches, but does not free, child resources.
The switch to remove_resource() forces remove_memory() to be responsible
for the deletion of the resource added by add_memory_driver_managed().
Fixes: c2f3011ee697 ("device-dax: add an allocation interface for device-dax instances")
Cc: <stable(a)vger.kernel.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Pavel Tatashin <pasha.tatashin(a)soleen.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/dax/bus.c | 2 +-
drivers/dax/kmem.c | 4 ++--
kernel/resource.c | 14 --------------
3 files changed, 3 insertions(+), 17 deletions(-)
diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 012d576004e9..67a64f4c472d 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -441,8 +441,8 @@ static void unregister_dev_dax(void *dev)
dev_dbg(dev, "%s\n", __func__);
kill_dev_dax(dev_dax);
- free_dev_dax_ranges(dev_dax);
device_del(dev);
+ free_dev_dax_ranges(dev_dax);
put_device(dev);
}
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 918d01d3fbaa..7b36db6f1cbd 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -146,7 +146,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax)
if (rc) {
dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n",
i, range.start, range.end);
- release_resource(res);
+ remove_resource(res);
kfree(res);
data->res[i] = NULL;
if (mapped)
@@ -195,7 +195,7 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax)
rc = remove_memory(range.start, range_len(&range));
if (rc == 0) {
- release_resource(data->res[i]);
+ remove_resource(data->res[i]);
kfree(data->res[i]);
data->res[i] = NULL;
success++;
diff --git a/kernel/resource.c b/kernel/resource.c
index ddbbacb9fb50..b1763b2fd7ef 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -1343,20 +1343,6 @@ void release_mem_region_adjustable(resource_size_t start, resource_size_t size)
continue;
}
- /*
- * All memory regions added from memory-hotplug path have the
- * flag IORESOURCE_SYSTEM_RAM. If the resource does not have
- * this flag, we know that we are dealing with a resource coming
- * from HMM/devm. HMM/devm use another mechanism to add/release
- * a resource. This goes via devm_request_mem_region and
- * devm_release_mem_region.
- * HMM/devm take care to release their resources when they want,
- * so if we are dealing with them, let us just back off here.
- */
- if (!(res->flags & IORESOURCE_SYSRAM)) {
- break;
- }
-
if (!(res->flags & IORESOURCE_MEM))
break;
The implementation of syscall_get_nr on mips used to ignore the task
argument and return the syscall number of the calling thread instead of
the target thread.
The bug was exposed to user space by commit 201766a20e30f ("ptrace: add
PTRACE_GET_SYSCALL_INFO request") and detected by strace test suite.
Link: https://github.com/strace/strace/issues/235
Fixes: c2d9f1775731 ("MIPS: Fix syscall_get_nr for the syscall exit tracing.")
Cc: <stable(a)vger.kernel.org> # v3.19+
Co-developed-by: Dmitry V. Levin <ldv(a)strace.io>
Signed-off-by: Dmitry V. Levin <ldv(a)strace.io>
Signed-off-by: Elvira Khabirova <lineprinter0(a)gmail.com>
---
arch/mips/include/asm/syscall.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/include/asm/syscall.h b/arch/mips/include/asm/syscall.h
index 25fa651c937d..ebdf4d910af2 100644
--- a/arch/mips/include/asm/syscall.h
+++ b/arch/mips/include/asm/syscall.h
@@ -38,7 +38,7 @@ static inline bool mips_syscall_is_indirect(struct task_struct *task,
static inline long syscall_get_nr(struct task_struct *task,
struct pt_regs *regs)
{
- return current_thread_info()->syscall;
+ return task_thread_info(task)->syscall;
}
static inline void mips_syscall_update_nr(struct task_struct *task,
--
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG),
indirect call targets are validated against the expected function
pointer prototype to make sure the call target is valid to help mitigate
ROP attacks. If they are not identical, there is a failure at run time,
which manifests as either a kernel panic or thread getting killed.
ext4_feat_ktype was setting the "release" handler to "kfree", which
doesn't have a matching function prototype. Add a simple wrapper
with the correct prototype.
This was found as a result of Clang's new -Wcast-function-type-strict
flag, which is more sensitive than the simpler -Wcast-function-type,
which only checks for type width mismatches.
Note that this code is only reached when ext4 is a loadable module and
it is being unloaded:
CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698)
...
RIP: 0010:kobject_put+0xbb/0x1b0
...
Call Trace:
<TASK>
ext4_exit_sysfs+0x14/0x60 [ext4]
cleanup_module+0x67/0xedb [ext4]
Fixes: b99fee58a20a ("ext4: create ext4_feat kobject dynamically")
Cc: Theodore Ts'o <tytso(a)mit.edu>
Cc: Eric Biggers <ebiggers(a)kernel.org>
Cc: stable(a)vger.kernel.org
Build-tested-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Link: https://lore.kernel.org/r/20230103234616.never.915-kees@kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
v2: rename callback, improve commit log (ebiggers)
v1: https://lore.kernel.org/lkml/20230103234616.never.915-kees@kernel.org
---
fs/ext4/sysfs.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index d233c24ea342..e2b8b3437c58 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -491,6 +491,11 @@ static void ext4_sb_release(struct kobject *kobj)
complete(&sbi->s_kobj_unregister);
}
+static void ext4_feat_release(struct kobject *kobj)
+{
+ kfree(kobj);
+}
+
static const struct sysfs_ops ext4_attr_ops = {
.show = ext4_attr_show,
.store = ext4_attr_store,
@@ -505,7 +510,7 @@ static struct kobj_type ext4_sb_ktype = {
static struct kobj_type ext4_feat_ktype = {
.default_groups = ext4_feat_groups,
.sysfs_ops = &ext4_attr_ops,
- .release = (void (*)(struct kobject *))kfree,
+ .release = ext4_feat_release,
};
void ext4_notify_error_sysfs(struct ext4_sb_info *sbi)
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
5956592ce337 ("mm/filemap: fix page end in filemap_get_read_batch")
25d6a23e8d28 ("filemap: Convert filemap_get_read_batch() to use a folio_batch")
d996fc7f615f ("filemap: Convert filemap_read() to use a folio")
65bca53b5f63 ("filemap: Convert filemap_get_pages to use folios")
a5d4ad098528 ("filemap: Convert filemap_create_page to folio")
9d427b4eb456 ("filemap: Convert filemap_read_page to take a folio")
bdb729329769 ("filemap: Convert filemap_get_read_batch to use folios")
512b7931ad05 ("Merge branch 'akpm' (patches from Andrew)")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5956592ce337330cdff0399a6f8b6a5aea397a8e Mon Sep 17 00:00:00 2001
From: Qian Yingjin <qian(a)ddn.com>
Date: Wed, 8 Feb 2023 10:24:00 +0800
Subject: [PATCH] mm/filemap: fix page end in filemap_get_read_batch
I was running traces of the read code against an RAID storage system to
understand why read requests were being misaligned against the underlying
RAID strips. I found that the page end offset calculation in
filemap_get_read_batch() was off by one.
When a read is submitted with end offset 1048575, then it calculates the
end page for read of 256 when it should be 255. "last_index" is the index
of the page beyond the end of the read and it should be skipped when get a
batch of pages for read in @filemap_get_read_batch().
The below simple patch fixes the problem. This code was introduced in
kernel 5.12.
Link: https://lkml.kernel.org/r/20230208022400.28962-1-coolqyj@163.com
Fixes: cbd59c48ae2b ("mm/filemap: use head pages in generic_file_buffered_read")
Signed-off-by: Qian Yingjin <qian(a)ddn.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/filemap.c b/mm/filemap.c
index c4d4ace9cc70..0e20a8d6dd93 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2588,18 +2588,19 @@ static int filemap_get_pages(struct kiocb *iocb, struct iov_iter *iter,
struct folio *folio;
int err = 0;
+ /* "last_index" is the index of the page beyond the end of the read */
last_index = DIV_ROUND_UP(iocb->ki_pos + iter->count, PAGE_SIZE);
retry:
if (fatal_signal_pending(current))
return -EINTR;
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & IOCB_NOIO)
return -EAGAIN;
page_cache_sync_readahead(mapping, ra, filp, index,
last_index - index);
- filemap_get_read_batch(mapping, index, last_index, fbatch);
+ filemap_get_read_batch(mapping, index, last_index - 1, fbatch);
}
if (!folio_batch_count(fbatch)) {
if (iocb->ki_flags & (IOCB_NOWAIT | IOCB_WAITQ))
In some specific situation, the return value of __bch_btree_node_alloc may
be NULL. This may lead to poential NULL pointer dereference in caller
function like a calling chaion :
btree_split->bch_btree_node_alloc->__bch_btree_node_alloc.
Fix it by initialize return value in __bch_btree_node_alloc before return.
Fixes: cafe56359144 ("bcache: A block layer cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
---
v3:
- Add Cc: stable(a)vger.kernel.org suggested by Eric
v2:
- split patch v1 into two patches to make it clearer suggested by Coly Li
---
drivers/md/bcache/btree.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 147c493a989a..cae25e74b9e0 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1090,10 +1090,12 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op,
struct btree *parent)
{
BKEY_PADDED(key) k;
- struct btree *b = ERR_PTR(-EAGAIN);
+ struct btree *b;
mutex_lock(&c->bucket_lock);
retry:
+ /* return ERR_PTR(-EAGAIN) when it fails */
+ b = ERR_PTR(-EAGAIN);
if (__bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, wait))
goto err;
--
2.25.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
0967bf837784 ("ixgbe: add double of VLAN header when computing the max MTU")
f9cd6a4418ba ("ixgbe: allow to increase MTU to 3K with XDP enabled")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0967bf837784a11c65d66060623a74e65211af0b Mon Sep 17 00:00:00 2001
From: Jason Xing <kernelxing(a)tencent.com>
Date: Thu, 9 Feb 2023 10:41:28 +0800
Subject: [PATCH] ixgbe: add double of VLAN header when computing the max MTU
Include the second VLAN HLEN into account when computing the maximum
MTU size as other drivers do.
Fixes: fabf1bce103a ("ixgbe: Prevent unsupported configurations with XDP")
Signed-off-by: Jason Xing <kernelxing(a)tencent.com>
Reviewed-by: Alexander Duyck <alexanderduyck(a)fb.com>
Tested-by: Chandan Kumar Rout <chandanx.rout(a)intel.com> (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index bc68b8f2176d..8736ca4b2628 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -73,6 +73,8 @@
#define IXGBE_RXBUFFER_4K 4096
#define IXGBE_MAX_RXBUFFER 16384 /* largest size for a single descriptor */
+#define IXGBE_PKT_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
+
/* Attempt to maximize the headroom available for incoming frames. We
* use a 2K buffer for receives and need 1536/1534 to store the data for
* the frame. This leaves us with 512 bytes of room. From that we need
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 25ca329f7d3c..4507fba8747a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -6801,8 +6801,7 @@ static int ixgbe_change_mtu(struct net_device *netdev, int new_mtu)
struct ixgbe_adapter *adapter = netdev_priv(netdev);
if (ixgbe_enabled_xdp_adapter(adapter)) {
- int new_frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN +
- VLAN_HLEN;
+ int new_frame_size = new_mtu + IXGBE_PKT_HDR_PAD;
if (new_frame_size > ixgbe_max_xdp_frame_size(adapter)) {
e_warn(probe, "Requested MTU size is not supported with XDP\n");
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
40bd094d65fc ("flow_offload: fill flags to action structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 21c167aa0ba943a7cac2f6969814f83bb701666b Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Fri, 10 Feb 2023 17:08:25 -0300
Subject: [PATCH] net/sched: act_ctinfo: use percpu stats
The tc action act_ctinfo was using shared stats, fix it to use percpu stats
since bstats_update() must be called with locks or with a percpu pointer argument.
tdc results:
1..12
ok 1 c826 - Add ctinfo action with default setting
ok 2 0286 - Add ctinfo action with dscp
ok 3 4938 - Add ctinfo action with valid cpmark and zone
ok 4 7593 - Add ctinfo action with drop control
ok 5 2961 - Replace ctinfo action zone and action control
ok 6 e567 - Delete ctinfo action with valid index
ok 7 6a91 - Delete ctinfo action with invalid index
ok 8 5232 - List ctinfo actions
ok 9 7702 - Flush ctinfo actions
ok 10 3201 - Add ctinfo action with duplicate index
ok 11 8295 - Add ctinfo action with invalid index
ok 12 3964 - Replace ctinfo action with invalid goto_chain control
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Reviewed-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba(a)intel.com>
Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 4b1b59da5c0b..4d15b6a6169c 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -93,7 +93,7 @@ TC_INDIRECT_SCOPE int tcf_ctinfo_act(struct sk_buff *skb,
cp = rcu_dereference_bh(ca->params);
tcf_lastuse_update(&ca->tcf_tm);
- bstats_update(&ca->tcf_bstats, skb);
+ tcf_action_update_bstats(&ca->common, skb);
action = READ_ONCE(ca->tcf_action);
wlen = skb_network_offset(skb);
@@ -212,8 +212,8 @@ static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
index = actparm->index;
err = tcf_idr_check_alloc(tn, &index, a, bind);
if (!err) {
- ret = tcf_idr_create(tn, index, est, a,
- &act_ctinfo_ops, bind, false, flags);
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_ctinfo_ops, bind, flags);
if (ret) {
tcf_idr_cleanup(tn, index);
return ret;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
21c167aa0ba9 ("net/sched: act_ctinfo: use percpu stats")
40bd094d65fc ("flow_offload: fill flags to action structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 21c167aa0ba943a7cac2f6969814f83bb701666b Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Fri, 10 Feb 2023 17:08:25 -0300
Subject: [PATCH] net/sched: act_ctinfo: use percpu stats
The tc action act_ctinfo was using shared stats, fix it to use percpu stats
since bstats_update() must be called with locks or with a percpu pointer argument.
tdc results:
1..12
ok 1 c826 - Add ctinfo action with default setting
ok 2 0286 - Add ctinfo action with dscp
ok 3 4938 - Add ctinfo action with valid cpmark and zone
ok 4 7593 - Add ctinfo action with drop control
ok 5 2961 - Replace ctinfo action zone and action control
ok 6 e567 - Delete ctinfo action with valid index
ok 7 6a91 - Delete ctinfo action with invalid index
ok 8 5232 - List ctinfo actions
ok 9 7702 - Flush ctinfo actions
ok 10 3201 - Add ctinfo action with duplicate index
ok 11 8295 - Add ctinfo action with invalid index
ok 12 3964 - Replace ctinfo action with invalid goto_chain control
Fixes: 24ec483cec98 ("net: sched: Introduce act_ctinfo action")
Reviewed-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Reviewed-by: Larysa Zaremba <larysa.zaremba(a)intel.com>
Link: https://lore.kernel.org/r/20230210200824.444856-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/act_ctinfo.c b/net/sched/act_ctinfo.c
index 4b1b59da5c0b..4d15b6a6169c 100644
--- a/net/sched/act_ctinfo.c
+++ b/net/sched/act_ctinfo.c
@@ -93,7 +93,7 @@ TC_INDIRECT_SCOPE int tcf_ctinfo_act(struct sk_buff *skb,
cp = rcu_dereference_bh(ca->params);
tcf_lastuse_update(&ca->tcf_tm);
- bstats_update(&ca->tcf_bstats, skb);
+ tcf_action_update_bstats(&ca->common, skb);
action = READ_ONCE(ca->tcf_action);
wlen = skb_network_offset(skb);
@@ -212,8 +212,8 @@ static int tcf_ctinfo_init(struct net *net, struct nlattr *nla,
index = actparm->index;
err = tcf_idr_check_alloc(tn, &index, a, bind);
if (!err) {
- ret = tcf_idr_create(tn, index, est, a,
- &act_ctinfo_ops, bind, false, flags);
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_ctinfo_ops, bind, flags);
if (ret) {
tcf_idr_cleanup(tn, index);
return ret;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a1221703a0f7 ("sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a1221703a0f75a9d81748c516457e0fc76951496 Mon Sep 17 00:00:00 2001
From: Pietro Borrello <borrello(a)diag.uniroma1.it>
Date: Thu, 9 Feb 2023 12:13:05 +0000
Subject: [PATCH] sctp: sctp_sock_filter(): avoid list_entry() on possibly
empty list
Use list_is_first() to check whether tsp->asoc matches the first
element of ep->asocs, as the list is not guaranteed to have an entry.
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Pietro Borrello <borrello(a)diag.uniroma1.it>
Acked-by: Xin Long <lucien.xin(a)gmail.com>
Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniro…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index a557009e9832..c3d6b92dd386 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -343,11 +343,9 @@ static int sctp_sock_filter(struct sctp_endpoint *ep, struct sctp_transport *tsp
struct sctp_comm_param *commp = p;
struct sock *sk = ep->base.sk;
const struct inet_diag_req_v2 *r = commp->r;
- struct sctp_association *assoc =
- list_entry(ep->asocs.next, struct sctp_association, asocs);
/* find the ep only once through the transports by this condition */
- if (tsp->asoc != assoc)
+ if (!list_is_first(&tsp->asoc->asocs, &ep->asocs))
return 0;
if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
a1221703a0f7 ("sctp: sctp_sock_filter(): avoid list_entry() on possibly empty list")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a1221703a0f75a9d81748c516457e0fc76951496 Mon Sep 17 00:00:00 2001
From: Pietro Borrello <borrello(a)diag.uniroma1.it>
Date: Thu, 9 Feb 2023 12:13:05 +0000
Subject: [PATCH] sctp: sctp_sock_filter(): avoid list_entry() on possibly
empty list
Use list_is_first() to check whether tsp->asoc matches the first
element of ep->asocs, as the list is not guaranteed to have an entry.
Fixes: 8f840e47f190 ("sctp: add the sctp_diag.c file")
Signed-off-by: Pietro Borrello <borrello(a)diag.uniroma1.it>
Acked-by: Xin Long <lucien.xin(a)gmail.com>
Link: https://lore.kernel.org/r/20230208-sctp-filter-v2-1-6e1f4017f326@diag.uniro…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sctp/diag.c b/net/sctp/diag.c
index a557009e9832..c3d6b92dd386 100644
--- a/net/sctp/diag.c
+++ b/net/sctp/diag.c
@@ -343,11 +343,9 @@ static int sctp_sock_filter(struct sctp_endpoint *ep, struct sctp_transport *tsp
struct sctp_comm_param *commp = p;
struct sock *sk = ep->base.sk;
const struct inet_diag_req_v2 *r = commp->r;
- struct sctp_association *assoc =
- list_entry(ep->asocs.next, struct sctp_association, asocs);
/* find the ep only once through the transports by this condition */
- if (tsp->asoc != assoc)
+ if (!list_is_first(&tsp->asoc->asocs, &ep->asocs))
return 0;
if (r->sdiag_family != AF_UNSPEC && sk->sk_family != r->sdiag_family)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ee059170b1f7 ("net/sched: tcindex: update imperfect hash filters respecting rcu")
304e024216a8 ("net_sched: add a temporary refcnt for struct tcindex_data")
599be01ee567 ("net_sched: fix an OOB access in cls_tcindex")
14215108a1fd ("net_sched: initialize net pointer inside tcf_exts_init()")
51dcb69de67a ("net_sched: fix a memory leak in cls_tcindex")
3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()")
12db03b65c2b ("net: sched: extend proto ops to support unlocked classifiers")
7d5509fa0d3d ("net: sched: extend proto ops with 'put' callback")
726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution")
fe2923afc124 ("net: sched: traverse classifiers in chain with tcf_get_next_proto()")
4dbfa766440c ("net: sched: introduce reference counting for tcf_proto")
ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex")
a5654820bb4b ("net: sched: protect chain template accesses with block lock")
bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()")
165f01354c52 ("net: sched: protect block->chain0 with block->lock")
c266f64dbfa2 ("net: sched: protect block state with mutex")
a030598690c6 ("net: sched: cls_u32: simplify the hell out u32_delete() emptiness check")
787ce6d02d95 ("net: sched: use reference counting for tcf blocks on rules update")
0607e439943b ("net: sched: implement tcf_block_refcnt_{get|put}()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee059170b1f7e94e55fa6cadee544e176a6e59c2 Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Thu, 9 Feb 2023 11:37:39 -0300
Subject: [PATCH] net/sched: tcindex: update imperfect hash filters respecting
rcu
The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.
CPU 0: CPU 1:
tcindex_set_parms tcindex_classify
tcindex_lookup
tcindex_lookup
tcf_exts_change
tcf_exts_exec [UAF]
Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.
Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: valis <sec(a)valis.email>
Suggested-by: valis <sec(a)valis.email>
Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index ee2a050c887b..ba7f22a49397 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -12,6 +12,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/refcount.h>
+#include <linux/rcupdate.h>
#include <net/act_api.h>
#include <net/netlink.h>
#include <net/pkt_cls.h>
@@ -339,6 +340,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
struct tcf_result cr = {};
int err, balloc = 0;
struct tcf_exts e;
+ bool update_h = false;
err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
@@ -456,10 +458,13 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
}
- if (cp->perfect)
+ if (cp->perfect) {
r = cp->perfect + handle;
- else
- r = tcindex_lookup(cp, handle) ? : &new_filter_result;
+ } else {
+ /* imperfect area is updated in-place using rcu */
+ update_h = !!tcindex_lookup(cp, handle);
+ r = &new_filter_result;
+ }
if (r == &new_filter_result) {
f = kzalloc(sizeof(*f), GFP_KERNEL);
@@ -485,7 +490,28 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
rcu_assign_pointer(tp->root, cp);
- if (r == &new_filter_result) {
+ if (update_h) {
+ struct tcindex_filter __rcu **fp;
+ struct tcindex_filter *cf;
+
+ f->result.res = r->res;
+ tcf_exts_change(&f->result.exts, &r->exts);
+
+ /* imperfect area bucket */
+ fp = cp->h + (handle % cp->hash);
+
+ /* lookup the filter, guaranteed to exist */
+ for (cf = rcu_dereference_bh_rtnl(*fp); cf;
+ fp = &cf->next, cf = rcu_dereference_bh_rtnl(*fp))
+ if (cf->key == handle)
+ break;
+
+ f->next = cf->next;
+
+ cf = rcu_replace_pointer(*fp, f, 1);
+ tcf_exts_get_net(&cf->result.exts);
+ tcf_queue_work(&cf->rwork, tcindex_destroy_fexts_work);
+ } else if (r == &new_filter_result) {
struct tcindex_filter *nfp;
struct tcindex_filter __rcu **fp;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ee059170b1f7 ("net/sched: tcindex: update imperfect hash filters respecting rcu")
304e024216a8 ("net_sched: add a temporary refcnt for struct tcindex_data")
599be01ee567 ("net_sched: fix an OOB access in cls_tcindex")
14215108a1fd ("net_sched: initialize net pointer inside tcf_exts_init()")
51dcb69de67a ("net_sched: fix a memory leak in cls_tcindex")
3d210534cc93 ("net_sched: fix a race condition in tcindex_destroy()")
12db03b65c2b ("net: sched: extend proto ops to support unlocked classifiers")
7d5509fa0d3d ("net: sched: extend proto ops with 'put' callback")
726d061286ce ("net: sched: prevent insertion of new classifiers during chain flush")
8b64678e0af8 ("net: sched: refactor tp insert/delete for concurrent execution")
fe2923afc124 ("net: sched: traverse classifiers in chain with tcf_get_next_proto()")
4dbfa766440c ("net: sched: introduce reference counting for tcf_proto")
ed76f5edccc9 ("net: sched: protect filter_chain list with filter_chain_lock mutex")
a5654820bb4b ("net: sched: protect chain template accesses with block lock")
bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()")
165f01354c52 ("net: sched: protect block->chain0 with block->lock")
c266f64dbfa2 ("net: sched: protect block state with mutex")
a030598690c6 ("net: sched: cls_u32: simplify the hell out u32_delete() emptiness check")
787ce6d02d95 ("net: sched: use reference counting for tcf blocks on rules update")
0607e439943b ("net: sched: implement tcf_block_refcnt_{get|put}()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee059170b1f7e94e55fa6cadee544e176a6e59c2 Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Thu, 9 Feb 2023 11:37:39 -0300
Subject: [PATCH] net/sched: tcindex: update imperfect hash filters respecting
rcu
The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.
CPU 0: CPU 1:
tcindex_set_parms tcindex_classify
tcindex_lookup
tcindex_lookup
tcf_exts_change
tcf_exts_exec [UAF]
Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.
Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: valis <sec(a)valis.email>
Suggested-by: valis <sec(a)valis.email>
Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index ee2a050c887b..ba7f22a49397 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -12,6 +12,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/refcount.h>
+#include <linux/rcupdate.h>
#include <net/act_api.h>
#include <net/netlink.h>
#include <net/pkt_cls.h>
@@ -339,6 +340,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
struct tcf_result cr = {};
int err, balloc = 0;
struct tcf_exts e;
+ bool update_h = false;
err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
@@ -456,10 +458,13 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
}
- if (cp->perfect)
+ if (cp->perfect) {
r = cp->perfect + handle;
- else
- r = tcindex_lookup(cp, handle) ? : &new_filter_result;
+ } else {
+ /* imperfect area is updated in-place using rcu */
+ update_h = !!tcindex_lookup(cp, handle);
+ r = &new_filter_result;
+ }
if (r == &new_filter_result) {
f = kzalloc(sizeof(*f), GFP_KERNEL);
@@ -485,7 +490,28 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
rcu_assign_pointer(tp->root, cp);
- if (r == &new_filter_result) {
+ if (update_h) {
+ struct tcindex_filter __rcu **fp;
+ struct tcindex_filter *cf;
+
+ f->result.res = r->res;
+ tcf_exts_change(&f->result.exts, &r->exts);
+
+ /* imperfect area bucket */
+ fp = cp->h + (handle % cp->hash);
+
+ /* lookup the filter, guaranteed to exist */
+ for (cf = rcu_dereference_bh_rtnl(*fp); cf;
+ fp = &cf->next, cf = rcu_dereference_bh_rtnl(*fp))
+ if (cf->key == handle)
+ break;
+
+ f->next = cf->next;
+
+ cf = rcu_replace_pointer(*fp, f, 1);
+ tcf_exts_get_net(&cf->result.exts);
+ tcf_queue_work(&cf->rwork, tcindex_destroy_fexts_work);
+ } else if (r == &new_filter_result) {
struct tcindex_filter *nfp;
struct tcindex_filter __rcu **fp;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
ee059170b1f7 ("net/sched: tcindex: update imperfect hash filters respecting rcu")
304e024216a8 ("net_sched: add a temporary refcnt for struct tcindex_data")
599be01ee567 ("net_sched: fix an OOB access in cls_tcindex")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee059170b1f7e94e55fa6cadee544e176a6e59c2 Mon Sep 17 00:00:00 2001
From: Pedro Tammela <pctammela(a)mojatatu.com>
Date: Thu, 9 Feb 2023 11:37:39 -0300
Subject: [PATCH] net/sched: tcindex: update imperfect hash filters respecting
rcu
The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.
CPU 0: CPU 1:
tcindex_set_parms tcindex_classify
tcindex_lookup
tcindex_lookup
tcf_exts_change
tcf_exts_exec [UAF]
Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.
Fixes: 9b0d4446b569 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: valis <sec(a)valis.email>
Suggested-by: valis <sec(a)valis.email>
Signed-off-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Signed-off-by: Pedro Tammela <pctammela(a)mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index ee2a050c887b..ba7f22a49397 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -12,6 +12,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/refcount.h>
+#include <linux/rcupdate.h>
#include <net/act_api.h>
#include <net/netlink.h>
#include <net/pkt_cls.h>
@@ -339,6 +340,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
struct tcf_result cr = {};
int err, balloc = 0;
struct tcf_exts e;
+ bool update_h = false;
err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
if (err < 0)
@@ -456,10 +458,13 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
}
}
- if (cp->perfect)
+ if (cp->perfect) {
r = cp->perfect + handle;
- else
- r = tcindex_lookup(cp, handle) ? : &new_filter_result;
+ } else {
+ /* imperfect area is updated in-place using rcu */
+ update_h = !!tcindex_lookup(cp, handle);
+ r = &new_filter_result;
+ }
if (r == &new_filter_result) {
f = kzalloc(sizeof(*f), GFP_KERNEL);
@@ -485,7 +490,28 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
rcu_assign_pointer(tp->root, cp);
- if (r == &new_filter_result) {
+ if (update_h) {
+ struct tcindex_filter __rcu **fp;
+ struct tcindex_filter *cf;
+
+ f->result.res = r->res;
+ tcf_exts_change(&f->result.exts, &r->exts);
+
+ /* imperfect area bucket */
+ fp = cp->h + (handle % cp->hash);
+
+ /* lookup the filter, guaranteed to exist */
+ for (cf = rcu_dereference_bh_rtnl(*fp); cf;
+ fp = &cf->next, cf = rcu_dereference_bh_rtnl(*fp))
+ if (cf->key == handle)
+ break;
+
+ f->next = cf->next;
+
+ cf = rcu_replace_pointer(*fp, f, 1);
+ tcf_exts_get_net(&cf->result.exts);
+ tcf_queue_work(&cf->rwork, tcindex_destroy_fexts_work);
+ } else if (r == &new_filter_result) {
struct tcindex_filter *nfp;
struct tcindex_filter __rcu **fp;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
43fbca02c2dd ("ice: fix lost multicast packets in promisc mode")
abddafd4585c ("ice: Fix clearing of promisc mode with bridge over bond")
1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
7bd527aa174f ("ice: Adjust naming for inner VLAN operations")
2bfefa2dab6b ("ice: Use the proto argument for VLAN ops")
a19d7f7f0122 ("ice: Refactor vf->port_vlan_info to use ice_vlan")
fb05ba1257d7 ("ice: Introduce ice_vlan struct")
bc42afa95487 ("ice: Add new VSI VLAN ops")
3e0b59714bd4 ("ice: Add helper function for adding VLAN 0")
daf4dd16438b ("ice: Refactor spoofcheck configuration functions")
c1e5da5dd465 ("ice: improve switchdev's slow-path")
c36a2b971627 ("ice: replay advanced rules after reset")
3a7496234d17 ("ice: implement basic E822 PTP support")
b2ee72565cd0 ("ice: introduce ice_ptp_init_phc function")
39b2810642e8 ("ice: use 'int err' instead of 'int status' in ice_ptp_hw.c")
78267d0c9cab ("ice: introduce ice_base_incval function")
4809671015a1 ("ice: Fix E810 PTP reset flow")
c14846914ed6 ("ice: Propagate error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43fbca02c2ddc39ff5879b6f3a4a097b1ba02098 Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Mon, 6 Feb 2023 15:54:36 -0800
Subject: [PATCH] ice: fix lost multicast packets in promisc mode
There was a problem reported to us where the addition of a VF with an IPv6
address ending with a particular sequence would cause the parent device on
the PF to no longer be able to respond to neighbor discovery packets.
In this case, we had an ovs-bridge device living on top of a VLAN, which
was on top of a PF, and it would not be able to talk anymore (the neighbor
entry would expire and couldn't be restored).
The root cause of the issue is that if the PF is asked to be in IFF_PROMISC
mode (promiscuous mode) and it had an ipv6 address that needed the
33:33:ff:00:00:04 multicast address to work, then when the VF was added
with the need for the same multicast address, the VF would steal all the
traffic destined for that address.
The ice driver didn't auto-subscribe a request of IFF_PROMISC to the
"multicast replication from other port's traffic" meaning that it won't get
for instance, packets with an exact destination in the VF, as above.
The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04,
results in no packets for that multicast address making it to the PF (which
is in promisc but NOT "multicast replication").
The fix is to enable "multicast promiscuous" whenever the driver is asked
to enable IFF_PROMISC, and make sure to disable it when appropriate.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index b288a01a321a..8ec24f6cf6be 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -275,6 +275,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (status && status != -EEXIST)
return status;
+ netdev_dbg(vsi->netdev, "set promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return 0;
}
@@ -300,6 +302,8 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
promisc_m, 0);
}
+ netdev_dbg(vsi->netdev, "clear promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return status;
}
@@ -414,6 +418,16 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
}
err = 0;
vlan_ops->dis_rx_filtering(vsi);
+
+ /* promiscuous mode implies allmulticast so
+ * that VSIs that are in promiscuous mode are
+ * subscribed to multicast packets coming to
+ * the port
+ */
+ err = ice_set_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err)
+ goto out_promisc;
}
} else {
/* Clear Rx filter to remove traffic from wire */
@@ -430,6 +444,18 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
NETIF_F_HW_VLAN_CTAG_FILTER)
vlan_ops->ena_rx_filtering(vsi);
}
+
+ /* disable allmulti here, but only if allmulti is not
+ * still enabled for the netdev
+ */
+ if (!(vsi->current_netdev_flags & IFF_ALLMULTI)) {
+ err = ice_clear_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err) {
+ netdev_err(netdev, "Error %d clearing multicast promiscuous on VSI %i\n",
+ err, vsi->vsi_num);
+ }
+ }
}
}
goto exit;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
43fbca02c2dd ("ice: fix lost multicast packets in promisc mode")
abddafd4585c ("ice: Fix clearing of promisc mode with bridge over bond")
1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
7bd527aa174f ("ice: Adjust naming for inner VLAN operations")
2bfefa2dab6b ("ice: Use the proto argument for VLAN ops")
a19d7f7f0122 ("ice: Refactor vf->port_vlan_info to use ice_vlan")
fb05ba1257d7 ("ice: Introduce ice_vlan struct")
bc42afa95487 ("ice: Add new VSI VLAN ops")
3e0b59714bd4 ("ice: Add helper function for adding VLAN 0")
daf4dd16438b ("ice: Refactor spoofcheck configuration functions")
c1e5da5dd465 ("ice: improve switchdev's slow-path")
c36a2b971627 ("ice: replay advanced rules after reset")
3a7496234d17 ("ice: implement basic E822 PTP support")
b2ee72565cd0 ("ice: introduce ice_ptp_init_phc function")
39b2810642e8 ("ice: use 'int err' instead of 'int status' in ice_ptp_hw.c")
78267d0c9cab ("ice: introduce ice_base_incval function")
4809671015a1 ("ice: Fix E810 PTP reset flow")
c14846914ed6 ("ice: Propagate error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43fbca02c2ddc39ff5879b6f3a4a097b1ba02098 Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Mon, 6 Feb 2023 15:54:36 -0800
Subject: [PATCH] ice: fix lost multicast packets in promisc mode
There was a problem reported to us where the addition of a VF with an IPv6
address ending with a particular sequence would cause the parent device on
the PF to no longer be able to respond to neighbor discovery packets.
In this case, we had an ovs-bridge device living on top of a VLAN, which
was on top of a PF, and it would not be able to talk anymore (the neighbor
entry would expire and couldn't be restored).
The root cause of the issue is that if the PF is asked to be in IFF_PROMISC
mode (promiscuous mode) and it had an ipv6 address that needed the
33:33:ff:00:00:04 multicast address to work, then when the VF was added
with the need for the same multicast address, the VF would steal all the
traffic destined for that address.
The ice driver didn't auto-subscribe a request of IFF_PROMISC to the
"multicast replication from other port's traffic" meaning that it won't get
for instance, packets with an exact destination in the VF, as above.
The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04,
results in no packets for that multicast address making it to the PF (which
is in promisc but NOT "multicast replication").
The fix is to enable "multicast promiscuous" whenever the driver is asked
to enable IFF_PROMISC, and make sure to disable it when appropriate.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index b288a01a321a..8ec24f6cf6be 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -275,6 +275,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (status && status != -EEXIST)
return status;
+ netdev_dbg(vsi->netdev, "set promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return 0;
}
@@ -300,6 +302,8 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
promisc_m, 0);
}
+ netdev_dbg(vsi->netdev, "clear promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return status;
}
@@ -414,6 +418,16 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
}
err = 0;
vlan_ops->dis_rx_filtering(vsi);
+
+ /* promiscuous mode implies allmulticast so
+ * that VSIs that are in promiscuous mode are
+ * subscribed to multicast packets coming to
+ * the port
+ */
+ err = ice_set_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err)
+ goto out_promisc;
}
} else {
/* Clear Rx filter to remove traffic from wire */
@@ -430,6 +444,18 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
NETIF_F_HW_VLAN_CTAG_FILTER)
vlan_ops->ena_rx_filtering(vsi);
}
+
+ /* disable allmulti here, but only if allmulti is not
+ * still enabled for the netdev
+ */
+ if (!(vsi->current_netdev_flags & IFF_ALLMULTI)) {
+ err = ice_clear_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err) {
+ netdev_err(netdev, "Error %d clearing multicast promiscuous on VSI %i\n",
+ err, vsi->vsi_num);
+ }
+ }
}
}
goto exit;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
43fbca02c2dd ("ice: fix lost multicast packets in promisc mode")
abddafd4585c ("ice: Fix clearing of promisc mode with bridge over bond")
1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
7bd527aa174f ("ice: Adjust naming for inner VLAN operations")
2bfefa2dab6b ("ice: Use the proto argument for VLAN ops")
a19d7f7f0122 ("ice: Refactor vf->port_vlan_info to use ice_vlan")
fb05ba1257d7 ("ice: Introduce ice_vlan struct")
bc42afa95487 ("ice: Add new VSI VLAN ops")
3e0b59714bd4 ("ice: Add helper function for adding VLAN 0")
daf4dd16438b ("ice: Refactor spoofcheck configuration functions")
c1e5da5dd465 ("ice: improve switchdev's slow-path")
c36a2b971627 ("ice: replay advanced rules after reset")
3a7496234d17 ("ice: implement basic E822 PTP support")
b2ee72565cd0 ("ice: introduce ice_ptp_init_phc function")
39b2810642e8 ("ice: use 'int err' instead of 'int status' in ice_ptp_hw.c")
78267d0c9cab ("ice: introduce ice_base_incval function")
4809671015a1 ("ice: Fix E810 PTP reset flow")
c14846914ed6 ("ice: Propagate error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43fbca02c2ddc39ff5879b6f3a4a097b1ba02098 Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Mon, 6 Feb 2023 15:54:36 -0800
Subject: [PATCH] ice: fix lost multicast packets in promisc mode
There was a problem reported to us where the addition of a VF with an IPv6
address ending with a particular sequence would cause the parent device on
the PF to no longer be able to respond to neighbor discovery packets.
In this case, we had an ovs-bridge device living on top of a VLAN, which
was on top of a PF, and it would not be able to talk anymore (the neighbor
entry would expire and couldn't be restored).
The root cause of the issue is that if the PF is asked to be in IFF_PROMISC
mode (promiscuous mode) and it had an ipv6 address that needed the
33:33:ff:00:00:04 multicast address to work, then when the VF was added
with the need for the same multicast address, the VF would steal all the
traffic destined for that address.
The ice driver didn't auto-subscribe a request of IFF_PROMISC to the
"multicast replication from other port's traffic" meaning that it won't get
for instance, packets with an exact destination in the VF, as above.
The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04,
results in no packets for that multicast address making it to the PF (which
is in promisc but NOT "multicast replication").
The fix is to enable "multicast promiscuous" whenever the driver is asked
to enable IFF_PROMISC, and make sure to disable it when appropriate.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index b288a01a321a..8ec24f6cf6be 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -275,6 +275,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (status && status != -EEXIST)
return status;
+ netdev_dbg(vsi->netdev, "set promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return 0;
}
@@ -300,6 +302,8 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
promisc_m, 0);
}
+ netdev_dbg(vsi->netdev, "clear promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return status;
}
@@ -414,6 +418,16 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
}
err = 0;
vlan_ops->dis_rx_filtering(vsi);
+
+ /* promiscuous mode implies allmulticast so
+ * that VSIs that are in promiscuous mode are
+ * subscribed to multicast packets coming to
+ * the port
+ */
+ err = ice_set_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err)
+ goto out_promisc;
}
} else {
/* Clear Rx filter to remove traffic from wire */
@@ -430,6 +444,18 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
NETIF_F_HW_VLAN_CTAG_FILTER)
vlan_ops->ena_rx_filtering(vsi);
}
+
+ /* disable allmulti here, but only if allmulti is not
+ * still enabled for the netdev
+ */
+ if (!(vsi->current_netdev_flags & IFF_ALLMULTI)) {
+ err = ice_clear_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err) {
+ netdev_err(netdev, "Error %d clearing multicast promiscuous on VSI %i\n",
+ err, vsi->vsi_num);
+ }
+ }
}
}
goto exit;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
43fbca02c2dd ("ice: fix lost multicast packets in promisc mode")
abddafd4585c ("ice: Fix clearing of promisc mode with bridge over bond")
1273f89578f2 ("ice: Fix broken IFF_ALLMULTI handling")
1babaf77f49d ("ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev")
c31af68a1b94 ("ice: Add outer_vlan_ops and VSI specific VLAN ops implementations")
7bd527aa174f ("ice: Adjust naming for inner VLAN operations")
2bfefa2dab6b ("ice: Use the proto argument for VLAN ops")
a19d7f7f0122 ("ice: Refactor vf->port_vlan_info to use ice_vlan")
fb05ba1257d7 ("ice: Introduce ice_vlan struct")
bc42afa95487 ("ice: Add new VSI VLAN ops")
3e0b59714bd4 ("ice: Add helper function for adding VLAN 0")
daf4dd16438b ("ice: Refactor spoofcheck configuration functions")
c1e5da5dd465 ("ice: improve switchdev's slow-path")
c36a2b971627 ("ice: replay advanced rules after reset")
3a7496234d17 ("ice: implement basic E822 PTP support")
b2ee72565cd0 ("ice: introduce ice_ptp_init_phc function")
39b2810642e8 ("ice: use 'int err' instead of 'int status' in ice_ptp_hw.c")
78267d0c9cab ("ice: introduce ice_base_incval function")
4809671015a1 ("ice: Fix E810 PTP reset flow")
c14846914ed6 ("ice: Propagate error codes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43fbca02c2ddc39ff5879b6f3a4a097b1ba02098 Mon Sep 17 00:00:00 2001
From: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Date: Mon, 6 Feb 2023 15:54:36 -0800
Subject: [PATCH] ice: fix lost multicast packets in promisc mode
There was a problem reported to us where the addition of a VF with an IPv6
address ending with a particular sequence would cause the parent device on
the PF to no longer be able to respond to neighbor discovery packets.
In this case, we had an ovs-bridge device living on top of a VLAN, which
was on top of a PF, and it would not be able to talk anymore (the neighbor
entry would expire and couldn't be restored).
The root cause of the issue is that if the PF is asked to be in IFF_PROMISC
mode (promiscuous mode) and it had an ipv6 address that needed the
33:33:ff:00:00:04 multicast address to work, then when the VF was added
with the need for the same multicast address, the VF would steal all the
traffic destined for that address.
The ice driver didn't auto-subscribe a request of IFF_PROMISC to the
"multicast replication from other port's traffic" meaning that it won't get
for instance, packets with an exact destination in the VF, as above.
The VF's IPv6 address, which adds a "perfect filter" for 33:33:ff:00:00:04,
results in no packets for that multicast address making it to the PF (which
is in promisc but NOT "multicast replication").
The fix is to enable "multicast promiscuous" whenever the driver is asked
to enable IFF_PROMISC, and make sure to disable it when appropriate.
Fixes: e94d44786693 ("ice: Implement filter sync, NDO operations and bump version")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index b288a01a321a..8ec24f6cf6be 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -275,6 +275,8 @@ static int ice_set_promisc(struct ice_vsi *vsi, u8 promisc_m)
if (status && status != -EEXIST)
return status;
+ netdev_dbg(vsi->netdev, "set promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return 0;
}
@@ -300,6 +302,8 @@ static int ice_clear_promisc(struct ice_vsi *vsi, u8 promisc_m)
promisc_m, 0);
}
+ netdev_dbg(vsi->netdev, "clear promisc filter bits for VSI %i: 0x%x\n",
+ vsi->vsi_num, promisc_m);
return status;
}
@@ -414,6 +418,16 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
}
err = 0;
vlan_ops->dis_rx_filtering(vsi);
+
+ /* promiscuous mode implies allmulticast so
+ * that VSIs that are in promiscuous mode are
+ * subscribed to multicast packets coming to
+ * the port
+ */
+ err = ice_set_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err)
+ goto out_promisc;
}
} else {
/* Clear Rx filter to remove traffic from wire */
@@ -430,6 +444,18 @@ static int ice_vsi_sync_fltr(struct ice_vsi *vsi)
NETIF_F_HW_VLAN_CTAG_FILTER)
vlan_ops->ena_rx_filtering(vsi);
}
+
+ /* disable allmulti here, but only if allmulti is not
+ * still enabled for the netdev
+ */
+ if (!(vsi->current_netdev_flags & IFF_ALLMULTI)) {
+ err = ice_clear_promisc(vsi,
+ ICE_MCAST_PROMISC_BITS);
+ if (err) {
+ netdev_err(netdev, "Error %d clearing multicast promiscuous on VSI %i\n",
+ err, vsi->vsi_num);
+ }
+ }
}
}
goto exit;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d5a1224aa68c ("drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list")
67b858dd8993 ("drm/i915/gen11: Moving WAs to icl_gt_workarounds_init()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d5a1224aa68c8b124a4c5c390186e571815ed390 Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper(a)intel.com>
Date: Wed, 1 Feb 2023 14:28:29 -0800
Subject: [PATCH] drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT
list
The UNSLICE_UNIT_LEVEL_CLKGATE register programmed by this workaround
has 'BUS' style reset, indicating that it does not lose its value on
engine resets. Furthermore, this register is part of the GT forcewake
domain rather than the RENDER domain, so it should not be impacted by
RCS engine resets. As such, we should implement this on the GT
workaround list rather than an engine list.
Bspec: 19219
Fixes: 3551ff928744 ("drm/i915/gen11: Moving WAs to rcs_engine_wa_init()")
Signed-off-by: Matt Roper <matthew.d.roper(a)intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230201222831.608281-2-matth…
(cherry picked from commit 5f21dc07b52eb54a908e66f5d6e05a87bcb5b049)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 949c19339015..a0740308555d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1355,6 +1355,13 @@ icl_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal)
GAMT_CHKN_BIT_REG,
GAMT_CHKN_DISABLE_L3_COH_PIPE);
+ /*
+ * Wa_1408615072:icl,ehl (vsunit)
+ * Wa_1407596294:icl,ehl (hsunit)
+ */
+ wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE,
+ VSUNIT_CLKGATE_DIS | HSUNIT_CLKGATE_DIS);
+
/* Wa_1407352427:icl,ehl */
wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2,
PSDUNIT_CLKGATE_DIS);
@@ -2539,13 +2546,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
wa_masked_en(wal, GEN9_CSFE_CHICKEN1_RCS,
GEN11_ENABLE_32_PLANE_MODE);
- /*
- * Wa_1408615072:icl,ehl (vsunit)
- * Wa_1407596294:icl,ehl (hsunit)
- */
- wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE,
- VSUNIT_CLKGATE_DIS | HSUNIT_CLKGATE_DIS);
-
/*
* Wa_1408767742:icl[a2..forever],ehl[all]
* Wa_1605460711:icl[a0..c0]
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
d5a1224aa68c ("drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT list")
67b858dd8993 ("drm/i915/gen11: Moving WAs to icl_gt_workarounds_init()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d5a1224aa68c8b124a4c5c390186e571815ed390 Mon Sep 17 00:00:00 2001
From: Matt Roper <matthew.d.roper(a)intel.com>
Date: Wed, 1 Feb 2023 14:28:29 -0800
Subject: [PATCH] drm/i915/gen11: Wa_1408615072/Wa_1407596294 should be on GT
list
The UNSLICE_UNIT_LEVEL_CLKGATE register programmed by this workaround
has 'BUS' style reset, indicating that it does not lose its value on
engine resets. Furthermore, this register is part of the GT forcewake
domain rather than the RENDER domain, so it should not be impacted by
RCS engine resets. As such, we should implement this on the GT
workaround list rather than an engine list.
Bspec: 19219
Fixes: 3551ff928744 ("drm/i915/gen11: Moving WAs to rcs_engine_wa_init()")
Signed-off-by: Matt Roper <matthew.d.roper(a)intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230201222831.608281-2-matth…
(cherry picked from commit 5f21dc07b52eb54a908e66f5d6e05a87bcb5b049)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c
index 949c19339015..a0740308555d 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -1355,6 +1355,13 @@ icl_gt_workarounds_init(struct intel_gt *gt, struct i915_wa_list *wal)
GAMT_CHKN_BIT_REG,
GAMT_CHKN_DISABLE_L3_COH_PIPE);
+ /*
+ * Wa_1408615072:icl,ehl (vsunit)
+ * Wa_1407596294:icl,ehl (hsunit)
+ */
+ wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE,
+ VSUNIT_CLKGATE_DIS | HSUNIT_CLKGATE_DIS);
+
/* Wa_1407352427:icl,ehl */
wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE2,
PSDUNIT_CLKGATE_DIS);
@@ -2539,13 +2546,6 @@ rcs_engine_wa_init(struct intel_engine_cs *engine, struct i915_wa_list *wal)
wa_masked_en(wal, GEN9_CSFE_CHICKEN1_RCS,
GEN11_ENABLE_32_PLANE_MODE);
- /*
- * Wa_1408615072:icl,ehl (vsunit)
- * Wa_1407596294:icl,ehl (hsunit)
- */
- wa_write_or(wal, UNSLICE_UNIT_LEVEL_CLKGATE,
- VSUNIT_CLKGATE_DIS | HSUNIT_CLKGATE_DIS);
-
/*
* Wa_1408767742:icl[a2..forever],ehl[all]
* Wa_1605460711:icl[a0..c0]
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6b77b16de75a ("drm/vc4: Fix YUV plane handling when planes are in different buffers")
8c30eecc6769 ("drm/gem: rename struct drm_gem_dma_object.{paddr => dma_addr}")
4a83c26a1d87 ("drm/gem: rename GEM CMA helpers to GEM DMA helpers")
6bcfe8eaeef0 ("drm/fb: rename FB CMA helpers to FB DMA helpers")
5e8bf00ea915 ("drm/fb: remove unused includes of drm_fb_cma_helper.h")
a4d847df8b44 ("drm/fsl-dcu: Use drm_plane_helper_destroy()")
254e5e8829a9 ("drm: Remove unnecessary include statements of drm_plane_helper.h")
382fc1f68132 ("drm/atomic-helper: Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers")
b7345c9799da ("drm/vc4: txp: Protect device resources")
e23a5e14aa27 ("Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b77b16de75a6efc0870b1fa467209387cbee8f3 Mon Sep 17 00:00:00 2001
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Date: Fri, 27 Jan 2023 16:57:08 +0100
Subject: [PATCH] drm/vc4: Fix YUV plane handling when planes are in different
buffers
YUV images can either be presented as one allocation with offsets
for the different planes, or multiple allocations with 0 offsets.
The driver only ever calls drm_fb_[dma|cma]_get_gem_obj with plane
index 0, therefore any application using the second approach was
incorrectly rendered.
Correctly determine the address for each plane, removing the
assumption that the base address is the same for each.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127155708.454704-1-maxim…
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 8b92a45a3c89..bd5acc4a8687 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -340,7 +340,7 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
{
struct vc4_plane_state *vc4_state = to_vc4_plane_state(state);
struct drm_framebuffer *fb = state->fb;
- struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, 0);
+ struct drm_gem_dma_object *bo;
int num_planes = fb->format->num_planes;
struct drm_crtc_state *crtc_state;
u32 h_subsample = fb->format->hsub;
@@ -359,8 +359,10 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
if (ret)
return ret;
- for (i = 0; i < num_planes; i++)
+ for (i = 0; i < num_planes; i++) {
+ bo = drm_fb_dma_get_gem_obj(fb, i);
vc4_state->offsets[i] = bo->dma_addr + fb->offsets[i];
+ }
/*
* We don't support subpixel source positioning for scaling,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6b77b16de75a ("drm/vc4: Fix YUV plane handling when planes are in different buffers")
8c30eecc6769 ("drm/gem: rename struct drm_gem_dma_object.{paddr => dma_addr}")
4a83c26a1d87 ("drm/gem: rename GEM CMA helpers to GEM DMA helpers")
6bcfe8eaeef0 ("drm/fb: rename FB CMA helpers to FB DMA helpers")
5e8bf00ea915 ("drm/fb: remove unused includes of drm_fb_cma_helper.h")
a4d847df8b44 ("drm/fsl-dcu: Use drm_plane_helper_destroy()")
254e5e8829a9 ("drm: Remove unnecessary include statements of drm_plane_helper.h")
382fc1f68132 ("drm/atomic-helper: Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers")
b7345c9799da ("drm/vc4: txp: Protect device resources")
e23a5e14aa27 ("Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b77b16de75a6efc0870b1fa467209387cbee8f3 Mon Sep 17 00:00:00 2001
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Date: Fri, 27 Jan 2023 16:57:08 +0100
Subject: [PATCH] drm/vc4: Fix YUV plane handling when planes are in different
buffers
YUV images can either be presented as one allocation with offsets
for the different planes, or multiple allocations with 0 offsets.
The driver only ever calls drm_fb_[dma|cma]_get_gem_obj with plane
index 0, therefore any application using the second approach was
incorrectly rendered.
Correctly determine the address for each plane, removing the
assumption that the base address is the same for each.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127155708.454704-1-maxim…
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 8b92a45a3c89..bd5acc4a8687 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -340,7 +340,7 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
{
struct vc4_plane_state *vc4_state = to_vc4_plane_state(state);
struct drm_framebuffer *fb = state->fb;
- struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, 0);
+ struct drm_gem_dma_object *bo;
int num_planes = fb->format->num_planes;
struct drm_crtc_state *crtc_state;
u32 h_subsample = fb->format->hsub;
@@ -359,8 +359,10 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
if (ret)
return ret;
- for (i = 0; i < num_planes; i++)
+ for (i = 0; i < num_planes; i++) {
+ bo = drm_fb_dma_get_gem_obj(fb, i);
vc4_state->offsets[i] = bo->dma_addr + fb->offsets[i];
+ }
/*
* We don't support subpixel source positioning for scaling,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6b77b16de75a ("drm/vc4: Fix YUV plane handling when planes are in different buffers")
8c30eecc6769 ("drm/gem: rename struct drm_gem_dma_object.{paddr => dma_addr}")
4a83c26a1d87 ("drm/gem: rename GEM CMA helpers to GEM DMA helpers")
6bcfe8eaeef0 ("drm/fb: rename FB CMA helpers to FB DMA helpers")
5e8bf00ea915 ("drm/fb: remove unused includes of drm_fb_cma_helper.h")
a4d847df8b44 ("drm/fsl-dcu: Use drm_plane_helper_destroy()")
254e5e8829a9 ("drm: Remove unnecessary include statements of drm_plane_helper.h")
382fc1f68132 ("drm/atomic-helper: Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers")
b7345c9799da ("drm/vc4: txp: Protect device resources")
e23a5e14aa27 ("Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b77b16de75a6efc0870b1fa467209387cbee8f3 Mon Sep 17 00:00:00 2001
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Date: Fri, 27 Jan 2023 16:57:08 +0100
Subject: [PATCH] drm/vc4: Fix YUV plane handling when planes are in different
buffers
YUV images can either be presented as one allocation with offsets
for the different planes, or multiple allocations with 0 offsets.
The driver only ever calls drm_fb_[dma|cma]_get_gem_obj with plane
index 0, therefore any application using the second approach was
incorrectly rendered.
Correctly determine the address for each plane, removing the
assumption that the base address is the same for each.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127155708.454704-1-maxim…
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 8b92a45a3c89..bd5acc4a8687 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -340,7 +340,7 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
{
struct vc4_plane_state *vc4_state = to_vc4_plane_state(state);
struct drm_framebuffer *fb = state->fb;
- struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, 0);
+ struct drm_gem_dma_object *bo;
int num_planes = fb->format->num_planes;
struct drm_crtc_state *crtc_state;
u32 h_subsample = fb->format->hsub;
@@ -359,8 +359,10 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
if (ret)
return ret;
- for (i = 0; i < num_planes; i++)
+ for (i = 0; i < num_planes; i++) {
+ bo = drm_fb_dma_get_gem_obj(fb, i);
vc4_state->offsets[i] = bo->dma_addr + fb->offsets[i];
+ }
/*
* We don't support subpixel source positioning for scaling,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6b77b16de75a ("drm/vc4: Fix YUV plane handling when planes are in different buffers")
8c30eecc6769 ("drm/gem: rename struct drm_gem_dma_object.{paddr => dma_addr}")
4a83c26a1d87 ("drm/gem: rename GEM CMA helpers to GEM DMA helpers")
6bcfe8eaeef0 ("drm/fb: rename FB CMA helpers to FB DMA helpers")
5e8bf00ea915 ("drm/fb: remove unused includes of drm_fb_cma_helper.h")
a4d847df8b44 ("drm/fsl-dcu: Use drm_plane_helper_destroy()")
254e5e8829a9 ("drm: Remove unnecessary include statements of drm_plane_helper.h")
382fc1f68132 ("drm/atomic-helper: Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers")
b7345c9799da ("drm/vc4: txp: Protect device resources")
e23a5e14aa27 ("Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b77b16de75a6efc0870b1fa467209387cbee8f3 Mon Sep 17 00:00:00 2001
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Date: Fri, 27 Jan 2023 16:57:08 +0100
Subject: [PATCH] drm/vc4: Fix YUV plane handling when planes are in different
buffers
YUV images can either be presented as one allocation with offsets
for the different planes, or multiple allocations with 0 offsets.
The driver only ever calls drm_fb_[dma|cma]_get_gem_obj with plane
index 0, therefore any application using the second approach was
incorrectly rendered.
Correctly determine the address for each plane, removing the
assumption that the base address is the same for each.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127155708.454704-1-maxim…
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 8b92a45a3c89..bd5acc4a8687 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -340,7 +340,7 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
{
struct vc4_plane_state *vc4_state = to_vc4_plane_state(state);
struct drm_framebuffer *fb = state->fb;
- struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, 0);
+ struct drm_gem_dma_object *bo;
int num_planes = fb->format->num_planes;
struct drm_crtc_state *crtc_state;
u32 h_subsample = fb->format->hsub;
@@ -359,8 +359,10 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
if (ret)
return ret;
- for (i = 0; i < num_planes; i++)
+ for (i = 0; i < num_planes; i++) {
+ bo = drm_fb_dma_get_gem_obj(fb, i);
vc4_state->offsets[i] = bo->dma_addr + fb->offsets[i];
+ }
/*
* We don't support subpixel source positioning for scaling,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
6b77b16de75a ("drm/vc4: Fix YUV plane handling when planes are in different buffers")
8c30eecc6769 ("drm/gem: rename struct drm_gem_dma_object.{paddr => dma_addr}")
4a83c26a1d87 ("drm/gem: rename GEM CMA helpers to GEM DMA helpers")
6bcfe8eaeef0 ("drm/fb: rename FB CMA helpers to FB DMA helpers")
5e8bf00ea915 ("drm/fb: remove unused includes of drm_fb_cma_helper.h")
a4d847df8b44 ("drm/fsl-dcu: Use drm_plane_helper_destroy()")
254e5e8829a9 ("drm: Remove unnecessary include statements of drm_plane_helper.h")
382fc1f68132 ("drm/atomic-helper: Move DRM_PLANE_HELPER_NO_SCALING to atomic helpers")
b7345c9799da ("drm/vc4: txp: Protect device resources")
e23a5e14aa27 ("Backmerge tag 'v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6b77b16de75a6efc0870b1fa467209387cbee8f3 Mon Sep 17 00:00:00 2001
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Date: Fri, 27 Jan 2023 16:57:08 +0100
Subject: [PATCH] drm/vc4: Fix YUV plane handling when planes are in different
buffers
YUV images can either be presented as one allocation with offsets
for the different planes, or multiple allocations with 0 offsets.
The driver only ever calls drm_fb_[dma|cma]_get_gem_obj with plane
index 0, therefore any application using the second approach was
incorrectly rendered.
Correctly determine the address for each plane, removing the
assumption that the base address is the same for each.
Fixes: fc04023fafec ("drm/vc4: Add support for YUV planes.")
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Maxime Ripard <maxime(a)cerno.tech>
Link: https://patchwork.freedesktop.org/patch/msgid/20230127155708.454704-1-maxim…
diff --git a/drivers/gpu/drm/vc4/vc4_plane.c b/drivers/gpu/drm/vc4/vc4_plane.c
index 8b92a45a3c89..bd5acc4a8687 100644
--- a/drivers/gpu/drm/vc4/vc4_plane.c
+++ b/drivers/gpu/drm/vc4/vc4_plane.c
@@ -340,7 +340,7 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
{
struct vc4_plane_state *vc4_state = to_vc4_plane_state(state);
struct drm_framebuffer *fb = state->fb;
- struct drm_gem_dma_object *bo = drm_fb_dma_get_gem_obj(fb, 0);
+ struct drm_gem_dma_object *bo;
int num_planes = fb->format->num_planes;
struct drm_crtc_state *crtc_state;
u32 h_subsample = fb->format->hsub;
@@ -359,8 +359,10 @@ static int vc4_plane_setup_clipping_and_scaling(struct drm_plane_state *state)
if (ret)
return ret;
- for (i = 0; i < num_planes; i++)
+ for (i = 0; i < num_planes; i++) {
+ bo = drm_fb_dma_get_gem_obj(fb, i);
vc4_state->offsets[i] = bo->dma_addr + fb->offsets[i];
+ }
/*
* We don't support subpixel source positioning for scaling,
commit 90091c367e74d5b58d9ebe979cc363f7468f58d3 upstream.
This patch fixes the stack-entropy.sh test to exit gracefully when the LKDTM is
not available. Test will hang otherwise as reported in [1].
Applicability of this fix to other LTS kernels:
- 4.14: No lkdtm selftest
- 4.19: No lkdtm selftest
- 5.4: No lkdtm selftests
- 5.10: Inital selftest version introduced in 46d1a0f03d661 ("selftests/lkdtm:
Add tests for LKDTM targets") is a single script which has the LKDTM
availability check
- 6.1: Fix applied
This patch applies cleanly to stable-5.15 tree. Updated test was executed in
Qemu VM with different kernels:
- CONFIG_LKDTM not enabled. Test finished with status SKIP.
- CONFIG_LKDTM enabled. Test failed (but not hanged) with error 'Stack entropy
is low'.
- CONFIG_LKDTM enabled and randomize_kstack_offset=on boot argument provided.
Test succeed.
[1] https://lore.kernel.org/lkml/2836f48a-d4e2-7f00-f06c-9f556fbd6332@linuxfoun…
commit ce4d9a1ea35ac5429e822c4106cb2859d5c71f3e upstream.
Patch series "Fix kmemleak crashes when scanning CMA regions", v2.
When trying to boot a device with an ARM64 kernel with the following
config options enabled:
CONFIG_DEBUG_PAGEALLOC=y
CONFIG_DEBUG_PAGEALLOC_ENABLE_DEFAULT=y
CONFIG_DEBUG_KMEMLEAK=y
a crash is encountered when kmemleak starts to scan the list of gray
or allocated objects that it maintains. Upon closer inspection, it was
observed that these page-faults always occurred when kmemleak attempted
to scan a CMA region.
At the moment, kmemleak is made aware of CMA regions that are specified
through the devicetree to be dynamically allocated within a range of
addresses. However, kmemleak should not need to scan CMA regions or any
reserved memory region, as those regions can be used for DMA transfers
between drivers and peripherals, and thus wouldn't contain anything
useful for kmemleak.
Additionally, since CMA regions are unmapped from the kernel's address
space when they are freed to the buddy allocator at boot when
CONFIG_DEBUG_PAGEALLOC is enabled, kmemleak shouldn't attempt to access
those memory regions, as that will trigger a crash. Thus, kmemleak
should ignore all dynamically allocated reserved memory regions.
This patch (of 1):
Currently, kmemleak ignores dynamically allocated reserved memory regions
that don't have a kernel mapping. However, regions that do retain a
kernel mapping (e.g. CMA regions) do get scanned by kmemleak.
This is not ideal for two reasons:
1 kmemleak works by scanning memory regions for pointers to allocated
objects to determine if those objects have been leaked or not.
However, reserved memory regions can be used between drivers and
peripherals for DMA transfers, and thus, would not contain pointers to
allocated objects, making it unnecessary for kmemleak to scan these
reserved memory regions.
2 When CONFIG_DEBUG_PAGEALLOC is enabled, along with kmemleak, the
CMA reserved memory regions are unmapped from the kernel's address
space when they are freed to buddy at boot. These CMA reserved regions
are still tracked by kmemleak, however, and when kmemleak attempts to
scan them, a crash will happen, as accessing the CMA region will result
in a page-fault, since the regions are unmapped.
Thus, use kmemleak_ignore_phys() for all dynamically allocated reserved
memory regions, instead of those that do not have a kernel mapping
associated with them.
Link: https://lkml.kernel.org/r/20230208232001.2052777-1-isaacmanjarres@google.com
Link: https://lkml.kernel.org/r/20230208232001.2052777-2-isaacmanjarres@google.com
Fixes: a7259df76702 ("memblock: make memblock_find_in_range method private")
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
Acked-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Frank Rowand <frowand.list(a)gmail.com>
Cc: Kirill A. Shutemov <kirill.shtuemov(a)linux.intel.com>
Cc: Nick Kossifidis <mick(a)ics.forth.gr>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Cc: Rob Herring <robh(a)kernel.org>
Cc: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Cc: Saravana Kannan <saravanak(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.15+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/of/of_reserved_mem.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 9da8835ba5a5..9e949ddcb146 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -47,9 +47,10 @@ static int __init early_init_dt_alloc_reserved_memory_arch(phys_addr_t size,
err = memblock_mark_nomap(base, size);
if (err)
memblock_free(base, size);
- kmemleak_ignore_phys(base);
}
+ kmemleak_ignore_phys(base);
+
return err;
}
--
2.39.2.637.g21b0678d19-goog
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
99b9402a36f0 ("nilfs2: fix underflow in second superblock position calculations")
4fcd69798d7f ("nilfs2: use bdev_nr_bytes instead of open coding it")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99b9402a36f0799f25feee4465bfa4b8dfa74b4d Mon Sep 17 00:00:00 2001
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Date: Wed, 15 Feb 2023 07:40:43 +0900
Subject: [PATCH] nilfs2: fix underflow in second superblock position
calculations
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/nilfs2/ioctl.c b/fs/nilfs2/ioctl.c
index 87e1004b606d..b4041d0566a9 100644
--- a/fs/nilfs2/ioctl.c
+++ b/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(struct inode *inode, void __user *argp)
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
diff --git a/fs/nilfs2/super.c b/fs/nilfs2/super.c
index 6edb6e0dd61f..1422b8ba24ed 100644
--- a/fs/nilfs2/super.c
+++ b/fs/nilfs2/super.c
@@ -408,6 +408,15 @@ int nilfs_resize_fs(struct super_block *sb, __u64 newsize)
if (newsize > devsize)
goto out;
+ /*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
/*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index 2064e6473d30..3a4c9c150cbf 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cf4c9d2ac1e4 ("mmc: mmc_spi: fix error handling in mmc_spi_probe()")
1ae51603528c ("mmc: mmc_spi: Indentation fixes")
5716fb9bd9c6 ("mmc: spi: Convert to use GPIO descriptors")
14cd18a89337 ("ARM: ep93xx: simone: let the mmc_spi driver handle the card detect")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cf4c9d2ac1e42c7d18b921bec39486896645b714 Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 31 Jan 2023 09:38:35 +0800
Subject: [PATCH] mmc: mmc_spi: fix error handling in mmc_spi_probe()
If mmc_add_host() fails, it doesn't need to call mmc_remove_host(),
or it will cause null-ptr-deref, because of deleting a not added
device in mmc_remove_host().
To fix this, goto label 'fail_glue_init', if mmc_add_host() fails,
and change the label 'fail_add_host' to 'fail_gpiod_request'.
Fixes: 15a0580ced08 ("mmc_spi host driver")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230131013835.3564011-1-yangyingliang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/mmc_spi.c b/drivers/mmc/host/mmc_spi.c
index 106dd204b1a7..cc333ad67cac 100644
--- a/drivers/mmc/host/mmc_spi.c
+++ b/drivers/mmc/host/mmc_spi.c
@@ -1437,7 +1437,7 @@ static int mmc_spi_probe(struct spi_device *spi)
status = mmc_add_host(mmc);
if (status != 0)
- goto fail_add_host;
+ goto fail_glue_init;
/*
* Index 0 is card detect
@@ -1445,7 +1445,7 @@ static int mmc_spi_probe(struct spi_device *spi)
*/
status = mmc_gpiod_request_cd(mmc, NULL, 0, false, 1000);
if (status == -EPROBE_DEFER)
- goto fail_add_host;
+ goto fail_gpiod_request;
if (!status) {
/*
* The platform has a CD GPIO signal that may support
@@ -1460,7 +1460,7 @@ static int mmc_spi_probe(struct spi_device *spi)
/* Index 1 is write protect/read only */
status = mmc_gpiod_request_ro(mmc, NULL, 1, 0);
if (status == -EPROBE_DEFER)
- goto fail_add_host;
+ goto fail_gpiod_request;
if (!status)
has_ro = true;
@@ -1474,7 +1474,7 @@ static int mmc_spi_probe(struct spi_device *spi)
? ", cd polling" : "");
return 0;
-fail_add_host:
+fail_gpiod_request:
mmc_remove_host(mmc);
fail_glue_init:
mmc_spi_dma_free(host);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
cf4c9d2ac1e4 ("mmc: mmc_spi: fix error handling in mmc_spi_probe()")
1ae51603528c ("mmc: mmc_spi: Indentation fixes")
5716fb9bd9c6 ("mmc: spi: Convert to use GPIO descriptors")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cf4c9d2ac1e42c7d18b921bec39486896645b714 Mon Sep 17 00:00:00 2001
From: Yang Yingliang <yangyingliang(a)huawei.com>
Date: Tue, 31 Jan 2023 09:38:35 +0800
Subject: [PATCH] mmc: mmc_spi: fix error handling in mmc_spi_probe()
If mmc_add_host() fails, it doesn't need to call mmc_remove_host(),
or it will cause null-ptr-deref, because of deleting a not added
device in mmc_remove_host().
To fix this, goto label 'fail_glue_init', if mmc_add_host() fails,
and change the label 'fail_add_host' to 'fail_gpiod_request'.
Fixes: 15a0580ced08 ("mmc_spi host driver")
Signed-off-by: Yang Yingliang <yangyingliang(a)huawei.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230131013835.3564011-1-yangyingliang@huawei.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/host/mmc_spi.c b/drivers/mmc/host/mmc_spi.c
index 106dd204b1a7..cc333ad67cac 100644
--- a/drivers/mmc/host/mmc_spi.c
+++ b/drivers/mmc/host/mmc_spi.c
@@ -1437,7 +1437,7 @@ static int mmc_spi_probe(struct spi_device *spi)
status = mmc_add_host(mmc);
if (status != 0)
- goto fail_add_host;
+ goto fail_glue_init;
/*
* Index 0 is card detect
@@ -1445,7 +1445,7 @@ static int mmc_spi_probe(struct spi_device *spi)
*/
status = mmc_gpiod_request_cd(mmc, NULL, 0, false, 1000);
if (status == -EPROBE_DEFER)
- goto fail_add_host;
+ goto fail_gpiod_request;
if (!status) {
/*
* The platform has a CD GPIO signal that may support
@@ -1460,7 +1460,7 @@ static int mmc_spi_probe(struct spi_device *spi)
/* Index 1 is write protect/read only */
status = mmc_gpiod_request_ro(mmc, NULL, 1, 0);
if (status == -EPROBE_DEFER)
- goto fail_add_host;
+ goto fail_gpiod_request;
if (!status)
has_ro = true;
@@ -1474,7 +1474,7 @@ static int mmc_spi_probe(struct spi_device *spi)
? ", cd polling" : "");
return 0;
-fail_add_host:
+fail_gpiod_request:
mmc_remove_host(mmc);
fail_glue_init:
mmc_spi_dma_free(host);
The quilt patch titled
Subject: nilfs2: fix underflow in second superblock position calculations
has been removed from the -mm tree. Its filename was
nilfs2-fix-underflow-in-second-superblock-position-calculations.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix underflow in second superblock position calculations
Date: Wed, 15 Feb 2023 07:40:43 +0900
Macro NILFS_SB2_OFFSET_BYTES, which computes the position of the second
superblock, underflows when the argument device size is less than 4096
bytes. Therefore, when using this macro, it is necessary to check in
advance that the device size is not less than a lower limit, or at least
that underflow does not occur.
The current nilfs2 implementation lacks this check, causing out-of-bound
block access when mounting devices smaller than 4096 bytes:
I/O error, dev loop0, sector 36028797018963960 op 0x0:(READ) flags 0x0
phys_seg 1 prio class 2
NILFS (loop0): unable to read secondary superblock (blocksize = 1024)
In addition, when trying to resize the filesystem to a size below 4096
bytes, this underflow occurs in nilfs_resize_fs(), passing a huge number
of segments to nilfs_sufile_resize(), corrupting parameters such as the
number of segments in superblocks. This causes excessive loop iterations
in nilfs_sufile_resize() during a subsequent resize ioctl, causing
semaphore ns_segctor_sem to block for a long time and hang the writer
thread:
INFO: task segctord:5067 blocked for more than 143 seconds.
Not tainted 6.2.0-rc8-syzkaller-00015-gf6feea56f66d #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:segctord state:D stack:23456 pid:5067 ppid:2
flags:0x00004000
Call Trace:
<TASK>
context_switch kernel/sched/core.c:5293 [inline]
__schedule+0x1409/0x43f0 kernel/sched/core.c:6606
schedule+0xc3/0x190 kernel/sched/core.c:6682
rwsem_down_write_slowpath+0xfcf/0x14a0 kernel/locking/rwsem.c:1190
nilfs_transaction_lock+0x25c/0x4f0 fs/nilfs2/segment.c:357
nilfs_segctor_thread_construct fs/nilfs2/segment.c:2486 [inline]
nilfs_segctor_thread+0x52f/0x1140 fs/nilfs2/segment.c:2570
kthread+0x270/0x300 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
</TASK>
...
Call Trace:
<TASK>
folio_mark_accessed+0x51c/0xf00 mm/swap.c:515
__nilfs_get_page_block fs/nilfs2/page.c:42 [inline]
nilfs_grab_buffer+0x3d3/0x540 fs/nilfs2/page.c:61
nilfs_mdt_submit_block+0xd7/0x8f0 fs/nilfs2/mdt.c:121
nilfs_mdt_read_block+0xeb/0x430 fs/nilfs2/mdt.c:176
nilfs_mdt_get_block+0x12d/0xbb0 fs/nilfs2/mdt.c:251
nilfs_sufile_get_segment_usage_block fs/nilfs2/sufile.c:92 [inline]
nilfs_sufile_truncate_range fs/nilfs2/sufile.c:679 [inline]
nilfs_sufile_resize+0x7a3/0x12b0 fs/nilfs2/sufile.c:777
nilfs_resize_fs+0x20c/0xed0 fs/nilfs2/super.c:422
nilfs_ioctl_resize fs/nilfs2/ioctl.c:1033 [inline]
nilfs_ioctl+0x137c/0x2440 fs/nilfs2/ioctl.c:1301
...
This fixes these issues by inserting appropriate minimum device size
checks or anti-underflow checks, depending on where the macro is used.
Link: https://lkml.kernel.org/r/0000000000004e1dfa05f4a48e6b@google.com
Link: https://lkml.kernel.org/r/20230214224043.24141-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: <syzbot+f0c4082ce5ebebdac63b(a)syzkaller.appspotmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/fs/nilfs2/ioctl.c~nilfs2-fix-underflow-in-second-superblock-position-calculations
+++ a/fs/nilfs2/ioctl.c
@@ -1114,7 +1114,14 @@ static int nilfs_ioctl_set_alloc_range(s
minseg = range[0] + segbytes - 1;
do_div(minseg, segbytes);
+
+ if (range[1] < 4096)
+ goto out;
+
maxseg = NILFS_SB2_OFFSET_BYTES(range[1]);
+ if (maxseg < segbytes)
+ goto out;
+
do_div(maxseg, segbytes);
maxseg--;
--- a/fs/nilfs2/super.c~nilfs2-fix-underflow-in-second-superblock-position-calculations
+++ a/fs/nilfs2/super.c
@@ -409,6 +409,15 @@ int nilfs_resize_fs(struct super_block *
goto out;
/*
+ * Prevent underflow in second superblock position calculation.
+ * The exact minimum size check is done in nilfs_sufile_resize().
+ */
+ if (newsize < 4096) {
+ ret = -ENOSPC;
+ goto out;
+ }
+
+ /*
* Write lock is required to protect some functions depending
* on the number of segments, the number of reserved segments,
* and so forth.
--- a/fs/nilfs2/the_nilfs.c~nilfs2-fix-underflow-in-second-superblock-position-calculations
+++ a/fs/nilfs2/the_nilfs.c
@@ -544,9 +544,15 @@ static int nilfs_load_super_block(struct
{
struct nilfs_super_block **sbp = nilfs->ns_sbp;
struct buffer_head **sbh = nilfs->ns_sbh;
- u64 sb2off = NILFS_SB2_OFFSET_BYTES(bdev_nr_bytes(nilfs->ns_bdev));
+ u64 sb2off, devsize = bdev_nr_bytes(nilfs->ns_bdev);
int valid[2], swp = 0;
+ if (devsize < NILFS_SEG_MIN_BLOCKS * NILFS_MIN_BLOCK_SIZE + 4096) {
+ nilfs_err(sb, "device size too small");
+ return -EINVAL;
+ }
+ sb2off = NILFS_SB2_OFFSET_BYTES(devsize);
+
sbp[0] = nilfs_read_super_block(sb, NILFS_SB_OFFSET_BYTES, blocksize,
&sbh[0]);
sbp[1] = nilfs_read_super_block(sb, sb2off, blocksize, &sbh[1]);
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: hugetlb: check for undefined shift on 32 bit architectures
has been removed from the -mm tree. Its filename was
hugetlb-check-for-undefined-shift-on-32-bit-architectures.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: check for undefined shift on 32 bit architectures
Date: Wed, 15 Feb 2023 17:35:42 -0800
Users can specify the hugetlb page size in the mmap, shmget and
memfd_create system calls. This is done by using 6 bits within the flags
argument to encode the base-2 logarithm of the desired page size. The
routine hstate_sizelog() uses the log2 value to find the corresponding
hugetlb hstate structure. Converting the log2 value (page_size_log) to
potential hugetlb page size is the simple statement:
1UL << page_size_log
Because only 6 bits are used for page_size_log, the left shift can not be
greater than 63. This is fine on 64 bit architectures where a long is 64
bits. However, if a value greater than 31 is passed on a 32 bit
architecture (where long is 32 bits) the shift will result in undefined
behavior. This was generally not an issue as the result of the undefined
shift had to exactly match hugetlb page size to proceed.
Recent improvements in runtime checking have resulted in this undefined
behavior throwing errors such as reported below.
Fix by comparing page_size_log to BITS_PER_LONG before doing shift.
Link: https://lkml.kernel.org/r/20230216013542.138708-1-mike.kravetz@oracle.com
Link: https://lore.kernel.org/lkml/CA+G9fYuei_Tr-vN9GS7SfFyU1y9hNysnf=PB7kT0=yv4M…
Fixes: 42d7395feb56 ("mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Reviewed-by: Jesper Juhl <jesperjuhl76(a)gmail.com>
Acked-by: Muchun Song <songmuchun(a)bytedance.com>
Tested-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Cc: Anders Roxell <anders.roxell(a)linaro.org>
Cc: Andi Kleen <ak(a)linux.intel.com>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/include/linux/hugetlb.h~hugetlb-check-for-undefined-shift-on-32-bit-architectures
+++ a/include/linux/hugetlb.h
@@ -743,7 +743,10 @@ static inline struct hstate *hstate_size
if (!page_size_log)
return &default_hstate;
- return size_to_hstate(1UL << page_size_log);
+ if (page_size_log < BITS_PER_LONG)
+ return size_to_hstate(1UL << page_size_log);
+
+ return NULL;
}
static inline struct hstate *hstate_vma(struct vm_area_struct *vma)
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The quilt patch titled
Subject: mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
has been removed from the -mm tree. Its filename was
mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/migrate: fix wrongly apply write bit after mkdirty on sparc64
Date: Thu, 16 Feb 2023 10:30:59 -0500
Nick Bowler reported another sparc64 breakage after the young/dirty
persistent work for page migration (per "Link:" below). That's after a
similar report [2].
It turns out page migration was overlooked, and it wasn't failing before
because page migration was not enabled in the initial report test
environment.
David proposed another way [2] to fix this from sparc64 side, but that
patch didn't land somehow. Neither did I check whether there's any other
arch that has similar issues.
Let's fix it for now as simple as moving the write bit handling to be
after dirty, like what we did before.
Note: this is based on mm-unstable, because the breakage was since 6.1 and
we're at a very late stage of 6.2 (-rc8), so I assume for this specific
case we should target this at 6.3.
[1] https://lore.kernel.org/all/20221021160603.GA23307@u164.east.ru/
[2] https://lore.kernel.org/all/20221212130213.136267-1-david@redhat.com/
Link: https://lkml.kernel.org/r/20230216153059.256739-1-peterx@redhat.com
Fixes: 2e3468778dbe ("mm: remember young/dirty bit for page migrations")
Link: https://lore.kernel.org/all/CADyTPExpEqaJiMGoV+Z6xVgL50ZoMJg49B10LcZ=8eg19u…
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Nick Bowler <nbowler(a)draconx.ca>
Acked-by: David Hildenbrand <david(a)redhat.com>
Tested-by: Nick Bowler <nbowler(a)draconx.ca>
Cc: <regressions(a)lists.linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/huge_memory.c~mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64
+++ a/mm/huge_memory.c
@@ -3272,8 +3272,6 @@ void remove_migration_pmd(struct page_vm
pmde = mk_huge_pmd(new, READ_ONCE(vma->vm_page_prot));
if (pmd_swp_soft_dirty(*pvmw->pmd))
pmde = pmd_mksoft_dirty(pmde);
- if (is_writable_migration_entry(entry))
- pmde = maybe_pmd_mkwrite(pmde, vma);
if (pmd_swp_uffd_wp(*pvmw->pmd))
pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
if (!is_migration_entry_young(entry))
@@ -3281,6 +3279,10 @@ void remove_migration_pmd(struct page_vm
/* NOTE: this may contain setting soft-dirty on some archs */
if (PageDirty(new) && is_migration_entry_dirty(entry))
pmde = pmd_mkdirty(pmde);
+ if (is_writable_migration_entry(entry))
+ pmde = maybe_pmd_mkwrite(pmde, vma);
+ else
+ pmde = pmd_wrprotect(pmde);
if (PageAnon(new)) {
rmap_t rmap_flags = RMAP_COMPOUND;
--- a/mm/migrate.c~mm-migrate-fix-wrongly-apply-write-bit-after-mkdirty-on-sparc64
+++ a/mm/migrate.c
@@ -224,6 +224,8 @@ static bool remove_migration_pte(struct
pte = maybe_mkwrite(pte, vma);
else if (pte_swp_uffd_wp(*pvmw.pte))
pte = pte_mkuffd_wp(pte);
+ else
+ pte = pte_wrprotect(pte);
if (folio_test_anon(folio) && !is_readable_migration_entry(entry))
rmap_flags |= RMAP_EXCLUSIVE;
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-uffd-fix-comment-in-handling-pte-markers.patch
From: John Harrison <John.C.Harrison(a)Intel.com>
Direction from hardware is that ring buffers should never be mapped
via the BAR on systems with LLC. There are too many caching pitfalls
due to the way BAR accesses are routed. So it is safest to just not
use it.
Signed-off-by: John Harrison <John.C.Harrison(a)Intel.com>
Fixes: 9d80841ea4c9 ("drm/i915: Allow ringbuffers to be bound anywhere")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.9+
---
drivers/gpu/drm/i915/gt/intel_ring.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index fb1d2595392ed..fb99143be98e7 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -53,7 +53,7 @@ int intel_ring_pin(struct intel_ring *ring, struct i915_gem_ww_ctx *ww)
if (unlikely(ret))
goto err_unpin;
- if (i915_vma_is_map_and_fenceable(vma)) {
+ if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915)) {
addr = (void __force *)i915_vma_pin_iomap(vma);
} else {
int type = i915_coherent_map_type(vma->vm->i915, vma->obj, false);
@@ -98,7 +98,7 @@ void intel_ring_unpin(struct intel_ring *ring)
return;
i915_vma_unset_ggtt_write(vma);
- if (i915_vma_is_map_and_fenceable(vma))
+ if (i915_vma_is_map_and_fenceable(vma) && !HAS_LLC(vma->vm->i915))
i915_vma_unpin_iomap(vma);
else
i915_gem_object_unpin_map(vma->obj);
--
2.39.1
BugLink: https://bugs.launchpad.net/bugs/2007581
GPIO chip irq members are exposed before they could be completely
initialized and this leads to race conditions.
One such issue was observed for the gc->irq.domain variable which
was accessed through the I2C interface in gpiochip_to_irq() before
it could be initialized by gpiochip_add_irqchip(). This resulted in
Kernel NULL pointer dereference.
Following are the logs for reference :-
kernel: Call Trace:
kernel: gpiod_to_irq+0x53/0x70
kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0
kernel: i2c_acpi_get_irq+0xc0/0xd0
kernel: i2c_device_probe+0x28a/0x2a0
kernel: really_probe+0xf2/0x460
kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0
To avoid such scenarios, restrict usage of GPIO chip irq members before
they are completely initialized.
Signed-off-by: Shreeya Patel <shreeya.patel(a)collabora.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Andy Shevchenko <andy.shevchenko(a)gmail.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Bartosz Golaszewski <brgl(a)bgdev.pl>
(backported from commit 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320)
Signed-off-by: Asmaa Mnebhi <asmaa(a)nvidia.com>
---
drivers/gpio/gpiolib.c | 19 +++++++++++++++++++
include/linux/gpio/driver.h | 9 +++++++++
2 files changed, 28 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index abdf448b11a3..e4d47e00c392 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2146,6 +2146,16 @@ static int gpiochip_to_irq(struct gpio_chip *chip, unsigned offset)
{
struct irq_domain *domain = chip->irq.domain;
+#ifdef CONFIG_GPIOLIB_IRQCHIP
+ /*
+ * Avoid race condition with other code, which tries to lookup
+ * an IRQ before the irqchip has been properly registered,
+ * i.e. while gpiochip is still being brought up.
+ */
+ if (!chip->irq.initialized)
+ return -EPROBE_DEFER;
+#endif
+
if (!gpiochip_irqchip_irq_valid(chip, offset))
return -ENXIO;
@@ -2321,6 +2331,15 @@ static int gpiochip_add_irqchip(struct gpio_chip *gpiochip,
acpi_gpiochip_request_interrupts(gpiochip);
+ /*
+ * Using barrier() here to prevent compiler from reordering
+ * gc->irq.initialized before initialization of above
+ * GPIO chip irq members.
+ */
+ barrier();
+
+ gpiochip->irq.initialized = true;
+
return 0;
}
diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h
index 5dd9c982e2cb..15418caf76fc 100644
--- a/include/linux/gpio/driver.h
+++ b/include/linux/gpio/driver.h
@@ -201,6 +201,15 @@ struct gpio_irq_chip {
*/
bool threaded;
+ /**
+ * @initialized:
+ *
+ * Flag to track GPIO chip irq member's initialization.
+ * This flag will make sure GPIO chip irq members are not used
+ * before they are initialized.
+ */
+ bool initialized;
+
/**
* @init_hw: optional routine to initialize hardware before
* an IRQ chip will be added. This is quite useful when
--
2.30.1
From: John Harrison <John.C.Harrison(a)Intel.com>
Direction from hardware is that stolen memory should never be used for
ring buffer allocations on platforms with LLC. There are too many
caching pitfalls due to the way stolen memory accesses are routed. So
it is safest to just not use it.
Signed-off-by: John Harrison <John.C.Harrison(a)Intel.com>
Fixes: c58b735fc762 ("drm/i915: Allocate rings from stolen")
Cc: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: intel-gfx(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.9+
---
drivers/gpu/drm/i915/gt/intel_ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c
index 15ec64d881c44..fb1d2595392ed 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -116,7 +116,7 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size)
obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
I915_BO_ALLOC_PM_VOLATILE);
- if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
+ if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt) && !HAS_LLC(i915))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))
obj = i915_gem_object_create_internal(i915, size);
--
2.39.1
Hi Greg,
I noticed that gcc 4.6 is also the requirement for kernel 5.4:
https://www.kernel.org/doc/html/v5.4/process/changes.html
So anyone who compiles 5.4 with gcc4.6 should also run into the problem.
Best regards,
Michael
-----Ursprüngliche Nachricht-----
Von: Michael Nies
Gesendet: Donnerstag, 9. Februar 2023 08:30
An: 'Greg KH' <gregkh(a)linuxfoundation.org>
Cc: 'stable(a)vger.kernel.org' <stable(a)vger.kernel.org>
Betreff: AW: Kernel Bug 217013
Hi Greg,
thanks for your answer.
We do not use kernel 4.19 at the moment.
On the systems itself we have only gcc 4.4.5 - so requirements for kernel 4.19 are not matched and we stay on 4.14.
But if anyone else should try to compile 4.19 with gcc 4.6 - they should run into the same problem because of "_Alignof".
On my side it is planned to compile kernel 4.19 with gcc 4.7 or higher - so we should not have problems with "_Alignof" in the future.
Best regards,
Michael
-----Ursprüngliche Nachricht-----
Von: Greg KH <mailto:gregkh@linuxfoundation.org>
Gesendet: Donnerstag, 9. Februar 2023 08:10
An: Michael Nies <mailto:michael.nies@netclusive.com>
Cc: 'stable(a)vger.kernel.org' <mailto:stable@vger.kernel.org>
Betreff: Re: Kernel Bug 217013
On Thu, Feb 09, 2023 at 06:54:51AM +0000, Michael Nies wrote:
> Hello,
>
> could you please have a look at Kernel Bug 217013 that was reported by me yesterday?
> https://bugzilla.kernel.org/show_bug.cgi?id=217013
>
> Greg wrote that I should write a Mail to this address.
Thanks for the email, and the bug report, I'll work on this later today.
Do you also see the same problem with the 4.19.y kernel tree? Or are you not using that one too, or with a newer compiler?
thanks,
greg k-h
Greg,
These two patches have been (correctly) auto selected to 5.15.y
along with the two dependency patches tagged with:
Stable-dep-of: b306e90ffabd ("ovl: remove privs in ovl_copyfile()")
9636e70ee2d3 ("ovl: use ovl_copy_{real,upper}attr() wrappers")
a54843833caf ("ovl: store lower path in ovl_inode")
It wasn't wrong to apply those patches with the two dependencies
to 5.15.y, but it is not as easy to do for 5.10.y, so here is a
very simple backport of the two fixes to 5.10.y, i.e.:
replaced ovl_copyattr(X) with ovl_copyattr(ovl_inode_real(X), X).
Note that the language "This fixes some failure in fstests..."
in commit message means that those fixes are not enough for the
tests to pass. Additional backports from v6.2 are needed for the
tests to pass and I am collaborating those backports with Leah,
so they will hit 5.15.y first before posting them for 5.10.y.
Never the less, these overlayfs fixes are important security
fixes, so they should be applied to LTS kernel even before
all the cases in the fstests are fixed.
Thanks,
Amir.
Amir Goldstein (2):
ovl: remove privs in ovl_copyfile()
ovl: remove privs in ovl_fallocate()
fs/overlayfs/file.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
--
2.34.1