The patch titled
Subject: nilfs2: fix leak of nilfs_root in case of writer thread creation failure
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix leak of nilfs_root in case of writer thread creation failure
Date: Fri, 7 Oct 2022 17:52:26 +0900
If nilfs_attach_log_writer() failed to create a log writer thread, it
frees a data structure of the log writer without any cleanup. After
commit e912a5b66837 ("nilfs2: use root object to get ifile"), this causes
a leak of struct nilfs_root, which started to leak an ifile metadata inode
and a kobject on that struct.
In addition, if the kernel is booted with panic_on_warn, the above
ifile metadata inode leak will cause the following panic when the
nilfs2 kernel module is removed:
kmem_cache_destroy nilfs2_inode_cache: Slab cache still has objects when
called from nilfs_destroy_cachep+0x16/0x3a [nilfs2]
WARNING: CPU: 8 PID: 1464 at mm/slab_common.c:494 kmem_cache_destroy+0x138/0x140
...
RIP: 0010:kmem_cache_destroy+0x138/0x140
Code: 00 20 00 00 e8 a9 55 d8 ff e9 76 ff ff ff 48 8b 53 60 48 c7 c6 20 70 65 86 48 c7 c7 d8 69 9c 86 48 8b 4c 24 28 e8 ef 71 c7 00 <0f> 0b e9 53 ff ff ff c3 48 81 ff ff 0f 00 00 77 03 31 c0 c3 53 48
...
Call Trace:
<TASK>
? nilfs_palloc_freev.cold.24+0x58/0x58 [nilfs2]
nilfs_destroy_cachep+0x16/0x3a [nilfs2]
exit_nilfs_fs+0xa/0x1b [nilfs2]
__x64_sys_delete_module+0x1d9/0x3a0
? __sanitizer_cov_trace_pc+0x1a/0x50
? syscall_trace_enter.isra.19+0x119/0x190
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
...
</TASK>
Kernel panic - not syncing: panic_on_warn set ...
This patch fixes these issues by calling nilfs_detach_log_writer() cleanup
function if spawning the log writer thread fails.
Link: https://lkml.kernel.org/r/20221007085226.57667-1-konishi.ryusuke@gmail.com
Fixes: e912a5b66837 ("nilfs2: use root object to get ifile")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+7381dc4ad60658ca4c05(a)syzkaller.appspotmail.com
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure
+++ a/fs/nilfs2/segment.c
@@ -2786,10 +2786,9 @@ int nilfs_attach_log_writer(struct super
inode_attach_wb(nilfs->ns_bdev->bd_inode, NULL);
err = nilfs_segctor_start_thread(nilfs->ns_writer);
- if (err) {
- kfree(nilfs->ns_writer);
- nilfs->ns_writer = NULL;
- }
+ if (unlikely(err))
+ nilfs_detach_log_writer(sb);
+
return err;
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-bug-of-struct-nilfs_root.patch
nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level.patch
nilfs2-fix-leak-of-nilfs_root-in-case-of-writer-thread-creation-failure.patch
nilfs2-replace-warn_ons-by-nilfs_error-for-checkpoint-acquisition-failure.patch
From: Gou Hao <gouhao(a)uniontech.com>
patch1: is memory leak of audit rule
patch2~3: is memory leak about 'fsname' field of struct ima_rule_entry
Tyler Hicks (3):
ima: Have the LSM free its audit rule
ima: Free the entire rule when deleting a list of rules
ima: Free the entire rule if it fails to parse
security/integrity/ima/ima.h | 5 +++++
security/integrity/ima/ima_policy.c | 24 ++++++++++++++++++------
2 files changed, 23 insertions(+), 6 deletions(-)
--
2.20.1
This is the start of the stable review cycle for the 5.10.147 release.
There are 52 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 05 Oct 2022 07:07:06 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.147-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.147-rc1
Kai Vehmanen <kai.vehmanen(a)linux.intel.com>
ALSA: hda/hdmi: fix warning about PCM count when used with SOF
Nadav Amit <namit(a)vmware.com>
x86/alternative: Fix race in try_get_desc()
Jim Mattson <jmattson(a)google.com>
KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guest
Florian Fainelli <f.fainelli(a)gmail.com>
clk: iproc: Do not rely on node name for correct PLL setup
Han Xu <han.xu(a)nxp.com>
clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks
Wang Yufen <wangyufen(a)huawei.com>
selftests: Fix the if conditions of in test_extra_filter()
Junxiao Chang <junxiao.chang(a)intel.com>
net: stmmac: power up/down serdes in stmmac_open/release
Michael Kelley <mikelley(a)microsoft.com>
nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
nvme: add new line after variable declatation
Rafael Mendonca <rafaelmendsr(a)gmail.com>
cxgb4: fix missing unlock on ETHOFLD desc collect fail path
Hangyu Hua <hbh25y(a)gmail.com>
net: sched: act_ct: fix possible refcount leak in tcf_ct_init()
Peilin Ye <peilin.ye(a)bytedance.com>
usbnet: Fix memory leak in usbnet_disconnect()
Yang Yingliang <yangyingliang(a)huawei.com>
Input: melfas_mip4 - fix return value check in mip4_probe()
Brian Norris <briannorris(a)chromium.org>
Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
Martin Povišer <povik+lin(a)cutebit.org>
ASoC: tas2770: Reinit regcache on reset
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Fix debugfs info for A64 SRAM C
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Fix probe function ordering issues
Cai Huoqing <caihuoqing(a)baidu.com>
soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource()
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Prevent the driver from being unbound
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Actually claim SRAM regions
Richard Zhu <hongxing.zhu(a)nxp.com>
reset: imx7: Fix the iMX8MP PCIe PHY PERST support
YuTong Chang <mtwget(a)gmail.com>
ARM: dts: am33xx: Fix MMCHS0 dma properties
Yu Kuai <yukuai3(a)huawei.com>
scsi: hisi_sas: Revert "scsi: hisi_sas: Limit max hw sectors for v3 HW"
Tianyu Lan <Tianyu.Lan(a)microsoft.com>
swiotlb: max mapping size takes min align mask into account
Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
media: rkvdec: Disable H.264 error detection
Hangyu Hua <hbh25y(a)gmail.com>
media: dvb_vb2: fix possible out of bound access
Minchan Kim <minchan(a)kernel.org>
mm: fix madivse_pageout mishandling on non-LRU page
Alistair Popple <apopple(a)nvidia.com>
mm/migrate_device.c: flush TLB while holding PTL
Maurizio Lombardi <mlombard(a)redhat.com>
mm: prevent page_frag_alloc() from corrupting the memory
Mel Gorman <mgorman(a)techsingularity.net>
mm/page_alloc: fix race condition between build_all_zonelists and page allocation
Wenchao Chen <wenchao.chen(a)unisoc.com>
mmc: hsq: Fix data stomping during mmc recovery
Sergei Antonov <saproj(a)gmail.com>
mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
Niklas Cassel <niklas.cassel(a)wdc.com>
libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
Yang Shi <shy828301(a)gmail.com>
powerpc/64s/radix: don't need to broadcast IPI for radix pmd collapse flush
Alexander Couzens <lynxis(a)fe80.eu>
net: mt7531: only do PLL once after the reset
ChenXiaoSong <chenxiaosong2(a)huawei.com>
ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
Linus Walleij <linus.walleij(a)linaro.org>
ARM: dts: integrator: Tag PCI host with device_type
Aidan MacDonald <aidanmacdonald.0x0(a)gmail.com>
clk: ingenic-tcu: Properly enable registers before accessing timers
Sebastian Krzyszkowiak <sebastian.krzyszkowiak(a)puri.sm>
Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
Frank Wunderlich <frank-w(a)public-files.de>
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
Mario Limonciello <mario.limonciello(a)amd.com>
thunderbolt: Explicitly reset plug events delay back to USB4 spec value
Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
usb: typec: ucsi: Remove incorrect warning
Hongling Zeng <zenghongling(a)kylinos.cn>
uas: ignore UAS for Thinkplus chips
Hongling Zeng <zenghongling(a)kylinos.cn>
usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
Hongling Zeng <zenghongling(a)kylinos.cn>
uas: add no-uas quirk for Hiksemi usb_disk
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix hang during unmount when stopping a space reclaim worker
Mohan Kumar <mkumard(a)nvidia.com>
ALSA: hda: Fix Nvidia dp infoframe
Hui Wang <hui.wang(a)canonical.com>
ALSA: hda/hdmi: let new platforms assign the pcm slot dynamically
Dmitry Osipenko <digetx(a)gmail.com>
ALSA: hda/tegra: Reset hardware
Dmitry Osipenko <digetx(a)gmail.com>
ALSA: hda/tegra: Use clk_bulk helpers
Gil Fine <gil.fine(a)intel.com>
thunderbolt: Add support for Intel Maple Ridge single port controller
Mika Westerberg <mika.westerberg(a)linux.intel.com>
thunderbolt: Add support for Intel Maple Ridge
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/am33xx-l4.dtsi | 3 +-
arch/arm/boot/dts/integratorap.dts | 1 +
arch/powerpc/mm/book3s64/radix_pgtable.c | 9 ---
arch/x86/kernel/alternative.c | 45 +++++------
arch/x86/kvm/cpuid.c | 2 -
drivers/ata/libata-core.c | 4 +
drivers/clk/bcm/clk-iproc-pll.c | 12 ++-
drivers/clk/imx/clk-imx6sx.c | 4 +-
drivers/clk/ingenic/tcu.c | 15 ++--
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 ----
drivers/input/keyboard/snvs_pwrkey.c | 2 +-
drivers/input/touchscreen/melfas_mip4.c | 2 +-
drivers/media/dvb-core/dvb_vb2.c | 11 +++
drivers/mmc/host/mmc_hsq.c | 2 +-
drivers/mmc/host/moxart-mmc.c | 17 +----
drivers/net/dsa/mt7530.c | 15 ++--
drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 28 ++++---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 23 +++---
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/usb/usbnet.c | 7 +-
drivers/nvme/host/core.c | 9 ++-
drivers/reset/reset-imx7.c | 1 +
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 --
drivers/soc/sunxi/sunxi_sram.c | 27 +++----
drivers/staging/media/rkvdec/rkvdec-h264.c | 4 +-
drivers/thunderbolt/icm.c | 12 +++
drivers/thunderbolt/nhi.h | 2 +
drivers/thunderbolt/switch.c | 1 +
drivers/usb/storage/unusual_uas.h | 21 ++++++
drivers/usb/typec/ucsi/ucsi.c | 2 -
fs/btrfs/disk-io.c | 25 ++++++
fs/ntfs/super.c | 3 +-
kernel/dma/swiotlb.c | 13 +++-
mm/madvise.c | 7 +-
mm/migrate.c | 5 +-
mm/page_alloc.c | 65 +++++++++++++---
net/sched/act_ct.c | 5 +-
sound/pci/hda/hda_tegra.c | 88 +++++++---------------
sound/pci/hda/patch_hdmi.c | 47 ++++++++++--
sound/soc/codecs/tas2770.c | 3 +
tools/testing/selftests/net/reuseport_bpf.c | 2 +-
42 files changed, 346 insertions(+), 223 deletions(-)
This is the start of the stable review cycle for the 4.19.261 release.
There are 25 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 05 Oct 2022 07:07:06 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.261-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.261-rc1
Florian Fainelli <f.fainelli(a)gmail.com>
clk: iproc: Do not rely on node name for correct PLL setup
Wang Yufen <wangyufen(a)huawei.com>
selftests: Fix the if conditions of in test_extra_filter()
Michael Kelley <mikelley(a)microsoft.com>
nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
nvme: add new line after variable declatation
Peilin Ye <peilin.ye(a)bytedance.com>
usbnet: Fix memory leak in usbnet_disconnect()
Yang Yingliang <yangyingliang(a)huawei.com>
Input: melfas_mip4 - fix return value check in mip4_probe()
Brian Norris <briannorris(a)chromium.org>
Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Fix debugfs info for A64 SRAM C
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Fix probe function ordering issues
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Prevent the driver from being unbound
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Actually claim SRAM regions
Tyler Hicks <tyhicks(a)linux.microsoft.com>
ima: Free the entire rule if it fails to parse
Tyler Hicks <tyhicks(a)linux.microsoft.com>
ima: Free the entire rule when deleting a list of rules
Tyler Hicks <tyhicks(a)linux.microsoft.com>
ima: Have the LSM free its audit rule
Alistair Popple <apopple(a)nvidia.com>
mm/migrate_device.c: flush TLB while holding PTL
Maurizio Lombardi <mlombard(a)redhat.com>
mm: prevent page_frag_alloc() from corrupting the memory
Mel Gorman <mgorman(a)techsingularity.net>
mm/page_alloc: fix race condition between build_all_zonelists and page allocation
Sergei Antonov <saproj(a)gmail.com>
mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
Niklas Cassel <niklas.cassel(a)wdc.com>
libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
ChenXiaoSong <chenxiaosong2(a)huawei.com>
ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
Linus Walleij <linus.walleij(a)linaro.org>
ARM: dts: integrator: Tag PCI host with device_type
Frank Wunderlich <frank-w(a)public-files.de>
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
Hongling Zeng <zenghongling(a)kylinos.cn>
uas: ignore UAS for Thinkplus chips
Hongling Zeng <zenghongling(a)kylinos.cn>
usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
Hongling Zeng <zenghongling(a)kylinos.cn>
uas: add no-uas quirk for Hiksemi usb_disk
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/integratorap.dts | 1 +
drivers/ata/libata-core.c | 4 ++
drivers/clk/bcm/clk-iproc-pll.c | 12 ++--
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 -----
drivers/input/touchscreen/melfas_mip4.c | 2 +-
drivers/mmc/host/moxart-mmc.c | 17 +-----
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/usb/usbnet.c | 7 ++-
drivers/nvme/host/core.c | 9 ++-
drivers/soc/sunxi/sunxi_sram.c | 23 ++++----
drivers/usb/storage/unusual_uas.h | 21 +++++++
fs/ntfs/super.c | 3 +-
mm/migrate.c | 5 +-
mm/page_alloc.c | 65 ++++++++++++++++++----
security/integrity/ima/ima.h | 5 ++
security/integrity/ima/ima_policy.c | 24 ++++++--
tools/testing/selftests/net/reuseport_bpf.c | 2 +-
18 files changed, 147 insertions(+), 71 deletions(-)
This is the start of the stable review cycle for the 5.4.217 release.
There are 51 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 07 Oct 2022 11:31:56 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.217-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.217-rc1
Shuah Khan <skhan(a)linuxfoundation.org>
docs: update mediator information in CoC docs
Sami Tolvanen <samitolvanen(a)google.com>
Makefile.extrawarn: Move -Wcast-function-type-strict to W=1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "drm/amdgpu: use dirty framebuffer helper"
YueHaibing <yuehaibing(a)huawei.com>
xfs: remove unused variable 'done'
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix uninitialized variable in xfs_attr3_leaf_inactive
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: streamline xfs_attr3_leaf_inactive
Christoph Hellwig <hch(a)lst.de>
xfs: move incore structures out of xfs_da_format.h
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix memory corruption during remote attr value buffer invalidation
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: refactor remote attr value buffer invalidation
Christoph Hellwig <hch(a)lst.de>
xfs: fix IOCB_NOWAIT handling in xfs_file_dio_aio_read
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix s_maxbytes computation on 32-bit kernels
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: truncate should remove all blocks, not just to the end of the page cache
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: introduce XFS_MAX_FILEOFF
Christoph Hellwig <hch(a)lst.de>
xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag
Daniel Sneddon <daniel.sneddon(a)linux.intel.com>
x86/speculation: Add RSB VM Exit protections
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts
Nathan Chancellor <nathan(a)kernel.org>
x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Disable RRSBA behavior
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Add Cannon lake to RETBleed affected CPU list
Andrew Cooper <andrew.cooper3(a)citrix.com>
x86/cpu/amd: Enumerate BTC_NO
Peter Zijlstra <peterz(a)infradead.org>
x86/common: Stamp out the stepping madness
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fill RSB on vmexit for IBRS
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Fix IBRS handling after vmexit
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
KVM: VMX: Convert launched argument to flags
Josh Poimboeuf <jpoimboe(a)kernel.org>
KVM: VMX: Flatten __vmx_vcpu_run()
Uros Bizjak <ubizjak(a)gmail.com>
KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hw
Uros Bizjak <ubizjak(a)gmail.com>
KVM/VMX: Use TEST %REG,%REG instead of CMP $0,%REG in vmenter.S
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Remove x86_spec_ctrl_mask
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix SPEC_CTRL write on SMT state change
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix firmware entry SPEC_CTRL handling
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
Peter Zijlstra <peterz(a)infradead.org>
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
Peter Zijlstra <peterz(a)infradead.org>
intel_idle: Disable IBRS during long idle
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Report Intel retbleed vulnerability
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Split spectre_v2_select_mitigation() and spectre_v2_user_select_mitigation()
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Optimize SPEC_CTRL MSR writes
Peter Zijlstra <peterz(a)infradead.org>
x86/entry: Add kernel IBRS implementation
Peter Zijlstra <peterz(a)infradead.org>
x86/entry: Remove skip_r11rcx
Peter Zijlstra <peterz(a)infradead.org>
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bugs: Add AMD retbleed= boot parameter
Alexandre Chartre <alexandre.chartre(a)oracle.com>
x86/bugs: Report AMD retbleed vulnerability
Peter Zijlstra <peterz(a)infradead.org>
x86/cpufeatures: Move RETPOLINE flags to word 11
Peter Zijlstra <peterz(a)infradead.org>
x86/kvm/vmx: Make noinstr clean
Mark Gross <mgross(a)linux.intel.com>
x86/cpu: Add a steppings field to struct x86_cpu_id
Thomas Gleixner <tglx(a)linutronix.de>
x86/cpu: Add consistent CPU match macros
Thomas Gleixner <tglx(a)linutronix.de>
x86/devicetable: Move x86 specific macro out of generic code
Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"
Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Revert "x86/speculation: Add RSB VM Exit protections"
-------------
Diffstat:
Documentation/admin-guide/kernel-parameters.txt | 13 +
.../process/code-of-conduct-interpretation.rst | 2 +-
Makefile | 4 +-
arch/x86/entry/calling.h | 68 +++-
arch/x86/entry/entry_32.S | 2 -
arch/x86/entry/entry_64.S | 34 +-
arch/x86/entry/entry_64_compat.S | 11 +-
arch/x86/include/asm/cpu_device_id.h | 132 +++++++-
arch/x86/include/asm/cpufeatures.h | 13 +-
arch/x86/include/asm/intel-family.h | 6 +
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/nospec-branch.h | 54 +--
arch/x86/kernel/cpu/amd.c | 21 +-
arch/x86/kernel/cpu/bugs.c | 365 ++++++++++++++++-----
arch/x86/kernel/cpu/common.c | 61 ++--
arch/x86/kernel/cpu/match.c | 13 +-
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/process.c | 2 +-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx/nested.c | 32 +-
arch/x86/kvm/vmx/run_flags.h | 8 +
arch/x86/kvm/vmx/vmenter.S | 161 +++++----
arch/x86/kvm/vmx/vmx.c | 72 ++--
arch/x86/kvm/vmx/vmx.h | 5 +
arch/x86/kvm/x86.c | 4 +-
drivers/base/cpu.c | 8 +
drivers/cpufreq/acpi-cpufreq.c | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 2 -
drivers/idle/intel_idle.c | 43 ++-
fs/xfs/libxfs/xfs_attr.c | 2 +-
fs/xfs/libxfs/xfs_attr_leaf.c | 4 +-
fs/xfs/libxfs/xfs_attr_leaf.h | 26 +-
fs/xfs/libxfs/xfs_attr_remote.c | 85 +++--
fs/xfs/libxfs/xfs_attr_remote.h | 2 +
fs/xfs/libxfs/xfs_da_btree.h | 17 +-
fs/xfs/libxfs/xfs_da_format.c | 1 +
fs/xfs/libxfs/xfs_da_format.h | 59 ----
fs/xfs/libxfs/xfs_dir2.h | 2 +
fs/xfs/libxfs/xfs_dir2_priv.h | 19 ++
fs/xfs/libxfs/xfs_format.h | 7 +
fs/xfs/xfs_attr_inactive.c | 146 +++------
fs/xfs/xfs_file.c | 7 +-
fs/xfs/xfs_inode.c | 25 +-
fs/xfs/xfs_reflink.c | 3 +-
fs/xfs/xfs_super.c | 48 ++-
include/linux/cpu.h | 2 +
include/linux/kvm_host.h | 2 +-
include/linux/mod_devicetable.h | 4 +-
scripts/Makefile.extrawarn | 1 +
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
51 files changed, 1056 insertions(+), 558 deletions(-)
With static analisys tools we found that strncpy() is used in rpmsg. This
function is not safe and can lead to buffer overflow. This patchset
replaces strncpy() with strscpy_pad().
This patchset backports the following commit from v5.16:
commit 766279a8f85d ("rpmsg: qcom: glink: replace strncpy() with strscpy_pad()")
Link: https://lore.kernel.org/all/20220519073330.7187-1-krzysztof.kozlowski@linar…
Found by Linux Verification Center (linuxtesting.org) with SVACE.
The patch below does not apply to the 5.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
e9233917a7e5 ("mmc: core: Terminate infinite loop in SD-UHS voltage switch")
e42726646082 ("mmc: core: Replace with already defined values for readability")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e9233917a7e53980664efbc565888163c0a33c3f Mon Sep 17 00:00:00 2001
From: Brian Norris <briannorris(a)chromium.org>
Date: Tue, 13 Sep 2022 18:40:10 -0700
Subject: [PATCH] mmc: core: Terminate infinite loop in SD-UHS voltage switch
This loop intends to retry a max of 10 times, with some implicit
termination based on the SD_{R,}OCR_S18A bit. Unfortunately, the
termination condition depends on the value reported by the SD card
(*rocr), which may or may not correctly reflect what we asked it to do.
Needless to say, it's not wise to rely on the card doing what we expect;
we should at least terminate the loop regardless. So, check both the
input and output values, so we ensure we will terminate regardless of
the SD card behavior.
Note that SDIO learned a similar retry loop in commit 0797e5f1453b
("mmc: core: Fixup signal voltage switch"), but that used the 'ocr'
result, and so the current pre-terminating condition looks like:
rocr & ocr & R4_18V_PRESENT
(i.e., it doesn't have the same bug.)
This addresses a number of crash reports seen on ChromeOS that look
like the following:
... // lots of repeated: ...
<4>[13142.846061] mmc1: Skipping voltage switch
<4>[13143.406087] mmc1: Skipping voltage switch
<4>[13143.964724] mmc1: Skipping voltage switch
<4>[13144.526089] mmc1: Skipping voltage switch
<4>[13145.086088] mmc1: Skipping voltage switch
<4>[13145.645941] mmc1: Skipping voltage switch
<3>[13146.153969] INFO: task halt:30352 blocked for more than 122 seconds.
...
Fixes: f2119df6b764 ("mmc: sd: add support for signal voltage switch procedure")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Brian Norris <briannorris(a)chromium.org>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Link: https://lore.kernel.org/r/20220914014010.2076169-1-briannorris@chromium.org
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index 06aa62ce0ed1..3662bf5320ce 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -870,7 +870,8 @@ int mmc_sd_get_cid(struct mmc_host *host, u32 ocr, u32 *cid, u32 *rocr)
* the CCS bit is set as well. We deliberately deviate from the spec in
* regards to this, which allows UHS-I to be supported for SDSC cards.
*/
- if (!mmc_host_is_spi(host) && rocr && (*rocr & SD_ROCR_S18A)) {
+ if (!mmc_host_is_spi(host) && (ocr & SD_OCR_S18R) &&
+ rocr && (*rocr & SD_ROCR_S18A)) {
err = mmc_set_uhs_voltage(host, pocr);
if (err == -EAGAIN) {
retries--;
Hi,
The following patch should be backported to the stable kernel versions
starting from 5.10
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
The patch fixes inheriting the need_wakeup flag for AF_XDP sockets.
Sockets with XDP_SHARED_UMEM cannot have good performance because they
do not get this flag from the first socket.
Fixes: b5aea28dca134 ("xsk: Add shared umem support between queue ids")
Commit ID: 60240bc26114 ("xsk: Inherit need_wakeup flag for shared sockets")
Kernels: 5.10 - 5.15 - 5.19 - 6.0
Best Regards,
Jalal
P.S.: Sorry for receiving multiple copies of this email. TEXT/PLAIN
mode is enabled now.
[ Upstream commit 8782fb61cc848364e1e1599d76d3c9dd58a1cc06 ]
The mmap lock protects the page walker from changes to the page tables
during the walk. However a read lock is insufficient to protect those
areas which don't have a VMA as munmap() detaches the VMAs before
downgrading to a read lock and actually tearing down PTEs/page tables.
For users of walk_page_range() the solution is to simply call pte_hole()
immediately without checking the actual page tables when a VMA is not
present. We now never call __walk_page_range() without a valid vma.
For walk_page_range_novma() the locking requirements are tightened to
require the mmap write lock to be taken, and then walking the pgd
directly with 'no_vma' set.
This in turn means that all page walkers either have a valid vma, or
it's that special 'novma' case for page table debugging. As a result,
all the odd '(!walk->vma && !walk->no_vma)' tests can be removed.
Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap")
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Steven Price <steven.price(a)arm.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i(a)gmail.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[manually backported. backport note: walk_page_range_novma() does not exist in
5.4, so I'm omitting it from the backport]
Signed-off-by: Jann Horn <jannh(a)google.com>
---
mm/pagewalk.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 4eb09e0898817..ec41e7552f37c 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -38,7 +38,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
do {
again:
next = pmd_addr_end(addr, end);
- if (pmd_none(*pmd) || !walk->vma) {
+ if (pmd_none(*pmd)) {
if (ops->pte_hole)
err = ops->pte_hole(addr, next, walk);
if (err)
@@ -84,7 +84,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
do {
again:
next = pud_addr_end(addr, end);
- if (pud_none(*pud) || !walk->vma) {
+ if (pud_none(*pud)) {
if (ops->pte_hole)
err = ops->pte_hole(addr, next, walk);
if (err)
@@ -254,7 +254,7 @@ static int __walk_page_range(unsigned long start, unsigned long end,
int err = 0;
struct vm_area_struct *vma = walk->vma;
- if (vma && is_vm_hugetlb_page(vma)) {
+ if (is_vm_hugetlb_page(vma)) {
if (walk->ops->hugetlb_entry)
err = walk_hugetlb_range(start, end, walk);
} else
@@ -324,9 +324,13 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
if (!vma) { /* after the last vma */
walk.vma = NULL;
next = end;
+ if (ops->pte_hole)
+ err = ops->pte_hole(start, next, &walk);
} else if (start < vma->vm_start) { /* outside vma */
walk.vma = NULL;
next = min(end, vma->vm_start);
+ if (ops->pte_hole)
+ err = ops->pte_hole(start, next, &walk);
} else { /* inside vma */
walk.vma = vma;
next = min(end, vma->vm_end);
@@ -344,9 +348,8 @@ int walk_page_range(struct mm_struct *mm, unsigned long start,
}
if (err < 0)
break;
- }
- if (walk.vma || walk.ops->pte_hole)
err = __walk_page_range(start, next, &walk);
+ }
if (err)
break;
} while (start = next, start < end);
base-commit: f28b7414ab715e6069e72a7bbe2f1354b2524beb
--
2.38.0.rc1.362.ged0d419d3c-goog
tl;dr: The existing mitigation for eIBRS PBRSB predictions uses an INT3 to
ensure a call instruction retires before a following unbalanced RET. Replace
this with a WRMSR serialising instruction which has a lower performance
penalty.
== Background ==
eIBRS (enhanced indirect branch restricted speculation) is used to prevent
predictor addresses from one privilege domain from being used for prediction
in a higher privilege domain.
== Problem ==
On processors with eIBRS protections there can be a case where upon VM exit
a guest address may be used as an RSB prediction for an unbalanced RET if a
CALL instruction hasn't yet been retired. This is termed PBRSB (Post-Barrier
Return Stack Buffer).
A mitigation for this was introduced in:
(2b1299322016731d56807aa49254a5ea3080b6b3 x86/speculation: Add RSB VM Exit protections)
This mitigation [1] has a ~1% performance impact on VM exit compared to without
it [2].
== Solution ==
The WRMSR instruction can be used as a speculation barrier and a serialising
instruction. Use this on the VM exit path instead to ensure that a CALL
instruction (in this case the call to vmx_spec_ctrl_restore_host) has retired
before the prediction of a following unbalanced RET.
This mitigation [3] has a negligible performance impact.
== Testing ==
Run the outl_to_kernel kvm-unit-tests test 200 times per configuration which
counts the cycles for an exit to kernel mode.
[1] With existing mitigation:
Average: 2026 cycles
[2] With no mitigation:
Average: 2008 cycles
[3] With proposed mitigation:
Average: 2008 cycles
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/include/asm/nospec-branch.h | 7 +++----
arch/x86/kvm/vmx/vmenter.S | 3 +--
arch/x86/kvm/vmx/vmx.c | 5 +++++
3 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index c936ce9f0c47..e5723e024b47 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -159,10 +159,9 @@
* A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
* monstrosity above, manually.
*/
-.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req ftr2=ALT_NOT(X86_FEATURE_ALWAYS)
- ALTERNATIVE_2 "jmp .Lskip_rsb_\@", \
- __stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr, \
- __stringify(__FILL_ONE_RETURN), \ftr2
+.macro FILL_RETURN_BUFFER reg:req nr:req ftr:req
+ ALTERNATIVE "jmp .Lskip_rsb_\@", \
+ __stringify(__FILL_RETURN_BUFFER(\reg,\nr)), \ftr
.Lskip_rsb_\@:
.endm
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 6de96b943804..eb82797bd7bf 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -231,8 +231,7 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
* single call to retire, before the first unbalanced RET.
*/
- FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT,\
- X86_FEATURE_RSB_VMEXIT_LITE
+ FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT
pop %_ASM_ARG2 /* @flags */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index c9b49a09e6b5..fdcd8e10c2ab 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7049,8 +7049,13 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
* For legacy IBRS, the IBRS bit always needs to be written after
* transitioning from a less privileged predictor mode, regardless of
* whether the guest/host values differ.
+ *
+ * For eIBRS affected by Post Barrier RSB Predictions a serialising
+ * instruction (wrmsr) must be executed to ensure a call instruction has
+ * retired before the prediction of a following unbalanced ret.
*/
if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
+ cpu_feature_enabled(X86_FEATURE_RSB_VMEXIT_LITE) ||
vmx->spec_ctrl != hostval)
native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
--
2.17.1
From: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
Before enabling new crtc, stream_count in dc_state does not sync with
that in drm_atomic_state. Validating dsc in such case would leave
newly added stream not jointly participating in dsc optimization with
existing streams, but simply using default initialized vcpi all the time
which gives wrong dsc determination decision.
Consider the scenaio where one 4k60 connected to the dock under dp-alt mode.
Since dp-alt mode is 2-lane setup, stream 1 consumes 63 slots with dsc needed.
Then hook up a second 4k60 to the dock.
stream 2 connected with 65 slot initialized by default without dsc.
dsc pre validate will not jointly optimize stream 2 with stream 1 before crtc 2 added
into the dc_state. That leads to stream 2 not getting dsc optimization,
and trigger atomic_check failure all the time, as 65 > 63 limit.
After getting all new crtcs added into the state, stream_count in dc_state
correctly reflect that in drm_atomic_state which comes up with correct dsc decision.
Fixes: 71be4b16d39a ("drm/amd/display: dsc validate fail not pass to atomic check")
Reviewed-by: Roman Li <Roman.Li(a)amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo(a)amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo(a)amd.com>
Tested-by: Mark Broadworth <mark.broadworth(a)amd.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 17c3daac837a..63f076a46260 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -9408,10 +9408,6 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
}
}
}
- if (!pre_validate_dsc(state, &dm_state, vars)) {
- ret = -EINVAL;
- goto fail;
- }
}
#endif
for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) {
@@ -9545,6 +9541,15 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev,
}
}
+#if defined(CONFIG_DRM_AMD_DC_DCN)
+ if (dc_resource_is_dsc_encoding_supported(dc)) {
+ if (!pre_validate_dsc(state, &dm_state, vars)) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ }
+#endif
+
/* Run this here since we want to validate the streams we created */
ret = drm_atomic_helper_check_planes(dev, state);
if (ret) {
--
2.25.1
доброе утро
Я знаю, что вы будете удивлены, получив это письмо от меня сегодня.
Я Анна С. Уильям, я работаю в Королевском банке Шотландии. Это письмо
является очень привилегированным и требует вашего немедленного
внимания, потому что мы потеряли одного из наших клиентов, который
тоже из вашей страны имеет ту же фамилию, что и вы, и у него был
срочный депозит на сумму 4,7 миллиона долларов в нашем банке до своей
смерти.
Учитывая вашу национальность с нашим покойным Заказчиком Александром, я хочу
представить вас банку в качестве бенефициара наследственного фонда и
мы оба разделим средства 50% 50%, как только деньги будут переведены
на ваш счет.
Я с нетерпением жду вашего немедленного ответа.
С уважением,
Анна С. Уильям.
[CCing regression and stable lists, to make sure they are aware of the
regression]
On 05.10.22 17:47, Hamza Mahfooz wrote:
> This reverts commit 10b6e91bd1ee9cd237ffbc244ad9c25b5fd3e167.
/me can't find that id and wonders what he did wrong -- or is this not
meant to refer to Linus tree?
And isn't this reverting both 66f99628eb24409cb8feb5061f78283c8b65f820
and abbc7a3dafb91b9d4ec56b70ec9a7520f8e13334 in one go?
> Unfortunately, this commit causes performance regressions on non-PSR
> setups. So, just revert it until FB_DAMAGE_CLIPS support can be added.
>
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2189
> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
This seems to be missing a Reported-by tag, a CC: stable tag (needed to
ensure backporting), and a Fixes: tag.
But the reason why I started writing this mail is totally different from
the comments above:
In case you are not aware of it, that patch apparently broke amdgpu for
some users of 5.4.215:
https://bugzilla.kernel.org/show_bug.cgi?id=216554
So more Link: and Reported-by: tags might would be nice.
Ciao, Thorsten
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 14 ++------------
> 1 file changed, 2 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 23998f727c7f..1a06b8d724f3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -38,8 +38,6 @@
> #include <linux/pci.h>
> #include <linux/pm_runtime.h>
> #include <drm/drm_crtc_helper.h>
> -#include <drm/drm_damage_helper.h>
> -#include <drm/drm_drv.h>
> #include <drm/drm_edid.h>
> #include <drm/drm_gem_framebuffer_helper.h>
> #include <drm/drm_fb_helper.h>
> @@ -500,12 +498,6 @@ static const struct drm_framebuffer_funcs amdgpu_fb_funcs = {
> .create_handle = drm_gem_fb_create_handle,
> };
>
> -static const struct drm_framebuffer_funcs amdgpu_fb_funcs_atomic = {
> - .destroy = drm_gem_fb_destroy,
> - .create_handle = drm_gem_fb_create_handle,
> - .dirty = drm_atomic_helper_dirtyfb,
> -};
> -
> uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,
> uint64_t bo_flags)
> {
> @@ -1108,10 +1100,8 @@ static int amdgpu_display_gem_fb_verify_and_init(struct drm_device *dev,
> if (ret)
> goto err;
>
> - if (drm_drv_uses_atomic_modeset(dev))
> - ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs_atomic);
> - else
> - ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
> + ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
> +
> if (ret)
> goto err;
>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The ftrace_boot_snapshot and alloc_snapshot cmdline options allocate the
snapshot buffer at boot up for use later. The ftrace_boot_snapshot in
particular requires the snapshot to be allocated because it will take a
snapshot at the end of boot up allowing to see the traces that happened
during boot so that it's not lost when user space takes over.
When a tracer is registered (started) there's a path that checks if it
requires the snapshot buffer or not, and if it does not and it was
allocated it will do a synchronization and free the snapshot buffer.
This is only required if the previous tracer was using it for "max
latency" snapshots, as it needs to make sure all max snapshots are
complete before freeing. But this is only needed if the previous tracer
was using the snapshot buffer for latency (like irqoff tracer and
friends). But it does not make sense to free it, if the previous tracer
was not using it, and the snapshot was allocated by the cmdline
parameters. This basically takes away the point of allocating it in the
first place!
Note, the allocated snapshot worked fine for just trace events, but fails
when a tracer is enabled on the cmdline.
Further investigation, this goes back even further and it does not require
a tracer on the cmdline to fail. Simply enable snapshots and then enable a
tracer, and it will remove the snapshot.
Link: https://lkml.kernel.org/r/20221005113757.041df7fe@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: 45ad21ca5530 ("tracing: Have trace_array keep track if snapshot buffer is allocated")
Reported-by: Ross Zwisler <zwisler(a)kernel.org>
Tested-by: Ross Zwisler <zwisler(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index def721de68a0..47a44b055a1d 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6428,12 +6428,12 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
if (tr->current_trace->reset)
tr->current_trace->reset(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE
+ had_max_tr = tr->current_trace->use_max_tr;
+
/* Current trace needs to be nop_trace before synchronize_rcu */
tr->current_trace = &nop_trace;
-#ifdef CONFIG_TRACER_MAX_TRACE
- had_max_tr = tr->allocated_snapshot;
-
if (had_max_tr && !t->use_max_tr) {
/*
* We need to make sure that the update_max_tr sees that
@@ -6446,11 +6446,13 @@ int tracing_set_tracer(struct trace_array *tr, const char *buf)
free_snapshot(tr);
}
- if (t->use_max_tr && !had_max_tr) {
+ if (t->use_max_tr && !tr->allocated_snapshot) {
ret = tracing_alloc_snapshot_instance(tr);
if (ret < 0)
goto out;
}
+#else
+ tr->current_trace = &nop_trace;
#endif
if (t->init) {
--
2.35.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Weak functions started causing havoc as they showed up in the
"available_filter_functions" and this confused people as to why some
functions marked as "notrace" were listed, but when enabled they did
nothing. This was because weak functions can still have fentry calls, and
these addresses get added to the "available_filter_functions" file.
kallsyms is what converts those addresses to names, and since the weak
functions are not listed in kallsyms, it would just pick the function
before that.
To solve this, there was a trick to detect weak functions listed, and
these records would be marked as DISABLED so that they do not get enabled
and are mostly ignored. As the processing of the list of all functions to
figure out what is weak or not can take a long time, this process is put
off into a kernel thread and run in parallel with the rest of start up.
Now the issue happens whet function tracing is enabled via the kernel
command line. As it starts very early in boot up, it can be enabled before
the records that are weak are marked to be disabled. This causes an issue
in the accounting, as the weak records are enabled by the command line
function tracing, but after boot up, they are not disabled.
The ftrace records have several accounting flags and a ref count. The
DISABLED flag is just one. If the record is enabled before it is marked
DISABLED it will get an ENABLED flag and also have its ref counter
incremented. After it is marked for DISABLED, neither the ENABLED flag nor
the ref counter is cleared. There's sanity checks on the records that are
performed after an ftrace function is registered or unregistered, and this
detected that there were records marked as ENABLED with ref counter that
should not have been.
Note, the module loading code uses the DISABLED flag as well to keep its
functions from being modified while its being loaded and some of these
flags may get set in this process. So changing the verification code to
ignore DISABLED records is a no go, as it still needs to verify that the
module records are working too.
Also, the weak functions still are calling a trampoline. Even though they
should never be called, it is dangerous to leave these weak functions
calling a trampoline that is freed, so they should still be set back to
nops.
There's two places that need to not skip records that have the ENABLED
and the DISABLED flags set. That is where the ftrace_ops is processed and
sets the records ref counts, and then later when the function itself is to
be updated, and the ENABLED flag gets removed. Add a helper function
"skip_record()" that returns true if the record has the DISABLED flag set
but not the ENABLED flag.
Link: https://lkml.kernel.org/r/20221005003809.27d2b97b@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: b39181f7c6907 ("ftrace: Add FTRACE_MCOUNT_MAX_OFFSET to avoid adding weak function")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ftrace.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 406d0597c409..83362a155791 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -1644,6 +1644,18 @@ ftrace_find_tramp_ops_any_other(struct dyn_ftrace *rec, struct ftrace_ops *op_ex
static struct ftrace_ops *
ftrace_find_tramp_ops_next(struct dyn_ftrace *rec, struct ftrace_ops *ops);
+static bool skip_record(struct dyn_ftrace *rec)
+{
+ /*
+ * At boot up, weak functions are set to disable. Function tracing
+ * can be enabled before they are, and they still need to be disabled now.
+ * If the record is disabled, still continue if it is marked as already
+ * enabled (this is needed to keep the accounting working).
+ */
+ return rec->flags & FTRACE_FL_DISABLED &&
+ !(rec->flags & FTRACE_FL_ENABLED);
+}
+
static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
int filter_hash,
bool inc)
@@ -1693,7 +1705,7 @@ static bool __ftrace_hash_rec_update(struct ftrace_ops *ops,
int in_hash = 0;
int match = 0;
- if (rec->flags & FTRACE_FL_DISABLED)
+ if (skip_record(rec))
continue;
if (all) {
@@ -2126,7 +2138,7 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
ftrace_bug_type = FTRACE_BUG_UNKNOWN;
- if (rec->flags & FTRACE_FL_DISABLED)
+ if (skip_record(rec))
return FTRACE_UPDATE_IGNORE;
/*
@@ -2241,7 +2253,7 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update)
if (update) {
/* If there's no more users, clear all flags */
if (!ftrace_rec_count(rec))
- rec->flags = 0;
+ rec->flags &= FTRACE_FL_DISABLED;
else
/*
* Just disable the record, but keep the ops TRAMP
@@ -2634,7 +2646,7 @@ void __weak ftrace_replace_code(int mod_flags)
do_for_each_ftrace_rec(pg, rec) {
- if (rec->flags & FTRACE_FL_DISABLED)
+ if (skip_record(rec))
continue;
failed = __ftrace_replace_code(rec, enable);
--
2.35.1
Commit c7b79a752871 ("mfd: intel-lpss: Add Intel Alder Lake PCH-S PCI
IDs") caused a regression on certain Gigabyte motherboards for Intel
Alder Lake-S where system crashes to NULL pointer dereference in
i2c_dw_xfer_msg() when system resumes from S3 sleep state ("deep").
I was able to debug the issue on Gigabyte Z690 AORUS ELITE and made
following notes:
- Issue happens when resuming from S3 but not when resuming from
"s2idle"
- PCI device 00:15.0 == i2c_designware.0 is already in D0 state when
system enters into pci_pm_resume_noirq() while all other i2c_designware
PCI devices are in D3. Devices were runtime suspended and in D3 prior
entering into suspend
- Interrupt comes after pci_pm_resume_noirq() when device interrupts are
re-enabled
- According to register dump the interrupt really comes from the
i2c_designware.0. Controller is enabled, I2C target address register
points to a one detectable I2C device address 0x60 and the
DW_IC_RAW_INTR_STAT register START_DET, STOP_DET, ACTIVITY and
TX_EMPTY bits are set indicating completed I2C transaction.
My guess is that the firmware uses this controller to communicate with
an on-board I2C device during resume but does not disable the controller
before giving control to an operating system.
I was told the UEFI update fixes this but never the less it revealed the
driver is not ready to handle TX_EMPTY (or RX_FULL) interrupt when device
is supposed to be idle and state variables are not set (especially the
dev->msgs pointer which may point to NULL or stale old data).
Introduce a new software status flag STATUS_ACTIVE indicating when the
controller is active in driver point of view. Now treat all interrupts
that occur when is not set as unexpected and mask all interrupts from
the controller.
Fixes: c7b79a752871 ("mfd: intel-lpss: Add Intel Alder Lake PCH-S PCI IDs")
Reported-by: Samuel Clark <slc2015(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215907
Cc: stable(a)vger.kernel.org # v5.12+
Signed-off-by: Jarkko Nikula <jarkko.nikula(a)linux.intel.com>
---
Hans: Are you able to test this on your Baytrail collection with shared
I2C controller and PUNIT so that patch doesn't break anything? I believe
even if the Linux interrupt for such shared I2C controller is shared e.g.
with the i2c-i801 the PUNIT access should not get affected by this
interrupt masking. I think PUNIT access won't use interrupts but you
never know... We have a MRD7 that has shared I2C controller with PUNIT
but Linux interrupt is not shared. I.e. unexpected interrupt case and
masking doesn't get hit.
i2c-designware-slave.c is not fully in sync with this new status flag since
STATUS_ACTIVE gets overwritten there and unexpected interrupt case is
needlessly hit in case of shared interrupt and this controller is
suspended but both of those can go to an another patchset.
---
drivers/i2c/busses/i2c-designware-core.h | 7 +++++--
drivers/i2c/busses/i2c-designware-master.c | 13 +++++++++++++
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-designware-core.h b/drivers/i2c/busses/i2c-designware-core.h
index 70b80e710990..4d3a3b464ecd 100644
--- a/drivers/i2c/busses/i2c-designware-core.h
+++ b/drivers/i2c/busses/i2c-designware-core.h
@@ -126,8 +126,9 @@
* status codes
*/
#define STATUS_IDLE 0x0
-#define STATUS_WRITE_IN_PROGRESS 0x1
-#define STATUS_READ_IN_PROGRESS 0x2
+#define STATUS_ACTIVE 0x1
+#define STATUS_WRITE_IN_PROGRESS 0x2
+#define STATUS_READ_IN_PROGRESS 0x4
/*
* operation modes
@@ -334,12 +335,14 @@ void i2c_dw_disable_int(struct dw_i2c_dev *dev);
static inline void __i2c_dw_enable(struct dw_i2c_dev *dev)
{
+ dev->status |= STATUS_ACTIVE;
regmap_write(dev->map, DW_IC_ENABLE, 1);
}
static inline void __i2c_dw_disable_nowait(struct dw_i2c_dev *dev)
{
regmap_write(dev->map, DW_IC_ENABLE, 0);
+ dev->status &= ~STATUS_ACTIVE;
}
void __i2c_dw_disable(struct dw_i2c_dev *dev);
diff --git a/drivers/i2c/busses/i2c-designware-master.c b/drivers/i2c/busses/i2c-designware-master.c
index 44a94b225ed8..dc3c5a15a95b 100644
--- a/drivers/i2c/busses/i2c-designware-master.c
+++ b/drivers/i2c/busses/i2c-designware-master.c
@@ -716,6 +716,19 @@ static int i2c_dw_irq_handler_master(struct dw_i2c_dev *dev)
u32 stat;
stat = i2c_dw_read_clear_intrbits(dev);
+
+ if (!(dev->status & STATUS_ACTIVE)) {
+ /*
+ * Unexpected interrupt in driver point of view. State
+ * variables are either unset or stale so acknowledge and
+ * disable interrupts for suppressing further interrupts if
+ * interrupt really came from this HW (E.g. firmware has left
+ * the HW active).
+ */
+ regmap_write(dev->map, DW_IC_INTR_MASK, 0);
+ return 0;
+ }
+
if (stat & DW_IC_INTR_TX_ABRT) {
dev->cmd_err |= DW_IC_ERR_TX_ABRT;
dev->status = STATUS_IDLE;
--
2.35.1
Danilo Cezar Zanella reported broken function graph tracer in the v4.19
tree. This was caused by the backport of commit
f9b58e8c7d031 ("ARM: 8800/1: use choice for kernel unwinders")
which ended in the v4.19-stable tree as of v4.19.222.
It is also part of v4.14-stable since v4.14.259.
It is also part of v4.9-stable since v4.9.299.
------->8---------
From: Russell King <rmk+kernel(a)armlinux.org.uk>
Date: Tue, 23 Apr 2019 17:09:38 +0100
Subject: [PATCH] ARM: fix function graph tracer and unwinder dependencies
Upstream commit 503621628b32782a07b2318e4112bd4372aa3401
Naresh Kamboju recently reported that the function-graph tracer crashes
on ARM. The function-graph tracer assumes that the kernel is built with
frame pointers.
We explicitly disabled the function-graph tracer when building Thumb2,
since the Thumb2 ABI doesn't have frame pointers.
We recently changed the way the unwinder method was selected, which
seems to have made it more likely that we can end up with the function-
graph tracer enabled but without the kernel built with frame pointers.
Fix up the function graph tracer dependencies so the option is not
available when we have no possibility of having frame pointers, and
adjust the dependencies on the unwinder option to hide the non-frame
pointer unwinder options if the function-graph tracer is enabled.
Reviewed-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Tested-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Russell King <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Reported-by: Danilo Cezar Zanella <danilo.zanella(a)iag.usp.br>
---
arch/arm/Kconfig | 2 +-
arch/arm/Kconfig.debug | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index d89d013f586cb..fce7e85f3ef57 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -68,7 +68,7 @@ config ARM
select HAVE_EFFICIENT_UNALIGNED_ACCESS if (CPU_V6 || CPU_V6K || CPU_V7) && MMU
select HAVE_EXIT_THREAD
select HAVE_FTRACE_MCOUNT_RECORD if (!XIP_KERNEL)
- select HAVE_FUNCTION_GRAPH_TRACER if (!THUMB2_KERNEL)
+ select HAVE_FUNCTION_GRAPH_TRACER if (!THUMB2_KERNEL && !CC_IS_CLANG)
select HAVE_FUNCTION_TRACER if (!XIP_KERNEL)
select HAVE_FUTEX_CMPXCHG if FUTEX
select HAVE_GCC_PLUGINS
diff --git a/arch/arm/Kconfig.debug b/arch/arm/Kconfig.debug
index 01c760929c9e4..b931fac129a1b 100644
--- a/arch/arm/Kconfig.debug
+++ b/arch/arm/Kconfig.debug
@@ -47,8 +47,8 @@ config DEBUG_WX
choice
prompt "Choose kernel unwinder"
- default UNWINDER_ARM if AEABI && !FUNCTION_GRAPH_TRACER
- default UNWINDER_FRAME_POINTER if !AEABI || FUNCTION_GRAPH_TRACER
+ default UNWINDER_ARM if AEABI
+ default UNWINDER_FRAME_POINTER if !AEABI
help
This determines which method will be used for unwinding kernel stack
traces for panics, oopses, bugs, warnings, perf, /proc/<pid>/stack,
@@ -65,7 +65,7 @@ config UNWINDER_FRAME_POINTER
config UNWINDER_ARM
bool "ARM EABI stack unwinder"
- depends on AEABI
+ depends on AEABI && !FUNCTION_GRAPH_TRACER
select ARM_UNWIND
help
This option enables stack unwinding support in the kernel
--
2.37.2
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The ring buffer is broken up into sub buffers (currently of page size).
Each sub buffer has a pointer to its "tail" (the last event written to the
sub buffer). When a new event is requested, the tail is locally
incremented to cover the size of the new event. This is done in a way that
there is no need for locking.
If the tail goes past the end of the sub buffer, the process of moving to
the next sub buffer takes place. After setting the current sub buffer to
the next one, the previous one that had the tail go passed the end of the
sub buffer needs to be reset back to the original tail location (before
the new event was requested) and the rest of the sub buffer needs to be
"padded".
The race happens when a reader takes control of the sub buffer. As readers
do a "swap" of sub buffers from the ring buffer to get exclusive access to
the sub buffer, it replaces the "head" sub buffer with an empty sub buffer
that goes back into the writable portion of the ring buffer. This swap can
happen as soon as the writer moves to the next sub buffer and before it
updates the last sub buffer with padding.
Because the sub buffer can be released to the reader while the writer is
still updating the padding, it is possible for the reader to see the event
that goes past the end of the sub buffer. This can cause obvious issues.
To fix this, add a few memory barriers so that the reader definitely sees
the updates to the sub buffer, and also waits until the writer has put
back the "tail" of the sub buffer back to the last event that was written
on it.
To be paranoid, it will only spin for 1 second, otherwise it will
warn and shutdown the ring buffer code. 1 second should be enough as
the writer does have preemption disabled. If the writer doesn't move
within 1 second (with preemption disabled) something is horribly
wrong. No interrupt should last 1 second!
Link: https://lore.kernel.org/all/20220830120854.7545-1-jiazi.li@transsion.com/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216369
Link: https://lkml.kernel.org/r/20220929104909.0650a36c@gandalf.local.home
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Fixes: c7b0930857e22 ("ring-buffer: prevent adding write in discarded area")
Reported-by: Jiazi.Li <jiazi.li(a)transsion.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3046deacf7b3..c3f354cfc5ba 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2648,6 +2648,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
/* Mark the rest of the page with padding */
rb_event_set_padding(event);
+ /* Make sure the padding is visible before the write update */
+ smp_wmb();
+
/* Set the write back to the previous setting */
local_sub(length, &tail_page->write);
return;
@@ -2659,6 +2662,9 @@ rb_reset_tail(struct ring_buffer_per_cpu *cpu_buffer,
/* time delta must be non zero */
event->time_delta = 1;
+ /* Make sure the padding is visible before the tail_page->write update */
+ smp_wmb();
+
/* Set write to end of buffer */
length = (tail + length) - BUF_PAGE_SIZE;
local_sub(length, &tail_page->write);
@@ -4627,6 +4633,33 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags);
+ /*
+ * The writer has preempt disable, wait for it. But not forever
+ * Although, 1 second is pretty much "forever"
+ */
+#define USECS_WAIT 1000000
+ for (nr_loops = 0; nr_loops < USECS_WAIT; nr_loops++) {
+ /* If the write is past the end of page, a writer is still updating it */
+ if (likely(!reader || rb_page_write(reader) <= BUF_PAGE_SIZE))
+ break;
+
+ udelay(1);
+
+ /* Get the latest version of the reader write value */
+ smp_rmb();
+ }
+
+ /* The writer is not moving forward? Something is wrong */
+ if (RB_WARN_ON(cpu_buffer, nr_loops == USECS_WAIT))
+ reader = NULL;
+
+ /*
+ * Make sure we see any padding after the write update
+ * (see rb_reset_tail())
+ */
+ smp_rmb();
+
+
return reader;
}
--
2.35.1
This reverts dbf8e63f48af ("f2fs: remove device type check for direct IO"),
and apply the below first version, since it contributed out-of-order DIO writes.
For zoned devices, f2fs forbids direct IO and forces buffered IO
to serialize write IOs. However, the constraint does not apply to
read IOs.
Cc: stable(a)vger.kernel.org
Fixes: dbf8e63f48af ("f2fs: remove device type check for direct IO")
Signed-off-by: Eunhee Rho <eunhee83.rho(a)samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
---
fs/f2fs/f2fs.h | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 2ed00111a399..a0b2c8626a75 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -4535,7 +4535,12 @@ static inline bool f2fs_force_buffered_io(struct inode *inode,
/* disallow direct IO if any of devices has unaligned blksize */
if (f2fs_is_multi_device(sbi) && !sbi->aligned_blksize)
return true;
-
+ /*
+ * for blkzoned device, fallback direct IO to buffered IO, so
+ * all IOs can be serialized by log-structured write.
+ */
+ if (f2fs_sb_has_blkzoned(sbi) && (rw == WRITE))
+ return true;
if (f2fs_lfs_mode(sbi) && (rw == WRITE)) {
if (block_unaligned_IO(inode, iocb, iter))
return true;
--
2.38.0.rc1.362.ged0d419d3c-goog
Hey there,
On 10/5/22 at 9:47 AM, Вячеслав Сальников wrote:
> Hi.
>
> I apologize if I wrote in the wrong mail list. I have not found
> linux-netdev for questions
I've Cc'd the netdev and stable lists.
>
> I switched from kernel versions 4.9 to 5.15 and found that the MTU on
> the interfaces in the bridge does not change.
> For example:
> I have the following bridge:
> bridge interface
> br0 sw1
> sw2
> sw3
>
> And I change with ifconfig MTU.
> I see that br0 sw1..sw3 has changed MTU from 1500 -> 1982.
>
> But if i send a ping through these interfaces, I get 1500(I added
> prints for output)
> I investigated the code and found the reason:
> The following commit came in the new kernel:
> https://github.com/torvalds/linux/commit/ac6627a28dbfb5d96736544a00c3938fa7…
>
> And the behavior of the MTU setting has changed:
>>
>> Kernel 4.9:
>> if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
>> ip_mtu_locked(dst) ||
>> !forwarding) <--- True
>> return dst_mtu(dst) <--- 1982
>>
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU);
>> if (mtu)
>> return mtu;
>
>
>
> Kernel 5.15:
>>
>> if (READ_ONCE(net->ipv4.sysctl_ip_fwd_use_pmtu) ||
>> ip_mtu_locked(dst) ||
>> !forwarding) { <--- True
>> mtu = rt->rt_pmtu; <--- 0
>> if (mtu && time_before(jiffies, rt->dst.expires)) <-- False
>> goto out;
>> }
>>
>> / 'forwarding = true' case should always honour route mtu /
>> mtu = dst_metric_raw(dst, RTAX_MTU); <---- 1500
>> if (mtu) <--- True
>> goto out;
>
>
> Why is rt_pmtu now used instead of dst_mtu?
> Why is forwarding = False called with dst_metric_raw?
> Maybe we should add processing when mtu = rt->rt_pmtu == 0?
> Could this be an error?
>
Cheers,
-srw
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
df5b035b5683 ("x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant")
66558b730f25 ("sched: Add cluster scheduler level for x86")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df5b035b5683d6a25f077af889fb88e09827f8bc Mon Sep 17 00:00:00 2001
From: Borislav Petkov <bp(a)suse.de>
Date: Fri, 19 Aug 2022 19:47:44 +0200
Subject: [PATCH] x86/cacheinfo: Add a cpu_llc_shared_mask() UP variant
On a CONFIG_SMP=n kernel, the LLC shared mask is 0, which prevents
__cache_amd_cpumap_setup() from doing the L3 masks setup, and more
specifically from setting up the shared_cpu_map and shared_cpu_list
files in sysfs, leading to lscpu from util-linux getting confused and
segfaulting.
Add a cpu_llc_shared_mask() UP variant which returns a mask with a
single bit set, i.e., for CPU0.
Fixes: 2b83809a5e6d ("x86/cpu/amd: Derive L3 shared_cpu_map from cpu_llc_shared_mask")
Reported-by: Saurabh Sengar <ssengar(a)linux.microsoft.com>
Signed-off-by: Borislav Petkov <bp(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1660148115-302-1-git-send-email-ssengar@linux.mic…
diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h
index 81a0211a372d..a73bced40e24 100644
--- a/arch/x86/include/asm/smp.h
+++ b/arch/x86/include/asm/smp.h
@@ -21,16 +21,6 @@ DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id);
DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id);
DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number);
-static inline struct cpumask *cpu_llc_shared_mask(int cpu)
-{
- return per_cpu(cpu_llc_shared_map, cpu);
-}
-
-static inline struct cpumask *cpu_l2c_shared_mask(int cpu)
-{
- return per_cpu(cpu_l2c_shared_map, cpu);
-}
-
DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid);
DECLARE_EARLY_PER_CPU_READ_MOSTLY(u32, x86_cpu_to_acpiid);
DECLARE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_bios_cpu_apicid);
@@ -172,6 +162,16 @@ extern int safe_smp_processor_id(void);
# define safe_smp_processor_id() smp_processor_id()
#endif
+static inline struct cpumask *cpu_llc_shared_mask(int cpu)
+{
+ return per_cpu(cpu_llc_shared_map, cpu);
+}
+
+static inline struct cpumask *cpu_l2c_shared_mask(int cpu)
+{
+ return per_cpu(cpu_l2c_shared_map, cpu);
+}
+
#else /* !CONFIG_SMP */
#define wbinvd_on_cpu(cpu) wbinvd()
static inline int wbinvd_on_all_cpus(void)
@@ -179,6 +179,11 @@ static inline int wbinvd_on_all_cpus(void)
wbinvd();
return 0;
}
+
+static inline struct cpumask *cpu_llc_shared_mask(int cpu)
+{
+ return (struct cpumask *)cpumask_of(0);
+}
#endif /* CONFIG_SMP */
extern unsigned disabled_cpus;
Return value of a function 'xdp_convert_buff_to_frame' is dereferenced
without checking for null, but it is usually checked for this function.
This fixed in upstream commit <e8223eeff02> while refactoring. So,
simpler patch is offered for a stable version.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
This is the start of the stable review cycle for the 5.4.216 release.
There are 30 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 05 Oct 2022 07:07:06 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.216-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.216-rc1
Florian Fainelli <f.fainelli(a)gmail.com>
clk: iproc: Do not rely on node name for correct PLL setup
Han Xu <han.xu(a)nxp.com>
clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks
Wang Yufen <wangyufen(a)huawei.com>
selftests: Fix the if conditions of in test_extra_filter()
Michael Kelley <mikelley(a)microsoft.com>
nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
Chaitanya Kulkarni <chaitanya.kulkarni(a)wdc.com>
nvme: add new line after variable declatation
Peilin Ye <peilin.ye(a)bytedance.com>
usbnet: Fix memory leak in usbnet_disconnect()
Yang Yingliang <yangyingliang(a)huawei.com>
Input: melfas_mip4 - fix return value check in mip4_probe()
Brian Norris <briannorris(a)chromium.org>
Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Fix debugfs info for A64 SRAM C
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Fix probe function ordering issues
Cai Huoqing <caihuoqing(a)baidu.com>
soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource()
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Prevent the driver from being unbound
Samuel Holland <samuel(a)sholland.org>
soc: sunxi: sram: Actually claim SRAM regions
YuTong Chang <mtwget(a)gmail.com>
ARM: dts: am33xx: Fix MMCHS0 dma properties
Faiz Abbas <faiz_abbas(a)ti.com>
ARM: dts: Move am33xx and am43xx mmc nodes to sdhci-omap driver
Hangyu Hua <hbh25y(a)gmail.com>
media: dvb_vb2: fix possible out of bound access
Minchan Kim <minchan(a)kernel.org>
mm: fix madivse_pageout mishandling on non-LRU page
Alistair Popple <apopple(a)nvidia.com>
mm/migrate_device.c: flush TLB while holding PTL
Maurizio Lombardi <mlombard(a)redhat.com>
mm: prevent page_frag_alloc() from corrupting the memory
Mel Gorman <mgorman(a)techsingularity.net>
mm/page_alloc: fix race condition between build_all_zonelists and page allocation
Sergei Antonov <saproj(a)gmail.com>
mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
Niklas Cassel <niklas.cassel(a)wdc.com>
libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
Sasha Levin <sashal(a)kernel.org>
Revert "net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()"
ChenXiaoSong <chenxiaosong2(a)huawei.com>
ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
Linus Walleij <linus.walleij(a)linaro.org>
ARM: dts: integrator: Tag PCI host with device_type
Aidan MacDonald <aidanmacdonald.0x0(a)gmail.com>
clk: ingenic-tcu: Properly enable registers before accessing timers
Frank Wunderlich <frank-w(a)public-files.de>
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
Hongling Zeng <zenghongling(a)kylinos.cn>
uas: ignore UAS for Thinkplus chips
Hongling Zeng <zenghongling(a)kylinos.cn>
usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
Hongling Zeng <zenghongling(a)kylinos.cn>
uas: add no-uas quirk for Hiksemi usb_disk
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/am335x-baltos.dtsi | 2 +-
arch/arm/boot/dts/am335x-boneblack-common.dtsi | 1 +
arch/arm/boot/dts/am335x-boneblack-wireless.dts | 1 -
arch/arm/boot/dts/am335x-boneblue.dts | 1 -
arch/arm/boot/dts/am335x-bonegreen-wireless.dts | 1 -
arch/arm/boot/dts/am335x-evm.dts | 3 +-
arch/arm/boot/dts/am335x-evmsk.dts | 2 +-
arch/arm/boot/dts/am335x-lxm.dts | 2 +-
arch/arm/boot/dts/am335x-moxa-uc-2100-common.dtsi | 2 +-
arch/arm/boot/dts/am335x-moxa-uc-8100-me-t.dts | 2 +-
arch/arm/boot/dts/am335x-pepper.dts | 4 +-
arch/arm/boot/dts/am335x-phycore-som.dtsi | 2 +-
arch/arm/boot/dts/am33xx-l4.dtsi | 9 +--
arch/arm/boot/dts/am33xx.dtsi | 3 +-
arch/arm/boot/dts/am4372.dtsi | 3 +-
arch/arm/boot/dts/am437x-cm-t43.dts | 2 +-
arch/arm/boot/dts/am437x-gp-evm.dts | 4 +-
arch/arm/boot/dts/am437x-l4.dtsi | 5 +-
arch/arm/boot/dts/am437x-sk-evm.dts | 2 +-
arch/arm/boot/dts/integratorap.dts | 1 +
drivers/ata/libata-core.c | 4 ++
drivers/clk/bcm/clk-iproc-pll.c | 12 ++--
drivers/clk/imx/clk-imx6sx.c | 4 +-
drivers/clk/ingenic/tcu.c | 15 ++---
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 -----
drivers/input/touchscreen/melfas_mip4.c | 2 +-
drivers/media/dvb-core/dvb_vb2.c | 11 ++++
drivers/mmc/host/moxart-mmc.c | 17 +-----
drivers/net/ethernet/marvell/mvpp2/mvpp2_debugfs.c | 4 +-
drivers/net/usb/qmi_wwan.c | 1 +
drivers/net/usb/usbnet.c | 7 ++-
drivers/nvme/host/core.c | 9 ++-
drivers/soc/sunxi/sunxi_sram.c | 27 ++++-----
drivers/usb/storage/unusual_uas.h | 21 +++++++
fs/ntfs/super.c | 3 +-
mm/madvise.c | 7 ++-
mm/migrate.c | 5 +-
mm/page_alloc.c | 65 ++++++++++++++++++----
tools/testing/selftests/net/reuseport_bpf.c | 2 +-
40 files changed, 173 insertions(+), 112 deletions(-)
This backport introduces IBRS support to 5.4.y in order to mitigate Retbleed on
Intel parts. Though some very small pieces for AMD have been picked up as well,
"UNRET" mitigations are not backported, nor IBPB. It is expected, though, that
the backport will report AMD systems as vulnerable or not affected, depending
on the parts and the BTC_NO bit.
One note here is that the PBRSB mitigation was backported previously to the 5.4
series, and this would have made things a little bit more complicated. So, I
reverted it and applied it later on.
This has been boot-tested and smoke-tested on a bunch of AMD and Intel systems.
Alexandre Chartre (2):
x86/bugs: Report AMD retbleed vulnerability
x86/bugs: Add AMD retbleed= boot parameter
Andrew Cooper (1):
x86/cpu/amd: Enumerate BTC_NO
Daniel Sneddon (1):
x86/speculation: Add RSB VM Exit protections
Josh Poimboeuf (9):
x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n
x86/speculation: Fix firmware entry SPEC_CTRL handling
x86/speculation: Fix SPEC_CTRL write on SMT state change
x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit
x86/speculation: Remove x86_spec_ctrl_mask
KVM: VMX: Flatten __vmx_vcpu_run()
KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS
KVM: VMX: Fix IBRS handling after vmexit
x86/speculation: Fill RSB on vmexit for IBRS
Mark Gross (1):
x86/cpu: Add a steppings field to struct x86_cpu_id
Nathan Chancellor (1):
x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_current
Pawan Gupta (4):
x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS
x86/bugs: Add Cannon lake to RETBleed affected CPU list
x86/speculation: Disable RRSBA behavior
x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS
parts
Peter Zijlstra (11):
x86/kvm/vmx: Make noinstr clean
x86/cpufeatures: Move RETPOLINE flags to word 11
x86/bugs: Keep a per-CPU IA32_SPEC_CTRL value
x86/entry: Remove skip_r11rcx
x86/entry: Add kernel IBRS implementation
x86/bugs: Optimize SPEC_CTRL MSR writes
x86/bugs: Split spectre_v2_select_mitigation() and
spectre_v2_user_select_mitigation()
x86/bugs: Report Intel retbleed vulnerability
intel_idle: Disable IBRS during long idle
x86/speculation: Change FILL_RETURN_BUFFER to work with objtool
x86/common: Stamp out the stepping madness
Thadeu Lima de Souza Cascardo (3):
Revert "x86/speculation: Add RSB VM Exit protections"
Revert "x86/cpu: Add a steppings field to struct x86_cpu_id"
KVM: VMX: Convert launched argument to flags
Thomas Gleixner (2):
x86/devicetable: Move x86 specific macro out of generic code
x86/cpu: Add consistent CPU match macros
Uros Bizjak (2):
KVM/VMX: Use TEST %REG,%REG instead of CMP $0,%REG in vmenter.S
KVM/nVMX: Use __vmx_vcpu_run in nested_vmx_check_vmentry_hw
.../admin-guide/kernel-parameters.txt | 13 +
arch/x86/entry/calling.h | 68 +++-
arch/x86/entry/entry_32.S | 2 -
arch/x86/entry/entry_64.S | 34 +-
arch/x86/entry/entry_64_compat.S | 11 +-
arch/x86/include/asm/cpu_device_id.h | 132 ++++++-
arch/x86/include/asm/cpufeatures.h | 13 +-
arch/x86/include/asm/intel-family.h | 6 +
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/nospec-branch.h | 54 +--
arch/x86/kernel/cpu/amd.c | 21 +-
arch/x86/kernel/cpu/bugs.c | 365 ++++++++++++++----
arch/x86/kernel/cpu/common.c | 61 +--
arch/x86/kernel/cpu/match.c | 13 +-
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/process.c | 2 +-
arch/x86/kvm/svm.c | 1 +
arch/x86/kvm/vmx/nested.c | 32 +-
arch/x86/kvm/vmx/run_flags.h | 8 +
arch/x86/kvm/vmx/vmenter.S | 161 ++++----
arch/x86/kvm/vmx/vmx.c | 72 ++--
arch/x86/kvm/vmx/vmx.h | 5 +
arch/x86/kvm/x86.c | 4 +-
drivers/base/cpu.c | 8 +
drivers/cpufreq/acpi-cpufreq.c | 1 +
drivers/cpufreq/amd_freq_sensitivity.c | 1 +
drivers/idle/intel_idle.c | 43 ++-
include/linux/cpu.h | 2 +
include/linux/kvm_host.h | 2 +-
include/linux/mod_devicetable.h | 4 +-
tools/arch/x86/include/asm/cpufeatures.h | 2 +-
31 files changed, 840 insertions(+), 312 deletions(-)
create mode 100644 arch/x86/kvm/vmx/run_flags.h
--
2.34.1
I'm announcing the release of the 5.10.147 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/boot/dts/am33xx-l4.dtsi | 3
arch/arm/boot/dts/integratorap.dts | 1
arch/x86/kernel/alternative.c | 45 +++++-----
arch/x86/kvm/cpuid.c | 2
drivers/ata/libata-core.c | 4
drivers/clk/bcm/clk-iproc-pll.c | 12 +-
drivers/clk/imx/clk-imx6sx.c | 4
drivers/clk/ingenic/tcu.c | 15 +--
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 ---
drivers/input/keyboard/snvs_pwrkey.c | 2
drivers/input/touchscreen/melfas_mip4.c | 2
drivers/media/dvb-core/dvb_vb2.c | 11 ++
drivers/mmc/host/mmc_hsq.c | 2
drivers/mmc/host/moxart-mmc.c | 17 ----
drivers/net/dsa/mt7530.c | 15 ++-
drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 28 ++++--
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 23 +++--
drivers/net/usb/qmi_wwan.c | 1
drivers/net/usb/usbnet.c | 7 +
drivers/nvme/host/core.c | 9 +-
drivers/reset/reset-imx7.c | 1
drivers/scsi/hisi_sas/hisi_sas_v3_hw.c | 7 -
drivers/soc/sunxi/sunxi_sram.c | 27 ++----
drivers/staging/media/rkvdec/rkvdec-h264.c | 4
drivers/thunderbolt/icm.c | 12 ++
drivers/thunderbolt/nhi.h | 2
drivers/thunderbolt/switch.c | 1
drivers/usb/storage/unusual_uas.h | 21 +++++
drivers/usb/typec/ucsi/ucsi.c | 2
fs/btrfs/disk-io.c | 25 +++++
fs/ntfs/super.c | 3
kernel/dma/swiotlb.c | 13 ++-
mm/madvise.c | 7 +
mm/migrate.c | 5 -
mm/page_alloc.c | 65 +++++++++++++--
net/sched/act_ct.c | 5 -
sound/pci/hda/hda_tegra.c | 88 ++++++---------------
sound/pci/hda/patch_hdmi.c | 47 +++++++++--
sound/soc/codecs/tas2770.c | 3
tools/testing/selftests/net/reuseport_bpf.c | 2
41 files changed, 345 insertions(+), 213 deletions(-)
Aidan MacDonald (1):
clk: ingenic-tcu: Properly enable registers before accessing timers
Alexander Couzens (1):
net: mt7531: only do PLL once after the reset
Alistair Popple (1):
mm/migrate_device.c: flush TLB while holding PTL
Brian Norris (1):
Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
Cai Huoqing (1):
soc: sunxi_sram: Make use of the helper function devm_platform_ioremap_resource()
Chaitanya Kulkarni (1):
nvme: add new line after variable declatation
ChenXiaoSong (1):
ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
Dmitry Osipenko (2):
ALSA: hda/tegra: Use clk_bulk helpers
ALSA: hda/tegra: Reset hardware
Filipe Manana (1):
btrfs: fix hang during unmount when stopping a space reclaim worker
Florian Fainelli (1):
clk: iproc: Do not rely on node name for correct PLL setup
Frank Wunderlich (1):
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
Gil Fine (1):
thunderbolt: Add support for Intel Maple Ridge single port controller
Greg Kroah-Hartman (1):
Linux 5.10.147
Han Xu (1):
clk: imx: imx6sx: remove the SET_RATE_PARENT flag for QSPI clocks
Hangyu Hua (2):
media: dvb_vb2: fix possible out of bound access
net: sched: act_ct: fix possible refcount leak in tcf_ct_init()
Heikki Krogerus (1):
usb: typec: ucsi: Remove incorrect warning
Hongling Zeng (3):
uas: add no-uas quirk for Hiksemi usb_disk
usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
uas: ignore UAS for Thinkplus chips
Hui Wang (1):
ALSA: hda/hdmi: let new platforms assign the pcm slot dynamically
Jim Mattson (1):
KVM: x86: Hide IA32_PLATFORM_DCA_CAP[31:0] from the guest
Junxiao Chang (1):
net: stmmac: power up/down serdes in stmmac_open/release
Kai Vehmanen (1):
ALSA: hda/hdmi: fix warning about PCM count when used with SOF
Linus Walleij (1):
ARM: dts: integrator: Tag PCI host with device_type
Mario Limonciello (1):
thunderbolt: Explicitly reset plug events delay back to USB4 spec value
Martin Povišer (1):
ASoC: tas2770: Reinit regcache on reset
Maurizio Lombardi (1):
mm: prevent page_frag_alloc() from corrupting the memory
Mel Gorman (1):
mm/page_alloc: fix race condition between build_all_zonelists and page allocation
Michael Kelley (1):
nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
Mika Westerberg (1):
thunderbolt: Add support for Intel Maple Ridge
Minchan Kim (1):
mm: fix madivse_pageout mishandling on non-LRU page
Mohan Kumar (1):
ALSA: hda: Fix Nvidia dp infoframe
Nadav Amit (1):
x86/alternative: Fix race in try_get_desc()
Nicolas Dufresne (1):
media: rkvdec: Disable H.264 error detection
Niklas Cassel (1):
libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
Peilin Ye (1):
usbnet: Fix memory leak in usbnet_disconnect()
Rafael Mendonca (1):
cxgb4: fix missing unlock on ETHOFLD desc collect fail path
Richard Zhu (1):
reset: imx7: Fix the iMX8MP PCIe PHY PERST support
Samuel Holland (4):
soc: sunxi: sram: Actually claim SRAM regions
soc: sunxi: sram: Prevent the driver from being unbound
soc: sunxi: sram: Fix probe function ordering issues
soc: sunxi: sram: Fix debugfs info for A64 SRAM C
Sebastian Krzyszkowiak (1):
Input: snvs_pwrkey - fix SNVS_HPVIDR1 register address
Sergei Antonov (1):
mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
Tianyu Lan (1):
swiotlb: max mapping size takes min align mask into account
Wang Yufen (1):
selftests: Fix the if conditions of in test_extra_filter()
Wenchao Chen (1):
mmc: hsq: Fix data stomping during mmc recovery
Yang Yingliang (1):
Input: melfas_mip4 - fix return value check in mip4_probe()
Yu Kuai (1):
scsi: hisi_sas: Revert "scsi: hisi_sas: Limit max hw sectors for v3 HW"
YuTong Chang (1):
ARM: dts: am33xx: Fix MMCHS0 dma properties
I'm announcing the release of the 4.19.261 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/boot/dts/integratorap.dts | 1
drivers/ata/libata-core.c | 4 +
drivers/clk/bcm/clk-iproc-pll.c | 12 ++-
drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 13 ----
drivers/input/touchscreen/melfas_mip4.c | 2
drivers/mmc/host/moxart-mmc.c | 17 -----
drivers/net/usb/qmi_wwan.c | 1
drivers/net/usb/usbnet.c | 7 +-
drivers/nvme/host/core.c | 9 +-
drivers/soc/sunxi/sunxi_sram.c | 23 +++----
drivers/usb/storage/unusual_uas.h | 21 ++++++
fs/ntfs/super.c | 3
mm/migrate.c | 5 -
mm/page_alloc.c | 65 +++++++++++++++++----
security/integrity/ima/ima.h | 5 +
security/integrity/ima/ima_policy.c | 24 +++++--
tools/testing/selftests/net/reuseport_bpf.c | 2
18 files changed, 146 insertions(+), 70 deletions(-)
Alistair Popple (1):
mm/migrate_device.c: flush TLB while holding PTL
Brian Norris (1):
Revert "drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time"
Chaitanya Kulkarni (1):
nvme: add new line after variable declatation
ChenXiaoSong (1):
ntfs: fix BUG_ON in ntfs_lookup_inode_by_name()
Florian Fainelli (1):
clk: iproc: Do not rely on node name for correct PLL setup
Frank Wunderlich (1):
net: usb: qmi_wwan: Add new usb-id for Dell branded EM7455
Greg Kroah-Hartman (1):
Linux 4.19.261
Hongling Zeng (3):
uas: add no-uas quirk for Hiksemi usb_disk
usb-storage: Add Hiksemi USB3-FW to IGNORE_UAS
uas: ignore UAS for Thinkplus chips
Linus Walleij (1):
ARM: dts: integrator: Tag PCI host with device_type
Maurizio Lombardi (1):
mm: prevent page_frag_alloc() from corrupting the memory
Mel Gorman (1):
mm/page_alloc: fix race condition between build_all_zonelists and page allocation
Michael Kelley (1):
nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices
Niklas Cassel (1):
libata: add ATA_HORKAGE_NOLPM for Pioneer BDR-207M and BDR-205
Peilin Ye (1):
usbnet: Fix memory leak in usbnet_disconnect()
Samuel Holland (4):
soc: sunxi: sram: Actually claim SRAM regions
soc: sunxi: sram: Prevent the driver from being unbound
soc: sunxi: sram: Fix probe function ordering issues
soc: sunxi: sram: Fix debugfs info for A64 SRAM C
Sergei Antonov (1):
mmc: moxart: fix 4-bit bus width and remove 8-bit bus width
Tyler Hicks (3):
ima: Have the LSM free its audit rule
ima: Free the entire rule when deleting a list of rules
ima: Free the entire rule if it fails to parse
Wang Yufen (1):
selftests: Fix the if conditions of in test_extra_filter()
Yang Yingliang (1):
Input: melfas_mip4 - fix return value check in mip4_probe()
On Wed, Oct 05, 2022 at 10:10:42AM +0200, Ferry Toth wrote:
> On 05-10-2022 04:39, Andrey Smirnov wrote:
> > On Tue, Oct 4, 2022 at 7:12 PM Thinh Nguyen<Thinh.Nguyen(a)synopsys.com> wrote:
...
> > FWIW, I just got the same HW Ferry has last week and am planning to
> > work on this over the weekend.
> I can help you setup, we have binary images available on github as well as
> Yocto recipies to build them.
Also you can build all components (U-Boot, kernel, Buildroot initrd) separately
as explained here:
https://edison-fw.github.io/edison-wiki/u-boot-updatehttps://edison-fw.github.io/edison-wiki/vanillahttps://github.com/andy-shev/buildroot/tree/intel/board/intel/common
--
With Best Regards,
Andy Shevchenko
On 10/3/22 at 9:35 AM, Carl Dasantas wrote:
>
> I was wondering if a new longterm kernel will be made prior to 6.1
> being released with Rust support added? As the kernel.org page
> https://www.kernel.org/category/releases.html states "Longterm kernels
> are picked based on various factors -- major new features...". In my
> opinion, adding Rust support is a major new feature. Of course it goes
> on to say new longterm kernels are dependent on time, etc so I thought
> I would inquire. No harm in that, right? I'm sure there are a lot of
> others in our community that are hesitant as I am with Rust and want
> to see where it goes. It would be nice to have a recent longterm
> kernel so we can see how this Rust stuff plays out. Possibly from 6.0
> or 5.19?
Not sure if a decision has been made yet. IMHO, it may be _a bit_ before
we see this year's longterm release picked.
+stable list
All the best,
-srw
The patch titled
Subject: mm: /proc/pid/smaps_rollup: fix no vma's null-deref
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-proc-pid-smaps_rollup-fix-no-vmas-null-deref.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Seth Jenkins <sethjenkins(a)google.com>
Subject: mm: /proc/pid/smaps_rollup: fix no vma's null-deref
Date: Mon, 3 Oct 2022 18:45:31 -0400
Commit 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value
seq_file") introduced a null-deref if there are no vma's in the task in
show_smaps_rollup.
Link: https://lkml.kernel.org/r/20221003224531.1930646-1-sethjenkins@google.com
Fixes: 258f669e7e88 ("mm: /proc/pid/smaps_rollup: convert to single value seq_file")
Signed-off-by: Seth Jenkins <sethjenkins(a)google.com>
Reviewed-by: Alexey Dobriyan <adobriyan(a)gmail.com>
Tested-by: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/task_mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/proc/task_mmu.c~mm-proc-pid-smaps_rollup-fix-no-vmas-null-deref
+++ a/fs/proc/task_mmu.c
@@ -969,7 +969,7 @@ static int show_smaps_rollup(struct seq_
vma = vma->vm_next;
}
- show_vma_header_prefix(m, priv->mm->mmap->vm_start,
+ show_vma_header_prefix(m, priv->mm->mmap ? priv->mm->mmap->vm_start : 0,
last_vma_end, 0, 0, 0, 0);
seq_pad(m, ' ');
seq_puts(m, "[rollup]\n");
_
Patches currently in -mm which might be from sethjenkins(a)google.com are
mm-proc-pid-smaps_rollup-fix-no-vmas-null-deref.patch
On Tue, Oct 04, 2022 at 06:46:10AM -0500, David Matthew Mattli wrote:
> Thorsten Leemhuis writes:
>
> > On 03.10.22 19:48, Ville Syrjälä wrote:
> >> On Mon, Oct 03, 2022 at 08:45:18PM +0300, Ville Syrjälä wrote:
> >>> On Sat, Oct 01, 2022 at 12:07:39PM +0200, Thorsten Leemhuis wrote:
> >>>> On 30.09.22 14:26, Jerry Ling wrote:
> >>>>>
> >>>>> looks like someone has done it:
> >>>>> https://bbs.archlinux.org/viewtopic.php?pid=2059823#p2059823
> >>>>>
> >>>>> and the bisect points to:
> >>>>>
> >>>>> |# first bad commit: [fc6aff984b1c63d6b9e54f5eff9cc5ac5840bc8c]
> >>>>> drm/i915/bios: Split VBT data into per-panel vs. global parts Best, Jerry
> |
> >>>>
> >>>> FWIW, that's 3cf050762534 in mainline. Adding Ville, its author to the
> >>>> list of recipients.
> >>>
> >>> I definitely had no plans to backport any of that stuff,
> >>> but I guess the automagics did it anyway.
> >>>
> >>> Looks like stable is at least missing this pile of stuff:
> >>> 50759c13735d drm/i915/pps: Keep VDD enabled during eDP probe
> >>> 67090801489d drm/i915/pps: Reinit PPS delays after VBT has been fully
> parsed
> >>> 8e75e8f573e1 drm/i915/pps: Split PPS init+sanitize in two
> >>> 586294c3c186 drm/i915/pps: Stash away original BIOS programmed PPS delays
> >>> 89fcdf430599 drm/i915/pps: Don't apply quirks/etc. to the VBT PPS
> >>> delays if they haven't been initialized
> >>> 60b02a09598f drm/i915/pps: Introduce pps_delays_valid()
> >>>
> >>> But dunno if even that is enough.
> >
> > If you need testers: David (now CCed) apparently has a affected machine
> > and offered to test patches in a different subthread of this thread.
> >
>
> I cherry-picked the six commits Thorsten listed onto 5.19.12 and it
> resolved the issue on my Framework laptop.
Thanks for testing, but I'm just going to revert the offending commits
as they probably shouldn't all be added to 5.19.y
thanks,
greg k-h
I'm announcing the release of the 5.19.13 kernel.
This release is to resolve a regression on some Intel graphics systems that had
problems with 5.19.12. If you do not have this problem with 5.19.12, there is
no need to upgrade.
The updated 5.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/gpu/drm/i915/display/g4x_dp.c | 22
drivers/gpu/drm/i915/display/icl_dsi.c | 18
drivers/gpu/drm/i915/display/intel_backlight.c | 23 -
drivers/gpu/drm/i915/display/intel_bios.c | 384 +++++++----------
drivers/gpu/drm/i915/display/intel_bios.h | 4
drivers/gpu/drm/i915/display/intel_ddi.c | 22
drivers/gpu/drm/i915/display/intel_ddi_buf_trans.c | 9
drivers/gpu/drm/i915/display/intel_display_types.h | 69 ---
drivers/gpu/drm/i915/display/intel_dp.c | 40 -
drivers/gpu/drm/i915/display/intel_dp.h | 2
drivers/gpu/drm/i915/display/intel_dp_aux_backlight.c | 6
drivers/gpu/drm/i915/display/intel_drrs.c | 3
drivers/gpu/drm/i915/display/intel_dsi.c | 2
drivers/gpu/drm/i915/display/intel_dsi_dcs_backlight.c | 9
drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 56 +-
drivers/gpu/drm/i915/display/intel_lvds.c | 6
drivers/gpu/drm/i915/display/intel_panel.c | 13
drivers/gpu/drm/i915/display/intel_pps.c | 70 ---
drivers/gpu/drm/i915/display/intel_psr.c | 35 -
drivers/gpu/drm/i915/display/intel_sdvo.c | 3
drivers/gpu/drm/i915/display/vlv_dsi.c | 21
drivers/gpu/drm/i915/i915_drv.h | 63 ++
23 files changed, 385 insertions(+), 497 deletions(-)
Greg Kroah-Hartman (9):
Revert "drm/i915/display: Fix handling of enable_psr parameter"
Revert "drm/i915/dsi: fix dual-link DSI backlight and CABC ports for display 11+"
Revert "drm/i915/dsi: filter invalid backlight and CABC ports"
Revert "drm/i915/bios: Split VBT data into per-panel vs. global parts"
Revert "drm/i915/bios: Split VBT parsing to global vs. panel specific parts"
Revert "drm/i915/bios: Split parse_driver_features() into two parts"
Revert "drm/i915/pps: Split pps_init_delays() into distinct parts"
Revert "drm/i915: Extract intel_edp_fixup_vbt_bpp()"
Linux 5.19.13
When grow an array with bitmap to 4 TiB, the bitmap chunksize
will be
head /sys/block/md0/md/bitmap/chunksize <==
18446744071562067968
with 8 Tib, the chunksize is 4, which lead to assemble failure.
The root cause is due to left shift count >= width of type and overflow.
The fix is simple, do a type cast before shift, the bug is pretty old
since kernel 4.0 at least.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jack Wang <jinpu.wang(a)ionos.com>
---
drivers/md/md-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index bf6dffadbe6f..b4d7a606a9d8 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2150,7 +2150,7 @@ int md_bitmap_resize(struct bitmap *bitmap, sector_t blocks,
bitmap->counts.missing_pages = pages;
bitmap->counts.chunkshift = chunkshift;
bitmap->counts.chunks = chunks;
- bitmap->mddev->bitmap_info.chunksize = 1 << (chunkshift +
+ bitmap->mddev->bitmap_info.chunksize = 1UL << (chunkshift +
BITMAP_BLOCK_SHIFT);
blocks = min(old_counts.chunks << old_counts.chunkshift,
--
2.34.1
The quilt patch titled
Subject: mm: hugetlb: fix UAF in hugetlb_handle_userfault
has been removed from the -mm tree. Its filename was
mm-hugetlb-fix-uaf-in-hugetlb_handle_userfault.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm: hugetlb: fix UAF in hugetlb_handle_userfault
Date: Fri, 23 Sep 2022 12:21:13 +0800
The vma_lock and hugetlb_fault_mutex are dropped before handling userfault
and reacquire them again after handle_userfault(), but reacquire the
vma_lock could lead to UAF[1,2] due to the following race,
hugetlb_fault
hugetlb_no_page
/*unlock vma_lock */
hugetlb_handle_userfault
handle_userfault
/* unlock mm->mmap_lock*/
vm_mmap_pgoff
do_mmap
mmap_region
munmap_vma_range
/* clean old vma */
/* lock vma_lock again <--- UAF */
/* unlock vma_lock */
Since the vma_lock will unlock immediately after
hugetlb_handle_userfault(), let's drop the unneeded lock and unlock in
hugetlb_handle_userfault() to fix the issue.
[1] https://lore.kernel.org/linux-mm/000000000000d5e00a05e834962e@google.com/
[2] https://lore.kernel.org/linux-mm/20220921014457.1668-1-liuzixian4@huawei.co…
Link: https://lkml.kernel.org/r/20220923042113.137273-1-liushixin2@huawei.com
Fixes: 1a1aad8a9b7b ("userfaultfd: hugetlbfs: add userfaultfd hugetlb hook")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Reported-by: syzbot+193f9cee8638750b23cf(a)syzkaller.appspotmail.com
Reported-by: Liu Zixian <liuzixian4(a)huawei.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: <stable(a)vger.kernel.org> [4.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-uaf-in-hugetlb_handle_userfault
+++ a/mm/hugetlb.c
@@ -5489,7 +5489,6 @@ static inline vm_fault_t hugetlb_handle_
unsigned long addr,
unsigned long reason)
{
- vm_fault_t ret;
u32 hash;
struct vm_fault vmf = {
.vma = vma,
@@ -5507,18 +5506,14 @@ static inline vm_fault_t hugetlb_handle_
};
/*
- * vma_lock and hugetlb_fault_mutex must be
- * dropped before handling userfault. Reacquire
- * after handling fault to make calling code simpler.
+ * vma_lock and hugetlb_fault_mutex must be dropped before handling
+ * userfault. Also mmap_lock could be dropped due to handling
+ * userfault, any vma operation should be careful from here.
*/
hugetlb_vma_unlock_read(vma);
hash = hugetlb_fault_mutex_hash(mapping, idx);
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
- ret = handle_userfault(&vmf, reason);
- mutex_lock(&hugetlb_fault_mutex_table[hash]);
- hugetlb_vma_lock_read(vma);
-
- return ret;
+ return handle_userfault(&vmf, reason);
}
static vm_fault_t hugetlb_no_page(struct mm_struct *mm,
@@ -5536,6 +5531,7 @@ static vm_fault_t hugetlb_no_page(struct
spinlock_t *ptl;
unsigned long haddr = address & huge_page_mask(h);
bool new_page, new_pagecache_page = false;
+ u32 hash = hugetlb_fault_mutex_hash(mapping, idx);
/*
* Currently, we are forced to kill the process in the event the
@@ -5546,7 +5542,7 @@ static vm_fault_t hugetlb_no_page(struct
if (is_vma_resv_set(vma, HPAGE_RESV_UNMAPPED)) {
pr_warn_ratelimited("PID %d killed due to inadequate hugepage pool\n",
current->pid);
- return ret;
+ goto out;
}
/*
@@ -5560,12 +5556,10 @@ static vm_fault_t hugetlb_no_page(struct
if (idx >= size)
goto out;
/* Check for page in userfault range */
- if (userfaultfd_missing(vma)) {
- ret = hugetlb_handle_userfault(vma, mapping, idx,
+ if (userfaultfd_missing(vma))
+ return hugetlb_handle_userfault(vma, mapping, idx,
flags, haddr, address,
VM_UFFD_MISSING);
- goto out;
- }
page = alloc_huge_page(vma, haddr, 0);
if (IS_ERR(page)) {
@@ -5631,10 +5625,9 @@ static vm_fault_t hugetlb_no_page(struct
if (userfaultfd_minor(vma)) {
unlock_page(page);
put_page(page);
- ret = hugetlb_handle_userfault(vma, mapping, idx,
+ return hugetlb_handle_userfault(vma, mapping, idx,
flags, haddr, address,
VM_UFFD_MINOR);
- goto out;
}
}
@@ -5692,6 +5685,8 @@ static vm_fault_t hugetlb_no_page(struct
unlock_page(page);
out:
+ hugetlb_vma_unlock_read(vma);
+ mutex_unlock(&hugetlb_fault_mutex_table[hash]);
return ret;
backout:
@@ -5789,11 +5784,13 @@ vm_fault_t hugetlb_fault(struct mm_struc
entry = huge_ptep_get(ptep);
/* PTE markers should be handled the same way as none pte */
- if (huge_pte_none_mostly(entry)) {
- ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
+ if (huge_pte_none_mostly(entry))
+ /*
+ * hugetlb_no_page will drop vma lock and hugetlb fault
+ * mutex internally, which make us return immediately.
+ */
+ return hugetlb_no_page(mm, vma, mapping, idx, address, ptep,
entry, flags);
- goto out_mutex;
- }
ret = 0;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
Now that Clang's -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
option is no longer required, remove it from the command line. Clang 16
and later will warn when it is used, which will cause Kconfig to think
it can't use -ftrivial-auto-var-init=zero at all. Check for whether it
is required and only use it when so.
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: linux-kbuild(a)vger.kernel.org
Cc: llvm(a)lists.linux.dev
Cc: stable(a)vger.kernel.org
Fixes: f02003c860d9 ("hardening: Avoid harmless Clang option under CONFIG_INIT_STACK_ALL_ZERO")
Signed-off-by: Kees Cook <keescook(a)chromium.org>
---
Makefile | 4 ++--
security/Kconfig.hardening | 14 ++++++++++----
2 files changed, 12 insertions(+), 6 deletions(-)
diff --git a/Makefile b/Makefile
index c7705f749601..02c857e2243c 100644
--- a/Makefile
+++ b/Makefile
@@ -831,8 +831,8 @@ endif
# Initialize all stack variables with a zero value.
ifdef CONFIG_INIT_STACK_ALL_ZERO
KBUILD_CFLAGS += -ftrivial-auto-var-init=zero
-ifdef CONFIG_CC_IS_CLANG
-# https://bugs.llvm.org/show_bug.cgi?id=45497
+ifdef CONFIG_CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
+# https://github.com/llvm/llvm-project/issues/44842
KBUILD_CFLAGS += -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang
endif
endif
diff --git a/security/Kconfig.hardening b/security/Kconfig.hardening
index bd2aabb2c60f..995bc42003e6 100644
--- a/security/Kconfig.hardening
+++ b/security/Kconfig.hardening
@@ -22,11 +22,17 @@ menu "Memory initialization"
config CC_HAS_AUTO_VAR_INIT_PATTERN
def_bool $(cc-option,-ftrivial-auto-var-init=pattern)
-config CC_HAS_AUTO_VAR_INIT_ZERO
- # GCC ignores the -enable flag, so we can test for the feature with
- # a single invocation using the flag, but drop it as appropriate in
- # the Makefile, depending on the presence of Clang.
+config CC_HAS_AUTO_VAR_INIT_ZERO_BARE
+ def_bool $(cc-option,-ftrivial-auto-var-init=zero)
+
+config CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
+ # Clang 16 and later warn about using the -enable flag, but it
+ # is required before then.
def_bool $(cc-option,-ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang)
+ depends on !CC_HAS_AUTO_VAR_INIT_ZERO_BARE
+
+config CC_HAS_AUTO_VAR_INIT_ZERO
+ def_bool CC_HAS_AUTO_VAR_INIT_ZERO_BARE || CC_HAS_AUTO_VAR_INIT_ZERO_ENABLER
choice
prompt "Initialize kernel stack variables at function entry"
--
2.34.1
From: Kyle Huey <me(a)kylehuey.com>
When management of the PKRU register was moved away from XSTATE, emulation
of PKRU's existence in XSTATE was added for reading PKRU through ptrace,
but not for writing PKRU through ptrace. This can be seen by running gdb
and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`. On affected
kernels (5.14+) the write to the PKRU register (which gdb performs through
ptrace) is ignored.
There are three APIs that write PKRU: sigreturn, PTRACE_SETREGSET with
NT_X86_XSTATE, and KVM_SET_XSAVE. sigreturn still uses XRSTOR to write to
PKRU. KVM_SET_XSAVE has its own special handling to make PKRU writes take
effect (in fpu_copy_uabi_to_guest_fpstate). Push that down into
copy_uabi_to_xstate and have PTRACE_SETREGSET with NT_X86_XSTATE pass in
a pointer to the appropriate PKRU slot. copy_sigframe_from_user_to_xstate
depends on copy_uabi_to_xstate populating the PKRU field in the task's
XSTATE so that __fpu_restore_sig can do a XRSTOR from it, so continue doing
that.
This also adds code to initialize the PKRU value to the hardware init value
(namely 0) if the PKRU bit is not set in the XSTATE header provided to
ptrace, to match XRSTOR.
Fixes: e84ba47e313d ("x86/fpu: Hook up PKRU into ptrace()")
Signed-off-by: Kyle Huey <me(a)kylehuey.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)suse.de>
Cc: stable(a)vger.kernel.org # 5.14+
---
arch/x86/kernel/fpu/core.c | 20 +++++++++-----------
arch/x86/kernel/fpu/regset.c | 2 +-
arch/x86/kernel/fpu/signal.c | 2 +-
arch/x86/kernel/fpu/xstate.c | 25 ++++++++++++++++++++-----
arch/x86/kernel/fpu/xstate.h | 4 ++--
5 files changed, 33 insertions(+), 20 deletions(-)
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 3b28c5b25e12..c273669e8a00 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -391,8 +391,6 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
{
struct fpstate *kstate = gfpu->fpstate;
const union fpregs_state *ustate = buf;
- struct pkru_state *xpkru;
- int ret;
if (!cpu_feature_enabled(X86_FEATURE_XSAVE)) {
if (ustate->xsave.header.xfeatures & ~XFEATURE_MASK_FPSSE)
@@ -406,16 +404,16 @@ int fpu_copy_uabi_to_guest_fpstate(struct fpu_guest *gfpu, const void *buf,
if (ustate->xsave.header.xfeatures & ~xcr0)
return -EINVAL;
- ret = copy_uabi_from_kernel_to_xstate(kstate, ustate);
- if (ret)
- return ret;
+ /*
+ * Nullify @vpkru to preserve its current value if PKRU's bit isn't set
+ * in the header. KVM's odd ABI is to leave PKRU untouched in this
+ * case (all other components are eventually re-initialized).
+ * (Not clear that this is actually necessary for compat).
+ */
+ if (!(ustate->xsave.header.xfeatures & XFEATURE_MASK_PKRU))
+ vpkru = NULL;
- /* Retrieve PKRU if not in init state */
- if (kstate->regs.xsave.header.xfeatures & XFEATURE_MASK_PKRU) {
- xpkru = get_xsave_addr(&kstate->regs.xsave, XFEATURE_PKRU);
- *vpkru = xpkru->pkru;
- }
- return 0;
+ return copy_uabi_from_kernel_to_xstate(kstate, ustate, vpkru);
}
EXPORT_SYMBOL_GPL(fpu_copy_uabi_to_guest_fpstate);
#endif /* CONFIG_KVM */
diff --git a/arch/x86/kernel/fpu/regset.c b/arch/x86/kernel/fpu/regset.c
index 75ffaef8c299..6d056b68f4ed 100644
--- a/arch/x86/kernel/fpu/regset.c
+++ b/arch/x86/kernel/fpu/regset.c
@@ -167,7 +167,7 @@ int xstateregs_set(struct task_struct *target, const struct user_regset *regset,
}
fpu_force_restore(fpu);
- ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf);
+ ret = copy_uabi_from_kernel_to_xstate(fpu->fpstate, kbuf ?: tmpbuf, &target->thread.pkru);
out:
vfree(tmpbuf);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 91d4b6de58ab..558076dbde5b 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -396,7 +396,7 @@ static bool __fpu_restore_sig(void __user *buf, void __user *buf_fx,
fpregs = &fpu->fpstate->regs;
if (use_xsave() && !fx_only) {
- if (copy_sigframe_from_user_to_xstate(fpu->fpstate, buf_fx))
+ if (copy_sigframe_from_user_to_xstate(tsk, buf_fx))
return false;
} else {
if (__copy_from_user(&fpregs->fxsave, buf_fx,
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index c8340156bfd2..8f14981a3936 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -1197,7 +1197,7 @@ static int copy_from_buffer(void *dst, unsigned int offset, unsigned int size,
static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
- const void __user *ubuf)
+ const void __user *ubuf, u32 *pkru)
{
struct xregs_state *xsave = &fpstate->regs.xsave;
unsigned int offset, size;
@@ -1246,6 +1246,21 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
}
}
+ /*
+ * Update the user protection key storage. Allow KVM to
+ * pass in a NULL pkru pointer if the mask bit is unset
+ * for its legacy ABI behavior.
+ */
+ if (pkru)
+ *pkru = 0;
+
+ if (hdr.xfeatures & XFEATURE_MASK_PKRU) {
+ struct pkru_state *xpkru;
+
+ xpkru = __raw_xsave_addr(xsave, XFEATURE_PKRU);
+ *pkru = xpkru->pkru;
+ }
+
/*
* The state that came in from userspace was user-state only.
* Mask all the user states out of 'xfeatures':
@@ -1264,9 +1279,9 @@ static int copy_uabi_to_xstate(struct fpstate *fpstate, const void *kbuf,
* Convert from a ptrace standard-format kernel buffer to kernel XSAVE[S]
* format and copy to the target thread. Used by ptrace and KVM.
*/
-int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
+int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru)
{
- return copy_uabi_to_xstate(fpstate, kbuf, NULL);
+ return copy_uabi_to_xstate(fpstate, kbuf, NULL, pkru);
}
/*
@@ -1274,10 +1289,10 @@ int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf)
* XSAVE[S] format and copy to the target thread. This is called from the
* sigreturn() and rt_sigreturn() system calls.
*/
-int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate,
+int copy_sigframe_from_user_to_xstate(struct task_struct *tsk,
const void __user *ubuf)
{
- return copy_uabi_to_xstate(fpstate, NULL, ubuf);
+ return copy_uabi_to_xstate(tsk->thread.fpu.fpstate, NULL, ubuf, &tsk->thread.pkru);
}
static bool validate_independent_components(u64 mask)
diff --git a/arch/x86/kernel/fpu/xstate.h b/arch/x86/kernel/fpu/xstate.h
index 5ad47031383b..a4ecb04d8d64 100644
--- a/arch/x86/kernel/fpu/xstate.h
+++ b/arch/x86/kernel/fpu/xstate.h
@@ -46,8 +46,8 @@ extern void __copy_xstate_to_uabi_buf(struct membuf to, struct fpstate *fpstate,
u32 pkru_val, enum xstate_copy_mode copy_mode);
extern void copy_xstate_to_uabi_buf(struct membuf to, struct task_struct *tsk,
enum xstate_copy_mode mode);
-extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf);
-extern int copy_sigframe_from_user_to_xstate(struct fpstate *fpstate, const void __user *ubuf);
+extern int copy_uabi_from_kernel_to_xstate(struct fpstate *fpstate, const void *kbuf, u32 *pkru);
+extern int copy_sigframe_from_user_to_xstate(struct task_struct *tsk, const void __user *ubuf);
extern void fpu__init_cpu_xstate(void);
--
2.37.2
Changelog since v5:
- Avoids a second copy from the uabi buffer as suggested.
- Preserves old KVM_SET_XSAVE behavior where leaving the PKRU bit in the
XSTATE header results in PKRU remaining unchanged instead of
reinitializing it.
- Fixed up patch metadata as requested.
Changelog since v4:
- Selftest additionally checks PKRU readbacks through ptrace.
- Selftest flips all PKRU bits (except the default key).
Changelog since v3:
- The v3 patch is now part 1 of 2.
- Adds a selftest in part 2 of 2.
Changelog since v2:
- Removed now unused variables in fpu_copy_uabi_to_guest_fpstate
Changelog since v1:
- Handles the error case of copy_to_buffer().
__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled.
A return of 0 causes the boot option/value to be listed as an Unknown
kernel parameter and added to init's (limited) argument or environment
strings. Also, error return codes don't mean anything to
obsolete_checksetup() -- only non-zero (usually 1) or zero.
So return 1 from vdso_setup().
Fixes: 9a08862a5d2e ("vDSO for sparc")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Reported-by: Igor Zhbanov <izh1979(a)gmail.com>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: sparclinux(a)vger.kernel.org
Cc: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: Nick Alcock <nick.alcock(a)oracle.com>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Arnd Bergmann <arnd(a)arndb.de>
---
v2: correct the Fixes: tag (Dan Carpenter)
v3: add more Cc's;
correct Igor's email address;
change From: Igor to Reported-by: Igor;
v4: add Arnd to Cc: list
arch/sparc/vdso/vma.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
--- a/arch/sparc/vdso/vma.c
+++ b/arch/sparc/vdso/vma.c
@@ -449,9 +449,8 @@ static __init int vdso_setup(char *s)
unsigned long val;
err = kstrtoul(s, 10, &val);
- if (err)
- return err;
- vdso_enabled = val;
- return 0;
+ if (!err)
+ vdso_enabled = val;
+ return 1;
}
__setup("vdso=", vdso_setup);
__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled.
A return of 0 causes the boot option/value to be listed as an Unknown
kernel parameter and added to init's (limited) argument or environment
strings. Also, error return codes don't mean anything to
obsolete_checksetup() -- only non-zero (usually 1) or zero.
So return 1 from setup_nmi_watchdog().
Fixes: e5553a6d0442 ("sparc64: Implement NMI watchdog on capable cpus.")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Reported-by: Igor Zhbanov <izh1979(a)gmail.com>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: sparclinux(a)vger.kernel.org
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Arnd Bergmann <arnd(a)arndb.de>
---
v2: change From" Igor to Reported-by:
add more Cc's
v3: use Igor's current email address
v4: add Arnd to Cc: list
arch/sparc/kernel/nmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/sparc/kernel/nmi.c
+++ b/arch/sparc/kernel/nmi.c
@@ -274,7 +274,7 @@ static int __init setup_nmi_watchdog(cha
if (!strncmp(str, "panic", 5))
panic_on_timeout = 1;
- return 0;
+ return 1;
}
__setup("nmi_watchdog=", setup_nmi_watchdog);
__setup() handlers should return 1 to obsolete_checksetup() in
init/main.c to indicate that the boot option has been handled.
A return of 0 causes the boot option/value to be listed as an Unknown
kernel parameter and added to init's (limited) argument or environment
strings. Also, error return codes don't mean anything to
obsolete_checksetup() -- only non-zero (usually 1) or zero.
So return 1 from nmi_debug_setup().
Fixes: 1e1030dccb10 ("sh: nmi_debug support.")
Signed-off-by: Randy Dunlap <rdunlap(a)infradead.org>
Reported-by: Igor Zhbanov <izh1979(a)gmail.com>
Cc: Yoshinori Sato <ysato(a)users.sourceforge.jp>
Cc: Rich Felker <dalias(a)libc.org>
Cc: linux-sh(a)vger.kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: stable(a)vger.kernel.org
Cc: Arnd Bergmann <arnd(a)arndb.de>
---
v2: add more Cc's;
refresh and resend;
v3: add Arnd to Cc: list
arch/sh/kernel/nmi_debug.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/sh/kernel/nmi_debug.c
+++ b/arch/sh/kernel/nmi_debug.c
@@ -49,7 +49,7 @@ static int __init nmi_debug_setup(char *
register_die_notifier(&nmi_debug_nb);
if (*str != '=')
- return 0;
+ return 1;
for (p = str + 1; *p; p = sep + 1) {
sep = strchr(p, ',');
@@ -70,6 +70,6 @@ static int __init nmi_debug_setup(char *
break;
}
- return 0;
+ return 1;
}
__setup("nmi_debug", nmi_debug_setup);
The patch titled
Subject: nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix NULL pointer dereference at nilfs_bmap_lookup_at_level()
Date: Sun, 2 Oct 2022 12:08:04 +0900
If the i_mode field in inode of metadata files is corrupted on disk, it
can cause the initialization of bmap structure, which should have been
called from nilfs_read_inode_common(), not to be called. This causes a
lockdep warning followed by a NULL pointer dereference at
nilfs_bmap_lookup_at_level().
This patch fixes these issues by adding a missing sanitiy check for the
i_mode field of metadata file's inode.
Link: https://lkml.kernel.org/r/20221002030804.29978-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+2b32eb36c1a825b7a74c(a)syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/nilfs2/inode.c~nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level
+++ a/fs/nilfs2/inode.c
@@ -455,6 +455,8 @@ int nilfs_read_inode_common(struct inode
inode->i_atime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
inode->i_ctime.tv_nsec = le32_to_cpu(raw_inode->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(raw_inode->i_mtime_nsec);
+ if (nilfs_is_metadata_file_inode(inode) && !S_ISREG(inode->i_mode))
+ return -EIO; /* this inode is for metadata and corrupted */
if (inode->i_nlink == 0)
return -ESTALE; /* this inode is deleted */
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-bug-of-struct-nilfs_root.patch
nilfs2-fix-null-pointer-dereference-at-nilfs_bmap_lookup_at_level.patch
nilfs2-replace-warn_ons-by-nilfs_error-for-checkpoint-acquisition-failure.patch
The patch titled
Subject: nilfs2: fix use-after-free bug of struct nilfs_root
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-use-after-free-bug-of-struct-nilfs_root.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix use-after-free bug of struct nilfs_root
Date: Tue, 4 Oct 2022 00:05:19 +0900
If the beginning of the inode bitmap area is corrupted on disk, an inode
with the same inode number as the root inode can be allocated and fail
soon after. In this case, the subsequent call to nilfs_clear_inode() on
that bogus root inode will wrongly decrement the reference counter of
struct nilfs_root, and this will erroneously free struct nilfs_root,
causing kernel oopses.
This fixes the problem by changing nilfs_new_inode() to skip reserved
inode numbers while repairing the inode bitmap.
Link: https://lkml.kernel.org/r/20221003150519.39789-1-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+b8c672b0e22615c80fe0(a)syzkaller.appspotmail.com
Reported-by: Khalid Masum <khalid.masum.92(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
--- a/fs/nilfs2/inode.c~nilfs2-fix-use-after-free-bug-of-struct-nilfs_root
+++ a/fs/nilfs2/inode.c
@@ -328,6 +328,7 @@ struct inode *nilfs_new_inode(struct ino
struct inode *inode;
struct nilfs_inode_info *ii;
struct nilfs_root *root;
+ struct buffer_head *bh;
int err = -ENOMEM;
ino_t ino;
@@ -343,11 +344,25 @@ struct inode *nilfs_new_inode(struct ino
ii->i_state = BIT(NILFS_I_NEW);
ii->i_root = root;
- err = nilfs_ifile_create_inode(root->ifile, &ino, &ii->i_bh);
+ err = nilfs_ifile_create_inode(root->ifile, &ino, &bh);
if (unlikely(err))
goto failed_ifile_create_inode;
/* reference count of i_bh inherits from nilfs_mdt_read_block() */
+ if (unlikely(ino < NILFS_USER_INO)) {
+ nilfs_warn(sb,
+ "inode bitmap is inconsistent for reserved inodes");
+ do {
+ brelse(bh);
+ err = nilfs_ifile_create_inode(root->ifile, &ino, &bh);
+ if (unlikely(err))
+ goto failed_ifile_create_inode;
+ } while (ino < NILFS_USER_INO);
+
+ nilfs_info(sb, "repaired inode bitmap for reserved inodes");
+ }
+ ii->i_bh = bh;
+
atomic64_inc(&root->inodes_count);
inode_init_owner(&init_user_ns, inode, dir, mode);
inode->i_ino = ino;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-bug-of-struct-nilfs_root.patch
nilfs2-replace-warn_ons-by-nilfs_error-for-checkpoint-acquisition-failure.patch
commit 70cbc3cc78a997d8247b50389d37c4e1736019da upstream
Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly. On architectures that send an
IPI broadcast on TLB flush, it works as expected. But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:
CPU A CPU B
THP collapse fast GUP
gup_pmd_range() <-- see valid pmd
gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
check page pinned <-- before GUP bump refcount
pin the page
check PTE <-- no change
__collapse_huge_page_copy()
copy data to huge page
ptep_clear()
install huge pmd for the huge page
return the stale page
discard the stale page
The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE. If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.
Also update the stale comment about serializing against fast GUP in
khugepaged.
Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 34 ++++++++++++++++++++++++++++------
mm/khugepaged.c | 10 ++++++----
2 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 6cb7d8ae56f6..b47c751df069 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2128,8 +2128,28 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
}
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
- unsigned int flags, struct page **pages, int *nr)
+/*
+ * Fast-gup relies on pte change detection to avoid concurrent pgtable
+ * operations.
+ *
+ * To pin the page, fast-gup needs to do below in order:
+ * (1) pin the page (by prefetching pte), then (2) check pte not changed.
+ *
+ * For the rest of pgtable operations where pgtable updates can be racy
+ * with fast-gup, we need to do (1) clear pte, then (2) check whether page
+ * is pinned.
+ *
+ * Above will work for all pte-level operations, including THP split.
+ *
+ * For THP collapse, it's a bit more complicated because fast-gup may be
+ * walking a pgtable page that is being freed (pte is still valid but pmd
+ * can be cleared already). To avoid race in such condition, we need to
+ * also check pmd here to make sure pmd doesn't change (corresponds to
+ * pmdp_collapse_flush() in the THP collapse code path).
+ */
+static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
struct dev_pagemap *pgmap = NULL;
int nr_start = *nr, ret = 0;
@@ -2169,7 +2189,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
if (!head)
goto pte_unmap;
- if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
+ unlikely(pte_val(pte) != pte_val(*ptep))) {
put_compound_head(head, 1, flags);
goto pte_unmap;
}
@@ -2214,8 +2235,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
* get_user_pages_fast_only implementation that can pin pages. Thus it's still
* useful to have gup_huge_pmd even if we can't operate on ptes.
*/
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
- unsigned int flags, struct page **pages, int *nr)
+static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
return 0;
}
@@ -2522,7 +2544,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
if (!gup_huge_pd(__hugepd(pmd_val(pmd)), addr,
PMD_SHIFT, next, flags, pages, nr))
return 0;
- } else if (!gup_pte_range(pmd, addr, next, flags, pages, nr))
+ } else if (!gup_pte_range(pmd, pmdp, addr, next, flags, pages, nr))
return 0;
} while (pmdp++, addr = next, addr != end);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 969e57dde65f..cf4dceb9682b 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1144,10 +1144,12 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
/*
- * After this gup_fast can't run anymore. This also removes
- * any huge TLB entry from the CPU so we won't allow
- * huge and small TLB entries for the same virtual address
- * to avoid the risk of CPU bugs in that area.
+ * This removes any huge TLB entry from the CPU so we won't allow
+ * huge and small TLB entries for the same virtual address to
+ * avoid the risk of CPU bugs in that area.
+ *
+ * Parallel fast GUP is fine since fast GUP will back off when
+ * it detects PMD is changed.
*/
_pmd = pmdp_collapse_flush(vma, address, pmd);
spin_unlock(pmd_ptl);
--
2.26.3
commit 70cbc3cc78a997d8247b50389d37c4e1736019da upstream
Since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm:
introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer
sufficient to handle concurrent GUP-fast in all cases, it only handles
traditional IPI-based GUP-fast correctly. On architectures that send an
IPI broadcast on TLB flush, it works as expected. But on the
architectures that do not use IPI to broadcast TLB flush, it may have the
below race:
CPU A CPU B
THP collapse fast GUP
gup_pmd_range() <-- see valid pmd
gup_pte_range() <-- work on pte
pmdp_collapse_flush() <-- clear pmd and flush
__collapse_huge_page_isolate()
check page pinned <-- before GUP bump refcount
pin the page
check PTE <-- no change
__collapse_huge_page_copy()
copy data to huge page
ptep_clear()
install huge pmd for the huge page
return the stale page
discard the stale page
The race can be fixed by checking whether PMD is changed or not after
taking the page pin in fast GUP, just like what it does for PTE. If the
PMD is changed it means there may be parallel THP collapse, so GUP should
back off.
Also update the stale comment about serializing against fast GUP in
khugepaged.
Link: https://lkml.kernel.org/r/20220907180144.555485-1-shy828301@gmail.com
Fixes: 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()")
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 34 ++++++++++++++++++++++++++++------
mm/khugepaged.c | 10 ++++++----
2 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 05068d3d2557..1a23cd0b4fba 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2266,8 +2266,28 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
}
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
- unsigned int flags, struct page **pages, int *nr)
+/*
+ * Fast-gup relies on pte change detection to avoid concurrent pgtable
+ * operations.
+ *
+ * To pin the page, fast-gup needs to do below in order:
+ * (1) pin the page (by prefetching pte), then (2) check pte not changed.
+ *
+ * For the rest of pgtable operations where pgtable updates can be racy
+ * with fast-gup, we need to do (1) clear pte, then (2) check whether page
+ * is pinned.
+ *
+ * Above will work for all pte-level operations, including THP split.
+ *
+ * For THP collapse, it's a bit more complicated because fast-gup may be
+ * walking a pgtable page that is being freed (pte is still valid but pmd
+ * can be cleared already). To avoid race in such condition, we need to
+ * also check pmd here to make sure pmd doesn't change (corresponds to
+ * pmdp_collapse_flush() in the THP collapse code path).
+ */
+static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
struct dev_pagemap *pgmap = NULL;
int nr_start = *nr, ret = 0;
@@ -2312,7 +2332,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
goto pte_unmap;
}
- if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ if (unlikely(pmd_val(pmd) != pmd_val(*pmdp)) ||
+ unlikely(pte_val(pte) != pte_val(*ptep))) {
put_compound_head(head, 1, flags);
goto pte_unmap;
}
@@ -2357,8 +2378,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
* get_user_pages_fast_only implementation that can pin pages. Thus it's still
* useful to have gup_huge_pmd even if we can't operate on ptes.
*/
-static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
- unsigned int flags, struct page **pages, int *nr)
+static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
return 0;
}
@@ -2667,7 +2689,7 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
if (!gup_huge_pd(__hugepd(pmd_val(pmd)), addr,
PMD_SHIFT, next, flags, pages, nr))
return 0;
- } else if (!gup_pte_range(pmd, addr, next, flags, pages, nr))
+ } else if (!gup_pte_range(pmd, pmdp, addr, next, flags, pages, nr))
return 0;
} while (pmdp++, addr = next, addr != end);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 8a8b3aa92937..dd069afd9cb9 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1146,10 +1146,12 @@ static void collapse_huge_page(struct mm_struct *mm,
pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */
/*
- * After this gup_fast can't run anymore. This also removes
- * any huge TLB entry from the CPU so we won't allow
- * huge and small TLB entries for the same virtual address
- * to avoid the risk of CPU bugs in that area.
+ * This removes any huge TLB entry from the CPU so we won't allow
+ * huge and small TLB entries for the same virtual address to
+ * avoid the risk of CPU bugs in that area.
+ *
+ * Parallel fast GUP is fine since fast GUP will back off when
+ * it detects PMD is changed.
*/
_pmd = pmdp_collapse_flush(vma, address, pmd);
spin_unlock(pmd_ptl);
--
2.26.3
My name is Alek Brian, Researcher at a reputable company in the United Kingdom , I would like to share with you some business insights .
Please Kindly reply me on my personal email brianalek510(a)gmail.com
Note: You have the right to quit by the end of my detailed explanation and you don't feel like moving forward with me.
But Trust me, you won't regret it.
Best Regards
Alek Brian
Email: brianalek510(a)gmail.com
On Fri, Sep 30, 2022 at 10:14:50AM +0800, Yaxiong Tian wrote:
> From: Yaxiong Tian <tianyaxiong(a)kylinos.cn>
>
Hi Tian,
> In shmem_tx_prepare() ,it use spin_until_cond() to wait the chanel to be
> free. Though it can avoid risk about overwriting the old command.But
> when the platform has some problem in setting the chanel to free(such as
> Improper initialization ),it can lead to the chanel never to be free.So
> the os is sticked in it.
On a transport based on shared memory area indeed the busy/free bitflag is
used, on the TX channel, to 'pass' ownership of the channel from the agent
(Linux) to the platform SCMI server: as stated in the inline comment block,
once the Linux agent write his command into the shmem area and set the
channel busy, it HAS TO wait for the channel to be set free again by the
platform to avoid possible corruptions due to a very late command being
deliverd by the platform so late that it could override the next new command
freshly written into the shared memory by the Linux agent.
In other words you CANNOT forcibly grab back the channel ownership on
timeout like you are doing here, because you cannot assume anything on
the status of the SCMI server entity: a misbehaving SCMI server could
deliver a very late reply at any time (since it still thinks to have
ownership) and the possible subsequent corruption in the next queued
TX message would probably generate issues very difficult to spot and
debug.
Inded, in the SCMI Kernel stack we never forcibly free a TX transport
channel on timeout, scmi_clear_channel() is called only on the RX channel
for notifications and delayed response, because the ownership relation
is just the opposite, we are indeed releasing the channel to the platform
after we have processed their messages.
Same we do on a different transport like virtio, in which a virtio buffer,
related a message without a reply, is just lost if the SCMI server never
sent back anything: the only difference is that virtio has usually more
numerous free buffers so we can carry on anyway using some of the other
remaning free buffers.
Moreover, when the SCMI server or a transport ended up in such a broken
state that it does not even release the channel, it means the SCMI
server stack is in serious trouble, and we (Kernel) do not really want
to do any businness with a backend in such a misbehaving state, the SCMI
server should be fixed; instead, this kind of approach you propose could
even hide some class of transient server-side problems by just ignoring
this condition and grabbing back the channel forcibly.
> In addition when shmem_tx_prepare() called,this
> indicates that the previous transfer has commpleted or timed out.It
> unsuitable for unconditional waiting the chanel to be free.
Note that there could be also the case in which there are many other
threads of executions on different cores trying to send a bunch of SCMI
commands (there are indeed many SCMI drivers currently in the stack) so
all these requests HAVE TO be queued while waiting for the shared
memory area to be free again, so I don't think you can assume in general
that shmem_tx_prepare() is called only after a successfull command completion
or timeout: it is probably true when an SCMI Mailbox transport is
used but it is not neccessarly the case when other shmem based transports
like optee/smc are used. You could have N threads trying to send an SCMI
command, queued, spinning on that loop, while waiting for the channel to
be free: in such a case note that you TX has not even complete, you are
still waiting for the channel the SCMI upper layer TX timeout has NOT
even being started since your the transmission itself is pending (i.e.
send_message has still not returned...)
>
> So for system stablility,we can add timeout condition in waiting the
> chanel to be free.
In the end, I could agree that it is unfortunate that, if the SCMI Server
gets stuck also the SCMI Linux agent gets stuck while waiting for a free
channel, so that it could be sensible to timeout as you proposed BUT after
the timeout you should NOT carry on, BUT FAIL the whole transmission;
in this scenario, though, it would be tricky to choose a proper timeout
value, because, as said above, the allowed timeout for the spin would
depend on the number of existing queued in-flight transactions: as an
example, if you had N previous pending transmissions and transport with
X ms timeout, I would spin for at least N * X ms before timing out and
failing. (additionally the current tx_prepare() helpers return voids so
you'll need to figure out a way to report an error and stop the transaction
by 'tunnelling' this errors back to .send_message())
Not sure that all of this kind of work would be worth to address some,
maybe transient, error conditions due to a broken SCMI server, BUT in any
case, any kind of timeout you want to introduce in the spin loop MUST
result in a failed transmission until the FREE bitflag is cleared by the
SCMI server; i.e. if that flag won't be cleared EVER by the server, you
have to end up with a sequence of timed-out spinloops and transmission
failures, you definetely cannot recover forcibly like this.
Fix the SCMI server instead.
Thanks,
Cristian
>
> Fixes: 9dc34d635c67 ("firmware: arm_scmi: Check if platform has released shmem before using")
> Signed-off-by: Yaxiong Tian <tianyaxiong(a)kylinos.cn>
> Cc: stable(a)vger.kernel.org
> Cc: Sudeep Holla <sudeep.holla(a)arm.com>
> Cc: Jim Quinlan <james.quinlan(a)broadcom.com>
> ---
> drivers/firmware/arm_scmi/shmem.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/arm_scmi/shmem.c b/drivers/firmware/arm_scmi/shmem.c
> index 0e3eaea5d852..ae6110a81855 100644
> --- a/drivers/firmware/arm_scmi/shmem.c
> +++ b/drivers/firmware/arm_scmi/shmem.c
> @@ -8,6 +8,7 @@
> #include <linux/io.h>
> #include <linux/processor.h>
> #include <linux/types.h>
> +#include <linux/ktime.h>
>
> #include "common.h"
>
> @@ -29,17 +30,27 @@ struct scmi_shared_mem {
> u8 msg_payload[];
> };
>
> +#define SCMI_MAX_TX_PREPARE_TIMEOUT_MS 30
> +
> void shmem_tx_prepare(struct scmi_shared_mem __iomem *shmem,
> struct scmi_xfer *xfer)
> {
> + ktime_t stop = ktime_add_ms(ktime_get(), SCMI_MAX_TX_PREPARE_TIMEOUT_MS);
> /*
> * Ideally channel must be free by now unless OS timeout last
> * request and platform continued to process the same, wait
> * until it releases the shared memory, otherwise we may endup
> * overwriting its response with new message payload or vice-versa
> */
> - spin_until_cond(ioread32(&shmem->channel_status) &
> - SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE);
> + spin_until_cond((ioread32(&shmem->channel_status) &
> + SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE) ||
> + ktime_after(ktime_get(), stop));
> +
> + if (unlikely(ktime_after(ktime_get(), stop))) {
> + pr_err("timed out in shmem_tx_prepare(caller: %pS).\n",
> + (void *)_RET_IP_);
> + }
> +
> /* Mark channel busy + clear error */
> iowrite32(0x0, &shmem->channel_status);
> iowrite32(xfer->hdr.poll_completion ? 0 : SCMI_SHMEM_FLAG_INTR_ENABLED,
> --
> 2.25.1
>
This backport patch addresses a NULL pointer dereference bug in 4.14.y
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: i2cdev_ioctl_rdwr.isra.2+0xe4/0x360
PGD 13af50067 P4D 13af50067 PUD 13504c067 PMD 0
Oops: 0000 [#1] PREEMPT SMP
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 17421 Comm: rep Not tainted 4.14.295 #7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
task: ffff88807c43a080 task.stack: ffffc90000d0c000
RIP: 0010:i2cdev_ioctl_rdwr.isra.2+0xe4/0x360
RSP: 0018:ffffc90000d0fdf0 EFLAGS: 00010297
RAX: ffff88807c43a080 RBX: 0000000000000000 RCX: 0000000000000000
....
Call Trace:
i2cdev_ioctl+0x1a5/0x2a0
? i2cdev_ioctl_rdwr.isra.2+0x360/0x360
do_vfs_ioctl+0xac/0x840
? syscall_trace_enter+0x159/0x4a0
SyS_ioctl+0x7e/0xb0
do_syscall_64+0x8d/0x220
....
RIP: i2cdev_ioctl_rdwr.isra.2+0xe4/0x360 RSP: ffffc90000d0fdf0
....
Kernel panic - not syncing: Fatal exception
Rebooting in 86400 seconds..
rdwr_pa[i].buf[0] is a NULL dereference when len=0, so to avoid
dereferencing zero-length buffer we add a check on len before
dereferencing.
I have tested only with the reproducer and the bug doesnot occur
after this patch.
This patch is only made for 4.14.y as other higher LTS branches
(>=4.19.y) already have the fix.
Thanks,
Harshit
Alexander Popov (1):
i2c: dev: prevent ZERO_SIZE_PTR deref in i2cdev_ioctl_rdwr()
drivers/i2c/i2c-dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.37.1