This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.7.8-rc1
Peter Jones pjones@redhat.com efi: Make it possible to disable efivar_ssdt entirely
Hou Tao houtao1@huawei.com dm zoned: assign max_io_len correctly
Barry Song song.bao.hua@hisilicon.com mm/cma.c: use exact_nid true to fix possible per-numa cma leak
Mike Kravetz mike.kravetz@oracle.com mm/hugetlb.c: fix pages per hugetlb calculation
Marc Zyngier maz@kernel.org irqchip/gic: Atomically update affinity
Sumit Semwal sumit.semwal@linaro.org dma-buf: Move dma_buf_release() from fops to dentry_ops
Alex Deucher alexander.deucher@amd.com drm/amdgpu/atomfirmware: fix vram_info fetching for renoir
Alex Deucher alexander.deucher@amd.com drm/amdgpu: use %u rather than %d for sclk/mclk
Nicholas Kazlauskas nicholas.kazlauskas@amd.com drm/amd/display: Only revalidate bandwidth on medium and fast updates
Ivan Mironov mironov.ivan@gmail.com drm/amd/powerplay: Fix NULL dereference in lock_bus() on Vega20 w/o RAS
Rodrigo Vivi rodrigo.vivi@intel.com drm/i915: Include asm sources for {ivb, hsw}_clear_kernel.c
Hauke Mehrtens hauke@hauke-m.de MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
Martin Blumenstingl martin.blumenstingl@googlemail.com MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names
Sean Christopherson sean.j.christopherson@intel.com x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
Bob Peterson rpeterso@redhat.com gfs2: fix trans slab error when withdraw occurs inside log_flush
Sumeet Pawnikar sumeet.r.pawnikar@intel.com ACPI: fan: Fix Tiger Lake ACPI device ID
Finley Xiao finley.xiao@rock-chips.com thermal/drivers/cpufreq_cooling: Fix wrong frequency converted from power
Jan Kundrát jan.kundrat@cesnet.cz hwmon: (pmbus) Fix page vs. register when accessing fans
Joseph Salisbury joseph.salisbury@microsoft.com Drivers: hv: Change flag to write log level in panic msg to false
Zhang Xiaoxu zhangxiaoxu5@huawei.com cifs: Fix the target file was deleted when rename failed.
Paul Aurich paul@darkrain42.org SMB3: Honor 'handletimeout' flag for multiuser mounts
Paul Aurich paul@darkrain42.org SMB3: Honor lease disabling for multiuser mounts
Paul Aurich paul@darkrain42.org SMB3: Honor persistent/resilient handle flags for multiuser mounts
Paul Aurich paul@darkrain42.org SMB3: Honor 'seal' flag for multiuser mounts
Daniel Jordan daniel.m.jordan@oracle.com padata: upgrade smp_mb__after_atomic to smp_mb in padata_do_serial
Greg Kroah-Hartman gregkh@linuxfoundation.org Revert "ALSA: usb-audio: Improve frames size computation"
J. Bruce Fields bfields@redhat.com nfsd: apply umask on fs without ACL support
Krzysztof Kozlowski krzk@kernel.org spi: spi-fsl-dspi: Fix external abort on interrupt in resume or exit paths
Jens Axboe axboe@kernel.dk io_uring: fix regression with always ignoring signals in io_cqring_wait()
Wolfram Sang wsa+renesas@sang-engineering.com i2c: mlxcpld: check correct size of maximum RECV_LEN packet
Ricardo Ribalda ribalda@kernel.org i2c: designware: platdrv: Set class based on DMI
Chris Packham chris.packham@alliedtelesis.co.nz i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
Kees Cook keescook@chromium.org samples/vfs: avoid warning in statx override
David Gibson david@gibson.dropbear.id.au tpm: ibmvtpm: Wait for ready buffer before probing for TPM2 attributes
Christoph Hellwig hch@lst.de nvme: fix a crash in nvme_mpath_add_disk
Sagi Grimberg sagi@grimberg.me nvme: fix identify error status silent ignore
Paul Aurich paul@darkrain42.org SMB3: Honor 'posix' flag for multiuser mounts
Hou Tao houtao1@huawei.com virtio-blk: free vblk-vqs in error path of virtblk_probe()
Chen-Yu Tsai wens@csie.org drm: sun4i: hdmi: Remove extra HPD polling
J. Bruce Fields bfields@redhat.com nfsd: fix nfsdfs inode reference count leak
J. Bruce Fields bfields@redhat.com nfsd4: fix nfsdfs reference count loop
J. Bruce Fields bfields@redhat.com nfsd: clients don't need to break their own delegations
J. Bruce Fields bfields@redhat.com kthread: save thread function
Dien Pham dien.pham.ry@renesas.com thermal/drivers/rcar_gen3: Fix undefined temperature if negative
Tiezhu Yang yangtiezhu@loongson.cn thermal/drivers/sprd: Fix return value of sprd_thm_probe()
Michael Kao michael.kao@mediatek.com thermal/drivers/mediatek: Fix bank number settings on mt8183
Dan Carpenter dan.carpenter@oracle.com scsi: qla2xxx: Fix a condition in qla2x00_find_all_fabric_devs()
Misono Tomohiro misono.tomohiro@jp.fujitsu.com hwmon: (acpi_power_meter) Fix potential memory leak in acpi_power_meter_add()
Chu Lin linchuyuan@google.com hwmon: (max6697) Make sure the OVERT mask is set correctly
KP Singh kpsingh@google.com security: Fix hook iteration and default value for inode_copy_up_xattr
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: fix SGE queue dump destination buffer context
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: use correct type for all-mask IP address comparison
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: fix endian conversions for L4 ports in filters
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: parse TC-U32 key values and masks natively
Rahul Lakkireddy rahul.lakkireddy@chelsio.com cxgb4: use unaligned conversion for fetching timestamp
Taehee Yoo ap420073@gmail.com hsr: avoid to create proc file after unregister
Taehee Yoo ap420073@gmail.com hsr: remove hsr interface if all slaves are removed
Dave Chinner dchinner@redhat.com xfs: fix use-after-free on CIL context on shutdown
Mark Zhang markz@mellanox.com RDMA/counter: Query a counter before release
Zenghui Yu yuzenghui@huawei.com irqchip/gic-v4.1: Use readx_poll_timeout_atomic() to fix sleep in atomic
Claudiu Manoil claudiu.manoil@nxp.com enetc: Fix HW_VLAN_CTAG_TX|RX toggling
Po Liu Po.Liu@nxp.com net: enetc: add hw tc hw offload features for PSPF capability
Paolo Abeni pabeni@redhat.com mptcp: drop MP_JOIN request sock on syn cookies
David Howells dhowells@redhat.com rxrpc: Fix afs large storage transmission performance drop
Filipe Manana fdmanana@suse.com btrfs: fix RWF_NOWAIT writes blocking on extent locks and waiting for IO
Chen Tao chentao107@huawei.com drm/msm/dpu: fix error return code in dpu_encoder_init
Jens Axboe axboe@kernel.dk io_uring: use signal based task_work running
Oleg Nesterov oleg@redhat.com task_work: teach task_work_add() to do signal_wake_up()
Herbert Xu herbert@gondor.apana.org.au crypto: af_alg - fix use-after-free in af_alg_accept() due to bh_lock_sock()
James Bottomley James.Bottomley@HansenPartnership.com tpm: Fix TIS locality timeout problems
Jarkko Sakkinen jarkko.sakkinen@linux.intel.com selftests: tpm: Use /bin/sh instead of /bin/bash
Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Revert "tpm: selftest: cleanup after unseal with wrong auth/policy test"
Douglas Anderson dianders@chromium.org kgdb: Avoid suspicious RCU usage warning
Pavel Begunkov asml.silence@gmail.com io_uring: fix current->mm NULL dereference on exit
Sagi Grimberg sagi@grimberg.me nvme-multipath: fix bogus request queue reference put
Anton Eidelman anton@lightbitslabs.com nvme-multipath: fix deadlock due to head->lock
Anton Eidelman anton@lightbitslabs.com nvme-multipath: fix deadlock between ana_work and scan_work
Sagi Grimberg sagi@grimberg.me nvme: fix possible deadlock when I/O is blocked
Keith Busch kbusch@kernel.org nvme-multipath: set bdi capabilities once
Xuan Zhuo xuanzhuo@linux.alibaba.com io_uring: fix io_sq_thread no schedule when busy
Christian Borntraeger borntraeger@de.ibm.com s390/debug: avoid kernel warning on too large number of pages
Steven Rostedt (VMware) rostedt@goodmis.org tools lib traceevent: Handle __attribute__((user)) in field names
Steven Rostedt (VMware) rostedt@goodmis.org tools lib traceevent: Add append() function helper for appending strings
Zqiang qiang.zhang@windriver.com usb: usbtest: fix missing kfree(dev->buf) in usbtest_disconnect
David Howells dhowells@redhat.com rxrpc: Fix race between incoming ACK parser and retransmitter
Pavel Begunkov asml.silence@gmail.com io_uring: fix {SQ,IO}POLL with unsupported opcodes
Vlastimil Babka vbabka@suse.cz mm, dump_page(): do not crash with invalid mapping pointer
Qian Cai cai@lca.pw mm/slub: fix stack overruns with SLUB_STATS
Dongli Zhang dongli.zhang@oracle.com mm/slub.c: fix corrupted freechain in deactivate_slab()
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/book3s64/kvm: Fix secondary page table walk warning during migration
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/kvm/book3s: Add helper to walk partition scoped linux page table.
Tero Kristo t-kristo@ti.com soc: ti: omap-prm: use atomic iopoll instead of sleeping one
Valentin Schneider valentin.schneider@arm.com sched/debug: Make sd->flags sysctl read-only
Guchun Chen guchun.chen@amd.com drm/amdgpu: fix kernel page fault issue by ras recovery on sGPU
Evan Quan evan.quan@amd.com drm/amdgpu: fix non-pointer dereference for non-RAS supported
John Clements john.clements@amd.com drm/amdgpu: disable ras query and iject during gpu reset
Chris Wilson chris@chris-wilson.co.uk drm/i915/gt: Mark timeline->cacheline as destroyed after rcu grace period
YueHaibing yuehaibing@huawei.com tipc: Fix NULL pointer dereference in __tipc_sendstream()
Tuomas Tynkkynen tuomas.tynkkynen@iki.fi usbnet: smsc95xx: Fix use-after-free after removal
Tuong Lien tuong.t.lien@dektech.com.au tipc: fix kernel WARNING in tipc_msg_append()
Tuong Lien tuong.t.lien@dektech.com.au tipc: add test for Nagle algorithm effectiveness
Ahmed Abdelsalam ahabdels@gmail.com seg6: fix seg6_validate_srh() to avoid slab-out-of-bounds
Stylon Wang stylon.wang@amd.com drm/amd/display: Fix ineffective setting of max bpc property
Stylon Wang stylon.wang@amd.com drm/amd/display: Fix incorrectly pruned modes with deep color
Hugh Dickins hughd@google.com mm: fix swap cache node allocation mask
Filipe Manana fdmanana@suse.com btrfs: fix race between block group removal and block group creation
Qu Wenruo wqu@suse.com btrfs: block-group: refactor how we delete one block group item
Sungjong Seo sj1557.seo@samsung.com exfat: flush dirty metadata in fsync
Namjae Jeon namjae.jeon@samsung.com exfat: move setting VOL_DIRTY over exfat_remove_entries()
Hyunchul Lee hyc.lee@gmail.com exfat: call sync_filesystem for read-only remount
Dan Carpenter dan.carpenter@oracle.com exfat: add missing brelse() calls on error paths
Hyeongseok.Kim Hyeongseok@gmail.com exfat: Set the unused characters of FileName field to the value 0000h
-------------
Diffstat:
Documentation/filesystems/locking.rst | 2 + Makefile | 4 +- arch/mips/kernel/traps.c | 1 + arch/mips/lantiq/xway/sysctrl.c | 8 +- arch/powerpc/include/asm/kvm_book3s_64.h | 23 ++++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 45 +++++-- arch/powerpc/kvm/book3s_hv_nested.c | 2 +- arch/s390/kernel/debug.c | 3 +- arch/x86/kernel/cpu/intel.c | 11 +- crypto/af_alg.c | 26 ++-- crypto/algif_aead.c | 9 +- crypto/algif_hash.c | 9 +- crypto/algif_skcipher.c | 9 +- drivers/acpi/fan.c | 2 +- drivers/block/virtio_blk.c | 1 + drivers/char/tpm/tpm-dev-common.c | 19 ++- drivers/char/tpm/tpm_ibmvtpm.c | 14 +-- drivers/dma-buf/dma-buf.c | 54 ++++----- drivers/firmware/efi/Kconfig | 11 ++ drivers/firmware/efi/efi.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 29 ++++- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 103 ++++++++++------ drivers/gpu/drm/amd/display/dc/core/dc.c | 10 +- .../gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 11 +- drivers/gpu/drm/i915/gt/intel_timeline.c | 12 +- drivers/gpu/drm/i915/gt/shaders/README | 46 +++++++ .../gpu/drm/i915/gt/shaders/clear_kernel/hsw.asm | 119 ++++++++++++++++++ .../gpu/drm/i915/gt/shaders/clear_kernel/ivb.asm | 117 ++++++++++++++++++ drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 2 +- drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c | 5 +- drivers/hv/vmbus_drv.c | 2 +- drivers/hwmon/acpi_power_meter.c | 4 +- drivers/hwmon/max6697.c | 7 +- drivers/hwmon/pmbus/pmbus_core.c | 8 +- drivers/i2c/algos/i2c-algo-pca.c | 3 +- drivers/i2c/busses/i2c-designware-platdrv.c | 14 ++- drivers/i2c/busses/i2c-mlxcpld.c | 4 +- drivers/infiniband/core/counters.c | 4 +- drivers/irqchip/irq-gic-v3-its.c | 8 +- drivers/irqchip/irq-gic.c | 14 +-- drivers/md/dm-zoned-target.c | 2 +- drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 6 +- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 25 ++-- drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 30 ++--- drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c | 18 +-- .../ethernet/chelsio/cxgb4/cxgb4_tc_u32_parse.h | 122 ++++++++++++------- drivers/net/ethernet/chelsio/cxgb4/sge.c | 2 +- drivers/net/ethernet/freescale/enetc/enetc.c | 49 ++++++++ drivers/net/ethernet/freescale/enetc/enetc.h | 48 ++++++++ drivers/net/ethernet/freescale/enetc/enetc_hw.h | 33 +++-- drivers/net/ethernet/freescale/enetc/enetc_pf.c | 17 ++- drivers/net/usb/smsc95xx.c | 2 +- drivers/nvme/host/core.c | 20 ++-- drivers/nvme/host/multipath.c | 45 +++++-- drivers/nvme/host/nvme.h | 2 + drivers/scsi/qla2xxx/qla_init.c | 2 +- drivers/soc/ti/omap_prm.c | 8 +- drivers/spi/spi-fsl-dspi.c | 17 ++- drivers/thermal/cpufreq_cooling.c | 6 +- drivers/thermal/mtk_thermal.c | 5 +- drivers/thermal/rcar_gen3_thermal.c | 2 +- drivers/thermal/sprd_thermal.c | 4 +- drivers/usb/misc/usbtest.c | 1 + fs/btrfs/block-group.c | 60 +++++++--- fs/btrfs/file.c | 37 ++++-- fs/cifs/connect.c | 10 +- fs/cifs/inode.c | 10 +- fs/exfat/dir.c | 12 +- fs/exfat/exfat_fs.h | 1 + fs/exfat/file.c | 19 ++- fs/exfat/namei.c | 14 ++- fs/exfat/super.c | 10 ++ fs/gfs2/log.c | 10 ++ fs/io_uring.c | 74 ++++++++++-- fs/locks.c | 3 + fs/nfsd/nfs4proc.c | 2 + fs/nfsd/nfs4state.c | 22 +++- fs/nfsd/nfsctl.c | 23 ++-- fs/nfsd/nfsd.h | 5 + fs/nfsd/nfssvc.c | 6 + fs/nfsd/vfs.c | 6 + fs/xfs/xfs_log_cil.c | 10 +- fs/xfs/xfs_log_priv.h | 2 +- include/crypto/if_alg.h | 4 +- include/linux/fs.h | 1 + include/linux/kthread.h | 1 + include/linux/lsm_hook_defs.h | 2 +- include/linux/sched/jobctl.h | 4 +- include/linux/sunrpc/svc.h | 1 + include/linux/task_work.h | 5 +- include/net/seg6.h | 2 +- kernel/debug/debug_core.c | 4 + kernel/kthread.c | 17 +++ kernel/padata.c | 4 +- kernel/sched/debug.c | 2 +- kernel/signal.c | 10 +- kernel/task_work.c | 16 ++- mm/cma.c | 4 +- mm/debug.c | 56 ++++++++- mm/hugetlb.c | 2 +- mm/slub.c | 30 ++++- mm/swap_state.c | 4 +- net/core/filter.c | 2 +- net/hsr/hsr_device.c | 21 +--- net/hsr/hsr_device.h | 2 +- net/hsr/hsr_main.c | 27 ++++- net/hsr/hsr_netlink.c | 17 +++ net/ipv6/ipv6_sockglue.c | 2 +- net/ipv6/seg6.c | 16 ++- net/ipv6/seg6_iptunnel.c | 2 +- net/ipv6/seg6_local.c | 6 +- net/mptcp/subflow.c | 18 +-- net/rxrpc/call_event.c | 29 ++--- net/tipc/msg.c | 7 +- net/tipc/msg.h | 14 ++- net/tipc/socket.c | 69 +++++++++-- samples/vfs/test-statx.c | 2 + security/security.c | 17 ++- sound/usb/card.h | 4 - sound/usb/endpoint.c | 43 +------ sound/usb/endpoint.h | 1 - sound/usb/pcm.c | 2 - tools/lib/traceevent/event-parse.c | 133 ++++++++++++--------- tools/testing/selftests/tpm2/test_smoke.sh | 7 +- tools/testing/selftests/tpm2/test_space.sh | 2 +- 130 files changed, 1593 insertions(+), 612 deletions(-)
From: Hyeongseok.Kim Hyeongseok@gmail.com
[ Upstream commit 4ba6ccd695f5ed3ae851e59b443b757bbe4557fe ]
Some fsck tool complain that padding part of the FileName field is not set to the value 0000h. So let's maintain filesystem cleaner, as exfat's spec. recommendation.
Signed-off-by: Hyeongseok.Kim Hyeongseok@gmail.com Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exfat/dir.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c index 4b91afb0f0515..349ca0c282c2c 100644 --- a/fs/exfat/dir.c +++ b/fs/exfat/dir.c @@ -430,10 +430,12 @@ static void exfat_init_name_entry(struct exfat_dentry *ep, ep->dentry.name.flags = 0x0;
for (i = 0; i < EXFAT_FILE_NAME_LEN; i++) { - ep->dentry.name.unicode_0_14[i] = cpu_to_le16(*uniname); - if (*uniname == 0x0) - break; - uniname++; + if (*uniname != 0x0) { + ep->dentry.name.unicode_0_14[i] = cpu_to_le16(*uniname); + uniname++; + } else { + ep->dentry.name.unicode_0_14[i] = 0x0; + } } }
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit e8dd3cda8667118b70d9fe527f61fe22623de04d ]
If the second exfat_get_dentry() call fails then we need to release "old_bh" before returning. There is a similar bug in exfat_move_file().
Fixes: 5f2aa075070c ("exfat: add inode operations") Reported-by: Markus Elfring Markus.Elfring@web.de Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exfat/namei.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c index a2659a8a68a14..3bf1dbadab691 100644 --- a/fs/exfat/namei.c +++ b/fs/exfat/namei.c @@ -1089,10 +1089,14 @@ static int exfat_rename_file(struct inode *inode, struct exfat_chain *p_dir,
epold = exfat_get_dentry(sb, p_dir, oldentry + 1, &old_bh, §or_old); + if (!epold) + return -EIO; epnew = exfat_get_dentry(sb, p_dir, newentry + 1, &new_bh, §or_new); - if (!epold || !epnew) + if (!epnew) { + brelse(old_bh); return -EIO; + }
memcpy(epnew, epold, DENTRY_SIZE); exfat_update_bh(sb, new_bh, sync); @@ -1173,10 +1177,14 @@ static int exfat_move_file(struct inode *inode, struct exfat_chain *p_olddir,
epmov = exfat_get_dentry(sb, p_olddir, oldentry + 1, &mov_bh, §or_mov); + if (!epmov) + return -EIO; epnew = exfat_get_dentry(sb, p_newdir, newentry + 1, &new_bh, §or_new); - if (!epmov || !epnew) + if (!epnew) { + brelse(mov_bh); return -EIO; + }
memcpy(epnew, epmov, DENTRY_SIZE); exfat_update_bh(sb, new_bh, IS_DIRSYNC(inode));
From: Hyunchul Lee hyc.lee@gmail.com
[ Upstream commit a0271a15cf2cf907ea5b0f2ba611123f1b7935ec ]
We need to commit dirty metadata and pages to disk before remounting exfat as read-only.
This fixes a failure in xfstests generic/452
generic/452 does the following: cp something <exfat>/ mount -o remount,ro <exfat>
the <exfat>/something is corrupted. because while exfat is remounted as read-only, exfat doesn't have a chance to commit metadata and vfs invalidates page caches in a block device.
Signed-off-by: Hyunchul Lee hyc.lee@gmail.com Acked-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exfat/super.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/fs/exfat/super.c b/fs/exfat/super.c index c1b1ed306a485..e879801533980 100644 --- a/fs/exfat/super.c +++ b/fs/exfat/super.c @@ -637,10 +637,20 @@ static void exfat_free(struct fs_context *fc) } }
+static int exfat_reconfigure(struct fs_context *fc) +{ + fc->sb_flags |= SB_NODIRATIME; + + /* volume flag will be updated in exfat_sync_fs */ + sync_filesystem(fc->root->d_sb); + return 0; +} + static const struct fs_context_operations exfat_context_ops = { .parse_param = exfat_parse_param, .get_tree = exfat_get_tree, .free = exfat_free, + .reconfigure = exfat_reconfigure, };
static int exfat_init_fs_context(struct fs_context *fc)
From: Namjae Jeon namjae.jeon@samsung.com
[ Upstream commit 3bcfb701099acf96b0e883bf5544f96af473aa1d ]
Move setting VOL_DIRTY over exfat_remove_entries() to avoid unneeded leaving VOL_DIRTY on -ENOTEMPTY.
Fixes: 5f2aa075070c ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7 Reported-by: Tetsuhiro Kohada kohada.t2@gmail.com Reviewed-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exfat/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/exfat/namei.c b/fs/exfat/namei.c index 3bf1dbadab691..2c9c783177213 100644 --- a/fs/exfat/namei.c +++ b/fs/exfat/namei.c @@ -984,7 +984,6 @@ static int exfat_rmdir(struct inode *dir, struct dentry *dentry) goto unlock; }
- exfat_set_vol_flags(sb, VOL_DIRTY); exfat_chain_set(&clu_to_free, ei->start_clu, EXFAT_B_TO_CLU_ROUND_UP(i_size_read(inode), sbi), ei->flags);
@@ -1012,6 +1011,7 @@ static int exfat_rmdir(struct inode *dir, struct dentry *dentry) num_entries++; brelse(bh);
+ exfat_set_vol_flags(sb, VOL_DIRTY); err = exfat_remove_entries(dir, &cdir, entry, 0, num_entries); if (err) { exfat_msg(sb, KERN_ERR,
From: Sungjong Seo sj1557.seo@samsung.com
[ Upstream commit 5267456e953fd8c5abd8e278b1cc6a9f9027ac0a ]
generic_file_fsync() exfat used could not guarantee the consistency of a file because it has flushed not dirty metadata but only dirty data pages for a file.
Instead of that, use exfat_file_fsync() for files and directories so that it guarantees to commit both the metadata and data pages for a file.
Signed-off-by: Sungjong Seo sj1557.seo@samsung.com Signed-off-by: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/exfat/dir.c | 2 +- fs/exfat/exfat_fs.h | 1 + fs/exfat/file.c | 19 ++++++++++++++++++- 3 files changed, 20 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c index 349ca0c282c2c..6db302d76d4c1 100644 --- a/fs/exfat/dir.c +++ b/fs/exfat/dir.c @@ -314,7 +314,7 @@ const struct file_operations exfat_dir_operations = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate = exfat_iterate, - .fsync = generic_file_fsync, + .fsync = exfat_file_fsync, };
int exfat_alloc_new_dir(struct inode *inode, struct exfat_chain *clu) diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h index d67fb8a6f770c..d865050fa6cd7 100644 --- a/fs/exfat/exfat_fs.h +++ b/fs/exfat/exfat_fs.h @@ -424,6 +424,7 @@ void exfat_truncate(struct inode *inode, loff_t size); int exfat_setattr(struct dentry *dentry, struct iattr *attr); int exfat_getattr(const struct path *path, struct kstat *stat, unsigned int request_mask, unsigned int query_flags); +int exfat_file_fsync(struct file *file, loff_t start, loff_t end, int datasync);
/* namei.c */ extern const struct dentry_operations exfat_dentry_ops; diff --git a/fs/exfat/file.c b/fs/exfat/file.c index 5b4ddff187312..b93aa9e6cb16c 100644 --- a/fs/exfat/file.c +++ b/fs/exfat/file.c @@ -6,6 +6,7 @@ #include <linux/slab.h> #include <linux/cred.h> #include <linux/buffer_head.h> +#include <linux/blkdev.h>
#include "exfat_raw.h" #include "exfat_fs.h" @@ -347,12 +348,28 @@ int exfat_setattr(struct dentry *dentry, struct iattr *attr) return error; }
+int exfat_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync) +{ + struct inode *inode = filp->f_mapping->host; + int err; + + err = __generic_file_fsync(filp, start, end, datasync); + if (err) + return err; + + err = sync_blockdev(inode->i_sb->s_bdev); + if (err) + return err; + + return blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); +} + const struct file_operations exfat_file_operations = { .llseek = generic_file_llseek, .read_iter = generic_file_read_iter, .write_iter = generic_file_write_iter, .mmap = generic_file_mmap, - .fsync = generic_file_fsync, + .fsync = exfat_file_fsync, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, };
From: Qu Wenruo wqu@suse.com
[ Upstream commit 7357623a7f4beb4ac76005f8fac9fc0230f9a67e ]
When deleting a block group item, it's pretty straight forward, just delete the item pointed by the key. However it will not be that straight-forward for incoming skinny block group item.
So refactor the block group item deletion into a new function, remove_block_group_item(), also to make the already lengthy btrfs_remove_block_group() a little shorter.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/block-group.c | 37 +++++++++++++++++++++++++------------ 1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 0c17f18b47940..d80857d00b0fb 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -863,11 +863,34 @@ static void clear_incompat_bg_bits(struct btrfs_fs_info *fs_info, u64 flags) } }
+static int remove_block_group_item(struct btrfs_trans_handle *trans, + struct btrfs_path *path, + struct btrfs_block_group *block_group) +{ + struct btrfs_fs_info *fs_info = trans->fs_info; + struct btrfs_root *root; + struct btrfs_key key; + int ret; + + root = fs_info->extent_root; + key.objectid = block_group->start; + key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; + key.offset = block_group->length; + + ret = btrfs_search_slot(trans, root, &key, path, -1, 1); + if (ret > 0) + ret = -ENOENT; + if (ret < 0) + return ret; + + ret = btrfs_del_item(trans, root, path); + return ret; +} + int btrfs_remove_block_group(struct btrfs_trans_handle *trans, u64 group_start, struct extent_map *em) { struct btrfs_fs_info *fs_info = trans->fs_info; - struct btrfs_root *root = fs_info->extent_root; struct btrfs_path *path; struct btrfs_block_group *block_group; struct btrfs_free_cluster *cluster; @@ -1068,10 +1091,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
spin_unlock(&block_group->space_info->lock);
- key.objectid = block_group->start; - key.type = BTRFS_BLOCK_GROUP_ITEM_KEY; - key.offset = block_group->length; - mutex_lock(&fs_info->chunk_mutex); spin_lock(&block_group->lock); block_group->removed = 1; @@ -1107,16 +1126,10 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans, if (ret) goto out;
- ret = btrfs_search_slot(trans, root, &key, path, -1, 1); - if (ret > 0) - ret = -EIO; + ret = remove_block_group_item(trans, path, block_group); if (ret < 0) goto out;
- ret = btrfs_del_item(trans, root, path); - if (ret) - goto out; - if (remove_em) { struct extent_map_tree *em_tree;
From: Filipe Manana fdmanana@suse.com
[ Upstream commit ffcb9d44572afbaf8fa6dbf5115bff6dab7b299e ]
There is a race between block group removal and block group creation when the removal is completed by a task running fitrim or scrub. When this happens we end up failing the block group creation with an error -EEXIST since we attempt to insert a duplicate block group item key in the extent tree. That results in a transaction abort.
The race happens like this:
1) Task A is doing a fitrim, and at btrfs_trim_block_group() it freezes block group X with btrfs_freeze_block_group() (until very recently that was named btrfs_get_block_group_trimming());
2) Task B starts removing block group X, either because it's now unused or due to relocation for example. So at btrfs_remove_block_group(), while holding the chunk mutex and the block group's lock, it sets the 'removed' flag of the block group and it sets the local variable 'remove_em' to false, because the block group is currently frozen (its 'frozen' counter is > 0, until very recently this counter was named 'trimming');
3) Task B unlocks the block group and the chunk mutex;
4) Task A is done trimming the block group and unfreezes the block group by calling btrfs_unfreeze_block_group() (until very recently this was named btrfs_put_block_group_trimming()). In this function we lock the block group and set the local variable 'cleanup' to true because we were able to decrement the block group's 'frozen' counter down to 0 and the flag 'removed' is set in the block group.
Since 'cleanup' is set to true, it locks the chunk mutex and removes the extent mapping representing the block group from the mapping tree;
5) Task C allocates a new block group Y and it picks up the logical address that block group X had as the logical address for Y, because X was the block group with the highest logical address and now the second block group with the highest logical address, the last in the fs mapping tree, ends at an offset corresponding to block group X's logical address (this logical address selection is done at volumes.c:find_next_chunk()).
At this point the new block group Y does not have yet its item added to the extent tree (nor the corresponding device extent items and chunk item in the device and chunk trees). The new group Y is added to the list of pending block groups in the transaction handle;
6) Before task B proceeds to removing the block group item for block group X from the extent tree, which has a key matching:
(X logical offset, BTRFS_BLOCK_GROUP_ITEM_KEY, length)
task C while ending its transaction handle calls btrfs_create_pending_block_groups(), which finds block group Y and tries to insert the block group item for Y into the exten tree, which fails with -EEXIST since logical offset is the same that X had and task B hasn't yet deleted the key from the extent tree. This failure results in a transaction abort, producing a stack like the following:
------------[ cut here ]------------ BTRFS: Transaction aborted (error -17) WARNING: CPU: 2 PID: 19736 at fs/btrfs/block-group.c:2074 btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs] Modules linked in: btrfs blake2b_generic xor raid6_pq (...) CPU: 2 PID: 19736 Comm: fsstress Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 RIP: 0010:btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs] Code: ff ff ff 48 8b 55 50 f0 48 (...) RSP: 0018:ffffa4160a1c7d58 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff961581909d98 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffffb3d63990 RDI: 0000000000000001 RBP: ffff9614f3356a58 R08: 0000000000000000 R09: 0000000000000001 R10: ffff9615b65b0040 R11: 0000000000000000 R12: ffff961581909c10 R13: ffff9615b0c32000 R14: ffff9614f3356ab0 R15: ffff9614be779000 FS: 00007f2ce2841e80(0000) GS:ffff9615bae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555f18780000 CR3: 0000000131d34005 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_start_dirty_block_groups+0x398/0x4e0 [btrfs] btrfs_commit_transaction+0xd0/0xc50 [btrfs] ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] ? __ia32_sys_fdatasync+0x20/0x20 iterate_supers+0xdb/0x180 ksys_sync+0x60/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x5c/0x280 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f2ce1d4d5b7 Code: 83 c4 08 48 3d 01 (...) RSP: 002b:00007ffd8b558c58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2 RAX: ffffffffffffffda RBX: 000000000000002c RCX: 00007f2ce1d4d5b7 RDX: 00000000ffffffff RSI: 00000000186ba07b RDI: 000000000000002c RBP: 0000555f17b9e520 R08: 0000000000000012 R09: 000000000000ce00 R10: 0000000000000078 R11: 0000000000000202 R12: 0000000000000032 R13: 0000000051eb851f R14: 00007ffd8b558cd0 R15: 0000555f1798ec20 irq event stamp: 0 hardirqs last enabled at (0): [<0000000000000000>] 0x0 hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 softirqs last enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020 softirqs last disabled at (0): [<0000000000000000>] 0x0 ---[ end trace bd7c03622e0b0a9c ]---
Fix this simply by making btrfs_remove_block_group() remove the block group's item from the extent tree before it flags the block group as removed. Also make the free space deletion from the free space tree before flagging the block group as removed, to avoid a similar race with adding and removing free space entries for the free space tree.
Fixes: 04216820fe83d5 ("Btrfs: fix race between fs trimming and block group remove/allocation") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/block-group.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index d80857d00b0fb..1b1c869530088 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -1091,6 +1091,25 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
spin_unlock(&block_group->space_info->lock);
+ /* + * Remove the free space for the block group from the free space tree + * and the block group's item from the extent tree before marking the + * block group as removed. This is to prevent races with tasks that + * freeze and unfreeze a block group, this task and another task + * allocating a new block group - the unfreeze task ends up removing + * the block group's extent map before the task calling this function + * deletes the block group item from the extent tree, allowing for + * another task to attempt to create another block group with the same + * item key (and failing with -EEXIST and a transaction abort). + */ + ret = remove_block_group_free_space(trans, block_group); + if (ret) + goto out; + + ret = remove_block_group_item(trans, path, block_group); + if (ret < 0) + goto out; + mutex_lock(&fs_info->chunk_mutex); spin_lock(&block_group->lock); block_group->removed = 1; @@ -1122,14 +1141,6 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
mutex_unlock(&fs_info->chunk_mutex);
- ret = remove_block_group_free_space(trans, block_group); - if (ret) - goto out; - - ret = remove_block_group_item(trans, path, block_group); - if (ret < 0) - goto out; - if (remove_em) { struct extent_map_tree *em_tree;
From: Hugh Dickins hughd@google.com
[ Upstream commit 243bce09c91b0145aeaedd5afba799d81841c030 ]
Chris Murphy reports that a slightly overcommitted load, testing swap and zram along with i915, splats and keeps on splatting, when it had better fail less noisily:
gnome-shell: page allocation failure: order:0, mode:0x400d0(__GFP_IO|__GFP_FS|__GFP_COMP|__GFP_RECLAIMABLE), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 2 PID: 1155 Comm: gnome-shell Not tainted 5.7.0-1.fc33.x86_64 #1 Call Trace: dump_stack+0x64/0x88 warn_alloc.cold+0x75/0xd9 __alloc_pages_slowpath.constprop.0+0xcfa/0xd30 __alloc_pages_nodemask+0x2df/0x320 alloc_slab_page+0x195/0x310 allocate_slab+0x3c5/0x440 ___slab_alloc+0x40c/0x5f0 __slab_alloc+0x1c/0x30 kmem_cache_alloc+0x20e/0x220 xas_nomem+0x28/0x70 add_to_swap_cache+0x321/0x400 __read_swap_cache_async+0x105/0x240 swap_cluster_readahead+0x22c/0x2e0 shmem_swapin+0x8e/0xc0 shmem_swapin_page+0x196/0x740 shmem_getpage_gfp+0x3a2/0xa60 shmem_read_mapping_page_gfp+0x32/0x60 shmem_get_pages+0x155/0x5e0 [i915] __i915_gem_object_get_pages+0x68/0xa0 [i915] i915_vma_pin+0x3fe/0x6c0 [i915] eb_add_vma+0x10b/0x2c0 [i915] i915_gem_do_execbuffer+0x704/0x3430 [i915] i915_gem_execbuffer2_ioctl+0x1ea/0x3e0 [i915] drm_ioctl_kernel+0x86/0xd0 [drm] drm_ioctl+0x206/0x390 [drm] ksys_ioctl+0x82/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported on 5.7, but it goes back really to 3.1: when shmem_read_mapping_page_gfp() was implemented for use by i915, and allowed for __GFP_NORETRY and __GFP_NOWARN flags in most places, but missed swapin's "& GFP_KERNEL" mask for page tree node allocation in __read_swap_cache_async() - that was to mask off HIGHUSER_MOVABLE bits from what page cache uses, but GFP_RECLAIM_MASK is now what's needed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=208085 Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2006151330070.11064@eggly.anvils Fixes: 68da9f055755 ("tmpfs: pass gfp to shmem_getpage_gfp") Signed-off-by: Hugh Dickins hughd@google.com Reviewed-by: Vlastimil Babka vbabka@suse.cz Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Reported-by: Chris Murphy lists@colorremedies.com Analyzed-by: Vlastimil Babka vbabka@suse.cz Analyzed-by: Matthew Wilcox willy@infradead.org Tested-by: Chris Murphy lists@colorremedies.com Cc: stable@vger.kernel.org [3.1+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/swap_state.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/swap_state.c b/mm/swap_state.c index ebed37bbf7a39..e3d36776c08bf 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -23,6 +23,7 @@ #include <linux/huge_mm.h>
#include <asm/pgtable.h> +#include "internal.h"
/* * swapper_space is a fiction, retained to simplify the path through @@ -418,7 +419,8 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, /* May fail (-ENOMEM) if XArray node allocation failed. */ __SetPageLocked(new_page); __SetPageSwapBacked(new_page); - err = add_to_swap_cache(new_page, entry, gfp_mask & GFP_KERNEL); + err = add_to_swap_cache(new_page, entry, + gfp_mask & GFP_RECLAIM_MASK); if (likely(!err)) { /* Initiate read into locked page */ SetPageWorkingset(new_page);
From: Stylon Wang stylon.wang@amd.com
[ Upstream commit cbd14ae7ea934fd9d9f95103a0601a7fea243573 ]
[Why] When "max bpc" is set to enable deep color, some modes are removed from the list if they fail validation on max bpc. These modes should be kept if they validates fine with lower bpc.
[How] - Retry with lower bpc in mode validation. - Same in atomic commit to apply working bpc, not necessarily max bpc.
Signed-off-by: Stylon Wang stylon.wang@amd.com Reviewed-by: Nicholas Kazlauskas Nicholas.Kazlauskas@amd.com Acked-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 102 +++++++++++------- 1 file changed, 64 insertions(+), 38 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index f9f02e08054bc..b7e161f2a47d4 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -3797,8 +3797,7 @@ static void update_stream_scaling_settings(const struct drm_display_mode *mode,
static enum dc_color_depth convert_color_depth_from_display_info(const struct drm_connector *connector, - const struct drm_connector_state *state, - bool is_y420) + bool is_y420, int requested_bpc) { uint8_t bpc;
@@ -3818,10 +3817,7 @@ convert_color_depth_from_display_info(const struct drm_connector *connector, bpc = bpc ? bpc : 8; }
- if (!state) - state = connector->state; - - if (state) { + if (requested_bpc > 0) { /* * Cap display bpc based on the user requested value. * @@ -3830,7 +3826,7 @@ convert_color_depth_from_display_info(const struct drm_connector *connector, * or if this was called outside of atomic check, so it * can't be used directly. */ - bpc = min(bpc, state->max_requested_bpc); + bpc = min_t(u8, bpc, requested_bpc);
/* Round down to the nearest even number. */ bpc = bpc - (bpc & 1); @@ -3952,7 +3948,8 @@ static void fill_stream_properties_from_drm_display_mode( const struct drm_display_mode *mode_in, const struct drm_connector *connector, const struct drm_connector_state *connector_state, - const struct dc_stream_state *old_stream) + const struct dc_stream_state *old_stream, + int requested_bpc) { struct dc_crtc_timing *timing_out = &stream->timing; const struct drm_display_info *info = &connector->display_info; @@ -3982,8 +3979,9 @@ static void fill_stream_properties_from_drm_display_mode(
timing_out->timing_3d_format = TIMING_3D_FORMAT_NONE; timing_out->display_color_depth = convert_color_depth_from_display_info( - connector, connector_state, - (timing_out->pixel_encoding == PIXEL_ENCODING_YCBCR420)); + connector, + (timing_out->pixel_encoding == PIXEL_ENCODING_YCBCR420), + requested_bpc); timing_out->scan_type = SCANNING_TYPE_NODATA; timing_out->hdmi_vic = 0;
@@ -4189,7 +4187,8 @@ static struct dc_stream_state * create_stream_for_sink(struct amdgpu_dm_connector *aconnector, const struct drm_display_mode *drm_mode, const struct dm_connector_state *dm_state, - const struct dc_stream_state *old_stream) + const struct dc_stream_state *old_stream, + int requested_bpc) { struct drm_display_mode *preferred_mode = NULL; struct drm_connector *drm_connector; @@ -4274,10 +4273,10 @@ create_stream_for_sink(struct amdgpu_dm_connector *aconnector, */ if (!scale || mode_refresh != preferred_refresh) fill_stream_properties_from_drm_display_mode(stream, - &mode, &aconnector->base, con_state, NULL); + &mode, &aconnector->base, con_state, NULL, requested_bpc); else fill_stream_properties_from_drm_display_mode(stream, - &mode, &aconnector->base, con_state, old_stream); + &mode, &aconnector->base, con_state, old_stream, requested_bpc);
stream->timing.flags.DSC = 0;
@@ -4800,16 +4799,54 @@ static void handle_edid_mgmt(struct amdgpu_dm_connector *aconnector) create_eml_sink(aconnector); }
+static struct dc_stream_state * +create_validate_stream_for_sink(struct amdgpu_dm_connector *aconnector, + const struct drm_display_mode *drm_mode, + const struct dm_connector_state *dm_state, + const struct dc_stream_state *old_stream) +{ + struct drm_connector *connector = &aconnector->base; + struct amdgpu_device *adev = connector->dev->dev_private; + struct dc_stream_state *stream; + int requested_bpc = connector->state ? connector->state->max_requested_bpc : 8; + enum dc_status dc_result = DC_OK; + + do { + stream = create_stream_for_sink(aconnector, drm_mode, + dm_state, old_stream, + requested_bpc); + if (stream == NULL) { + DRM_ERROR("Failed to create stream for sink!\n"); + break; + } + + dc_result = dc_validate_stream(adev->dm.dc, stream); + + if (dc_result != DC_OK) { + DRM_DEBUG_KMS("Mode %dx%d (clk %d) failed DC validation with error %d\n", + drm_mode->hdisplay, + drm_mode->vdisplay, + drm_mode->clock, + dc_result); + + dc_stream_release(stream); + stream = NULL; + requested_bpc -= 2; /* lower bpc to retry validation */ + } + + } while (stream == NULL && requested_bpc >= 6); + + return stream; +} + enum drm_mode_status amdgpu_dm_connector_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { int result = MODE_ERROR; struct dc_sink *dc_sink; - struct amdgpu_device *adev = connector->dev->dev_private; /* TODO: Unhardcode stream count */ struct dc_stream_state *stream; struct amdgpu_dm_connector *aconnector = to_amdgpu_dm_connector(connector); - enum dc_status dc_result = DC_OK;
if ((mode->flags & DRM_MODE_FLAG_INTERLACE) || (mode->flags & DRM_MODE_FLAG_DBLSCAN)) @@ -4830,24 +4867,11 @@ enum drm_mode_status amdgpu_dm_connector_mode_valid(struct drm_connector *connec goto fail; }
- stream = create_stream_for_sink(aconnector, mode, NULL, NULL); - if (stream == NULL) { - DRM_ERROR("Failed to create stream for sink!\n"); - goto fail; - } - - dc_result = dc_validate_stream(adev->dm.dc, stream); - - if (dc_result == DC_OK) + stream = create_validate_stream_for_sink(aconnector, mode, NULL, NULL); + if (stream) { + dc_stream_release(stream); result = MODE_OK; - else - DRM_DEBUG_KMS("Mode %dx%d (clk %d) failed DC validation with error %d\n", - mode->hdisplay, - mode->vdisplay, - mode->clock, - dc_result); - - dc_stream_release(stream); + }
fail: /* TODO: error handling*/ @@ -5170,10 +5194,12 @@ static int dm_encoder_helper_atomic_check(struct drm_encoder *encoder, return 0;
if (!state->duplicated) { + int max_bpc = conn_state->max_requested_bpc; is_y420 = drm_mode_is_420_also(&connector->display_info, adjusted_mode) && aconnector->force_yuv420_output; - color_depth = convert_color_depth_from_display_info(connector, conn_state, - is_y420); + color_depth = convert_color_depth_from_display_info(connector, + is_y420, + max_bpc); bpp = convert_dc_color_depth_into_bpc(color_depth) * 3; clock = adjusted_mode->clock; dm_new_connector_state->pbn = drm_dp_calc_pbn_mode(clock, bpp, false); @@ -7589,10 +7615,10 @@ static int dm_update_crtc_state(struct amdgpu_display_manager *dm, if (!drm_atomic_crtc_needs_modeset(new_crtc_state)) goto skip_modeset;
- new_stream = create_stream_for_sink(aconnector, - &new_crtc_state->mode, - dm_new_conn_state, - dm_old_crtc_state->stream); + new_stream = create_validate_stream_for_sink(aconnector, + &new_crtc_state->mode, + dm_new_conn_state, + dm_old_crtc_state->stream);
/* * we can have no stream on ACTION_SET if a display
From: Stylon Wang stylon.wang@amd.com
[ Upstream commit fa7041d9d2fc7401cece43f305eb5b87b7017fc4 ]
[Why] Regression was introduced where setting max bpc property has no effect on the atomic check and final commit. It has the same effect as max bpc being stuck at 8.
[How] Correctly propagate max bpc with the new connector state.
Signed-off-by: Stylon Wang stylon.wang@amd.com Reviewed-by: Nicholas Kazlauskas Nicholas.Kazlauskas@amd.com Acked-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index b7e161f2a47d4..69b1f61928eff 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -4808,7 +4808,8 @@ create_validate_stream_for_sink(struct amdgpu_dm_connector *aconnector, struct drm_connector *connector = &aconnector->base; struct amdgpu_device *adev = connector->dev->dev_private; struct dc_stream_state *stream; - int requested_bpc = connector->state ? connector->state->max_requested_bpc : 8; + const struct drm_connector_state *drm_state = dm_state ? &dm_state->base : NULL; + int requested_bpc = drm_state ? drm_state->max_requested_bpc : 8; enum dc_status dc_result = DC_OK;
do {
From: Ahmed Abdelsalam ahabdels@gmail.com
[ Upstream commit bb986a50421a11bf31a81afb15b9b8f45a4a3a11 ]
The seg6_validate_srh() is used to validate SRH for three cases:
case1: SRH of data-plane SRv6 packets to be processed by the Linux kernel. Case2: SRH of the netlink message received from user-space (iproute2) Case3: SRH injected into packets through setsockopt
In case1, the SRH can be encoded in the Reduced way (i.e., first SID is carried in DA only and not represented as SID in the SRH) and the seg6_validate_srh() now handles this case correctly.
In case2 and case3, the SRH shouldn’t be encoded in the Reduced way otherwise we lose the first segment (i.e., the first hop).
The current implementation of the seg6_validate_srh() allow SRH of case2 and case3 to be encoded in the Reduced way. This leads a slab-out-of-bounds problem.
This patch verifies SRH of case1, case2 and case3. Allowing case1 to be reduced while preventing SRH of case2 and case3 from being reduced .
Reported-by: syzbot+e8c028b62439eac42073@syzkaller.appspotmail.com Reported-by: YueHaibing yuehaibing@huawei.com Fixes: 0cb7498f234e ("seg6: fix SRH processing to comply with RFC8754") Signed-off-by: Ahmed Abdelsalam ahabdels@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/seg6.h | 2 +- net/core/filter.c | 2 +- net/ipv6/ipv6_sockglue.c | 2 +- net/ipv6/seg6.c | 16 ++++++++++------ net/ipv6/seg6_iptunnel.c | 2 +- net/ipv6/seg6_local.c | 6 +++--- 6 files changed, 17 insertions(+), 13 deletions(-)
diff --git a/include/net/seg6.h b/include/net/seg6.h index 640724b352731..9d19c15e8545c 100644 --- a/include/net/seg6.h +++ b/include/net/seg6.h @@ -57,7 +57,7 @@ extern void seg6_iptunnel_exit(void); extern int seg6_local_init(void); extern void seg6_local_exit(void);
-extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len); +extern bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len, bool reduced); extern int seg6_do_srh_encap(struct sk_buff *skb, struct ipv6_sr_hdr *osrh, int proto); extern int seg6_do_srh_inline(struct sk_buff *skb, struct ipv6_sr_hdr *osrh); diff --git a/net/core/filter.c b/net/core/filter.c index 9512a9772d691..45fa65a289833 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4920,7 +4920,7 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len int err; struct ipv6_sr_hdr *srh = (struct ipv6_sr_hdr *)hdr;
- if (!seg6_validate_srh(srh, len)) + if (!seg6_validate_srh(srh, len, false)) return -EINVAL;
switch (type) { diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 5af97b4f5df30..ff187fd2083ff 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -458,7 +458,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, struct ipv6_sr_hdr *srh = (struct ipv6_sr_hdr *) opt->srcrt;
- if (!seg6_validate_srh(srh, optlen)) + if (!seg6_validate_srh(srh, optlen, false)) goto sticky_done; break; } diff --git a/net/ipv6/seg6.c b/net/ipv6/seg6.c index 37b434293bda3..d2f8138e5a73a 100644 --- a/net/ipv6/seg6.c +++ b/net/ipv6/seg6.c @@ -25,7 +25,7 @@ #include <net/seg6_hmac.h> #endif
-bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len) +bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len, bool reduced) { unsigned int tlv_offset; int max_last_entry; @@ -37,13 +37,17 @@ bool seg6_validate_srh(struct ipv6_sr_hdr *srh, int len) if (((srh->hdrlen + 1) << 3) != len) return false;
- max_last_entry = (srh->hdrlen / 2) - 1; - - if (srh->first_segment > max_last_entry) + if (!reduced && srh->segments_left > srh->first_segment) { return false; + } else { + max_last_entry = (srh->hdrlen / 2) - 1;
- if (srh->segments_left > srh->first_segment + 1) - return false; + if (srh->first_segment > max_last_entry) + return false; + + if (srh->segments_left > srh->first_segment + 1) + return false; + }
tlv_offset = sizeof(*srh) + ((srh->first_segment + 1) << 4);
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index c7cbfeae94f5e..e0e9f48ab14fe 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -426,7 +426,7 @@ static int seg6_build_state(struct net *net, struct nlattr *nla, }
/* verify that SRH is consistent */ - if (!seg6_validate_srh(tuninfo->srh, tuninfo_len - sizeof(*tuninfo))) + if (!seg6_validate_srh(tuninfo->srh, tuninfo_len - sizeof(*tuninfo), false)) return -EINVAL;
newts = lwtunnel_state_alloc(tuninfo_len + sizeof(*slwt)); diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c index 52493423f3299..eba23279912df 100644 --- a/net/ipv6/seg6_local.c +++ b/net/ipv6/seg6_local.c @@ -87,7 +87,7 @@ static struct ipv6_sr_hdr *get_srh(struct sk_buff *skb) */ srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
- if (!seg6_validate_srh(srh, len)) + if (!seg6_validate_srh(srh, len, true)) return NULL;
return srh; @@ -495,7 +495,7 @@ bool seg6_bpf_has_valid_srh(struct sk_buff *skb) return false;
srh->hdrlen = (u8)(srh_state->hdrlen >> 3); - if (!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3)) + if (!seg6_validate_srh(srh, (srh->hdrlen + 1) << 3, true)) return false;
srh_state->valid = true; @@ -670,7 +670,7 @@ static int parse_nla_srh(struct nlattr **attrs, struct seg6_local_lwt *slwt) if (len < sizeof(*srh) + sizeof(struct in6_addr)) return -EINVAL;
- if (!seg6_validate_srh(srh, len)) + if (!seg6_validate_srh(srh, len, false)) return -EINVAL;
slwt->srh = kmemdup(srh, len, GFP_KERNEL);
From: Tuong Lien tuong.t.lien@dektech.com.au
[ Upstream commit 0a3e060f340dbe232ffa290c40f879b7f7db595b ]
When streaming in Nagle mode, we try to bundle small messages from user as many as possible if there is one outstanding buffer, i.e. not ACK-ed by the receiving side, which helps boost up the overall throughput. So, the algorithm's effectiveness really depends on when Nagle ACK comes or what the specific network latency (RTT) is, compared to the user's message sending rate.
In a bad case, the user's sending rate is low or the network latency is small, there will not be many bundles, so making a Nagle ACK or waiting for it is not meaningful. For example: a user sends its messages every 100ms and the RTT is 50ms, then for each messages, we require one Nagle ACK but then there is only one user message sent without any bundles.
In a better case, even if we have a few bundles (e.g. the RTT = 300ms), but now the user sends messages in medium size, then there will not be any difference at all, that says 3 x 1000-byte data messages if bundled will still result in 3 bundles with MTU = 1500.
When Nagle is ineffective, the delay in user message sending is clearly wasted instead of sending directly.
Besides, adding Nagle ACKs will consume some processor load on both the sending and receiving sides.
This commit adds a test on the effectiveness of the Nagle algorithm for an individual connection in the network on which it actually runs. Particularly, upon receipt of a Nagle ACK we will compare the number of bundles in the backlog queue to the number of user messages which would be sent directly without Nagle. If the ratio is good (e.g. >= 2), Nagle mode will be kept for further message sending. Otherwise, we will leave Nagle and put a 'penalty' on the connection, so it will have to spend more 'one-way' messages before being able to re-enter Nagle.
In addition, the 'ack-required' bit is only set when really needed that the number of Nagle ACKs will be reduced during Nagle mode.
Testing with benchmark showed that with the patch, there was not much difference in throughput for small messages since the tool continuously sends messages without a break, so Nagle would still take in effect.
Acked-by: Ying Xue ying.xue@windriver.com Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Tuong Lien tuong.t.lien@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/msg.c | 3 --- net/tipc/msg.h | 14 +++++++++-- net/tipc/socket.c | 64 ++++++++++++++++++++++++++++++++++++++--------- 3 files changed, 64 insertions(+), 17 deletions(-)
diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 3ad411884e6c0..93966321f8929 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -235,9 +235,6 @@ int tipc_msg_append(struct tipc_msg *_hdr, struct msghdr *m, int dlen, msg_set_size(hdr, MIN_H_SIZE); __skb_queue_tail(txq, skb); total += 1; - if (prev) - msg_set_ack_required(buf_msg(prev), 0); - msg_set_ack_required(hdr, 1); } hdr = buf_msg(skb); curr = msg_blocks(hdr); diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 871feadbbc191..a4e2029170b1b 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -321,9 +321,19 @@ static inline int msg_ack_required(struct tipc_msg *m) return msg_bits(m, 0, 18, 1); }
-static inline void msg_set_ack_required(struct tipc_msg *m, u32 d) +static inline void msg_set_ack_required(struct tipc_msg *m) { - msg_set_bits(m, 0, 18, 1, d); + msg_set_bits(m, 0, 18, 1, 1); +} + +static inline int msg_nagle_ack(struct tipc_msg *m) +{ + return msg_bits(m, 0, 18, 1); +} + +static inline void msg_set_nagle_ack(struct tipc_msg *m) +{ + msg_set_bits(m, 0, 18, 1, 1); }
static inline bool msg_is_rcast(struct tipc_msg *m) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index e370ad0edd768..d6b67d07d22ec 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -48,6 +48,8 @@ #include "group.h" #include "trace.h"
+#define NAGLE_START_INIT 4 +#define NAGLE_START_MAX 1024 #define CONN_TIMEOUT_DEFAULT 8000 /* default connect timeout = 8s */ #define CONN_PROBING_INTV msecs_to_jiffies(3600000) /* [ms] => 1 h */ #define TIPC_FWD_MSG 1 @@ -119,7 +121,10 @@ struct tipc_sock { struct rcu_head rcu; struct tipc_group *group; u32 oneway; + u32 nagle_start; u16 snd_backlog; + u16 msg_acc; + u16 pkt_cnt; bool expect_ack; bool nodelay; bool group_is_open; @@ -143,7 +148,7 @@ static int tipc_sk_insert(struct tipc_sock *tsk); static void tipc_sk_remove(struct tipc_sock *tsk); static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dsz); static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dsz); -static void tipc_sk_push_backlog(struct tipc_sock *tsk); +static void tipc_sk_push_backlog(struct tipc_sock *tsk, bool nagle_ack);
static const struct proto_ops packet_ops; static const struct proto_ops stream_ops; @@ -474,6 +479,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock, tsk = tipc_sk(sk); tsk->max_pkt = MAX_PKT_DEFAULT; tsk->maxnagle = 0; + tsk->nagle_start = NAGLE_START_INIT; INIT_LIST_HEAD(&tsk->publications); INIT_LIST_HEAD(&tsk->cong_links); msg = &tsk->phdr; @@ -541,7 +547,7 @@ static void __tipc_shutdown(struct socket *sock, int error) !tsk_conn_cong(tsk)));
/* Push out delayed messages if in Nagle mode */ - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, false); /* Remove pending SYN */ __skb_queue_purge(&sk->sk_write_queue);
@@ -1252,14 +1258,37 @@ void tipc_sk_mcast_rcv(struct net *net, struct sk_buff_head *arrvq, /* tipc_sk_push_backlog(): send accumulated buffers in socket write queue * when socket is in Nagle mode */ -static void tipc_sk_push_backlog(struct tipc_sock *tsk) +static void tipc_sk_push_backlog(struct tipc_sock *tsk, bool nagle_ack) { struct sk_buff_head *txq = &tsk->sk.sk_write_queue; + struct sk_buff *skb = skb_peek_tail(txq); struct net *net = sock_net(&tsk->sk); u32 dnode = tsk_peer_node(tsk); - struct sk_buff *skb = skb_peek(txq); int rc;
+ if (nagle_ack) { + tsk->pkt_cnt += skb_queue_len(txq); + if (!tsk->pkt_cnt || tsk->msg_acc / tsk->pkt_cnt < 2) { + tsk->oneway = 0; + if (tsk->nagle_start < NAGLE_START_MAX) + tsk->nagle_start *= 2; + tsk->expect_ack = false; + pr_debug("tsk %10u: bad nagle %u -> %u, next start %u!\n", + tsk->portid, tsk->msg_acc, tsk->pkt_cnt, + tsk->nagle_start); + } else { + tsk->nagle_start = NAGLE_START_INIT; + if (skb) { + msg_set_ack_required(buf_msg(skb)); + tsk->expect_ack = true; + } else { + tsk->expect_ack = false; + } + } + tsk->msg_acc = 0; + tsk->pkt_cnt = 0; + } + if (!skb || tsk->cong_link_cnt) return;
@@ -1267,9 +1296,10 @@ static void tipc_sk_push_backlog(struct tipc_sock *tsk) if (msg_is_syn(buf_msg(skb))) return;
+ if (tsk->msg_acc) + tsk->pkt_cnt += skb_queue_len(txq); tsk->snt_unacked += tsk->snd_backlog; tsk->snd_backlog = 0; - tsk->expect_ack = true; rc = tipc_node_xmit(net, txq, dnode, tsk->portid); if (rc == -ELINKCONG) tsk->cong_link_cnt = 1; @@ -1322,8 +1352,7 @@ static void tipc_sk_conn_proto_rcv(struct tipc_sock *tsk, struct sk_buff *skb, return; } else if (mtyp == CONN_ACK) { was_cong = tsk_conn_cong(tsk); - tsk->expect_ack = false; - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, msg_nagle_ack(hdr)); tsk->snt_unacked -= msg_conn_ack(hdr); if (tsk->peer_caps & TIPC_BLOCK_FLOWCTL) tsk->snd_win = msg_adv_win(hdr); @@ -1516,6 +1545,7 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen) struct tipc_sock *tsk = tipc_sk(sk); struct tipc_msg *hdr = &tsk->phdr; struct net *net = sock_net(sk); + struct sk_buff *skb; u32 dnode = tsk_peer_node(tsk); int maxnagle = tsk->maxnagle; int maxpkt = tsk->max_pkt; @@ -1544,17 +1574,25 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen) break; send = min_t(size_t, dlen - sent, TIPC_MAX_USER_MSG_SIZE); blocks = tsk->snd_backlog; - if (tsk->oneway++ >= 4 && send <= maxnagle) { + if (tsk->oneway++ >= tsk->nagle_start && send <= maxnagle) { rc = tipc_msg_append(hdr, m, send, maxnagle, txq); if (unlikely(rc < 0)) break; blocks += rc; + tsk->msg_acc++; if (blocks <= 64 && tsk->expect_ack) { tsk->snd_backlog = blocks; sent += send; break; + } else if (blocks > 64) { + tsk->pkt_cnt += skb_queue_len(txq); + } else { + skb = skb_peek_tail(txq); + msg_set_ack_required(buf_msg(skb)); + tsk->expect_ack = true; + tsk->msg_acc = 0; + tsk->pkt_cnt = 0; } - tsk->expect_ack = true; } else { rc = tipc_msg_build(hdr, m, sent, send, maxpkt, txq); if (unlikely(rc != send)) @@ -2091,7 +2129,7 @@ static void tipc_sk_proto_rcv(struct sock *sk, smp_wmb(); tsk->cong_link_cnt--; wakeup = true; - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, false); break; case GROUP_PROTOCOL: tipc_group_proto_rcv(grp, &wakeup, hdr, inputq, xmitq); @@ -2180,7 +2218,7 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb, return false; case TIPC_ESTABLISHED: if (!skb_queue_empty(&sk->sk_write_queue)) - tipc_sk_push_backlog(tsk); + tipc_sk_push_backlog(tsk, false); /* Accept only connection-based messages sent by peer */ if (likely(con_msg && !err && pport == oport && pnode == onode)) { @@ -2188,8 +2226,10 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb, struct sk_buff *skb;
skb = tipc_sk_build_ack(tsk); - if (skb) + if (skb) { + msg_set_nagle_ack(buf_msg(skb)); __skb_queue_tail(xmitq, skb); + } } return true; }
From: Tuong Lien tuong.t.lien@dektech.com.au
[ Upstream commit c9aa81faf19115fc2e732e7f210b37bb316987ff ]
syzbot found the following issue:
WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 check_copy_size include/linux/thread_info.h:150 [inline] WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 copy_from_iter include/linux/uio.h:144 [inline] WARNING: CPU: 0 PID: 6808 at include/linux/thread_info.h:150 tipc_msg_append+0x49a/0x5e0 net/tipc/msg.c:242 Kernel panic - not syncing: panic_on_warn set ...
This happens after commit 5e9eeccc58f3 ("tipc: fix NULL pointer dereference in streaming") that tried to build at least one buffer even when the message data length is zero... However, it now exposes another bug that the 'mss' can be zero and the 'cpy' will be negative, thus the above kernel WARNING will appear! The zero value of 'mss' is never expected because it means Nagle is not enabled for the socket (actually the socket type was 'SOCK_SEQPACKET'), so the function 'tipc_msg_append()' must not be called at all. But that was in this particular case since the message data length was zero, and the 'send <= maxnagle' check became true.
We resolve the issue by explicitly checking if Nagle is enabled for the socket, i.e. 'maxnagle != 0' before calling the 'tipc_msg_append()'. We also reinforce the function to against such a negative values if any.
Reported-by: syzbot+75139a7d2605236b0b7f@syzkaller.appspotmail.com Fixes: c0bceb97db9e ("tipc: add smart nagle feature") Acked-by: Jon Maloy jmaloy@redhat.com Signed-off-by: Tuong Lien tuong.t.lien@dektech.com.au Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/msg.c | 4 ++-- net/tipc/socket.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/tipc/msg.c b/net/tipc/msg.c index 93966321f8929..560d7a4c0ffff 100644 --- a/net/tipc/msg.c +++ b/net/tipc/msg.c @@ -239,14 +239,14 @@ int tipc_msg_append(struct tipc_msg *_hdr, struct msghdr *m, int dlen, hdr = buf_msg(skb); curr = msg_blocks(hdr); mlen = msg_size(hdr); - cpy = min_t(int, rem, mss - mlen); + cpy = min_t(size_t, rem, mss - mlen); if (cpy != copy_from_iter(skb->data + mlen, cpy, &m->msg_iter)) return -EFAULT; msg_set_size(hdr, mlen + cpy); skb_put(skb, cpy); rem -= cpy; total += msg_blocks(hdr) - curr; - } while (rem); + } while (rem > 0); return total - accounted; }
diff --git a/net/tipc/socket.c b/net/tipc/socket.c index d6b67d07d22ec..62fc871a8d673 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1574,7 +1574,8 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen) break; send = min_t(size_t, dlen - sent, TIPC_MAX_USER_MSG_SIZE); blocks = tsk->snd_backlog; - if (tsk->oneway++ >= tsk->nagle_start && send <= maxnagle) { + if (tsk->oneway++ >= tsk->nagle_start && maxnagle && + send <= maxnagle) { rc = tipc_msg_append(hdr, m, send, maxnagle, txq); if (unlikely(rc < 0)) break;
From: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi
[ Upstream commit b835a71ef64a61383c414d6bf2896d2c0161deca ]
Syzbot reports an use-after-free in workqueue context:
BUG: KASAN: use-after-free in mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737 mutex_unlock+0x19/0x40 kernel/locking/mutex.c:737 __smsc95xx_mdio_read drivers/net/usb/smsc95xx.c:217 [inline] smsc95xx_mdio_read+0x583/0x870 drivers/net/usb/smsc95xx.c:278 check_carrier+0xd1/0x2e0 drivers/net/usb/smsc95xx.c:644 process_one_work+0x777/0xf90 kernel/workqueue.c:2274 worker_thread+0xa8f/0x1430 kernel/workqueue.c:2420 kthread+0x2df/0x300 kernel/kthread.c:255
It looks like that smsc95xx_unbind() is freeing the structures that are still in use by the concurrently running workqueue callback. Thus switch to using cancel_delayed_work_sync() to ensure the work callback really is no longer active.
Reported-by: syzbot+29dc7d4ae19b703ff947@syzkaller.appspotmail.com Signed-off-by: Tuomas Tynkkynen tuomas.tynkkynen@iki.fi Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/smsc95xx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c index 355be77f42418..3cf4dc3433f91 100644 --- a/drivers/net/usb/smsc95xx.c +++ b/drivers/net/usb/smsc95xx.c @@ -1324,7 +1324,7 @@ static void smsc95xx_unbind(struct usbnet *dev, struct usb_interface *intf) struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
if (pdata) { - cancel_delayed_work(&pdata->carrier_check); + cancel_delayed_work_sync(&pdata->carrier_check); netif_dbg(dev, ifdown, dev->net, "free pdata\n"); kfree(pdata); pdata = NULL;
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit 4c21daae3dbc9f8536cc18e6e53627821fa2c90c ]
tipc_sendstream() may send zero length packet, then tipc_msg_append() do not alloc skb, skb_peek_tail() will get NULL, msg_set_ack_required will trigger NULL pointer dereference.
Reported-by: syzbot+8eac6d030e7807c21d32@syzkaller.appspotmail.com Fixes: 0a3e060f340d ("tipc: add test for Nagle algorithm effectiveness") Signed-off-by: YueHaibing yuehaibing@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/tipc/socket.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/tipc/socket.c b/net/tipc/socket.c index 62fc871a8d673..f02f2abf6e3c0 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -1589,8 +1589,12 @@ static int __tipc_sendstream(struct socket *sock, struct msghdr *m, size_t dlen) tsk->pkt_cnt += skb_queue_len(txq); } else { skb = skb_peek_tail(txq); - msg_set_ack_required(buf_msg(skb)); - tsk->expect_ack = true; + if (skb) { + msg_set_ack_required(buf_msg(skb)); + tsk->expect_ack = true; + } else { + tsk->expect_ack = false; + } tsk->msg_acc = 0; tsk->pkt_cnt = 0; }
From: Chris Wilson chris@chris-wilson.co.uk
[ Upstream commit 8e87e0139aff59c5961347ab1ef06814f092c439 ]
Since we take advantage of RCU for some i915_active objects, like the intel_timeline_cacheline, we need to delay the i915_active_fini until after the RCU grace period and we perform the kfree -- that is until after all RCU protected readers.
<3> [108.204873] ODEBUG: assert_init not available (active state 0) object type: i915_active hint: __cacheline_active+0x0/0x80 [i915] <4> [108.207377] WARNING: CPU: 3 PID: 2342 at lib/debugobjects.c:488 debug_print_object+0x67/0x90 <4> [108.207400] Modules linked in: vgem snd_hda_codec_hdmi x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel ghash_clmulni_intel snd_intel_dspcfg snd_hda_codec ax88179_178a snd_hwdep usbnet btusb snd_hda_core btrtl mii btbcm btintel snd_pcm bluetooth ecdh_generic ecc i915 i2c_hid pinctrl_sunrisepoint pinctrl_intel intel_lpss_pci prime_numbers <4> [108.207587] CPU: 3 PID: 2342 Comm: gem_exec_parall Tainted: G U 5.6.0-rc6-CI-Patchwork_17047+ #1 <4> [108.207609] Hardware name: Google Soraka/Soraka, BIOS MrChromebox-4.10 08/25/2019 <4> [108.207639] RIP: 0010:debug_print_object+0x67/0x90 <4> [108.207668] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 87 d2 8a 02 8b 53 10 4c 89 e6 48 c7 c7 38 2b 32 82 48 8b 14 d5 80 2f 07 82 e8 49 d5 b7 ff <0f> 0b 5b 83 05 c3 f6 22 01 01 5d 41 5c c3 83 05 b8 f6 22 01 01 c3 <4> [108.207692] RSP: 0018:ffffc90000e7f890 EFLAGS: 00010282 <4> [108.207723] RAX: 0000000000000000 RBX: ffffc90000e7f8b0 RCX: 0000000000000001 <4> [108.207747] RDX: 0000000080000001 RSI: ffff88817ada8cb8 RDI: 00000000ffffffff <4> [108.207770] RBP: ffffffffa0341cc0 R08: ffff88816b5a8948 R09: 0000000000000000 <4> [108.207792] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff82322d54 <4> [108.207814] R13: ffffffffa0341cc0 R14: ffffffff83df9568 R15: ffff88816064f400 <4> [108.207839] FS: 00007f437d753700(0000) GS:ffff88817ad80000(0000) knlGS:0000000000000000 <4> [108.207863] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [108.207887] CR2: 00007f2ad1fb5000 CR3: 00000001725d8004 CR4: 00000000003606e0 <4> [108.207907] Call Trace: <4> [108.207959] debug_object_assert_init+0x15c/0x180 <4> [108.208475] ? i915_active_acquire_if_busy+0x10/0x50 [i915] <4> [108.208513] ? rcu_read_lock_held+0x4d/0x60 <4> [108.208970] i915_active_acquire_if_busy+0x10/0x50 [i915] <4> [108.209380] intel_timeline_read_hwsp+0x81/0x540 [i915] <4> [108.210262] __emit_semaphore_wait+0x45/0x1b0 [i915] <4> [108.210726] ? i915_request_await_dma_fence+0x143/0x560 [i915] <4> [108.211156] i915_request_await_dma_fence+0x28a/0x560 [i915] <4> [108.211633] i915_request_await_object+0x24a/0x3f0 [i915] <4> [108.212102] eb_submit.isra.47+0x58f/0x920 [i915] <4> [108.212622] i915_gem_do_execbuffer+0x1706/0x2c70 [i915] <4> [108.213071] ? i915_gem_execbuffer2_ioctl+0xc0/0x470 [i915]
Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Reviewed-by: Matthew Auld matthew.auld@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20200323092841.22240-1-chris@c... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_timeline.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index 08b56d7ab4f45..92da746f01c1e 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -119,6 +119,15 @@ static void __idle_hwsp_free(struct intel_timeline_hwsp *hwsp, int cacheline) spin_unlock_irqrestore(>->hwsp_lock, flags); }
+static void __rcu_cacheline_free(struct rcu_head *rcu) +{ + struct intel_timeline_cacheline *cl = + container_of(rcu, typeof(*cl), rcu); + + i915_active_fini(&cl->active); + kfree(cl); +} + static void __idle_cacheline_free(struct intel_timeline_cacheline *cl) { GEM_BUG_ON(!i915_active_is_idle(&cl->active)); @@ -127,8 +136,7 @@ static void __idle_cacheline_free(struct intel_timeline_cacheline *cl) i915_vma_put(cl->hwsp->vma); __idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
- i915_active_fini(&cl->active); - kfree_rcu(cl, rcu); + call_rcu(&cl->rcu, __rcu_cacheline_free); }
__i915_active_call
From: John Clements john.clements@amd.com
[ Upstream commit 61380faa4b4cc577df8a7ff5db5859bac6b351f7 ]
added flag to ras context to indicate if ras query functionality is ready
Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: John Clements john.clements@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 24 +++++++++++++++++++--- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 4 ++++ 3 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index affde2de2a0db..59288653412db 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4091,6 +4091,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, need_full_reset = job_signaled = false; INIT_LIST_HEAD(&device_list);
+ amdgpu_ras_set_error_query_ready(adev, false); + dev_info(adev->dev, "GPU %s begin!\n", (in_ras_intr && !use_baco) ? "jobs stop":"reset");
@@ -4147,6 +4149,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev, /* block all schedulers and reset given job's ring */ list_for_each_entry(tmp_adev, device_list_handle, gmc.xgmi.head) { if (tmp_adev != adev) { + amdgpu_ras_set_error_query_ready(tmp_adev, false); amdgpu_device_lock_adev(tmp_adev, false); if (!amdgpu_sriov_vf(tmp_adev)) amdgpu_amdkfd_pre_reset(tmp_adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index ab379b44679cc..aa6148d12d5a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -80,6 +80,20 @@ atomic_t amdgpu_ras_in_intr = ATOMIC_INIT(0); static bool amdgpu_ras_check_bad_page(struct amdgpu_device *adev, uint64_t addr);
+void amdgpu_ras_set_error_query_ready(struct amdgpu_device *adev, bool ready) +{ + if (adev) + amdgpu_ras_get_context(adev)->error_query_ready = ready; +} + +bool amdgpu_ras_get_error_query_ready(struct amdgpu_device *adev) +{ + if (adev) + return amdgpu_ras_get_context(adev)->error_query_ready; + + return false; +} + static ssize_t amdgpu_ras_debugfs_read(struct file *f, char __user *buf, size_t size, loff_t *pos) { @@ -281,7 +295,7 @@ static ssize_t amdgpu_ras_debugfs_ctrl_write(struct file *f, const char __user * struct ras_debug_if data; int ret = 0;
- if (amdgpu_ras_intr_triggered()) { + if (!amdgpu_ras_get_error_query_ready(adev)) { DRM_WARN("RAS WARN: error injection currently inaccessible\n"); return size; } @@ -399,7 +413,7 @@ static ssize_t amdgpu_ras_sysfs_read(struct device *dev, .head = obj->head, };
- if (amdgpu_ras_intr_triggered()) + if (!amdgpu_ras_get_error_query_ready(obj->adev)) return snprintf(buf, PAGE_SIZE, "Query currently inaccessible\n");
@@ -1896,8 +1910,10 @@ int amdgpu_ras_late_init(struct amdgpu_device *adev, }
/* in resume phase, no need to create ras fs node */ - if (adev->in_suspend || adev->in_gpu_reset) + if (adev->in_suspend || adev->in_gpu_reset) { + amdgpu_ras_set_error_query_ready(adev, true); return 0; + }
if (ih_info->cb) { r = amdgpu_ras_interrupt_add_handler(adev, ih_info); @@ -1909,6 +1925,8 @@ int amdgpu_ras_late_init(struct amdgpu_device *adev, if (r) goto sysfs;
+ amdgpu_ras_set_error_query_ready(adev, true); + return 0; cleanup: amdgpu_ras_sysfs_remove(adev, ras_block); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h index 55c3eceb390d4..e7df5d8429f82 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h @@ -334,6 +334,8 @@ struct amdgpu_ras { uint32_t flags; bool reboot; struct amdgpu_ras_eeprom_control eeprom_control; + + bool error_query_ready; };
struct ras_fs_data { @@ -629,4 +631,6 @@ static inline void amdgpu_ras_intr_cleared(void)
void amdgpu_ras_global_ras_isr(struct amdgpu_device *adev);
+void amdgpu_ras_set_error_query_ready(struct amdgpu_device *adev, bool ready); + #endif
From: Evan Quan evan.quan@amd.com
[ Upstream commit a9d82d2f91297679cfafd7e61c4bccdca6cd550d ]
Backtrace on gpu recover test on Navi10.
[ 1324.516681] RIP: 0010:amdgpu_ras_set_error_query_ready+0x15/0x20 [amdgpu] [ 1324.523778] Code: 4c 89 f7 e8 cd a2 a0 d8 e9 99 fe ff ff 45 31 ff e9 91 fe ff ff 0f 1f 44 00 00 55 48 85 ff 48 89 e5 74 0e 48 8b 87 d8 2b 01 00 <40> 88 b0 38 01 00 00 5d c3 66 90 0f 1f 44 00 00 55 31 c0 48 85 ff [ 1324.543452] RSP: 0018:ffffaa1040e4bd28 EFLAGS: 00010286 [ 1324.549025] RAX: 0000000000000000 RBX: ffff911198b20000 RCX: 0000000000000000 [ 1324.556217] RDX: 00000000000c0a01 RSI: 0000000000000000 RDI: ffff911198b20000 [ 1324.563514] RBP: ffffaa1040e4bd28 R08: 0000000000001000 R09: ffff91119d0028c0 [ 1324.570804] R10: ffffffff9a606b40 R11: 0000000000000000 R12: 0000000000000000 [ 1324.578413] R13: ffffaa1040e4bd70 R14: ffff911198b20000 R15: 0000000000000000 [ 1324.586464] FS: 00007f4441cbf540(0000) GS:ffff91119ed80000(0000) knlGS:0000000000000000 [ 1324.595434] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1324.601345] CR2: 0000000000000138 CR3: 00000003fcdf8004 CR4: 00000000003606e0 [ 1324.608694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1324.616303] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1324.623678] Call Trace: [ 1324.626270] amdgpu_device_gpu_recover+0x6e7/0xc50 [amdgpu] [ 1324.632018] ? seq_printf+0x4e/0x70 [ 1324.636652] amdgpu_debugfs_gpu_recover+0x50/0x80 [amdgpu] [ 1324.643371] seq_read+0xda/0x420 [ 1324.647601] full_proxy_read+0x5c/0x90 [ 1324.652426] __vfs_read+0x1b/0x40 [ 1324.656734] vfs_read+0x8e/0x130 [ 1324.660981] ksys_read+0xa7/0xe0 [ 1324.665201] __x64_sys_read+0x1a/0x20 [ 1324.669907] do_syscall_64+0x57/0x1c0 [ 1324.674517] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 1324.680654] RIP: 0033:0x7f44417cf081
Signed-off-by: Evan Quan evan.quan@amd.com Reviewed-by: John Clements John.Clements@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index aa6148d12d5a4..b0aa4e1ed4df7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -82,13 +82,13 @@ static bool amdgpu_ras_check_bad_page(struct amdgpu_device *adev,
void amdgpu_ras_set_error_query_ready(struct amdgpu_device *adev, bool ready) { - if (adev) + if (adev && amdgpu_ras_get_context(adev)) amdgpu_ras_get_context(adev)->error_query_ready = ready; }
bool amdgpu_ras_get_error_query_ready(struct amdgpu_device *adev) { - if (adev) + if (adev && amdgpu_ras_get_context(adev)) return amdgpu_ras_get_context(adev)->error_query_ready;
return false;
From: Guchun Chen guchun.chen@amd.com
[ Upstream commit 12c17b9d62663c14a5343d6742682b3e67280754 ]
When running ras uncorrectable error injection and triggering GPU reset on sGPU, below issue is observed. It's caused by the list uninitialized when accessing.
[ 80.047227] BUG: unable to handle page fault for address: ffffffffc0f4f750 [ 80.047300] #PF: supervisor write access in kernel mode [ 80.047351] #PF: error_code(0x0003) - permissions violation [ 80.047404] PGD 12c20e067 P4D 12c20e067 PUD 12c210067 PMD 41c4ee067 PTE 404316061 [ 80.047477] Oops: 0003 [#1] SMP PTI [ 80.047516] CPU: 7 PID: 377 Comm: kworker/7:2 Tainted: G OE 5.4.0-rc7-guchchen #1 [ 80.047594] Hardware name: System manufacturer System Product Name/TUF Z370-PLUS GAMING II, BIOS 0411 09/21/2018 [ 80.047888] Workqueue: events amdgpu_ras_do_recovery [amdgpu]
Signed-off-by: Guchun Chen guchun.chen@amd.com Reviewed-by: John Clements John.Clements@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index b0aa4e1ed4df7..cd18596b47d33 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -1444,9 +1444,10 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) struct amdgpu_hive_info *hive = amdgpu_get_xgmi_hive(adev, false);
/* Build list of devices to query RAS related errors */ - if (hive && adev->gmc.xgmi.num_physical_nodes > 1) { + if (hive && adev->gmc.xgmi.num_physical_nodes > 1) device_list_handle = &hive->device_list; - } else { + else { + INIT_LIST_HEAD(&device_list); list_add_tail(&adev->gmc.xgmi.head, &device_list); device_list_handle = &device_list; }
From: Valentin Schneider valentin.schneider@arm.com
[ Upstream commit 9818427c6270a9ce8c52c8621026fe9cebae0f92 ]
Writing to the sysctl of a sched_domain->flags directly updates the value of the field, and goes nowhere near update_top_cache_domain(). This means that the cached domain pointers can end up containing stale data (e.g. the domain pointed to doesn't have the relevant flag set anymore).
Explicit domain walks that check for flags will be affected by the write, but this won't be in sync with the cached pointers which will still point to the domains that were cached at the last sched_domain build.
In other words, writing to this interface is playing a dangerous game. It could be made to trigger an update of the cached sched_domain pointers when written to, but this does not seem to be worth the trouble. Make it read-only.
Signed-off-by: Valentin Schneider valentin.schneider@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20200415210512.805-3-valentin.schneider@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 239970b991c03..0f4aaad236a9d 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -258,7 +258,7 @@ sd_alloc_ctl_domain_table(struct sched_domain *sd) set_table_entry(&table[2], "busy_factor", &sd->busy_factor, sizeof(int), 0644, proc_dointvec_minmax); set_table_entry(&table[3], "imbalance_pct", &sd->imbalance_pct, sizeof(int), 0644, proc_dointvec_minmax); set_table_entry(&table[4], "cache_nice_tries", &sd->cache_nice_tries, sizeof(int), 0644, proc_dointvec_minmax); - set_table_entry(&table[5], "flags", &sd->flags, sizeof(int), 0644, proc_dointvec_minmax); + set_table_entry(&table[5], "flags", &sd->flags, sizeof(int), 0444, proc_dointvec_minmax); set_table_entry(&table[6], "max_newidle_lb_cost", &sd->max_newidle_lb_cost, sizeof(long), 0644, proc_doulongvec_minmax); set_table_entry(&table[7], "name", sd->name, CORENAME_MAX_SIZE, 0444, proc_dostring); /* &table[8] is terminator */
From: Tero Kristo t-kristo@ti.com
[ Upstream commit 98ece19f247159a51003796ede7112fef2df5d7f ]
The reset handling APIs for omap-prm can be invoked PM runtime which runs in atomic context. For this to work properly, switch to atomic iopoll version instead of the current which can sleep. Otherwise, this throws a "BUG: scheduling while atomic" warning. Issue is seen rather easily when CONFIG_PREEMPT is enabled.
Signed-off-by: Tero Kristo t-kristo@ti.com Acked-by: Santosh Shilimkar ssantosh@kernel.org Signed-off-by: Tony Lindgren tony@atomide.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soc/ti/omap_prm.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c index 96c6f777519c0..c9b3f9ebf0bbf 100644 --- a/drivers/soc/ti/omap_prm.c +++ b/drivers/soc/ti/omap_prm.c @@ -256,10 +256,10 @@ static int omap_reset_deassert(struct reset_controller_dev *rcdev, goto exit;
/* wait for the status to be set */ - ret = readl_relaxed_poll_timeout(reset->prm->base + - reset->prm->data->rstst, - v, v & BIT(st_bit), 1, - OMAP_RESET_MAX_WAIT); + ret = readl_relaxed_poll_timeout_atomic(reset->prm->base + + reset->prm->data->rstst, + v, v & BIT(st_bit), 1, + OMAP_RESET_MAX_WAIT); if (ret) pr_err("%s: timedout waiting for %s:%lu\n", __func__, reset->prm->data->name, id);
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 4b99412ed6972cc77c1f16009e1d00323fcef9ab ]
The locking rules for walking partition scoped table is different from process scoped table. Hence add a helper for secondary linux page table walk and also add check whether we are holding the right locks.
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20200505071729.54912-10-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/kvm_book3s_64.h | 13 +++++++++++++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 12 ++++++------ arch/powerpc/kvm/book3s_hv_nested.c | 2 +- 3 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 04b2b927bb5ae..2c2635967d6e0 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -14,6 +14,7 @@ #include <asm/book3s/64/mmu-hash.h> #include <asm/cpu_has_feature.h> #include <asm/ppc-opcode.h> +#include <asm/pte-walk.h>
#ifdef CONFIG_PPC_PSERIES static inline bool kvmhv_on_pseries(void) @@ -634,6 +635,18 @@ extern void kvmhv_remove_nest_rmap_range(struct kvm *kvm, unsigned long gpa, unsigned long hpa, unsigned long nbytes);
+static inline pte_t *find_kvm_secondary_pte(struct kvm *kvm, unsigned long ea, + unsigned *hshift) +{ + pte_t *pte; + + VM_WARN(!spin_is_locked(&kvm->mmu_lock), + "%s called with kvm mmu_lock not held \n", __func__); + pte = __find_linux_pte(kvm->arch.pgtable, ea, NULL, hshift); + + return pte; +} + #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
#endif /* __ASM_KVM_BOOK3S_64_H__ */ diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index bc6c1aa3d0e92..e9b3622405b1d 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -993,11 +993,11 @@ int kvm_unmap_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, return 0; }
- ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); + ptep = find_kvm_secondary_pte(kvm, gpa, &shift); if (ptep && pte_present(*ptep)) kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot, kvm->arch.lpid); - return 0; + return 0; }
/* Called with kvm->mmu_lock held */ @@ -1013,7 +1013,7 @@ int kvm_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE) return ref;
- ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); + ptep = find_kvm_secondary_pte(kvm, gpa, &shift); if (ptep && pte_present(*ptep) && pte_young(*ptep)) { old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_ACCESSED, 0, gpa, shift); @@ -1040,7 +1040,7 @@ int kvm_test_age_radix(struct kvm *kvm, struct kvm_memory_slot *memslot, if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE) return ref;
- ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); + ptep = find_kvm_secondary_pte(kvm, gpa, &shift); if (ptep && pte_present(*ptep) && pte_young(*ptep)) ref = 1; return ref; @@ -1060,7 +1060,7 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm, if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE) return ret;
- ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); + ptep = find_kvm_secondary_pte(kvm, gpa, &shift); if (ptep && pte_present(*ptep) && pte_dirty(*ptep)) { ret = 1; if (shift) @@ -1121,7 +1121,7 @@ void kvmppc_radix_flush_memslot(struct kvm *kvm, gpa = memslot->base_gfn << PAGE_SHIFT; spin_lock(&kvm->mmu_lock); for (n = memslot->npages; n; --n) { - ptep = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); + ptep = find_kvm_secondary_pte(kvm, gpa, &shift); if (ptep && pte_present(*ptep)) kvmppc_unmap_pte(kvm, ptep, gpa, shift, memslot, kvm->arch.lpid); diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c index dc97e5be76f61..7f1fc5db13eab 100644 --- a/arch/powerpc/kvm/book3s_hv_nested.c +++ b/arch/powerpc/kvm/book3s_hv_nested.c @@ -1362,7 +1362,7 @@ static long int __kvmhv_nested_page_fault(struct kvm_run *run, /* See if can find translation in our partition scoped tables for L1 */ pte = __pte(0); spin_lock(&kvm->mmu_lock); - pte_p = __find_linux_pte(kvm->arch.pgtable, gpa, NULL, &shift); + pte_p = find_kvm_secondary_pte(kvm, gpa, &shift); if (!shift) shift = PAGE_SHIFT; if (pte_p)
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit bf8036a4098d1548cdccf9ed5c523ef4e83e3c68 ]
This patch fixes the below warning reported during migration:
find_kvm_secondary_pte called with kvm mmu_lock not held CPU: 23 PID: 5341 Comm: qemu-system-ppc Tainted: G W 5.7.0-rc5-kvm-00211-g9ccf10d6d088 #432 NIP: c008000000fe848c LR: c008000000fe8488 CTR: 0000000000000000 REGS: c000001e19f077e0 TRAP: 0700 Tainted: G W (5.7.0-rc5-kvm-00211-g9ccf10d6d088) MSR: 9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE> CR: 42222422 XER: 20040000 CFAR: c00000000012f5ac IRQMASK: 0 GPR00: c008000000fe8488 c000001e19f07a70 c008000000ffe200 0000000000000039 GPR04: 0000000000000001 c000001ffc8b4900 0000000000018840 0000000000000007 GPR08: 0000000000000003 0000000000000001 0000000000000007 0000000000000001 GPR12: 0000000000002000 c000001fff6d9400 000000011f884678 00007fff70b70000 GPR16: 00007fff7137cb90 00007fff7dcb4410 0000000000000001 0000000000000000 GPR20: 000000000ffe0000 0000000000000000 0000000000000001 0000000000000000 GPR24: 8000000000000000 0000000000000001 c000001e1f67e600 c000001e1fd82410 GPR28: 0000000000001000 c000001e2e410000 0000000000000fff 0000000000000ffe NIP [c008000000fe848c] kvmppc_hv_get_dirty_log_radix+0x2e4/0x340 [kvm_hv] LR [c008000000fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] Call Trace: [c000001e19f07a70] [c008000000fe8488] kvmppc_hv_get_dirty_log_radix+0x2e0/0x340 [kvm_hv] (unreliable) [c000001e19f07b50] [c008000000fd42e4] kvm_vm_ioctl_get_dirty_log_hv+0x33c/0x3c0 [kvm_hv] [c000001e19f07be0] [c008000000eea878] kvm_vm_ioctl_get_dirty_log+0x30/0x50 [kvm] [c000001e19f07c00] [c008000000edc818] kvm_vm_ioctl+0x2b0/0xc00 [kvm] [c000001e19f07d50] [c00000000046e148] ksys_ioctl+0xf8/0x150 [c000001e19f07da0] [c00000000046e1c8] sys_ioctl+0x28/0x80 [c000001e19f07dc0] [c00000000003652c] system_call_exception+0x16c/0x240 [c000001e19f07e20] [c00000000000d070] system_call_common+0xf0/0x278 Instruction dump: 7d3a512a 4200ffd0 7ffefb78 4bfffdc4 60000000 3c820000 e8848468 3c620000 e86384a8 38840010 4800673d e8410018 <0fe00000> 4bfffdd4 60000000 60000000
Reported-by: Paul Mackerras paulus@ozlabs.org Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20200528080456.87797-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/kvm_book3s_64.h | 10 +++++++ arch/powerpc/kvm/book3s_64_mmu_radix.c | 35 ++++++++++++++++++++---- 2 files changed, 39 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h index 2c2635967d6e0..0431db7b82af7 100644 --- a/arch/powerpc/include/asm/kvm_book3s_64.h +++ b/arch/powerpc/include/asm/kvm_book3s_64.h @@ -635,6 +635,16 @@ extern void kvmhv_remove_nest_rmap_range(struct kvm *kvm, unsigned long gpa, unsigned long hpa, unsigned long nbytes);
+static inline pte_t * +find_kvm_secondary_pte_unlocked(struct kvm *kvm, unsigned long ea, + unsigned *hshift) +{ + pte_t *pte; + + pte = __find_linux_pte(kvm->arch.pgtable, ea, NULL, hshift); + return pte; +} + static inline pte_t *find_kvm_secondary_pte(struct kvm *kvm, unsigned long ea, unsigned *hshift) { diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c index e9b3622405b1d..d4e532a63f08e 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c @@ -1052,7 +1052,7 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm, { unsigned long gfn = memslot->base_gfn + pagenum; unsigned long gpa = gfn << PAGE_SHIFT; - pte_t *ptep; + pte_t *ptep, pte; unsigned int shift; int ret = 0; unsigned long old, *rmapp; @@ -1060,12 +1060,35 @@ static int kvm_radix_test_clear_dirty(struct kvm *kvm, if (kvm->arch.secure_guest & KVMPPC_SECURE_INIT_DONE) return ret;
- ptep = find_kvm_secondary_pte(kvm, gpa, &shift); - if (ptep && pte_present(*ptep) && pte_dirty(*ptep)) { - ret = 1; - if (shift) - ret = 1 << (shift - PAGE_SHIFT); + /* + * For performance reasons we don't hold kvm->mmu_lock while walking the + * partition scoped table. + */ + ptep = find_kvm_secondary_pte_unlocked(kvm, gpa, &shift); + if (!ptep) + return 0; + + pte = READ_ONCE(*ptep); + if (pte_present(pte) && pte_dirty(pte)) { spin_lock(&kvm->mmu_lock); + /* + * Recheck the pte again + */ + if (pte_val(pte) != pte_val(*ptep)) { + /* + * We have KVM_MEM_LOG_DIRTY_PAGES enabled. Hence we can + * only find PAGE_SIZE pte entries here. We can continue + * to use the pte addr returned by above page table + * walk. + */ + if (!pte_present(*ptep) || !pte_dirty(*ptep)) { + spin_unlock(&kvm->mmu_lock); + return 0; + } + } + + ret = 1; + VM_BUG_ON(shift); old = kvmppc_radix_update_pte(kvm, ptep, _PAGE_DIRTY, 0, gpa, shift); kvmppc_radix_tlbie_page(kvm, gpa, shift, kvm->arch.lpid);
From: Dongli Zhang dongli.zhang@oracle.com
[ Upstream commit 52f23478081ae0dcdb95d1650ea1e7d52d586829 ]
The slub_debug is able to fix the corrupted slab freelist/page. However, alloc_debug_processing() only checks the validity of current and next freepointer during allocation path. As a result, once some objects have their freepointers corrupted, deactivate_slab() may lead to page fault.
Below is from a test kernel module when 'slub_debug=PUF,kmalloc-128 slub_nomerge'. The test kernel corrupts the freepointer of one free object on purpose. Unfortunately, deactivate_slab() does not detect it when iterating the freechain.
BUG: unable to handle page fault for address: 00000000123456f8 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI ... ... RIP: 0010:deactivate_slab.isra.92+0xed/0x490 ... ... Call Trace: ___slab_alloc+0x536/0x570 __slab_alloc+0x17/0x30 __kmalloc+0x1d9/0x200 ext4_htree_store_dirent+0x30/0xf0 htree_dirblock_to_tree+0xcb/0x1c0 ext4_htree_fill_tree+0x1bc/0x2d0 ext4_readdir+0x54f/0x920 iterate_dir+0x88/0x190 __x64_sys_getdents+0xa6/0x140 do_syscall_64+0x49/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Therefore, this patch adds extra consistency check in deactivate_slab(). Once an object's freepointer is corrupted, all following objects starting at this object are isolated.
[akpm@linux-foundation.org: fix build with CONFIG_SLAB_DEBUG=n] Signed-off-by: Dongli Zhang dongli.zhang@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Cc: Joe Jin joe.jin@oracle.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/20200331031450.12182-1-dongli.zhang@oracle.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/slub.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+)
diff --git a/mm/slub.c b/mm/slub.c index 63bd39c476431..63f372366ec59 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -679,6 +679,20 @@ static void slab_fix(struct kmem_cache *s, char *fmt, ...) va_end(args); }
+static bool freelist_corrupted(struct kmem_cache *s, struct page *page, + void *freelist, void *nextfree) +{ + if ((s->flags & SLAB_CONSISTENCY_CHECKS) && + !check_valid_pointer(s, page, nextfree)) { + object_err(s, page, freelist, "Freechain corrupt"); + freelist = NULL; + slab_fix(s, "Isolate corrupted freechain"); + return true; + } + + return false; +} + static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p) { unsigned int off; /* Offset of last byte */ @@ -1410,6 +1424,11 @@ static inline void inc_slabs_node(struct kmem_cache *s, int node, static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects) {}
+static bool freelist_corrupted(struct kmem_cache *s, struct page *page, + void *freelist, void *nextfree) +{ + return false; +} #endif /* CONFIG_SLUB_DEBUG */
/* @@ -2093,6 +2112,14 @@ static void deactivate_slab(struct kmem_cache *s, struct page *page, void *prior; unsigned long counters;
+ /* + * If 'nextfree' is invalid, it is possible that the object at + * 'freelist' is already corrupted. So isolate all objects + * starting at 'freelist'. + */ + if (freelist_corrupted(s, page, freelist, nextfree)) + break; + do { prior = page->freelist; counters = page->counters;
From: Qian Cai cai@lca.pw
[ Upstream commit a68ee0573991e90af2f1785db309206408bad3e5 ]
There is no need to copy SLUB_STATS items from root memcg cache to new memcg cache copies. Doing so could result in stack overruns because the store function only accepts 0 to clear the stat and returns an error for everything else while the show method would print out the whole stat.
Then, the mismatch of the lengths returns from show and store methods happens in memcg_propagate_slab_attrs():
else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) buf = mbuf;
max_attr_size is only 2 from slab_attr_store(), then, it uses mbuf[64] in show_stat() later where a bounch of sprintf() would overrun the stack variable. Fix it by always allocating a page of buffer to be used in show_stat() if SLUB_STATS=y which should only be used for debug purpose.
# echo 1 > /sys/kernel/slab/fs_cache/shrink BUG: KASAN: stack-out-of-bounds in number+0x421/0x6e0 Write of size 1 at addr ffffc900256cfde0 by task kworker/76:0/53251
Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: number+0x421/0x6e0 vsnprintf+0x451/0x8e0 sprintf+0x9e/0xd0 show_stat+0x124/0x1d0 alloc_slowpath_show+0x13/0x20 __kmem_cache_create+0x47a/0x6b0
addr ffffc900256cfde0 is located in stack of task kworker/76:0/53251 at offset 0 in frame: process_one_work+0x0/0xb90
this frame has 1 object: [32, 72) 'lockdep_map'
Memory state around the buggy address: ffffc900256cfc80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffc900256cfd00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffc900256cfd80: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
^ ffffc900256cfe00: 00 00 00 00 00 f2 f2 f2 00 00 00 00 00 00 00 00 ffffc900256cfe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ================================================================== Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: __kmem_cache_create+0x6ac/0x6b0 Workqueue: memcg_kmem_cache memcg_kmem_cache_create_func Call Trace: __kmem_cache_create+0x6ac/0x6b0
Fixes: 107dab5c92d5 ("slub: slub-specific propagation changes") Signed-off-by: Qian Cai cai@lca.pw Signed-off-by: Andrew Morton akpm@linux-foundation.org Cc: Glauber Costa glauber@scylladb.com Cc: Christoph Lameter cl@linux.com Cc: Pekka Enberg penberg@kernel.org Cc: David Rientjes rientjes@google.com Cc: Joonsoo Kim iamjoonsoo.kim@lge.com Link: http://lkml.kernel.org/r/20200429222356.4322-1-cai@lca.pw Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/slub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/slub.c b/mm/slub.c index 63f372366ec59..660f4324c0972 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5681,7 +5681,8 @@ static void memcg_propagate_slab_attrs(struct kmem_cache *s) */ if (buffer) buf = buffer; - else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf)) + else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf) && + !IS_ENABLED(CONFIG_SLUB_STATS)) buf = mbuf; else { buffer = (char *) get_zeroed_page(GFP_KERNEL);
From: Vlastimil Babka vbabka@suse.cz
[ Upstream commit 002ae7057069538aa3afd500f6f60a429cb948b2 ]
We have seen a following problem on a RPi4 with 1G RAM:
BUG: Bad page state in process systemd-hwdb pfn:35601 page:ffff7e0000d58040 refcount:15 mapcount:131221 mapping:efd8fe765bc80080 index:0x1 compound_mapcount: -32767 Unable to handle kernel paging request at virtual address efd8fe765bc80080 Mem abort info: ESR = 0x96000004 Exception class = DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 Data abort info: ISV = 0, ISS = 0x00000004 CM = 0, WnR = 0 [efd8fe765bc80080] address between user and kernel address ranges Internal error: Oops: 96000004 [#1] SMP Modules linked in: btrfs libcrc32c xor xor_neon zlib_deflate raid6_pq mmc_block xhci_pci xhci_hcd usbcore sdhci_iproc sdhci_pltfm sdhci mmc_core clk_raspberrypi gpio_raspberrypi_exp pcie_brcmstb bcm2835_dma gpio_regulator phy_generic fixed sg scsi_mod efivarfs Supported: No, Unreleased kernel CPU: 3 PID: 408 Comm: systemd-hwdb Not tainted 5.3.18-8-default #1 SLE15-SP2 (unreleased) Hardware name: raspberrypi rpi/rpi, BIOS 2020.01 02/21/2020 pstate: 40000085 (nZcv daIf -PAN -UAO) pc : __dump_page+0x268/0x368 lr : __dump_page+0xc4/0x368 sp : ffff000012563860 x29: ffff000012563860 x28: ffff80003ddc4300 x27: 0000000000000010 x26: 000000000000003f x25: ffff7e0000d58040 x24: 000000000000000f x23: efd8fe765bc80080 x22: 0000000000020095 x21: efd8fe765bc80080 x20: ffff000010ede8b0 x19: ffff7e0000d58040 x18: ffffffffffffffff x17: 0000000000000001 x16: 0000000000000007 x15: ffff000011689708 x14: 3030386362353637 x13: 6566386466653a67 x12: 6e697070616d2031 x11: 32323133313a746e x10: 756f6370616d2035 x9 : ffff00001168a840 x8 : ffff00001077a670 x7 : 000000000000013d x6 : ffff0000118a43b5 x5 : 0000000000000001 x4 : ffff80003dd9e2c8 x3 : ffff80003dd9e2c8 x2 : 911c8d7c2f483500 x1 : dead000000000100 x0 : efd8fe765bc80080 Call trace: __dump_page+0x268/0x368 bad_page+0xd4/0x168 check_new_page_bad+0x80/0xb8 rmqueue_bulk.constprop.26+0x4d8/0x788 get_page_from_freelist+0x4d4/0x1228 __alloc_pages_nodemask+0x134/0xe48 alloc_pages_vma+0x198/0x1c0 do_anonymous_page+0x1a4/0x4d8 __handle_mm_fault+0x4e8/0x560 handle_mm_fault+0x104/0x1e0 do_page_fault+0x1e8/0x4c0 do_translation_fault+0xb0/0xc0 do_mem_abort+0x50/0xb0 el0_da+0x24/0x28 Code: f9401025 8b8018a0 9a851005 17ffffca (f94002a0)
Besides the underlying issue with page->mapping containing a bogus value for some reason, we can see that __dump_page() crashed by trying to read the pointer at mapping->host, turning a recoverable warning into full Oops.
It can be expected that when page is reported as bad state for some reason, the pointers there should not be trusted blindly.
So this patch treats all data in __dump_page() that depends on page->mapping as lava, using probe_kernel_read_strict(). Ideally this would include the dentry->d_parent recursively, but that would mean changing printk handler for %pd. Chances of reaching the dentry printing part with an initially bogus mapping pointer should be rather low, though.
Also prefix printing mapping->a_ops with a description of what is being printed. In case the value is bogus, %ps will print raw value instead of the symbol name and then it's not obvious at all that it's printing a_ops.
Reported-by: Petr Tesarik ptesarik@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Acked-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: John Hubbard jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20200331165454.12263-1-vbabka@suse.cz Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/debug.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 50 insertions(+), 6 deletions(-)
diff --git a/mm/debug.c b/mm/debug.c index 2189357f09871..f2ede2df585a9 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -110,13 +110,57 @@ void __dump_page(struct page *page, const char *reason) else if (PageAnon(page)) type = "anon "; else if (mapping) { - if (mapping->host && mapping->host->i_dentry.first) { - struct dentry *dentry; - dentry = container_of(mapping->host->i_dentry.first, struct dentry, d_u.d_alias); - pr_warn("%ps name:"%pd"\n", mapping->a_ops, dentry); - } else - pr_warn("%ps\n", mapping->a_ops); + const struct inode *host; + const struct address_space_operations *a_ops; + const struct hlist_node *dentry_first; + const struct dentry *dentry_ptr; + struct dentry dentry; + + /* + * mapping can be invalid pointer and we don't want to crash + * accessing it, so probe everything depending on it carefully + */ + if (probe_kernel_read_strict(&host, &mapping->host, + sizeof(struct inode *)) || + probe_kernel_read_strict(&a_ops, &mapping->a_ops, + sizeof(struct address_space_operations *))) { + pr_warn("failed to read mapping->host or a_ops, mapping not a valid kernel address?\n"); + goto out_mapping; + } + + if (!host) { + pr_warn("mapping->a_ops:%ps\n", a_ops); + goto out_mapping; + } + + if (probe_kernel_read_strict(&dentry_first, + &host->i_dentry.first, sizeof(struct hlist_node *))) { + pr_warn("mapping->a_ops:%ps with invalid mapping->host inode address %px\n", + a_ops, host); + goto out_mapping; + } + + if (!dentry_first) { + pr_warn("mapping->a_ops:%ps\n", a_ops); + goto out_mapping; + } + + dentry_ptr = container_of(dentry_first, struct dentry, d_u.d_alias); + if (probe_kernel_read_strict(&dentry, dentry_ptr, + sizeof(struct dentry))) { + pr_warn("mapping->aops:%ps with invalid mapping->host->i_dentry.first %px\n", + a_ops, dentry_ptr); + } else { + /* + * if dentry is corrupted, the %pd handler may still + * crash, but it's unlikely that we reach here with a + * corrupted struct page + */ + pr_warn("mapping->aops:%ps dentry name:"%pd"\n", + a_ops, &dentry); + } } +out_mapping: BUILD_BUG_ON(ARRAY_SIZE(pageflag_names) != __NR_PAGEFLAGS + 1);
pr_warn("%sflags: %#lx(%pGp)%s\n", type, page->flags, &page->flags,
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 3232dd02af65f2d01be641120d2a710176b0c7a7 ]
IORING_SETUP_IOPOLL is defined only for read/write, other opcodes should be disallowed, otherwise it'll get an error as below. Also refuse open/close with SQPOLL, as the polling thread wouldn't know which file table to use.
RIP: 0010:io_iopoll_getevents+0x111/0x5a0 Call Trace: ? _raw_spin_unlock_irqrestore+0x24/0x40 ? do_send_sig_info+0x64/0x90 io_iopoll_reap_events.part.0+0x5e/0xa0 io_ring_ctx_wait_and_kill+0x132/0x1c0 io_uring_release+0x20/0x30 __fput+0xcd/0x230 ____fput+0xe/0x10 task_work_run+0x67/0xa0 do_exit+0x353/0xb10 ? handle_mm_fault+0xd4/0x200 ? syscall_trace_enter+0x18c/0x2c0 do_group_exit+0x43/0xa0 __x64_sys_exit_group+0x18/0x20 do_syscall_64+0x60/0x1e0 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Pavel Begunkov asml.silence@gmail.com [axboe: allow provide/remove buffers and files update] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 4ab1728de247c..bb74e45941af2 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2748,6 +2748,8 @@ static int io_splice_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
if (req->flags & REQ_F_NEED_CLEANUP) return 0; + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL;
sp->file_in = NULL; sp->off_in = READ_ONCE(sqe->splice_off_in); @@ -2910,6 +2912,8 @@ static int io_fallocate_prep(struct io_kiocb *req, { if (sqe->ioprio || sqe->buf_index || sqe->rw_flags) return -EINVAL; + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL;
req->sync.off = READ_ONCE(sqe->off); req->sync.len = READ_ONCE(sqe->addr); @@ -2935,6 +2939,8 @@ static int io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) const char __user *fname; int ret;
+ if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; if (sqe->ioprio || sqe->buf_index) return -EINVAL; if (req->flags & REQ_F_FIXED_FILE) @@ -2968,6 +2974,8 @@ static int io_openat2_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) size_t len; int ret;
+ if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; if (sqe->ioprio || sqe->buf_index) return -EINVAL; if (req->flags & REQ_F_FIXED_FILE) @@ -3207,6 +3215,8 @@ static int io_epoll_ctl_prep(struct io_kiocb *req, #if defined(CONFIG_EPOLL) if (sqe->ioprio || sqe->buf_index) return -EINVAL; + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL;
req->epoll.epfd = READ_ONCE(sqe->fd); req->epoll.op = READ_ONCE(sqe->len); @@ -3251,6 +3261,8 @@ static int io_madvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) #if defined(CONFIG_ADVISE_SYSCALLS) && defined(CONFIG_MMU) if (sqe->ioprio || sqe->buf_index || sqe->off) return -EINVAL; + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL;
req->madvise.addr = READ_ONCE(sqe->addr); req->madvise.len = READ_ONCE(sqe->len); @@ -3285,6 +3297,8 @@ static int io_fadvise_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { if (sqe->ioprio || sqe->buf_index || sqe->addr) return -EINVAL; + if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL;
req->fadvise.offset = READ_ONCE(sqe->off); req->fadvise.len = READ_ONCE(sqe->len); @@ -3322,6 +3336,8 @@ static int io_statx_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) unsigned lookup_flags; int ret;
+ if (unlikely(req->ctx->flags & IORING_SETUP_IOPOLL)) + return -EINVAL; if (sqe->ioprio || sqe->buf_index) return -EINVAL; if (req->flags & REQ_F_FIXED_FILE) @@ -3402,6 +3418,8 @@ static int io_close_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) */ req->work.flags |= IO_WQ_WORK_NO_CANCEL;
+ if (unlikely(req->ctx->flags & (IORING_SETUP_IOPOLL|IORING_SETUP_SQPOLL))) + return -EINVAL; if (sqe->ioprio || sqe->off || sqe->addr || sqe->len || sqe->rw_flags || sqe->buf_index) return -EINVAL;
From: David Howells dhowells@redhat.com
[ Upstream commit 2ad6691d988c0c611362ddc2aad89e0fb50e3261 ]
There's a race between the retransmission code and the received ACK parser. The problem is that the retransmission loop has to drop the lock under which it is iterating through the transmission buffer in order to transmit a packet, but whilst the lock is dropped, the ACK parser can crank the Tx window round and discard the packets from the buffer.
The retransmission code then updated the annotations for the wrong packet and a later retransmission thought it had to retransmit a packet that wasn't there, leading to a NULL pointer dereference.
Fix this by:
(1) Moving the annotation change to before we drop the lock prior to transmission. This means we can't vary the annotation depending on the outcome of the transmission, but that's fine - we'll retransmit again later if it failed now.
(2) Skipping the packet if the skb pointer is NULL.
The following oops was seen:
BUG: kernel NULL pointer dereference, address: 000000000000002d Workqueue: krxrpcd rxrpc_process_call RIP: 0010:rxrpc_get_skb+0x14/0x8a ... Call Trace: rxrpc_resend+0x331/0x41e ? get_vtime_delta+0x13/0x20 rxrpc_process_call+0x3c0/0x4ac process_one_work+0x18f/0x27f worker_thread+0x1a3/0x247 ? create_worker+0x17d/0x17d kthread+0xe6/0xeb ? kthread_delayed_work_timer_fn+0x83/0x83 ret_from_fork+0x1f/0x30
Fixes: 248f219cb8bc ("rxrpc: Rewrite the data and ack handling code") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/call_event.c | 29 +++++++++++------------------ 1 file changed, 11 insertions(+), 18 deletions(-)
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 2a65ac41055f5..985fb89202d0c 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -248,7 +248,18 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) if (anno_type != RXRPC_TX_ANNO_RETRANS) continue;
+ /* We need to reset the retransmission state, but we need to do + * so before we drop the lock as a new ACK/NAK may come in and + * confuse things + */ + annotation &= ~RXRPC_TX_ANNO_MASK; + annotation |= RXRPC_TX_ANNO_RESENT; + call->rxtx_annotations[ix] = annotation; + skb = call->rxtx_buffer[ix]; + if (!skb) + continue; + rxrpc_get_skb(skb, rxrpc_skb_got); spin_unlock_bh(&call->lock);
@@ -262,24 +273,6 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j)
rxrpc_free_skb(skb, rxrpc_skb_freed); spin_lock_bh(&call->lock); - - /* We need to clear the retransmit state, but there are two - * things we need to be aware of: A new ACK/NAK might have been - * received and the packet might have been hard-ACK'd (in which - * case it will no longer be in the buffer). - */ - if (after(seq, call->tx_hard_ack)) { - annotation = call->rxtx_annotations[ix]; - anno_type = annotation & RXRPC_TX_ANNO_MASK; - if (anno_type == RXRPC_TX_ANNO_RETRANS || - anno_type == RXRPC_TX_ANNO_NAK) { - annotation &= ~RXRPC_TX_ANNO_MASK; - annotation |= RXRPC_TX_ANNO_UNACK; - } - annotation |= RXRPC_TX_ANNO_RESENT; - call->rxtx_annotations[ix] = annotation; - } - if (after(call->tx_hard_ack, seq)) seq = call->tx_hard_ack; }
From: Zqiang qiang.zhang@windriver.com
[ Upstream commit 28ebeb8db77035e058a510ce9bd17c2b9a009dba ]
BUG: memory leak unreferenced object 0xffff888055046e00 (size 256): comm "kworker/2:9", pid 2570, jiffies 4294942129 (age 1095.500s) hex dump (first 32 bytes): 00 70 04 55 80 88 ff ff 18 bb 5a 81 ff ff ff ff .p.U......Z..... f5 96 78 81 ff ff ff ff 37 de 8e 81 ff ff ff ff ..x.....7....... backtrace: [<00000000d121dccf>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000d121dccf>] slab_post_alloc_hook mm/slab.h:586 [inline] [<00000000d121dccf>] slab_alloc_node mm/slub.c:2786 [inline] [<00000000d121dccf>] slab_alloc mm/slub.c:2794 [inline] [<00000000d121dccf>] kmem_cache_alloc_trace+0x15e/0x2d0 mm/slub.c:2811 [<000000005c3c3381>] kmalloc include/linux/slab.h:555 [inline] [<000000005c3c3381>] usbtest_probe+0x286/0x19d0 drivers/usb/misc/usbtest.c:2790 [<000000001cec6910>] usb_probe_interface+0x2bd/0x870 drivers/usb/core/driver.c:361 [<000000007806c118>] really_probe+0x48d/0x8f0 drivers/base/dd.c:551 [<00000000a3308c3e>] driver_probe_device+0xfc/0x2a0 drivers/base/dd.c:724 [<000000003ef66004>] __device_attach_driver+0x1b6/0x240 drivers/base/dd.c:831 [<00000000eee53e97>] bus_for_each_drv+0x14e/0x1e0 drivers/base/bus.c:431 [<00000000bb0648d0>] __device_attach+0x1f9/0x350 drivers/base/dd.c:897 [<00000000838b324a>] device_initial_probe+0x1a/0x20 drivers/base/dd.c:944 [<0000000030d501c1>] bus_probe_device+0x1e1/0x280 drivers/base/bus.c:491 [<000000005bd7adef>] device_add+0x131d/0x1c40 drivers/base/core.c:2504 [<00000000a0937814>] usb_set_configuration+0xe84/0x1ab0 drivers/usb/core/message.c:2030 [<00000000e3934741>] generic_probe+0x6a/0xe0 drivers/usb/core/generic.c:210 [<0000000098ade0f1>] usb_probe_device+0x90/0xd0 drivers/usb/core/driver.c:266 [<000000007806c118>] really_probe+0x48d/0x8f0 drivers/base/dd.c:551 [<00000000a3308c3e>] driver_probe_device+0xfc/0x2a0 drivers/base/dd.c:724
Acked-by: Alan Stern stern@rowland.harvard.edu Reported-by: Kyungtae Kim kt0755@gmail.com Signed-off-by: Zqiang qiang.zhang@windriver.com Link: https://lore.kernel.org/r/20200612035210.20494-1-qiang.zhang@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/misc/usbtest.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/usb/misc/usbtest.c b/drivers/usb/misc/usbtest.c index 98ada1a3425c6..bae88893ee8e3 100644 --- a/drivers/usb/misc/usbtest.c +++ b/drivers/usb/misc/usbtest.c @@ -2873,6 +2873,7 @@ static void usbtest_disconnect(struct usb_interface *intf)
usb_set_intfdata(intf, NULL); dev_dbg(&intf->dev, "disconnect\n"); + kfree(dev->buf); kfree(dev); }
From: Steven Rostedt (VMware) rostedt@goodmis.org
[ Upstream commit 27d4d336f2872193e90ee5450559e1699fae0f6d ]
There's several locations that open code realloc and strcat() to append text to strings. Add an append() function that takes a delimiter and a string to append to another string.
Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jaewon Lim jaewon31.kim@samsung.com Cc: Jiri Olsa jolsa@redhat.com Cc: Kees Kook keescook@chromium.org Cc: linux-mm@kvack.org Cc: linux-trace-devel@vger.kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Link: http://lore.kernel.org/lkml/20200324200956.515118403@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/traceevent/event-parse.c | 98 ++++++++++++------------------ 1 file changed, 40 insertions(+), 58 deletions(-)
diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index e1bd2a93c6db8..eec96c31ea9e5 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -1425,6 +1425,19 @@ static unsigned int type_size(const char *name) return 0; }
+static int append(char **buf, const char *delim, const char *str) +{ + char *new_buf; + + new_buf = realloc(*buf, strlen(*buf) + strlen(delim) + strlen(str) + 1); + if (!new_buf) + return -1; + strcat(new_buf, delim); + strcat(new_buf, str); + *buf = new_buf; + return 0; +} + static int event_read_fields(struct tep_event *event, struct tep_format_field **fields) { struct tep_format_field *field = NULL; @@ -1432,6 +1445,7 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** char *token; char *last_token; int count = 0; + int ret;
do { unsigned int size_dynamic = 0; @@ -1490,24 +1504,15 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** field->flags |= TEP_FIELD_IS_POINTER;
if (field->type) { - char *new_type; - new_type = realloc(field->type, - strlen(field->type) + - strlen(last_token) + 2); - if (!new_type) { - free(last_token); - goto fail; - } - field->type = new_type; - strcat(field->type, " "); - strcat(field->type, last_token); + ret = append(&field->type, " ", last_token); free(last_token); + if (ret < 0) + goto fail; } else field->type = last_token; last_token = token; continue; } - break; }
@@ -1523,8 +1528,6 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** if (strcmp(token, "[") == 0) { enum tep_event_type last_type = type; char *brackets = token; - char *new_brackets; - int len;
field->flags |= TEP_FIELD_IS_ARRAY;
@@ -1536,29 +1539,27 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** field->arraylen = 0;
while (strcmp(token, "]") != 0) { + const char *delim; + if (last_type == TEP_EVENT_ITEM && type == TEP_EVENT_ITEM) - len = 2; + delim = " "; else - len = 1; + delim = ""; + last_type = type;
- new_brackets = realloc(brackets, - strlen(brackets) + - strlen(token) + len); - if (!new_brackets) { + ret = append(&brackets, delim, token); + if (ret < 0) { free(brackets); goto fail; } - brackets = new_brackets; - if (len == 2) - strcat(brackets, " "); - strcat(brackets, token); /* We only care about the last token */ field->arraylen = strtoul(token, NULL, 0); free_token(token); type = read_token(&token); if (type == TEP_EVENT_NONE) { + free(brackets); do_warning_event(event, "failed to find token"); goto fail; } @@ -1566,13 +1567,11 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field **
free_token(token);
- new_brackets = realloc(brackets, strlen(brackets) + 2); - if (!new_brackets) { + ret = append(&brackets, "", "]"); + if (ret < 0) { free(brackets); goto fail; } - brackets = new_brackets; - strcat(brackets, "]");
/* add brackets to type */
@@ -1582,34 +1581,23 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** * the format: type [] item; */ if (type == TEP_EVENT_ITEM) { - char *new_type; - new_type = realloc(field->type, - strlen(field->type) + - strlen(field->name) + - strlen(brackets) + 2); - if (!new_type) { + ret = append(&field->type, " ", field->name); + if (ret < 0) { free(brackets); goto fail; } - field->type = new_type; - strcat(field->type, " "); - strcat(field->type, field->name); + ret = append(&field->type, "", brackets); + size_dynamic = type_size(field->name); free_token(field->name); - strcat(field->type, brackets); field->name = field->alias = token; type = read_token(&token); } else { - char *new_type; - new_type = realloc(field->type, - strlen(field->type) + - strlen(brackets) + 1); - if (!new_type) { + ret = append(&field->type, "", brackets); + if (ret < 0) { free(brackets); goto fail; } - field->type = new_type; - strcat(field->type, brackets); } free(brackets); } @@ -2046,19 +2034,16 @@ process_op(struct tep_event *event, struct tep_print_arg *arg, char **tok) /* could just be a type pointer */ if ((strcmp(arg->op.op, "*") == 0) && type == TEP_EVENT_DELIM && (strcmp(token, ")") == 0)) { - char *new_atom; + int ret;
if (left->type != TEP_PRINT_ATOM) { do_warning_event(event, "bad pointer type"); goto out_free; } - new_atom = realloc(left->atom.atom, - strlen(left->atom.atom) + 3); - if (!new_atom) + ret = append(&left->atom.atom, " ", "*"); + if (ret < 0) goto out_warn_free;
- left->atom.atom = new_atom; - strcat(left->atom.atom, " *"); free(arg->op.op); *arg = *left; free(left); @@ -3151,18 +3136,15 @@ process_arg_token(struct tep_event *event, struct tep_print_arg *arg, } /* atoms can be more than one token long */ while (type == TEP_EVENT_ITEM) { - char *new_atom; - new_atom = realloc(atom, - strlen(atom) + strlen(token) + 2); - if (!new_atom) { + int ret; + + ret = append(&atom, " ", token); + if (ret < 0) { free(atom); *tok = NULL; free_token(token); return TEP_EVENT_ERROR; } - atom = new_atom; - strcat(atom, " "); - strcat(atom, token); free_token(token); type = read_token_item(&token); }
From: Steven Rostedt (VMware) rostedt@goodmis.org
[ Upstream commit 74621d929d944529a5e2878a84f48bfa6fb69a66 ]
Commit c61f13eaa1ee1 ("gcc-plugins: Add structleak for more stack initialization") added "__attribute__((user))" to the user when stackleak detector is enabled. This now appears in the field format of system call trace events for system calls that have user buffers. The "__attribute__((user))" breaks the parsing in libtraceevent. That needs to be handled.
Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Jaewon Kim jaewon31.kim@samsung.com Cc: Jiri Olsa jolsa@redhat.com Cc: Kees Kook keescook@chromium.org Cc: Namhyung Kim namhyung@kernel.org Cc: Vlastimil Babka vbabka@suse.cz Cc: linux-mm@kvack.org Cc: linux-trace-devel@vger.kernel.org Link: http://lore.kernel.org/lkml/20200324200956.663647256@goodmis.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/traceevent/event-parse.c | 39 +++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-)
diff --git a/tools/lib/traceevent/event-parse.c b/tools/lib/traceevent/event-parse.c index eec96c31ea9e5..010e60d5a0817 100644 --- a/tools/lib/traceevent/event-parse.c +++ b/tools/lib/traceevent/event-parse.c @@ -1444,6 +1444,7 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** enum tep_event_type type; char *token; char *last_token; + char *delim = " "; int count = 0; int ret;
@@ -1504,13 +1505,49 @@ static int event_read_fields(struct tep_event *event, struct tep_format_field ** field->flags |= TEP_FIELD_IS_POINTER;
if (field->type) { - ret = append(&field->type, " ", last_token); + ret = append(&field->type, delim, last_token); free(last_token); if (ret < 0) goto fail; } else field->type = last_token; last_token = token; + delim = " "; + continue; + } + + /* Handle __attribute__((user)) */ + if ((type == TEP_EVENT_DELIM) && + strcmp("__attribute__", last_token) == 0 && + token[0] == '(') { + int depth = 1; + int ret; + + ret = append(&field->type, " ", last_token); + ret |= append(&field->type, "", "("); + if (ret < 0) + goto fail; + + delim = " "; + while ((type = read_token(&token)) != TEP_EVENT_NONE) { + if (type == TEP_EVENT_DELIM) { + if (token[0] == '(') + depth++; + else if (token[0] == ')') + depth--; + if (!depth) + break; + ret = append(&field->type, "", token); + delim = ""; + } else { + ret = append(&field->type, delim, token); + delim = " "; + } + if (ret < 0) + goto fail; + free(last_token); + last_token = token; + } continue; } break;
From: Christian Borntraeger borntraeger@de.ibm.com
[ Upstream commit 827c4913923e0b441ba07ba4cc41e01181102303 ]
When specifying insanely large debug buffers a kernel warning is printed. The debug code does handle the error gracefully, though. Instead of duplicating the check let us silence the warning to avoid crashes when panic_on_warn is used.
Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Reviewed-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Heiko Carstens heiko.carstens@de.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kernel/debug.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kernel/debug.c b/arch/s390/kernel/debug.c index 6d321f5f101d6..7184d55d87aae 100644 --- a/arch/s390/kernel/debug.c +++ b/arch/s390/kernel/debug.c @@ -198,9 +198,10 @@ static debug_entry_t ***debug_areas_alloc(int pages_per_area, int nr_areas) if (!areas) goto fail_malloc_areas; for (i = 0; i < nr_areas; i++) { + /* GFP_NOWARN to avoid user triggerable WARN, we handle fails */ areas[i] = kmalloc_array(pages_per_area, sizeof(debug_entry_t *), - GFP_KERNEL); + GFP_KERNEL | __GFP_NOWARN); if (!areas[i]) goto fail_malloc_areas2; for (j = 0; j < pages_per_area; j++) {
From: Xuan Zhuo xuanzhuo@linux.alibaba.com
[ Upstream commit b772f07add1c0b22e02c0f1e96f647560679d3a9 ]
When the user consumes and generates sqe at a fast rate, io_sqring_entries can always get sqe, and ret will not be equal to -EBUSY, so that io_sq_thread will never call cond_resched or schedule, and then we will get the following system error prompt:
rcu: INFO: rcu_sched self-detected stall on CPU or watchdog: BUG: soft lockup-CPU#23 stuck for 112s! [io_uring-sq:1863]
This patch checks whether need to call cond_resched() by checking the need_resched() function every cycle.
Suggested-by: Jens Axboe axboe@kernel.dk Signed-off-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index bb74e45941af2..63a456921903e 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -6084,7 +6084,7 @@ static int io_sq_thread(void *data) * If submit got -EBUSY, flag us as needing the application * to enter the kernel to reap and flush events. */ - if (!to_submit || ret == -EBUSY) { + if (!to_submit || ret == -EBUSY || need_resched()) { /* * Drop cur_mm before scheduling, we can't hold it for * long periods (or over schedule()). Do this before @@ -6100,7 +6100,7 @@ static int io_sq_thread(void *data) * more IO, we should wait for the application to * reap events and wake us up. */ - if (!list_empty(&ctx->poll_list) || + if (!list_empty(&ctx->poll_list) || need_resched() || (!time_after(jiffies, timeout) && ret != -EBUSY && !percpu_ref_is_dying(&ctx->refs))) { if (current->task_works)
From: Keith Busch kbusch@kernel.org
[ Upstream commit b2ce4d90690bd29ce5b554e203cd03682dd59697 ]
The queues' backing device info capabilities don't change with each namespace revalidation. Set it only when each path's request_queue is initially added to a multipath queue.
Signed-off-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 7 ------- drivers/nvme/host/multipath.c | 8 ++++++++ 2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 7b4cbe2c69541..887139f8fa53b 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1910,13 +1910,6 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id) if (ns->head->disk) { nvme_update_disk_info(ns->head->disk, ns, id); blk_queue_stack_limits(ns->head->disk->queue, ns->queue); - if (bdi_cap_stable_pages_required(ns->queue->backing_dev_info)) { - struct backing_dev_info *info = - ns->head->disk->queue->backing_dev_info; - - info->capabilities |= BDI_CAP_STABLE_WRITES; - } - revalidate_disk(ns->head->disk); } #endif diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 17f172cf456ad..91a8b1ce5a3a2 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -3,6 +3,7 @@ * Copyright (c) 2017-2018 Christoph Hellwig. */
+#include <linux/backing-dev.h> #include <linux/moduleparam.h> #include <trace/events/block.h> #include "nvme.h" @@ -662,6 +663,13 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id) ns->ana_state = NVME_ANA_OPTIMIZED; nvme_mpath_set_live(ns); } + + if (bdi_cap_stable_pages_required(ns->queue->backing_dev_info)) { + struct backing_dev_info *info = + ns->head->disk->queue->backing_dev_info; + + info->capabilities |= BDI_CAP_STABLE_WRITES; + } }
void nvme_mpath_remove_disk(struct nvme_ns_head *head)
From: Sagi Grimberg sagi@grimberg.me
[ Upstream commit 3b4b19721ec652ad2c4fe51dfbe5124212b5f581 ]
Revert fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns")
When adding a new namespace to the head disk (via nvme_mpath_set_live) we will see partition scan which triggers I/O on the mpath device node. This process will usually be triggered from the scan_work which holds the scan_lock. If I/O blocks (if we got ana change currently have only available paths but none are accessible) this can deadlock on the head disk bd_mutex as both partition scan I/O takes it, and head disk revalidation takes it to check for resize (also triggered from scan_work on a different path). See trace [1].
The mpath disk revalidation was originally added to detect online disk size change, but this is no longer needed since commit cb224c3af4df ("nvme: Convert to use set_capacity_revalidate_and_notify") which already updates resize info without unnecessarily revalidating the disk (the mpath disk doesn't even implement .revalidate_disk fop).
[1]: -- kernel: INFO: task kworker/u65:9:494 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:9 D 0 494 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: revalidate_disk+0x63/0xa0 kernel: __nvme_revalidate_disk+0xfe/0x110 [nvme_core] kernel: nvme_revalidate_disk+0xa4/0x160 [nvme_core] kernel: ? evict+0x14c/0x1b0 kernel: revalidate_disk+0x2b/0xa0 kernel: nvme_validate_ns+0x49/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: ? __nvme_submit_sync_cmd+0xbe/0x1e0 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 ... kernel: INFO: task kworker/u65:1:2630 blocked for more than 241 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u65:1 D 0 2630 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? file_fdatawait_range+0x30/0x30 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: ? kmem_cache_alloc_trace+0x19c/0x230 kernel: efi_partition+0x1e6/0x708 kernel: ? vsnprintf+0x39e/0x4e0 kernel: ? snprintf+0x49/0x60 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: ? nvme_update_ns_ana_state+0x60/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: ? blk_mq_free_request+0xd2/0x100 kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x1f/0x40 --
Fixes: fab7772bfbcf ("nvme-multipath: revalidate nvme_ns_head gendisk in nvme_validate_ns") Signed-off-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 887139f8fa53b..85ce6c682849e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1910,7 +1910,6 @@ static void __nvme_revalidate_disk(struct gendisk *disk, struct nvme_id_ns *id) if (ns->head->disk) { nvme_update_disk_info(ns->head->disk, ns, id); blk_queue_stack_limits(ns->head->disk->queue, ns->queue); - revalidate_disk(ns->head->disk); } #endif }
From: Anton Eidelman anton@lightbitslabs.com
[ Upstream commit 489dd102a2c7c94d783a35f9412eb085b8da1aa4 ]
When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path. While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO.
In order to recover and complete the IO ana_work on the same ctrl should be able to update the path state and remove NVME_NS_ANA_PENDING.
The deadlock occurs because scan_work keeps holding ana_lock, so ana_work hangs [1].
Fix: Now nvme_mpath_add_disk() uses nvme_parse_ana_log() to obtain a copy of the ANA group desc, and then calls nvme_update_ns_ana_state() without holding ana_lock.
[1]: kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_set_ns_ana_state+0x1e/0x30 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_mpath_add_disk+0x47/0x90 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x24f/0x380 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x249/0x400 kernel: kthread+0x104/0x140
kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: ? __switch_to_asm+0x34/0x70 kernel: ? select_task_rq_fair+0x1aa/0x5c0 kernel: ? kvm_sched_clock_read+0x11/0x20 kernel: ? sched_clock+0x9/0x10 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_read_ana_log+0x3a/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ? process_one_work+0x380/0x380 kernel: ? kthread_park+0x80/0x80 kernel: ret_from_fork+0x35/0x40
Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/multipath.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 91a8b1ce5a3a2..f4287d8550a9f 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -639,26 +639,34 @@ static ssize_t ana_state_show(struct device *dev, struct device_attribute *attr, } DEVICE_ATTR_RO(ana_state);
-static int nvme_set_ns_ana_state(struct nvme_ctrl *ctrl, +static int nvme_lookup_ana_group_desc(struct nvme_ctrl *ctrl, struct nvme_ana_group_desc *desc, void *data) { - struct nvme_ns *ns = data; + struct nvme_ana_group_desc *dst = data;
- if (ns->ana_grpid == le32_to_cpu(desc->grpid)) { - nvme_update_ns_ana_state(desc, ns); - return -ENXIO; /* just break out of the loop */ - } + if (desc->grpid != dst->grpid) + return 0;
- return 0; + *dst = *desc; + return -ENXIO; /* just break out of the loop */ }
void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id) { if (nvme_ctrl_use_ana(ns->ctrl)) { + struct nvme_ana_group_desc desc = { + .grpid = id->anagrpid, + .state = 0, + }; + mutex_lock(&ns->ctrl->ana_lock); ns->ana_grpid = le32_to_cpu(id->anagrpid); - nvme_parse_ana_log(ns->ctrl, ns, nvme_set_ns_ana_state); + nvme_parse_ana_log(ns->ctrl, &desc, nvme_lookup_ana_group_desc); mutex_unlock(&ns->ctrl->ana_lock); + if (desc.state) { + /* found the group desc: update */ + nvme_update_ns_ana_state(&desc, ns); + } } else { ns->ana_state = NVME_ANA_OPTIMIZED; nvme_mpath_set_live(ns);
From: Anton Eidelman anton@lightbitslabs.com
[ Upstream commit d8a22f85609fadb46ba699e0136cc3ebdeebff79 ]
In the following scenario scan_work and ana_work will deadlock:
When scan_work calls nvme_mpath_add_disk() this holds ana_lock and invokes nvme_parse_ana_log(), which may issue IO in device_add_disk() and hang waiting for an accessible path.
While nvme_mpath_set_live() only called when nvme_state_is_live(), a transition may cause NVME_SC_ANA_TRANSITION and requeue the IO.
Since nvme_mpath_set_live() holds ns->head->lock, an ana_work on ANY ctrl will not be able to complete nvme_mpath_set_live() on the same ns->head, which is required in order to update the new accessible path and remove NVME_NS_ANA_PENDING.. Therefore IO never completes: deadlock [1].
Fix: Move device_add_disk out of the head->lock and protect it with an atomic test_and_set for a new NVME_NS_HEAD_HAS_DISK bit.
[1]: kernel: INFO: task kworker/u8:2:160 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:2 D 0 160 2 0x80004000 kernel: Workqueue: nvme-wq nvme_ana_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: schedule_preempt_disabled+0xe/0x10 kernel: __mutex_lock.isra.0+0x182/0x4f0 kernel: __mutex_lock_slowpath+0x13/0x20 kernel: mutex_lock+0x2e/0x40 kernel: nvme_update_ns_ana_state+0x22/0x60 [nvme_core] kernel: nvme_update_ana_state+0xca/0xe0 [nvme_core] kernel: nvme_parse_ana_log+0xa1/0x180 [nvme_core] kernel: nvme_read_ana_log+0x76/0x100 [nvme_core] kernel: nvme_ana_work+0x15/0x20 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40 kernel: INFO: task kworker/u8:4:439 blocked for more than 120 seconds. kernel: Tainted: G OE 5.3.5-050305-generic #201910071830 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: kworker/u8:4 D 0 439 2 0x80004000 kernel: Workqueue: nvme-wq nvme_scan_work [nvme_core] kernel: Call Trace: kernel: __schedule+0x2b9/0x6c0 kernel: schedule+0x42/0xb0 kernel: io_schedule+0x16/0x40 kernel: do_read_cache_page+0x438/0x830 kernel: read_cache_page+0x12/0x20 kernel: read_dev_sector+0x27/0xc0 kernel: read_lba+0xc1/0x220 kernel: efi_partition+0x1e6/0x708 kernel: check_partition+0x154/0x244 kernel: rescan_partitions+0xae/0x280 kernel: __blkdev_get+0x40f/0x560 kernel: blkdev_get+0x3d/0x140 kernel: __device_add_disk+0x388/0x480 kernel: device_add_disk+0x13/0x20 kernel: nvme_mpath_set_live+0x119/0x140 [nvme_core] kernel: nvme_update_ns_ana_state+0x5c/0x60 [nvme_core] kernel: nvme_mpath_add_disk+0xbe/0x100 [nvme_core] kernel: nvme_validate_ns+0x396/0x940 [nvme_core] kernel: nvme_scan_work+0x256/0x390 [nvme_core] kernel: process_one_work+0x1db/0x380 kernel: worker_thread+0x4d/0x400 kernel: kthread+0x104/0x140 kernel: ret_from_fork+0x35/0x40
Fixes: 0d0b660f214d ("nvme: add ANA support") Signed-off-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/multipath.c | 4 ++-- drivers/nvme/host/nvme.h | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index f4287d8550a9f..d1cb65698288b 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -413,11 +413,11 @@ static void nvme_mpath_set_live(struct nvme_ns *ns) if (!head->disk) return;
- mutex_lock(&head->lock); - if (!(head->disk->flags & GENHD_FL_UP)) + if (!test_and_set_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) device_add_disk(&head->subsys->dev, head->disk, nvme_ns_id_attr_groups);
+ mutex_lock(&head->lock); if (nvme_path_is_optimized(ns)) { int node, srcu_idx;
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 2e04a36296d95..719342600be62 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -359,6 +359,8 @@ struct nvme_ns_head { spinlock_t requeue_lock; struct work_struct requeue_work; struct mutex lock; + unsigned long flags; +#define NVME_NSHEAD_DISK_LIVE 0 struct nvme_ns __rcu *current_path[]; #endif };
From: Sagi Grimberg sagi@grimberg.me
[ Upstream commit c31244669f57963b6ce133a5555b118fc50aec95 ]
The mpath disk node takes a reference on the request mpath request queue when adding live path to the mpath gendisk. However if we connected to an inaccessible path device_add_disk is not called, so if we disconnect and remove the mpath gendisk we endup putting an reference on the request queue that was never taken [1].
Fix that to check if we ever added a live path (using NVME_NS_HEAD_HAS_DISK flag) and if not, clear the disk->queue reference.
[1]: ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 1 PID: 1372 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0 CPU: 1 PID: 1372 Comm: nvme Tainted: G O 5.7.0-rc2+ #3 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1 04/01/2014 RIP: 0010:refcount_warn_saturate+0xa6/0xf0 RSP: 0018:ffffb29e8053bdc0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff8b7a2f4fc060 RCX: 0000000000000007 RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8b7a3ec99980 RBP: ffff8b7a2f4fc000 R08: 00000000000002e1 R09: 0000000000000004 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000 R13: fffffffffffffff2 R14: ffffb29e8053bf08 R15: ffff8b7a320e2da0 FS: 00007f135d4ca800(0000) GS:ffff8b7a3ec80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005651178c0c30 CR3: 000000003b650005 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: disk_release+0xa2/0xc0 device_release+0x28/0x80 kobject_put+0xa5/0x1b0 nvme_put_ns_head+0x26/0x70 [nvme_core] nvme_put_ns+0x30/0x60 [nvme_core] nvme_remove_namespaces+0x9b/0xe0 [nvme_core] nvme_do_delete_ctrl+0x43/0x5c [nvme_core] nvme_sysfs_delete.cold+0x8/0xd [nvme_core] kernfs_fop_write+0xc1/0x1a0 vfs_write+0xb6/0x1a0 ksys_write+0x5f/0xe0 do_syscall_64+0x52/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reported-by: Anton Eidelman anton@lightbitslabs.com Tested-by: Anton Eidelman anton@lightbitslabs.com Signed-off-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/multipath.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index d1cb65698288b..03bc3aba09871 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -691,6 +691,14 @@ void nvme_mpath_remove_disk(struct nvme_ns_head *head) kblockd_schedule_work(&head->requeue_work); flush_work(&head->requeue_work); blk_cleanup_queue(head->disk->queue); + if (!test_bit(NVME_NSHEAD_DISK_LIVE, &head->flags)) { + /* + * if device_add_disk wasn't called, prevent + * disk release to put a bogus reference on the + * request queue + */ + head->disk->queue = NULL; + } put_disk(head->disk); }
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit d60b5fbc1ce8210759b568da49d149b868e7c6d3 ]
Don't reissue requests from io_iopoll_reap_events(), the task may not have mm, which ends up with NULL. It's better to kill everything off on exit anyway.
[ 677.734670] RIP: 0010:io_iopoll_complete+0x27e/0x630 ... [ 677.734679] Call Trace: [ 677.734695] ? __send_signal+0x1f2/0x420 [ 677.734698] ? _raw_spin_unlock_irqrestore+0x24/0x40 [ 677.734699] ? send_signal+0xf5/0x140 [ 677.734700] io_iopoll_getevents+0x12f/0x1a0 [ 677.734702] io_iopoll_reap_events.part.0+0x5e/0xa0 [ 677.734703] io_ring_ctx_wait_and_kill+0x132/0x1c0 [ 677.734704] io_uring_release+0x20/0x30 [ 677.734706] __fput+0xcd/0x230 [ 677.734707] ____fput+0xe/0x10 [ 677.734709] task_work_run+0x67/0xa0 [ 677.734710] do_exit+0x35d/0xb70 [ 677.734712] do_group_exit+0x43/0xa0 [ 677.734713] get_signal+0x140/0x900 [ 677.734715] do_signal+0x37/0x780 [ 677.734717] ? enqueue_hrtimer+0x41/0xb0 [ 677.734718] ? recalibrate_cpu_khz+0x10/0x10 [ 677.734720] ? ktime_get+0x3e/0xa0 [ 677.734721] ? lapic_next_deadline+0x26/0x30 [ 677.734723] ? tick_program_event+0x4d/0x90 [ 677.734724] ? __hrtimer_get_next_event+0x4d/0x80 [ 677.734726] __prepare_exit_to_usermode+0x126/0x1c0 [ 677.734741] prepare_exit_to_usermode+0x9/0x40 [ 677.734742] idtentry_exit_cond_rcu+0x4c/0x60 [ 677.734743] sysvec_reschedule_ipi+0x92/0x160 [ 677.734744] ? asm_sysvec_reschedule_ipi+0xa/0x20 [ 677.734745] asm_sysvec_reschedule_ipi+0x12/0x20
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 63a456921903e..71d281f68ed83 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -858,6 +858,7 @@ static int __io_sqe_files_update(struct io_ring_ctx *ctx, struct io_uring_files_update *ip, unsigned nr_args); static int io_grab_files(struct io_kiocb *req); +static void io_complete_rw_common(struct kiocb *kiocb, long res); static void io_cleanup_req(struct io_kiocb *req); static int io_file_get(struct io_submit_state *state, struct io_kiocb *req, int fd, struct file **out_file, bool fixed); @@ -1697,6 +1698,14 @@ static void io_iopoll_queue(struct list_head *again) do { req = list_first_entry(again, struct io_kiocb, list); list_del(&req->list); + + /* shouldn't happen unless io_uring is dying, cancel reqs */ + if (unlikely(!current->mm)) { + io_complete_rw_common(&req->rw.kiocb, -EAGAIN); + io_put_req(req); + continue; + } + refcount_inc(&req->refs); io_queue_async_work(req); } while (!list_empty(again));
From: Douglas Anderson dianders@chromium.org
[ Upstream commit 440ab9e10e2e6e5fd677473ee6f9e3af0f6904d6 ]
At times when I'm using kgdb I see a splat on my console about suspicious RCU usage. I managed to come up with a case that could reproduce this that looked like this:
WARNING: suspicious RCU usage 5.7.0-rc4+ #609 Not tainted ----------------------------- kernel/pid.c:395 find_task_by_pid_ns() needs rcu_read_lock() protection!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1 3 locks held by swapper/0/1: #0: ffffff81b6b8e988 (&dev->mutex){....}-{3:3}, at: __device_attach+0x40/0x13c #1: ffffffd01109e9e8 (dbg_master_lock){....}-{2:2}, at: kgdb_cpu_enter+0x20c/0x7ac #2: ffffffd01109ea90 (dbg_slave_lock){....}-{2:2}, at: kgdb_cpu_enter+0x3ec/0x7ac
stack backtrace: CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.7.0-rc4+ #609 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x1b8 show_stack+0x1c/0x24 dump_stack+0xd4/0x134 lockdep_rcu_suspicious+0xf0/0x100 find_task_by_pid_ns+0x5c/0x80 getthread+0x8c/0xb0 gdb_serial_stub+0x9d4/0xd04 kgdb_cpu_enter+0x284/0x7ac kgdb_handle_exception+0x174/0x20c kgdb_brk_fn+0x24/0x30 call_break_hook+0x6c/0x7c brk_handler+0x20/0x5c do_debug_exception+0x1c8/0x22c el1_sync_handler+0x3c/0xe4 el1_sync+0x7c/0x100 rpmh_rsc_probe+0x38/0x420 platform_drv_probe+0x94/0xb4 really_probe+0x134/0x300 driver_probe_device+0x68/0x100 __device_attach_driver+0x90/0xa8 bus_for_each_drv+0x84/0xcc __device_attach+0xb4/0x13c device_initial_probe+0x18/0x20 bus_probe_device+0x38/0x98 device_add+0x38c/0x420
If I understand properly we should just be able to blanket kgdb under one big RCU read lock and the problem should go away. We'll add it to the beast-of-a-function known as kgdb_cpu_enter().
With this I no longer get any splats and things seem to work fine.
Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://lore.kernel.org/r/20200602154729.v2.1.I70e0d4fd46d5ed2aaf0c98a355e8e... Signed-off-by: Daniel Thompson daniel.thompson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/debug/debug_core.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c index d47c7d6656cd3..9be6accf8fe3d 100644 --- a/kernel/debug/debug_core.c +++ b/kernel/debug/debug_core.c @@ -577,6 +577,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs, arch_kgdb_ops.disable_hw_break(regs);
acquirelock: + rcu_read_lock(); /* * Interrupts will be restored by the 'trap return' code, except when * single stepping. @@ -636,6 +637,7 @@ return_normal: atomic_dec(&slaves_in_kgdb); dbg_touch_watchdogs(); local_irq_restore(flags); + rcu_read_unlock(); return 0; } cpu_relax(); @@ -654,6 +656,7 @@ return_normal: raw_spin_unlock(&dbg_master_lock); dbg_touch_watchdogs(); local_irq_restore(flags); + rcu_read_unlock();
goto acquirelock; } @@ -777,6 +780,7 @@ kgdb_restore: raw_spin_unlock(&dbg_master_lock); dbg_touch_watchdogs(); local_irq_restore(flags); + rcu_read_unlock();
return kgdb_info[cpu].ret_state; }
From: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com
commit 5be206eaac9a68992fc3b06fb5dd5634e323de86 upstream.
The reverted commit illegitly uses tpm2-tools. External dependencies are absolutely forbidden from these tests. There is also the problem that clearing is not necessarily wanted behavior if the test/target computer is not used only solely for testing.
Fixes: a9920d3bad40 ("tpm: selftest: cleanup after unseal with wrong auth/policy test") Cc: Tadeusz Struk tadeusz.struk@intel.com Cc: stable@vger.kernel.org Cc: linux-integrity@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/tpm2/test_smoke.sh | 5 ----- 1 file changed, 5 deletions(-)
--- a/tools/testing/selftests/tpm2/test_smoke.sh +++ b/tools/testing/selftests/tpm2/test_smoke.sh @@ -3,8 +3,3 @@
python -m unittest -v tpm2_tests.SmokeTest python -m unittest -v tpm2_tests.AsyncTest - -CLEAR_CMD=$(which tpm2_clear) -if [ -n $CLEAR_CMD ]; then - tpm2_clear -T device -fi
From: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com
commit 377ff83083c953dd58c5a030b3c9b5b85d8cc727 upstream.
It's better to use /bin/sh instead of /bin/bash in order to run the tests in the BusyBox shell.
Fixes: 6ea3dfe1e073 ("selftests: add TPM 2.0 tests") Cc: stable@vger.kernel.org Cc: linux-integrity@vger.kernel.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/testing/selftests/tpm2/test_smoke.sh | 2 +- tools/testing/selftests/tpm2/test_space.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/tools/testing/selftests/tpm2/test_smoke.sh +++ b/tools/testing/selftests/tpm2/test_smoke.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/sh # SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
python -m unittest -v tpm2_tests.SmokeTest --- a/tools/testing/selftests/tpm2/test_space.sh +++ b/tools/testing/selftests/tpm2/test_space.sh @@ -1,4 +1,4 @@ -#!/bin/bash +#!/bin/sh # SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause)
python -m unittest -v tpm2_tests.SpaceTest
From: James Bottomley James.Bottomley@HansenPartnership.com
commit 7862840219058436b80029a0263fd1ef065fb1b3 upstream.
It has been reported that some TIS based TPMs are giving unexpected errors when using the O_NONBLOCK path of the TPM device. The problem is that some TPMs don't like it when you get and then relinquish a locality (as the tpm_try_get_ops()/tpm_put_ops() pair does) without sending a command. This currently happens all the time in the O_NONBLOCK write path. Fix this by moving the tpm_try_get_ops() further down the code to after the O_NONBLOCK determination is made. This is safe because the priv->buffer_mutex still protects the priv state being modified.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206275 Fixes: d23d12484307 ("tpm: fix invalid locking in NONBLOCKING mode") Reported-by: Mario Limonciello Mario.Limonciello@dell.com Tested-by: Alex Guzman alex@guzman.io Cc: stable@vger.kernel.org Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Signed-off-by: James Bottomley James.Bottomley@HansenPartnership.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/tpm/tpm-dev-common.c | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-)
--- a/drivers/char/tpm/tpm-dev-common.c +++ b/drivers/char/tpm/tpm-dev-common.c @@ -189,15 +189,6 @@ ssize_t tpm_common_write(struct file *fi goto out; }
- /* atomic tpm command send and result receive. We only hold the ops - * lock during this period so that the tpm can be unregistered even if - * the char dev is held open. - */ - if (tpm_try_get_ops(priv->chip)) { - ret = -EPIPE; - goto out; - } - priv->response_length = 0; priv->response_read = false; *off = 0; @@ -211,11 +202,19 @@ ssize_t tpm_common_write(struct file *fi if (file->f_flags & O_NONBLOCK) { priv->command_enqueued = true; queue_work(tpm_dev_wq, &priv->async_work); - tpm_put_ops(priv->chip); mutex_unlock(&priv->buffer_mutex); return size; }
+ /* atomic tpm command send and result receive. We only hold the ops + * lock during this period so that the tpm can be unregistered even if + * the char dev is held open. + */ + if (tpm_try_get_ops(priv->chip)) { + ret = -EPIPE; + goto out; + } + ret = tpm_dev_transmit(priv->chip, priv->space, priv->data_buffer, sizeof(priv->data_buffer)); tpm_put_ops(priv->chip);
From: Herbert Xu herbert@gondor.apana.org.au
commit 34c86f4c4a7be3b3e35aa48bd18299d4c756064d upstream.
The locking in af_alg_release_parent is broken as the BH socket lock can only be taken if there is a code-path to handle the case where the lock is owned by process-context. Instead of adding such handling, we can fix this by changing the ref counts to atomic_t.
This patch also modifies the main refcnt to include both normal and nokey sockets. This way we don't have to fudge the nokey ref count when a socket changes from nokey to normal.
Credits go to Mauricio Faria de Oliveira who diagnosed this bug and sent a patch for it:
https://lore.kernel.org/linux-crypto/20200605161657.535043-1-mfo@canonical.c...
Reported-by: Brian Moyles bmoyles@netflix.com Reported-by: Mauricio Faria de Oliveira mfo@canonical.com Fixes: 37f96694cf73 ("crypto: af_alg - Use bh_lock_sock in...") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/af_alg.c | 26 +++++++++++--------------- crypto/algif_aead.c | 9 +++------ crypto/algif_hash.c | 9 +++------ crypto/algif_skcipher.c | 9 +++------ include/crypto/if_alg.h | 4 ++-- 5 files changed, 22 insertions(+), 35 deletions(-)
--- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -128,21 +128,15 @@ EXPORT_SYMBOL_GPL(af_alg_release); void af_alg_release_parent(struct sock *sk) { struct alg_sock *ask = alg_sk(sk); - unsigned int nokey = ask->nokey_refcnt; - bool last = nokey && !ask->refcnt; + unsigned int nokey = atomic_read(&ask->nokey_refcnt);
sk = ask->parent; ask = alg_sk(sk);
- local_bh_disable(); - bh_lock_sock(sk); - ask->nokey_refcnt -= nokey; - if (!last) - last = !--ask->refcnt; - bh_unlock_sock(sk); - local_bh_enable(); + if (nokey) + atomic_dec(&ask->nokey_refcnt);
- if (last) + if (atomic_dec_and_test(&ask->refcnt)) sock_put(sk); } EXPORT_SYMBOL_GPL(af_alg_release_parent); @@ -187,7 +181,7 @@ static int alg_bind(struct socket *sock,
err = -EBUSY; lock_sock(sk); - if (ask->refcnt | ask->nokey_refcnt) + if (atomic_read(&ask->refcnt)) goto unlock;
swap(ask->type, type); @@ -236,7 +230,7 @@ static int alg_setsockopt(struct socket int err = -EBUSY;
lock_sock(sk); - if (ask->refcnt) + if (atomic_read(&ask->refcnt) != atomic_read(&ask->nokey_refcnt)) goto unlock;
type = ask->type; @@ -301,12 +295,14 @@ int af_alg_accept(struct sock *sk, struc if (err) goto unlock;
- if (nokey || !ask->refcnt++) + if (atomic_inc_return_relaxed(&ask->refcnt) == 1) sock_hold(sk); - ask->nokey_refcnt += nokey; + if (nokey) { + atomic_inc(&ask->nokey_refcnt); + atomic_set(&alg_sk(sk2)->nokey_refcnt, 1); + } alg_sk(sk2)->parent = sk; alg_sk(sk2)->type = type; - alg_sk(sk2)->nokey_refcnt = nokey;
newsock->ops = type->ops; newsock->state = SS_CONNECTED; --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -384,7 +384,7 @@ static int aead_check_key(struct socket struct alg_sock *ask = alg_sk(sk);
lock_sock(sk); - if (ask->refcnt) + if (!atomic_read(&ask->nokey_refcnt)) goto unlock_child;
psk = ask->parent; @@ -396,11 +396,8 @@ static int aead_check_key(struct socket if (crypto_aead_get_flags(tfm->aead) & CRYPTO_TFM_NEED_KEY) goto unlock;
- if (!pask->refcnt++) - sock_hold(psk); - - ask->refcnt = 1; - sock_put(psk); + atomic_dec(&pask->nokey_refcnt); + atomic_set(&ask->nokey_refcnt, 0);
err = 0;
--- a/crypto/algif_hash.c +++ b/crypto/algif_hash.c @@ -301,7 +301,7 @@ static int hash_check_key(struct socket struct alg_sock *ask = alg_sk(sk);
lock_sock(sk); - if (ask->refcnt) + if (!atomic_read(&ask->nokey_refcnt)) goto unlock_child;
psk = ask->parent; @@ -313,11 +313,8 @@ static int hash_check_key(struct socket if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) goto unlock;
- if (!pask->refcnt++) - sock_hold(psk); - - ask->refcnt = 1; - sock_put(psk); + atomic_dec(&pask->nokey_refcnt); + atomic_set(&ask->nokey_refcnt, 0);
err = 0;
--- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -211,7 +211,7 @@ static int skcipher_check_key(struct soc struct alg_sock *ask = alg_sk(sk);
lock_sock(sk); - if (ask->refcnt) + if (!atomic_read(&ask->nokey_refcnt)) goto unlock_child;
psk = ask->parent; @@ -223,11 +223,8 @@ static int skcipher_check_key(struct soc if (crypto_skcipher_get_flags(tfm) & CRYPTO_TFM_NEED_KEY) goto unlock;
- if (!pask->refcnt++) - sock_hold(psk); - - ask->refcnt = 1; - sock_put(psk); + atomic_dec(&pask->nokey_refcnt); + atomic_set(&ask->nokey_refcnt, 0);
err = 0;
--- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -29,8 +29,8 @@ struct alg_sock {
struct sock *parent;
- unsigned int refcnt; - unsigned int nokey_refcnt; + atomic_t refcnt; + atomic_t nokey_refcnt;
const struct af_alg_type *type; void *private;
From: Oleg Nesterov oleg@redhat.com
[ Upstream commit e91b48162332480f5840902268108bb7fb7a44c7 ]
So that the target task will exit the wait_event_interruptible-like loop and call task_work_run() asap.
The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the new TWA_SIGNAL flag implies signal_wake_up(). However, it needs to avoid the race with recalc_sigpending(), so the patch also adds the new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK.
TODO: once this patch is merged we need to change all current users of task_work_add(notify = true) to use TWA_RESUME.
Cc: stable@vger.kernel.org # v5.7 Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sched/jobctl.h | 4 +++- include/linux/task_work.h | 5 ++++- kernel/signal.c | 10 +++++++--- kernel/task_work.c | 16 ++++++++++++++-- 4 files changed, 28 insertions(+), 7 deletions(-)
diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h index fa067de9f1a94..d2b4204ba4d34 100644 --- a/include/linux/sched/jobctl.h +++ b/include/linux/sched/jobctl.h @@ -19,6 +19,7 @@ struct task_struct; #define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */ #define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */ #define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */ +#define JOBCTL_TASK_WORK_BIT 24 /* set by TWA_SIGNAL */
#define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT) #define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT) @@ -28,9 +29,10 @@ struct task_struct; #define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT) #define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT) #define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT) +#define JOBCTL_TASK_WORK (1UL << JOBCTL_TASK_WORK_BIT)
#define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY) -#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK) +#define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK | JOBCTL_TASK_WORK)
extern bool task_set_jobctl_pending(struct task_struct *task, unsigned long mask); extern void task_clear_jobctl_trapping(struct task_struct *task); diff --git a/include/linux/task_work.h b/include/linux/task_work.h index bd9a6a91c097e..0fb93aafa4785 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -13,7 +13,10 @@ init_task_work(struct callback_head *twork, task_work_func_t func) twork->func = func; }
-int task_work_add(struct task_struct *task, struct callback_head *twork, bool); +#define TWA_RESUME 1 +#define TWA_SIGNAL 2 +int task_work_add(struct task_struct *task, struct callback_head *twork, int); + struct callback_head *task_work_cancel(struct task_struct *, task_work_func_t); void task_work_run(void);
diff --git a/kernel/signal.c b/kernel/signal.c index 284fc1600063b..d5feb34b5e158 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -2529,9 +2529,6 @@ bool get_signal(struct ksignal *ksig) struct signal_struct *signal = current->signal; int signr;
- if (unlikely(current->task_works)) - task_work_run(); - if (unlikely(uprobe_deny_signal())) return false;
@@ -2544,6 +2541,13 @@ bool get_signal(struct ksignal *ksig)
relock: spin_lock_irq(&sighand->siglock); + current->jobctl &= ~JOBCTL_TASK_WORK; + if (unlikely(current->task_works)) { + spin_unlock_irq(&sighand->siglock); + task_work_run(); + goto relock; + } + /* * Every stopped thread goes here after wakeup. Check to see if * we should notify the parent, prepare_signal(SIGCONT) encodes diff --git a/kernel/task_work.c b/kernel/task_work.c index 825f28259a19a..5c0848ca1287d 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -25,9 +25,10 @@ static struct callback_head work_exited; /* all we need is ->next == NULL */ * 0 if succeeds or -ESRCH. */ int -task_work_add(struct task_struct *task, struct callback_head *work, bool notify) +task_work_add(struct task_struct *task, struct callback_head *work, int notify) { struct callback_head *head; + unsigned long flags;
do { head = READ_ONCE(task->task_works); @@ -36,8 +37,19 @@ task_work_add(struct task_struct *task, struct callback_head *work, bool notify) work->next = head; } while (cmpxchg(&task->task_works, head, work) != head);
- if (notify) + switch (notify) { + case TWA_RESUME: set_notify_resume(task); + break; + case TWA_SIGNAL: + if (lock_task_sighand(task, &flags)) { + task->jobctl |= JOBCTL_TASK_WORK; + signal_wake_up(task, 0); + unlock_task_sighand(task, &flags); + } + break; + } + return 0; }
From: Jens Axboe axboe@kernel.dk
[ Upstream commit ce593a6c480a22acba08795be313c0c6d49dd35d ]
Since 5.7, we've been using task_work to trigger async running of requests in the context of the original task. This generally works great, but there's a case where if the task is currently blocked in the kernel waiting on a condition to become true, it won't process task_work. Even though the task is woken, it just checks whatever condition it's waiting on, and goes back to sleep if it's still false.
This is a problem if that very condition only becomes true when that task_work is run. An example of that is the task registering an eventfd with io_uring, and it's now blocked waiting on an eventfd read. That read could depend on a completion event, and that completion event won't get trigged until task_work has been run.
Use the TWA_SIGNAL notification for task_work, so that we ensure that the task always runs the work when queued.
Cc: stable@vger.kernel.org # v5.7 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 71d281f68ed83..51362a619fd50 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4136,6 +4136,21 @@ struct io_poll_table { int error; };
+static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb, + int notify) +{ + struct task_struct *tsk = req->task; + int ret; + + if (req->ctx->flags & IORING_SETUP_SQPOLL) + notify = 0; + + ret = task_work_add(tsk, cb, notify); + if (!ret) + wake_up_process(tsk); + return ret; +} + static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, __poll_t mask, task_work_func_t func) { @@ -4159,13 +4174,13 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, * of executing it. We can't safely execute it anyway, as we may not * have the needed state needed for it anyway. */ - ret = task_work_add(tsk, &req->task_work, true); + ret = io_req_task_work_add(req, &req->task_work, TWA_SIGNAL); if (unlikely(ret)) { WRITE_ONCE(poll->canceled, true); tsk = io_wq_get_task(req->ctx->io_wq); - task_work_add(tsk, &req->task_work, true); + task_work_add(tsk, &req->task_work, 0); + wake_up_process(tsk); } - wake_up_process(tsk); return 1; }
@@ -6260,19 +6275,20 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, do { prepare_to_wait_exclusive(&ctx->wait, &iowq.wq, TASK_INTERRUPTIBLE); + /* make sure we run task_work before checking for signals */ if (current->task_works) task_work_run(); - if (io_should_wake(&iowq, false)) - break; - schedule(); if (signal_pending(current)) { - ret = -EINTR; + ret = -ERESTARTSYS; break; } + if (io_should_wake(&iowq, false)) + break; + schedule(); } while (1); finish_wait(&ctx->wait, &iowq.wq);
- restore_saved_sigmask_unless(ret == -EINTR); + restore_saved_sigmask_unless(ret == -ERESTARTSYS);
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; }
From: Chen Tao chentao107@huawei.com
[ Upstream commit aa472721c8dbe1713cf510f56ffbc56ae9e14247 ]
Fix to return negative error code -ENOMEM with the use of ERR_PTR from dpu_encoder_init.
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Chen Tao chentao107@huawei.com Signed-off-by: Rob Clark robdclark@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c index a1b79ee2bd9d5..a2f6b688a9768 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c @@ -2173,7 +2173,7 @@ struct drm_encoder *dpu_encoder_init(struct drm_device *dev,
dpu_enc = devm_kzalloc(dev->dev, sizeof(*dpu_enc), GFP_KERNEL); if (!dpu_enc) - return ERR_PTR(ENOMEM); + return ERR_PTR(-ENOMEM);
rc = drm_encoder_init(dev, &dpu_enc->base, &dpu_encoder_funcs, drm_enc_mode, NULL);
From: Filipe Manana fdmanana@suse.com
[ Upstream commit 5dbb75ed6900048e146247b6325742d92c892548 ]
A RWF_NOWAIT write is not supposed to wait on filesystem locks that can be held for a long time or for ongoing IO to complete.
However when calling check_can_nocow(), if the inode has prealloc extents or has the NOCOW flag set, we can block on extent (file range) locks through the call to btrfs_lock_and_flush_ordered_range(). Such lock can take a significant amount of time to be available. For example, a fiemap task may be running, and iterating through the entire file range checking all extents and doing backref walking to determine if they are shared, or a readpage operation may be in progress.
Also at btrfs_lock_and_flush_ordered_range(), called by check_can_nocow(), after locking the file range we wait for any existing ordered extent that is in progress to complete. Another operation that can take a significant amount of time and defeat the purpose of RWF_NOWAIT.
So fix this by trying to lock the file range and if it's currently locked return -EAGAIN to user space. If we are able to lock the file range without waiting and there is an ordered extent in the range, return -EAGAIN as well, instead of waiting for it to complete. Finally, don't bother trying to lock the snapshot lock of the root when attempting a RWF_NOWAIT write, as that is only important for buffered writes.
Fixes: edf064e7c6fec3 ("btrfs: nowait aio support") Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/file.c | 37 ++++++++++++++++++++++++++----------- 1 file changed, 26 insertions(+), 11 deletions(-)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c index 52d565ff66e2d..93244934d4f92 100644 --- a/fs/btrfs/file.c +++ b/fs/btrfs/file.c @@ -1541,7 +1541,7 @@ lock_and_cleanup_extent_if_need(struct btrfs_inode *inode, struct page **pages, }
static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, - size_t *write_bytes) + size_t *write_bytes, bool nowait) { struct btrfs_fs_info *fs_info = inode->root->fs_info; struct btrfs_root *root = inode->root; @@ -1549,27 +1549,43 @@ static noinline int check_can_nocow(struct btrfs_inode *inode, loff_t pos, u64 num_bytes; int ret;
- if (!btrfs_drew_try_write_lock(&root->snapshot_lock)) + if (!nowait && !btrfs_drew_try_write_lock(&root->snapshot_lock)) return -EAGAIN;
lockstart = round_down(pos, fs_info->sectorsize); lockend = round_up(pos + *write_bytes, fs_info->sectorsize) - 1; + num_bytes = lockend - lockstart + 1;
- btrfs_lock_and_flush_ordered_range(inode, lockstart, - lockend, NULL); + if (nowait) { + struct btrfs_ordered_extent *ordered; + + if (!try_lock_extent(&inode->io_tree, lockstart, lockend)) + return -EAGAIN; + + ordered = btrfs_lookup_ordered_range(inode, lockstart, + num_bytes); + if (ordered) { + btrfs_put_ordered_extent(ordered); + ret = -EAGAIN; + goto out_unlock; + } + } else { + btrfs_lock_and_flush_ordered_range(inode, lockstart, + lockend, NULL); + }
- num_bytes = lockend - lockstart + 1; ret = can_nocow_extent(&inode->vfs_inode, lockstart, &num_bytes, NULL, NULL, NULL); if (ret <= 0) { ret = 0; - btrfs_drew_write_unlock(&root->snapshot_lock); + if (!nowait) + btrfs_drew_write_unlock(&root->snapshot_lock); } else { *write_bytes = min_t(size_t, *write_bytes , num_bytes - pos + lockstart); } - +out_unlock: unlock_extent(&inode->io_tree, lockstart, lockend);
return ret; @@ -1641,7 +1657,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb, if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) && check_can_nocow(BTRFS_I(inode), pos, - &write_bytes) > 0) { + &write_bytes, false) > 0) { /* * For nodata cow case, no need to reserve * data space. @@ -1920,12 +1936,11 @@ static ssize_t btrfs_file_write_iter(struct kiocb *iocb, */ if (!(BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)) || - check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes) <= 0) { + check_can_nocow(BTRFS_I(inode), pos, &nocow_bytes, + true) <= 0) { inode_unlock(inode); return -EAGAIN; } - /* check_can_nocow() locks the snapshot lock on success */ - btrfs_drew_write_unlock(&root->snapshot_lock); /* * There are holes in the range or parts of the range that must * be COWed (shared extents, RO block groups, etc), so just bail
From: David Howells dhowells@redhat.com
[ Upstream commit 02c28dffb13abbaaedece1e4a6493b48ad3f913a ]
Commit 2ad6691d988c, which moved the modification of the status annotation for a packet in the Tx buffer prior to the retransmission moved the state clearance, but managed to lose the bit that set it to UNACK.
Consequently, if a retransmission occurs, the packet is accidentally changed to the ACK state (ie. 0) by masking it off, which means that the packet isn't counted towards the tally of newly-ACK'd packets if it gets hard-ACK'd. This then prevents the congestion control algorithm from recovering properly.
Fix by reinstating the change of state to UNACK.
Spotted by the generic/460 xfstest.
Fixes: 2ad6691d988c ("rxrpc: Fix race between incoming ACK parser and retransmitter") Signed-off-by: David Howells dhowells@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/rxrpc/call_event.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c index 985fb89202d0c..9ff85ee8337cd 100644 --- a/net/rxrpc/call_event.c +++ b/net/rxrpc/call_event.c @@ -253,7 +253,7 @@ static void rxrpc_resend(struct rxrpc_call *call, unsigned long now_j) * confuse things */ annotation &= ~RXRPC_TX_ANNO_MASK; - annotation |= RXRPC_TX_ANNO_RESENT; + annotation |= RXRPC_TX_ANNO_UNACK | RXRPC_TX_ANNO_RESENT; call->rxtx_annotations[ix] = annotation;
skb = call->rxtx_buffer[ix];
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 9e365ff576b7c1623bbc5ef31ec652c533e2f65e ]
Currently any MPTCP socket using syn cookies will fallback to TCP at 3rd ack time. In case of MP_JOIN requests, the RFC mandate closing the child and sockets, but the existing error paths do not handle the syncookie scenario correctly.
Address the issue always forcing the child shutdown in case of MP_JOIN fallback.
Fixes: ae2dd7164943 ("mptcp: handle tcp fallback when using syn cookies") Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/subflow.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index db3e4e74e7857..0112ead58fd8b 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -424,22 +424,25 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, struct mptcp_subflow_context *listener = mptcp_subflow_ctx(sk); struct mptcp_subflow_request_sock *subflow_req; struct mptcp_options_received mp_opt; - bool fallback_is_fatal = false; + bool fallback, fallback_is_fatal; struct sock *new_msk = NULL; - bool fallback = false; struct sock *child;
pr_debug("listener=%p, req=%p, conn=%p", listener, req, listener->conn);
- /* we need later a valid 'mp_capable' value even when options are not - * parsed + /* After child creation we must look for 'mp_capable' even when options + * are not parsed */ mp_opt.mp_capable = 0; - if (tcp_rsk(req)->is_mptcp == 0) + + /* hopefully temporary handling for MP_JOIN+syncookie */ + subflow_req = mptcp_subflow_rsk(req); + fallback_is_fatal = subflow_req->mp_join; + fallback = !tcp_rsk(req)->is_mptcp; + if (fallback) goto create_child;
/* if the sk is MP_CAPABLE, we try to fetch the client key */ - subflow_req = mptcp_subflow_rsk(req); if (subflow_req->mp_capable) { if (TCP_SKB_CB(skb)->seq != subflow_req->ssn_offset + 1) { /* here we can receive and accept an in-window, @@ -460,12 +463,11 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, if (!new_msk) fallback = true; } else if (subflow_req->mp_join) { - fallback_is_fatal = true; mptcp_get_options(skb, &mp_opt); if (!mp_opt.mp_join || !subflow_hmac_valid(req, &mp_opt)) { SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_JOINACKMAC); - return NULL; + fallback = true; } }
From: Po Liu Po.Liu@nxp.com
[ Upstream commit 79e499829f3ff5b8f70c87baf1b03ebb3401a3e4 ]
This patch is to let ethtool enable/disable the tc flower offload features. Hardware ENETC has the feature of PSFP which is for per-stream policing. When enable the tc hw offloading feature, driver would enable the IEEE 802.1Qci feature. It is only set the register enable bit for this feature not enable for any entry of per stream filtering and stream gate or stream identify but get how much capabilities for each feature.
Signed-off-by: Po Liu Po.Liu@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/enetc/enetc.c | 23 +++++++++ drivers/net/ethernet/freescale/enetc/enetc.h | 48 +++++++++++++++++++ .../net/ethernet/freescale/enetc/enetc_hw.h | 17 +++++++ .../net/ethernet/freescale/enetc/enetc_pf.c | 8 ++++ 4 files changed, 96 insertions(+)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 4486a0db8ef0c..9ac5cccfe0204 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -756,6 +756,9 @@ void enetc_get_si_caps(struct enetc_si *si)
if (val & ENETC_SIPCAPR0_QBV) si->hw_features |= ENETC_SI_F_QBV; + + if (val & ENETC_SIPCAPR0_PSFP) + si->hw_features |= ENETC_SI_F_PSFP; }
static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size) @@ -1567,6 +1570,23 @@ static int enetc_set_rss(struct net_device *ndev, int en) return 0; }
+static int enetc_set_psfp(struct net_device *ndev, int en) +{ + struct enetc_ndev_priv *priv = netdev_priv(ndev); + + if (en) { + priv->active_offloads |= ENETC_F_QCI; + enetc_get_max_cap(priv); + enetc_psfp_enable(&priv->si->hw); + } else { + priv->active_offloads &= ~ENETC_F_QCI; + memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap)); + enetc_psfp_disable(&priv->si->hw); + } + + return 0; +} + int enetc_set_features(struct net_device *ndev, netdev_features_t features) { @@ -1575,6 +1595,9 @@ int enetc_set_features(struct net_device *ndev, if (changed & NETIF_F_RXHASH) enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));
+ if (changed & NETIF_F_HW_TC) + enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC)); + return 0; }
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h index 56c43f35b633b..2cfe877c37786 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.h +++ b/drivers/net/ethernet/freescale/enetc/enetc.h @@ -151,6 +151,7 @@ enum enetc_errata { };
#define ENETC_SI_F_QBV BIT(0) +#define ENETC_SI_F_PSFP BIT(1)
/* PCI IEP device data */ struct enetc_si { @@ -203,12 +204,20 @@ struct enetc_cls_rule { };
#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */ +struct psfp_cap { + u32 max_streamid; + u32 max_psfp_filter; + u32 max_psfp_gate; + u32 max_psfp_gatelist; + u32 max_psfp_meter; +};
/* TODO: more hardware offloads */ enum enetc_active_offloads { ENETC_F_RX_TSTAMP = BIT(0), ENETC_F_TX_TSTAMP = BIT(1), ENETC_F_QBV = BIT(2), + ENETC_F_QCI = BIT(3), };
struct enetc_ndev_priv { @@ -231,6 +240,8 @@ struct enetc_ndev_priv {
struct enetc_cls_rule *cls_rules;
+ struct psfp_cap psfp_cap; + struct device_node *phy_node; phy_interface_t if_mode; }; @@ -289,9 +300,46 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data); void enetc_sched_speed_set(struct net_device *ndev); int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data); int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data); + +static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv) +{ + u32 reg; + + reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR); + priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK; + /* Port stream filter capability */ + reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR); + priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK; + /* Port stream gate capability */ + reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR); + priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK); + priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16; + /* Port flow meter capability */ + reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR); + priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK; +} + +static inline void enetc_psfp_enable(struct enetc_hw *hw) +{ + enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) | + ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS | + ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC); +} + +static inline void enetc_psfp_disable(struct enetc_hw *hw) +{ + enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) & + ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS & + ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC); +} #else #define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP #define enetc_sched_speed_set(ndev) (void)0 #define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP #define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP +#define enetc_get_max_cap(p) \ + memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap)) + +#define enetc_psfp_enable(hw) (void)0 +#define enetc_psfp_disable(hw) (void)0 #endif diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h index 2a6523136947d..587974862f488 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h @@ -19,6 +19,7 @@ #define ENETC_SICTR1 0x1c #define ENETC_SIPCAPR0 0x20 #define ENETC_SIPCAPR0_QBV BIT(4) +#define ENETC_SIPCAPR0_PSFP BIT(9) #define ENETC_SIPCAPR0_RSS BIT(8) #define ENETC_SIPCAPR1 0x24 #define ENETC_SITGTGR 0x30 @@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX}; #define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11)) #define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1)) #define ENETC_PM0_IFM_XGMII BIT(12) +#define ENETC_PSIDCAPR 0x1b08 +#define ENETC_PSIDCAPR_MSK GENMASK(15, 0) +#define ENETC_PSFCAPR 0x1b18 +#define ENETC_PSFCAPR_MSK GENMASK(15, 0) +#define ENETC_PSGCAPR 0x1b28 +#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16) +#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0) +#define ENETC_PFMCAPR 0x1b38 +#define ENETC_PFMCAPR_MSK GENMASK(15, 0)
/* MAC counters */ #define ENETC_PM0_REOCT 0x8100 @@ -621,3 +631,10 @@ struct enetc_cbd { /* Port time specific departure */ #define ENETC_PTCTSDR(n) (0x1210 + 4 * (n)) #define ENETC_TSDE BIT(31) + +/* PSFP setting */ +#define ENETC_PPSFPMR 0x11b00 +#define ENETC_PPSFPMR_PSFPEN BIT(0) +#define ENETC_PPSFPMR_VS BIT(1) +#define ENETC_PPSFPMR_PVC BIT(2) +#define ENETC_PPSFPMR_PVZC BIT(3) diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index 85e2b741df414..eacd597b55f22 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -739,6 +739,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev, if (si->hw_features & ENETC_SI_F_QBV) priv->active_offloads |= ENETC_F_QBV;
+ if (si->hw_features & ENETC_SI_F_PSFP) { + priv->active_offloads |= ENETC_F_QCI; + ndev->features |= NETIF_F_HW_TC; + ndev->hw_features |= NETIF_F_HW_TC; + enetc_get_max_cap(priv); + enetc_psfp_enable(&si->hw); + } + /* pick up primary MAC address from SI */ enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr); }
From: Claudiu Manoil claudiu.manoil@nxp.com
[ Upstream commit 9deba33f1b7266a3870c9da31f787b605748fc0c ]
VLAN tag insertion/extraction offload is correctly activated at probe time but deactivation of this feature (i.e. via ethtool) is broken. Toggling works only for Tx/Rx ring 0 of a PF, and is ignored for the other rings, including the VF rings. To fix this, the existing VLAN offload toggling code was extended to all the rings assigned to a netdevice, instead of the default ring 0 (likely a leftover from the early validation days of this feature). And the code was moved to the common set_features() function to fix toggling for the VF driver too.
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers") Signed-off-by: Claudiu Manoil claudiu.manoil@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/enetc/enetc.c | 26 +++++++++++++++++++ .../net/ethernet/freescale/enetc/enetc_hw.h | 16 ++++++------ .../net/ethernet/freescale/enetc/enetc_pf.c | 9 ------- 3 files changed, 34 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c index 9ac5cccfe0204..a7e4274d3f402 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc.c +++ b/drivers/net/ethernet/freescale/enetc/enetc.c @@ -1587,6 +1587,24 @@ static int enetc_set_psfp(struct net_device *ndev, int en) return 0; }
+static void enetc_enable_rxvlan(struct net_device *ndev, bool en) +{ + struct enetc_ndev_priv *priv = netdev_priv(ndev); + int i; + + for (i = 0; i < priv->num_rx_rings; i++) + enetc_bdr_enable_rxvlan(&priv->si->hw, i, en); +} + +static void enetc_enable_txvlan(struct net_device *ndev, bool en) +{ + struct enetc_ndev_priv *priv = netdev_priv(ndev); + int i; + + for (i = 0; i < priv->num_tx_rings; i++) + enetc_bdr_enable_txvlan(&priv->si->hw, i, en); +} + int enetc_set_features(struct net_device *ndev, netdev_features_t features) { @@ -1595,6 +1613,14 @@ int enetc_set_features(struct net_device *ndev, if (changed & NETIF_F_RXHASH) enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));
+ if (changed & NETIF_F_HW_VLAN_CTAG_RX) + enetc_enable_rxvlan(ndev, + !!(features & NETIF_F_HW_VLAN_CTAG_RX)); + + if (changed & NETIF_F_HW_VLAN_CTAG_TX) + enetc_enable_txvlan(ndev, + !!(features & NETIF_F_HW_VLAN_CTAG_TX)); + if (changed & NETIF_F_HW_TC) enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h index 587974862f488..02efda266c468 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h @@ -531,22 +531,22 @@ struct enetc_msg_cmd_header {
/* Common H/W utility functions */
-static inline void enetc_enable_rxvlan(struct enetc_hw *hw, int si_idx, - bool en) +static inline void enetc_bdr_enable_rxvlan(struct enetc_hw *hw, int idx, + bool en) { - u32 val = enetc_rxbdr_rd(hw, si_idx, ENETC_RBMR); + u32 val = enetc_rxbdr_rd(hw, idx, ENETC_RBMR);
val = (val & ~ENETC_RBMR_VTE) | (en ? ENETC_RBMR_VTE : 0); - enetc_rxbdr_wr(hw, si_idx, ENETC_RBMR, val); + enetc_rxbdr_wr(hw, idx, ENETC_RBMR, val); }
-static inline void enetc_enable_txvlan(struct enetc_hw *hw, int si_idx, - bool en) +static inline void enetc_bdr_enable_txvlan(struct enetc_hw *hw, int idx, + bool en) { - u32 val = enetc_txbdr_rd(hw, si_idx, ENETC_TBMR); + u32 val = enetc_txbdr_rd(hw, idx, ENETC_TBMR);
val = (val & ~ENETC_TBMR_VIH) | (en ? ENETC_TBMR_VIH : 0); - enetc_txbdr_wr(hw, si_idx, ENETC_TBMR, val); + enetc_txbdr_wr(hw, idx, ENETC_TBMR, val); }
static inline void enetc_set_bdr_prio(struct enetc_hw *hw, int bdr_idx, diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c index eacd597b55f22..438648a06f2ae 100644 --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c @@ -667,15 +667,6 @@ static int enetc_pf_set_features(struct net_device *ndev, netdev_features_t features) { netdev_features_t changed = ndev->features ^ features; - struct enetc_ndev_priv *priv = netdev_priv(ndev); - - if (changed & NETIF_F_HW_VLAN_CTAG_RX) - enetc_enable_rxvlan(&priv->si->hw, 0, - !!(features & NETIF_F_HW_VLAN_CTAG_RX)); - - if (changed & NETIF_F_HW_VLAN_CTAG_TX) - enetc_enable_txvlan(&priv->si->hw, 0, - !!(features & NETIF_F_HW_VLAN_CTAG_TX));
if (changed & NETIF_F_LOOPBACK) enetc_set_loopback(ndev, !!(features & NETIF_F_LOOPBACK));
From: Zenghui Yu yuzenghui@huawei.com
[ Upstream commit 31dbb6b1d025506b3b8b8b74e9b697df47b9f696 ]
readx_poll_timeout() can sleep if @sleep_us is specified by the caller, and is therefore unsafe to be used inside the atomic context, which is this case when we use it to poll the GICR_VPENDBASER.Dirty bit in irq_set_vcpu_affinity() callback.
Let's convert to its atomic version instead which helps to get the v4.1 board back to life!
Fixes: 96806229ca03 ("irqchip/gic-v4.1: Add support for VPENDBASER's Dirty+Valid signaling") Signed-off-by: Zenghui Yu yuzenghui@huawei.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20200605052345.1494-1-yuzenghui@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-gic-v3-its.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 124251b0ccbae..b3e16a06c13b7 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -3681,10 +3681,10 @@ static void its_wait_vpt_parse_complete(void) if (!gic_rdists->has_vpend_valid_dirty) return;
- WARN_ON_ONCE(readq_relaxed_poll_timeout(vlpi_base + GICR_VPENDBASER, - val, - !(val & GICR_VPENDBASER_Dirty), - 10, 500)); + WARN_ON_ONCE(readq_relaxed_poll_timeout_atomic(vlpi_base + GICR_VPENDBASER, + val, + !(val & GICR_VPENDBASER_Dirty), + 10, 500)); }
static void its_vpe_schedule(struct its_vpe *vpe)
From: Mark Zhang markz@mellanox.com
[ Upstream commit c1d869d64a1955817c4d6fff08ecbbe8e59d36f8 ]
Query a dynamically-allocated counter before release it, to update it's hwcounters and log all of them into history data. Otherwise all values of these hwcounters will be lost.
Fixes: f34a55e497e8 ("RDMA/core: Get sum value of all counters when perform a sysfs stat read") Link: https://lore.kernel.org/r/20200621110000.56059-1-leon@kernel.org Signed-off-by: Mark Zhang markz@mellanox.com Reviewed-by: Maor Gottlieb maorg@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/counters.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/core/counters.c b/drivers/infiniband/core/counters.c index 2257d7f7810fd..738d1faf4bba5 100644 --- a/drivers/infiniband/core/counters.c +++ b/drivers/infiniband/core/counters.c @@ -202,7 +202,7 @@ static int __rdma_counter_unbind_qp(struct ib_qp *qp) return ret; }
-static void counter_history_stat_update(const struct rdma_counter *counter) +static void counter_history_stat_update(struct rdma_counter *counter) { struct ib_device *dev = counter->device; struct rdma_port_counter *port_counter; @@ -212,6 +212,8 @@ static void counter_history_stat_update(const struct rdma_counter *counter) if (!port_counter->hstats) return;
+ rdma_counter_query_stats(counter); + for (i = 0; i < counter->stats->num_counters; i++) port_counter->hstats->value[i] += counter->stats->value[i]; }
From: Dave Chinner dchinner@redhat.com
[ Upstream commit c7f87f3984cfa1e6d32806a715f35c5947ad9c09 ]
xlog_wait() on the CIL context can reference a freed context if the waiter doesn't get scheduled before the CIL context is freed. This can happen when a task is on the hard throttle and the CIL push aborts due to a shutdown. This was detected by generic/019:
thread 1 thread 2
__xfs_trans_commit xfs_log_commit_cil <CIL size over hard throttle limit> xlog_wait schedule xlog_cil_push_work wake_up_all <shutdown aborts commit> xlog_cil_committed kmem_free
remove_wait_queue spin_lock_irqsave --> UAF
Fix it by moving the wait queue to the CIL rather than keeping it in in the CIL context that gets freed on push completion. Because the wait queue is now independent of the CIL context and we might have multiple contexts in flight at once, only wake the waiters on the push throttle when the context we are pushing is over the hard throttle size threshold.
Fixes: 0e7ab7efe7745 ("xfs: Throttle commits on delayed background CIL push") Reported-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_log_cil.c | 10 +++++----- fs/xfs/xfs_log_priv.h | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index b43f0e8f43f2e..9ed90368ab311 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -671,7 +671,8 @@ xlog_cil_push_work( /* * Wake up any background push waiters now this context is being pushed. */ - wake_up_all(&ctx->push_wait); + if (ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) + wake_up_all(&cil->xc_push_wait);
/* * Check if we've anything to push. If there is nothing, then we don't @@ -743,13 +744,12 @@ xlog_cil_push_work(
/* * initialise the new context and attach it to the CIL. Then attach - * the current context to the CIL committing lsit so it can be found + * the current context to the CIL committing list so it can be found * during log forces to extract the commit lsn of the sequence that * needs to be forced. */ INIT_LIST_HEAD(&new_ctx->committing); INIT_LIST_HEAD(&new_ctx->busy_extents); - init_waitqueue_head(&new_ctx->push_wait); new_ctx->sequence = ctx->sequence + 1; new_ctx->cil = cil; cil->xc_ctx = new_ctx; @@ -937,7 +937,7 @@ xlog_cil_push_background( if (cil->xc_ctx->space_used >= XLOG_CIL_BLOCKING_SPACE_LIMIT(log)) { trace_xfs_log_cil_wait(log, cil->xc_ctx->ticket); ASSERT(cil->xc_ctx->space_used < log->l_logsize); - xlog_wait(&cil->xc_ctx->push_wait, &cil->xc_push_lock); + xlog_wait(&cil->xc_push_wait, &cil->xc_push_lock); return; }
@@ -1216,12 +1216,12 @@ xlog_cil_init( INIT_LIST_HEAD(&cil->xc_committing); spin_lock_init(&cil->xc_cil_lock); spin_lock_init(&cil->xc_push_lock); + init_waitqueue_head(&cil->xc_push_wait); init_rwsem(&cil->xc_ctx_lock); init_waitqueue_head(&cil->xc_commit_wait);
INIT_LIST_HEAD(&ctx->committing); INIT_LIST_HEAD(&ctx->busy_extents); - init_waitqueue_head(&ctx->push_wait); ctx->sequence = 1; ctx->cil = cil; cil->xc_ctx = ctx; diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h index ec22c7a3867f1..75a62870b63af 100644 --- a/fs/xfs/xfs_log_priv.h +++ b/fs/xfs/xfs_log_priv.h @@ -240,7 +240,6 @@ struct xfs_cil_ctx { struct xfs_log_vec *lv_chain; /* logvecs being pushed */ struct list_head iclog_entry; struct list_head committing; /* ctx committing list */ - wait_queue_head_t push_wait; /* background push throttle */ struct work_struct discard_endio_work; };
@@ -274,6 +273,7 @@ struct xfs_cil { wait_queue_head_t xc_commit_wait; xfs_lsn_t xc_current_sequence; struct work_struct xc_push_work; + wait_queue_head_t xc_push_wait; /* background push throttle */ } ____cacheline_aligned_in_smp;
/*
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 34a9c361dd480041d790fff3d6ea58513c8769e8 ]
When all hsr slave interfaces are removed, hsr interface doesn't work. At that moment, it's fine to remove an unused hsr interface automatically for saving resources. That's a common behavior of virtual interfaces.
Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_main.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c index 26d6c39f24e16..e2564de676038 100644 --- a/net/hsr/hsr_main.c +++ b/net/hsr/hsr_main.c @@ -15,12 +15,23 @@ #include "hsr_framereg.h" #include "hsr_slave.h"
+static bool hsr_slave_empty(struct hsr_priv *hsr) +{ + struct hsr_port *port; + + hsr_for_each_port(hsr, port) + if (port->type != HSR_PT_MASTER) + return false; + return true; +} + static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event, void *ptr) { - struct net_device *dev; struct hsr_port *port, *master; + struct net_device *dev; struct hsr_priv *hsr; + LIST_HEAD(list_kill); int mtu_max; int res;
@@ -85,8 +96,15 @@ static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event, master->dev->mtu = mtu_max; break; case NETDEV_UNREGISTER: - if (!is_hsr_master(dev)) + if (!is_hsr_master(dev)) { + master = hsr_port_get_hsr(port->hsr, HSR_PT_MASTER); hsr_del_port(port); + if (hsr_slave_empty(master->hsr)) { + unregister_netdevice_queue(master->dev, + &list_kill); + unregister_netdevice_many(&list_kill); + } + } break; case NETDEV_PRE_TYPE_CHANGE: /* HSR works only on Ethernet devices. Refuse slave to change
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit de0083c7ed7dba036d1ed6e012157649d45313c8 ]
When an interface is being deleted, "/proc/net/dev_snmp6/<interface name>" is deleted. The function for this is addrconf_ifdown() in the addrconf_notify() and it is called by notification, which is NETDEV_UNREGISTER. But, if NETDEV_CHANGEMTU is triggered after NETDEV_UNREGISTER, this proc file will be created again. This recreated proc file will be deleted by netdev_wati_allrefs(). Before netdev_wait_allrefs() is called, creating a new HSR interface routine can be executed and It tries to create a proc file but it will find an un-deleted proc file. At this point, it warns about it.
To avoid this situation, it can use ->dellink() instead of ->ndo_uninit() to release resources because ->dellink() is called before NETDEV_UNREGISTER. So, a proc file will not be recreated.
Test commands ip link add dummy0 type dummy ip link add dummy1 type dummy ip link set dummy0 mtu 1300
#SHELL1 while : do ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 done
#SHELL2 while : do ip link del hsr0 done
Splat looks like: [ 9888.980852][ T2752] proc_dir_entry 'dev_snmp6/hsr0' already registered [ 9888.981797][ C2] WARNING: CPU: 2 PID: 2752 at fs/proc/generic.c:372 proc_register+0x2d5/0x430 [ 9888.981798][ C2] Modules linked in: hsr dummy veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6x [ 9888.981814][ C2] CPU: 2 PID: 2752 Comm: ip Tainted: G W 5.8.0-rc1+ #616 [ 9888.981815][ C2] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 9888.981816][ C2] RIP: 0010:proc_register+0x2d5/0x430 [ 9888.981818][ C2] Code: fc ff df 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 65 01 00 00 49 8b b5 e0 00 00 00 48 89 ea 40 [ 9888.981819][ C2] RSP: 0018:ffff8880628dedf0 EFLAGS: 00010286 [ 9888.981821][ C2] RAX: dffffc0000000008 RBX: ffff888028c69170 RCX: ffffffffaae09a62 [ 9888.981822][ C2] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88806c9f75ac [ 9888.981823][ C2] RBP: ffff888028c693f4 R08: ffffed100d9401bd R09: ffffed100d9401bd [ 9888.981824][ C2] R10: ffffffffaddf406f R11: 0000000000000001 R12: ffff888028c69308 [ 9888.981825][ C2] R13: ffff8880663584c8 R14: dffffc0000000000 R15: ffffed100518d27e [ 9888.981827][ C2] FS: 00007f3876b3b0c0(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000 [ 9888.981828][ C2] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9888.981829][ C2] CR2: 00007f387601a8c0 CR3: 000000004101a002 CR4: 00000000000606e0 [ 9888.981830][ C2] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 9888.981831][ C2] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 9888.981832][ C2] Call Trace: [ 9888.981833][ C2] ? snmp6_seq_show+0x180/0x180 [ 9888.981834][ C2] proc_create_single_data+0x7c/0xa0 [ 9888.981835][ C2] snmp6_register_dev+0xb0/0x130 [ 9888.981836][ C2] ipv6_add_dev+0x4b7/0xf60 [ 9888.981837][ C2] addrconf_notify+0x684/0x1ca0 [ 9888.981838][ C2] ? __mutex_unlock_slowpath+0xd0/0x670 [ 9888.981839][ C2] ? kasan_unpoison_shadow+0x30/0x40 [ 9888.981840][ C2] ? wait_for_completion+0x250/0x250 [ 9888.981841][ C2] ? inet6_ifinfo_notify+0x100/0x100 [ 9888.981842][ C2] ? dropmon_net_event+0x227/0x410 [ 9888.981843][ C2] ? notifier_call_chain+0x90/0x160 [ 9888.981844][ C2] ? inet6_ifinfo_notify+0x100/0x100 [ 9888.981845][ C2] notifier_call_chain+0x90/0x160 [ 9888.981846][ C2] register_netdevice+0xbe5/0x1070 [ ... ]
Reported-by: syzbot+1d51c8b74efa4c44adeb@syzkaller.appspotmail.com Fixes: e0a4b99773d3 ("hsr: use upper/lower device infrastructure") Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/hsr/hsr_device.c | 21 +-------------------- net/hsr/hsr_device.h | 2 +- net/hsr/hsr_main.c | 9 ++++++--- net/hsr/hsr_netlink.c | 17 +++++++++++++++++ 4 files changed, 25 insertions(+), 24 deletions(-)
diff --git a/net/hsr/hsr_device.c b/net/hsr/hsr_device.c index fc7027314ad80..ef100cfd2ac1b 100644 --- a/net/hsr/hsr_device.c +++ b/net/hsr/hsr_device.c @@ -341,7 +341,7 @@ static void hsr_announce(struct timer_list *t) rcu_read_unlock(); }
-static void hsr_del_ports(struct hsr_priv *hsr) +void hsr_del_ports(struct hsr_priv *hsr) { struct hsr_port *port;
@@ -358,31 +358,12 @@ static void hsr_del_ports(struct hsr_priv *hsr) hsr_del_port(port); }
-/* This has to be called after all the readers are gone. - * Otherwise we would have to check the return value of - * hsr_port_get_hsr(). - */ -static void hsr_dev_destroy(struct net_device *hsr_dev) -{ - struct hsr_priv *hsr = netdev_priv(hsr_dev); - - hsr_debugfs_term(hsr); - hsr_del_ports(hsr); - - del_timer_sync(&hsr->prune_timer); - del_timer_sync(&hsr->announce_timer); - - hsr_del_self_node(hsr); - hsr_del_nodes(&hsr->node_db); -} - static const struct net_device_ops hsr_device_ops = { .ndo_change_mtu = hsr_dev_change_mtu, .ndo_open = hsr_dev_open, .ndo_stop = hsr_dev_close, .ndo_start_xmit = hsr_dev_xmit, .ndo_fix_features = hsr_fix_features, - .ndo_uninit = hsr_dev_destroy, };
static struct device_type hsr_type = { diff --git a/net/hsr/hsr_device.h b/net/hsr/hsr_device.h index a099d7de7e790..b8f9262ed101a 100644 --- a/net/hsr/hsr_device.h +++ b/net/hsr/hsr_device.h @@ -11,6 +11,7 @@ #include <linux/netdevice.h> #include "hsr_main.h"
+void hsr_del_ports(struct hsr_priv *hsr); void hsr_dev_setup(struct net_device *dev); int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], unsigned char multicast_spec, u8 protocol_version, @@ -18,5 +19,4 @@ int hsr_dev_finalize(struct net_device *hsr_dev, struct net_device *slave[2], void hsr_check_carrier_and_operstate(struct hsr_priv *hsr); bool is_hsr_master(struct net_device *dev); int hsr_get_max_mtu(struct hsr_priv *hsr); - #endif /* __HSR_DEVICE_H */ diff --git a/net/hsr/hsr_main.c b/net/hsr/hsr_main.c index e2564de676038..144da15f0a817 100644 --- a/net/hsr/hsr_main.c +++ b/net/hsr/hsr_main.c @@ -6,6 +6,7 @@ */
#include <linux/netdevice.h> +#include <net/rtnetlink.h> #include <linux/rculist.h> #include <linux/timer.h> #include <linux/etherdevice.h> @@ -100,8 +101,10 @@ static int hsr_netdev_notify(struct notifier_block *nb, unsigned long event, master = hsr_port_get_hsr(port->hsr, HSR_PT_MASTER); hsr_del_port(port); if (hsr_slave_empty(master->hsr)) { - unregister_netdevice_queue(master->dev, - &list_kill); + const struct rtnl_link_ops *ops; + + ops = master->dev->rtnl_link_ops; + ops->dellink(master->dev, &list_kill); unregister_netdevice_many(&list_kill); } } @@ -144,9 +147,9 @@ static int __init hsr_init(void)
static void __exit hsr_exit(void) { - unregister_netdevice_notifier(&hsr_nb); hsr_netlink_exit(); hsr_debugfs_remove_root(); + unregister_netdevice_notifier(&hsr_nb); }
module_init(hsr_init); diff --git a/net/hsr/hsr_netlink.c b/net/hsr/hsr_netlink.c index 1decb25f6764a..6e14b7d226399 100644 --- a/net/hsr/hsr_netlink.c +++ b/net/hsr/hsr_netlink.c @@ -83,6 +83,22 @@ static int hsr_newlink(struct net *src_net, struct net_device *dev, return hsr_dev_finalize(dev, link, multicast_spec, hsr_version, extack); }
+static void hsr_dellink(struct net_device *dev, struct list_head *head) +{ + struct hsr_priv *hsr = netdev_priv(dev); + + del_timer_sync(&hsr->prune_timer); + del_timer_sync(&hsr->announce_timer); + + hsr_debugfs_term(hsr); + hsr_del_ports(hsr); + + hsr_del_self_node(hsr); + hsr_del_nodes(&hsr->node_db); + + unregister_netdevice_queue(dev, head); +} + static int hsr_fill_info(struct sk_buff *skb, const struct net_device *dev) { struct hsr_priv *hsr = netdev_priv(dev); @@ -118,6 +134,7 @@ static struct rtnl_link_ops hsr_link_ops __read_mostly = { .priv_size = sizeof(struct hsr_priv), .setup = hsr_dev_setup, .newlink = hsr_newlink, + .dellink = hsr_dellink, .fill_info = hsr_fill_info, };
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit 589b1c9c166dce120e27b32a83a78f55464a7ef9 ]
Use get_unaligned_be64() to fetch the timestamp needed for ns_to_ktime() conversion.
Fixes following sparse warning: sge.c:3282:43: warning: cast to restricted __be64
Fixes: a456950445a0 ("cxgb4: time stamping interface for PTP") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/sge.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c b/drivers/net/ethernet/chelsio/cxgb4/sge.c index db8106d9d6edf..28ce9856a0784 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/sge.c +++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c @@ -3300,7 +3300,7 @@ static noinline int t4_systim_to_hwstamp(struct adapter *adapter,
hwtstamps = skb_hwtstamps(skb); memset(hwtstamps, 0, sizeof(*hwtstamps)); - hwtstamps->hwtstamp = ns_to_ktime(be64_to_cpu(*((u64 *)data))); + hwtstamps->hwtstamp = ns_to_ktime(get_unaligned_be64(data));
return RX_PTP_PKT_SUC; }
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit 27f78cb245abdb86735529c13b0a579f57829e71 ]
TC-U32 passes all keys values and masks in __be32 format. The parser already expects this and hence pass the value and masks in __be32 natively to the parser.
Fixes following sparse warnings in several places: cxgb4_tc_u32.c:57:21: warning: incorrect type in assignment (different base types) cxgb4_tc_u32.c:57:21: expected unsigned int [usertype] val cxgb4_tc_u32.c:57:21: got restricted __be32 [usertype] val cxgb4_tc_u32_parse.h:48:24: warning: cast to restricted __be32
Fixes: 2e8aad7bf203 ("cxgb4: add parser to translate u32 filters to internal spec") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c | 18 +-- .../chelsio/cxgb4/cxgb4_tc_u32_parse.h | 122 ++++++++++++------ 2 files changed, 91 insertions(+), 49 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c index 3f3c11e54d970..dede02505ceb5 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32.c @@ -48,7 +48,7 @@ static int fill_match_fields(struct adapter *adap, bool next_header) { unsigned int i, j; - u32 val, mask; + __be32 val, mask; int off, err; bool found;
@@ -228,7 +228,7 @@ int cxgb4_config_knode(struct net_device *dev, struct tc_cls_u32_offload *cls) const struct cxgb4_next_header *next; bool found = false; unsigned int i, j; - u32 val, mask; + __be32 val, mask; int off;
if (t->table[link_uhtid - 1].link_handle) { @@ -242,10 +242,10 @@ int cxgb4_config_knode(struct net_device *dev, struct tc_cls_u32_offload *cls)
/* Try to find matches that allow jumps to next header. */ for (i = 0; next[i].jump; i++) { - if (next[i].offoff != cls->knode.sel->offoff || - next[i].shift != cls->knode.sel->offshift || - next[i].mask != cls->knode.sel->offmask || - next[i].offset != cls->knode.sel->off) + if (next[i].sel.offoff != cls->knode.sel->offoff || + next[i].sel.offshift != cls->knode.sel->offshift || + next[i].sel.offmask != cls->knode.sel->offmask || + next[i].sel.off != cls->knode.sel->off) continue;
/* Found a possible candidate. Find a key that @@ -257,9 +257,9 @@ int cxgb4_config_knode(struct net_device *dev, struct tc_cls_u32_offload *cls) val = cls->knode.sel->keys[j].val; mask = cls->knode.sel->keys[j].mask;
- if (next[i].match_off == off && - next[i].match_val == val && - next[i].match_mask == mask) { + if (next[i].key.off == off && + next[i].key.val == val && + next[i].key.mask == mask) { found = true; break; } diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32_parse.h b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32_parse.h index 125868c6770a2..f59dd4b2ae6f9 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32_parse.h +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_u32_parse.h @@ -38,12 +38,12 @@ struct cxgb4_match_field { int off; /* Offset from the beginning of the header to match */ /* Fill the value/mask pair in the spec if matched */ - int (*val)(struct ch_filter_specification *f, u32 val, u32 mask); + int (*val)(struct ch_filter_specification *f, __be32 val, __be32 mask); };
/* IPv4 match fields */ static inline int cxgb4_fill_ipv4_tos(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { f->val.tos = (ntohl(val) >> 16) & 0x000000FF; f->mask.tos = (ntohl(mask) >> 16) & 0x000000FF; @@ -52,7 +52,7 @@ static inline int cxgb4_fill_ipv4_tos(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv4_frag(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { u32 mask_val; u8 frag_val; @@ -74,7 +74,7 @@ static inline int cxgb4_fill_ipv4_frag(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv4_proto(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { f->val.proto = (ntohl(val) >> 16) & 0x000000FF; f->mask.proto = (ntohl(mask) >> 16) & 0x000000FF; @@ -83,7 +83,7 @@ static inline int cxgb4_fill_ipv4_proto(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv4_src_ip(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.fip[0], &val, sizeof(u32)); memcpy(&f->mask.fip[0], &mask, sizeof(u32)); @@ -92,7 +92,7 @@ static inline int cxgb4_fill_ipv4_src_ip(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv4_dst_ip(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.lip[0], &val, sizeof(u32)); memcpy(&f->mask.lip[0], &mask, sizeof(u32)); @@ -111,7 +111,7 @@ static const struct cxgb4_match_field cxgb4_ipv4_fields[] = {
/* IPv6 match fields */ static inline int cxgb4_fill_ipv6_tos(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { f->val.tos = (ntohl(val) >> 20) & 0x000000FF; f->mask.tos = (ntohl(mask) >> 20) & 0x000000FF; @@ -120,7 +120,7 @@ static inline int cxgb4_fill_ipv6_tos(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_proto(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { f->val.proto = (ntohl(val) >> 8) & 0x000000FF; f->mask.proto = (ntohl(mask) >> 8) & 0x000000FF; @@ -129,7 +129,7 @@ static inline int cxgb4_fill_ipv6_proto(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_src_ip0(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.fip[0], &val, sizeof(u32)); memcpy(&f->mask.fip[0], &mask, sizeof(u32)); @@ -138,7 +138,7 @@ static inline int cxgb4_fill_ipv6_src_ip0(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_src_ip1(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.fip[4], &val, sizeof(u32)); memcpy(&f->mask.fip[4], &mask, sizeof(u32)); @@ -147,7 +147,7 @@ static inline int cxgb4_fill_ipv6_src_ip1(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_src_ip2(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.fip[8], &val, sizeof(u32)); memcpy(&f->mask.fip[8], &mask, sizeof(u32)); @@ -156,7 +156,7 @@ static inline int cxgb4_fill_ipv6_src_ip2(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_src_ip3(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.fip[12], &val, sizeof(u32)); memcpy(&f->mask.fip[12], &mask, sizeof(u32)); @@ -165,7 +165,7 @@ static inline int cxgb4_fill_ipv6_src_ip3(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_dst_ip0(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.lip[0], &val, sizeof(u32)); memcpy(&f->mask.lip[0], &mask, sizeof(u32)); @@ -174,7 +174,7 @@ static inline int cxgb4_fill_ipv6_dst_ip0(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_dst_ip1(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.lip[4], &val, sizeof(u32)); memcpy(&f->mask.lip[4], &mask, sizeof(u32)); @@ -183,7 +183,7 @@ static inline int cxgb4_fill_ipv6_dst_ip1(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_dst_ip2(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.lip[8], &val, sizeof(u32)); memcpy(&f->mask.lip[8], &mask, sizeof(u32)); @@ -192,7 +192,7 @@ static inline int cxgb4_fill_ipv6_dst_ip2(struct ch_filter_specification *f, }
static inline int cxgb4_fill_ipv6_dst_ip3(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { memcpy(&f->val.lip[12], &val, sizeof(u32)); memcpy(&f->mask.lip[12], &mask, sizeof(u32)); @@ -216,7 +216,7 @@ static const struct cxgb4_match_field cxgb4_ipv6_fields[] = {
/* TCP/UDP match */ static inline int cxgb4_fill_l4_ports(struct ch_filter_specification *f, - u32 val, u32 mask) + __be32 val, __be32 mask) { f->val.fport = ntohl(val) >> 16; f->mask.fport = ntohl(mask) >> 16; @@ -237,19 +237,13 @@ static const struct cxgb4_match_field cxgb4_udp_fields[] = { };
struct cxgb4_next_header { - unsigned int offset; /* Offset to next header */ - /* offset, shift, and mask added to offset above + /* Offset, shift, and mask added to beginning of the header * to get to next header. Useful when using a header * field's value to jump to next header such as IHL field * in IPv4 header. */ - unsigned int offoff; - u32 shift; - u32 mask; - /* match criteria to make this jump */ - unsigned int match_off; - u32 match_val; - u32 match_mask; + struct tc_u32_sel sel; + struct tc_u32_key key; /* location of jump to make */ const struct cxgb4_match_field *jump; }; @@ -258,26 +252,74 @@ struct cxgb4_next_header { * IPv4 header. */ static const struct cxgb4_next_header cxgb4_ipv4_jumps[] = { - { .offset = 0, .offoff = 0, .shift = 6, .mask = 0xF, - .match_off = 8, .match_val = 0x600, .match_mask = 0xFF00, - .jump = cxgb4_tcp_fields }, - { .offset = 0, .offoff = 0, .shift = 6, .mask = 0xF, - .match_off = 8, .match_val = 0x1100, .match_mask = 0xFF00, - .jump = cxgb4_udp_fields }, - { .jump = NULL } + { + /* TCP Jump */ + .sel = { + .off = 0, + .offoff = 0, + .offshift = 6, + .offmask = cpu_to_be16(0x0f00), + }, + .key = { + .off = 8, + .val = cpu_to_be32(0x00060000), + .mask = cpu_to_be32(0x00ff0000), + }, + .jump = cxgb4_tcp_fields, + }, + { + /* UDP Jump */ + .sel = { + .off = 0, + .offoff = 0, + .offshift = 6, + .offmask = cpu_to_be16(0x0f00), + }, + .key = { + .off = 8, + .val = cpu_to_be32(0x00110000), + .mask = cpu_to_be32(0x00ff0000), + }, + .jump = cxgb4_udp_fields, + }, + { .jump = NULL }, };
/* Accept a rule with a jump directly past the 40 Bytes of IPv6 fixed header * to get to transport layer header. */ static const struct cxgb4_next_header cxgb4_ipv6_jumps[] = { - { .offset = 0x28, .offoff = 0, .shift = 0, .mask = 0, - .match_off = 4, .match_val = 0x60000, .match_mask = 0xFF0000, - .jump = cxgb4_tcp_fields }, - { .offset = 0x28, .offoff = 0, .shift = 0, .mask = 0, - .match_off = 4, .match_val = 0x110000, .match_mask = 0xFF0000, - .jump = cxgb4_udp_fields }, - { .jump = NULL } + { + /* TCP Jump */ + .sel = { + .off = 40, + .offoff = 0, + .offshift = 0, + .offmask = 0, + }, + .key = { + .off = 4, + .val = cpu_to_be32(0x00000600), + .mask = cpu_to_be32(0x0000ff00), + }, + .jump = cxgb4_tcp_fields, + }, + { + /* UDP Jump */ + .sel = { + .off = 40, + .offoff = 0, + .offshift = 0, + .offmask = 0, + }, + .key = { + .off = 4, + .val = cpu_to_be32(0x00001100), + .mask = cpu_to_be32(0x0000ff00), + }, + .jump = cxgb4_udp_fields, + }, + { .jump = NULL }, };
struct cxgb4_link {
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit 63b53b0b99cd5f2d9754a21eda2ed8e706646cc9 ]
The source and destination L4 ports in filter offload need to be in CPU endian. They will finally be converted to Big Endian after all operations are done and before giving them to hardware. The L4 ports for NAT are expected to be passed as a byte stream TCB. So, treat them as such.
Fixes following sparse warnings in several places: cxgb4_tc_flower.c:159:33: warning: cast from restricted __be16 cxgb4_tc_flower.c:159:33: warning: incorrect type in argument 1 (different base types) cxgb4_tc_flower.c:159:33: expected unsigned short [usertype] val cxgb4_tc_flower.c:159:33: got restricted __be16 [usertype] dst
Fixes: dca4faeb812f ("cxgb4: Add LE hash collision bug fix path in LLD driver") Fixes: 62488e4b53ae ("cxgb4: add basic tc flower offload support") Fixes: 557ccbf9dfa8 ("cxgb4: add tc flower support for L3/L4 rewrite") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 15 +++++++--- .../net/ethernet/chelsio/cxgb4/cxgb4_main.c | 2 +- .../ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 30 +++++++------------ 3 files changed, 22 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c index 796555255207c..c6bf2648fe420 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -165,6 +165,9 @@ static void set_nat_params(struct adapter *adap, struct filter_entry *f, unsigned int tid, bool dip, bool sip, bool dp, bool sp) { + u8 *nat_lp = (u8 *)&f->fs.nat_lport; + u8 *nat_fp = (u8 *)&f->fs.nat_fport; + if (dip) { if (f->fs.type) { set_tcb_field(adap, f, tid, TCB_SND_UNA_RAW_W, @@ -236,8 +239,9 @@ static void set_nat_params(struct adapter *adap, struct filter_entry *f, }
set_tcb_field(adap, f, tid, TCB_PDU_HDR_LEN_W, WORD_MASK, - (dp ? f->fs.nat_lport : 0) | - (sp ? f->fs.nat_fport << 16 : 0), 1); + (dp ? (nat_lp[1] | nat_lp[0] << 8) : 0) | + (sp ? (nat_fp[1] << 16 | nat_fp[0] << 24) : 0), + 1); }
/* Validate filter spec against configuration done on the card. */ @@ -909,6 +913,9 @@ int set_filter_wr(struct adapter *adapter, int fidx) fwr->fpm = htons(f->fs.mask.fport);
if (adapter->params.filter2_wr_support) { + u8 *nat_lp = (u8 *)&f->fs.nat_lport; + u8 *nat_fp = (u8 *)&f->fs.nat_fport; + fwr->natmode_to_ulp_type = FW_FILTER2_WR_ULP_TYPE_V(f->fs.nat_mode ? ULP_MODE_TCPDDP : @@ -916,8 +923,8 @@ int set_filter_wr(struct adapter *adapter, int fidx) FW_FILTER2_WR_NATMODE_V(f->fs.nat_mode); memcpy(fwr->newlip, f->fs.nat_lip, sizeof(fwr->newlip)); memcpy(fwr->newfip, f->fs.nat_fip, sizeof(fwr->newfip)); - fwr->newlport = htons(f->fs.nat_lport); - fwr->newfport = htons(f->fs.nat_fport); + fwr->newlport = htons(nat_lp[1] | nat_lp[0] << 8); + fwr->newfport = htons(nat_fp[1] | nat_fp[0] << 8); }
/* Mark the filter as "pending" and ship off the Filter Work Request. diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c index a70018f067aa8..e8934c48f09b6 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c @@ -2604,7 +2604,7 @@ int cxgb4_create_server_filter(const struct net_device *dev, unsigned int stid,
/* Clear out filter specifications */ memset(&f->fs, 0, sizeof(struct ch_filter_specification)); - f->fs.val.lport = cpu_to_be16(sport); + f->fs.val.lport = be16_to_cpu(sport); f->fs.mask.lport = ~0; val = (u8 *)&sip; if ((val[0] | val[1] | val[2] | val[3]) != 0) { diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c index 4a5fa9eba0b64..59b65d4db086e 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c @@ -58,10 +58,6 @@ static struct ch_tc_pedit_fields pedits[] = { PEDIT_FIELDS(IP6_, DST_63_32, 4, nat_lip, 4), PEDIT_FIELDS(IP6_, DST_95_64, 4, nat_lip, 8), PEDIT_FIELDS(IP6_, DST_127_96, 4, nat_lip, 12), - PEDIT_FIELDS(TCP_, SPORT, 2, nat_fport, 0), - PEDIT_FIELDS(TCP_, DPORT, 2, nat_lport, 0), - PEDIT_FIELDS(UDP_, SPORT, 2, nat_fport, 0), - PEDIT_FIELDS(UDP_, DPORT, 2, nat_lport, 0), };
static struct ch_tc_flower_entry *allocate_flower_entry(void) @@ -156,14 +152,14 @@ static void cxgb4_process_flow_match(struct net_device *dev, struct flow_match_ports match;
flow_rule_match_ports(rule, &match); - fs->val.lport = cpu_to_be16(match.key->dst); - fs->mask.lport = cpu_to_be16(match.mask->dst); - fs->val.fport = cpu_to_be16(match.key->src); - fs->mask.fport = cpu_to_be16(match.mask->src); + fs->val.lport = be16_to_cpu(match.key->dst); + fs->mask.lport = be16_to_cpu(match.mask->dst); + fs->val.fport = be16_to_cpu(match.key->src); + fs->mask.fport = be16_to_cpu(match.mask->src);
/* also initialize nat_lport/fport to same values */ - fs->nat_lport = cpu_to_be16(match.key->dst); - fs->nat_fport = cpu_to_be16(match.key->src); + fs->nat_lport = fs->val.lport; + fs->nat_fport = fs->val.fport; }
if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_IP)) { @@ -354,12 +350,9 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val, switch (offset) { case PEDIT_TCP_SPORT_DPORT: if (~mask & PEDIT_TCP_UDP_SPORT_MASK) - offload_pedit(fs, cpu_to_be32(val) >> 16, - cpu_to_be32(mask) >> 16, - TCP_SPORT); + fs->nat_fport = val; else - offload_pedit(fs, cpu_to_be32(val), - cpu_to_be32(mask), TCP_DPORT); + fs->nat_lport = val >> 16; } fs->nat_mode = NAT_MODE_ALL; break; @@ -367,12 +360,9 @@ static void process_pedit_field(struct ch_filter_specification *fs, u32 val, switch (offset) { case PEDIT_UDP_SPORT_DPORT: if (~mask & PEDIT_TCP_UDP_SPORT_MASK) - offload_pedit(fs, cpu_to_be32(val) >> 16, - cpu_to_be32(mask) >> 16, - UDP_SPORT); + fs->nat_fport = val; else - offload_pedit(fs, cpu_to_be32(val), - cpu_to_be32(mask), UDP_DPORT); + fs->nat_lport = val >> 16; } fs->nat_mode = NAT_MODE_ALL; }
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit f286dd8eaad5a2758750f407ab079298e0bcc8a5 ]
Use correct type to check for all-mask exact match IP addresses.
Fixes following sparse warnings due to big endian value checks against 0xffffffff in is_addr_all_mask(): cxgb4_filter.c:977:25: warning: restricted __be32 degrades to integer cxgb4_filter.c:983:37: warning: restricted __be32 degrades to integer cxgb4_filter.c:984:37: warning: restricted __be32 degrades to integer cxgb4_filter.c:985:37: warning: restricted __be32 degrades to integer cxgb4_filter.c:986:37: warning: restricted __be32 degrades to integer
Fixes: 3eb8b62d5a26 ("cxgb4: add support to create hash-filters via tc-flower offload") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c index c6bf2648fe420..7a7f61a8cdf40 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -1112,16 +1112,16 @@ static bool is_addr_all_mask(u8 *ipmask, int family) struct in_addr *addr;
addr = (struct in_addr *)ipmask; - if (addr->s_addr == 0xffffffff) + if (ntohl(addr->s_addr) == 0xffffffff) return true; } else if (family == AF_INET6) { struct in6_addr *addr6;
addr6 = (struct in6_addr *)ipmask; - if (addr6->s6_addr32[0] == 0xffffffff && - addr6->s6_addr32[1] == 0xffffffff && - addr6->s6_addr32[2] == 0xffffffff && - addr6->s6_addr32[3] == 0xffffffff) + if (ntohl(addr6->s6_addr32[0]) == 0xffffffff && + ntohl(addr6->s6_addr32[1]) == 0xffffffff && + ntohl(addr6->s6_addr32[2]) == 0xffffffff && + ntohl(addr6->s6_addr32[3]) == 0xffffffff) return true; } return false;
From: Rahul Lakkireddy rahul.lakkireddy@chelsio.com
[ Upstream commit 1992ded5d111997877a9a25205976d8d03c46814 ]
The data in destination buffer is expected to be be parsed in big endian. So, use the right context.
Fixes following sparse warning: cudbg_lib.c:2041:44: warning: incorrect type in assignment (different base types) cudbg_lib.c:2041:44: expected unsigned long long [usertype] cudbg_lib.c:2041:44: got restricted __be64 [usertype]
Fixes: 736c3b94474e ("cxgb4: collect egress and ingress SGE queue contexts") Signed-off-by: Rahul Lakkireddy rahul.lakkireddy@chelsio.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c index 7b9cd69f98440..d8ab8e366818c 100644 --- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c @@ -1975,7 +1975,6 @@ int cudbg_collect_dump_context(struct cudbg_init *pdbg_init, u8 mem_type[CTXT_INGRESS + 1] = { 0 }; struct cudbg_buffer temp_buff = { 0 }; struct cudbg_ch_cntxt *buff; - u64 *dst_off, *src_off; u8 *ctx_buf; u8 i, k; int rc; @@ -2044,8 +2043,11 @@ int cudbg_collect_dump_context(struct cudbg_init *pdbg_init, }
for (j = 0; j < max_ctx_qid; j++) { + __be64 *dst_off; + u64 *src_off; + src_off = (u64 *)(ctx_buf + j * SGE_CTXT_SIZE); - dst_off = (u64 *)buff->data; + dst_off = (__be64 *)buff->data;
/* The data is stored in 64-bit cpu order. Convert it * to big endian before parsing.
From: KP Singh kpsingh@google.com
[ Upstream commit 23e390cdbe6f85827a43d38f9288dcd3066fa376 ]
inode_copy_up_xattr returns 0 to indicate the acceptance of the xattr and 1 to reject it. If the LSM does not know about the xattr, it's expected to return -EOPNOTSUPP, which is the correct default value for this hook. BPF LSM, currently, uses 0 as the default value and thereby falsely allows all overlay fs xattributes to be copied up.
The iteration logic is also updated from the "bail-on-fail" call_int_hook to continue on the non-decisive -EOPNOTSUPP and bail out on other values.
Fixes: 98e828a0650f ("security: Refactor declaration of LSM hooks") Signed-off-by: KP Singh kpsingh@google.com Signed-off-by: James Morris jmorris@namei.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/lsm_hook_defs.h | 2 +- security/security.c | 17 ++++++++++++++++- 2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 5616b2567aa7f..c2d073c49bf8a 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -149,7 +149,7 @@ LSM_HOOK(int, 0, inode_listsecurity, struct inode *inode, char *buffer, size_t buffer_size) LSM_HOOK(void, LSM_RET_VOID, inode_getsecid, struct inode *inode, u32 *secid) LSM_HOOK(int, 0, inode_copy_up, struct dentry *src, struct cred **new) -LSM_HOOK(int, 0, inode_copy_up_xattr, const char *name) +LSM_HOOK(int, -EOPNOTSUPP, inode_copy_up_xattr, const char *name) LSM_HOOK(int, 0, kernfs_init_security, struct kernfs_node *kn_dir, struct kernfs_node *kn) LSM_HOOK(int, 0, file_permission, struct file *file, int mask) diff --git a/security/security.c b/security/security.c index 51de970fbb1ed..8b4d342ade5e1 100644 --- a/security/security.c +++ b/security/security.c @@ -1409,7 +1409,22 @@ EXPORT_SYMBOL(security_inode_copy_up);
int security_inode_copy_up_xattr(const char *name) { - return call_int_hook(inode_copy_up_xattr, -EOPNOTSUPP, name); + struct security_hook_list *hp; + int rc; + + /* + * The implementation can return 0 (accept the xattr), 1 (discard the + * xattr), -EOPNOTSUPP if it does not know anything about the xattr or + * any other error code incase of an error. + */ + hlist_for_each_entry(hp, + &security_hook_heads.inode_copy_up_xattr, list) { + rc = hp->hook.inode_copy_up_xattr(name); + if (rc != LSM_RET_DEFAULT(inode_copy_up_xattr)) + return rc; + } + + return LSM_RET_DEFAULT(inode_copy_up_xattr); } EXPORT_SYMBOL(security_inode_copy_up_xattr);
From: Chu Lin linchuyuan@google.com
[ Upstream commit 016983d138cbe99a5c0aaae0103ee88f5300beb3 ]
Per the datasheet for max6697, OVERT mask and ALERT mask are different. For example, the 7th bit of OVERT is the local channel but for alert mask, the 6th bit is the local channel. Therefore, we can't apply the same mask for both registers. In addition to that, the max6697 driver is supposed to be compatibale with different models. I manually went over all the listed chips and made sure all chip types have the same layout.
Testing; mask value of 0x9 should map to 0x44 for ALERT and 0x84 for OVERT. I used iotool to read the reg value back to verify. I only tested this change on max6581.
Reference: https://datasheets.maximintegrated.com/en/ds/MAX6581.pdf https://datasheets.maximintegrated.com/en/ds/MAX6697.pdf https://datasheets.maximintegrated.com/en/ds/MAX6699.pdf
Signed-off-by: Chu Lin linchuyuan@google.com Fixes: 5372d2d71c46e ("hwmon: Driver for Maxim MAX6697 and compatibles") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/max6697.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/hwmon/max6697.c b/drivers/hwmon/max6697.c index 743752a2467a2..64122eb38060d 100644 --- a/drivers/hwmon/max6697.c +++ b/drivers/hwmon/max6697.c @@ -38,8 +38,9 @@ static const u8 MAX6697_REG_CRIT[] = { * Map device tree / platform data register bit map to chip bit map. * Applies to alert register and over-temperature register. */ -#define MAX6697_MAP_BITS(reg) ((((reg) & 0x7e) >> 1) | \ +#define MAX6697_ALERT_MAP_BITS(reg) ((((reg) & 0x7e) >> 1) | \ (((reg) & 0x01) << 6) | ((reg) & 0x80)) +#define MAX6697_OVERT_MAP_BITS(reg) (((reg) >> 1) | (((reg) & 0x01) << 7))
#define MAX6697_REG_STAT(n) (0x44 + (n))
@@ -562,12 +563,12 @@ static int max6697_init_chip(struct max6697_data *data, return ret;
ret = i2c_smbus_write_byte_data(client, MAX6697_REG_ALERT_MASK, - MAX6697_MAP_BITS(pdata->alert_mask)); + MAX6697_ALERT_MAP_BITS(pdata->alert_mask)); if (ret < 0) return ret;
ret = i2c_smbus_write_byte_data(client, MAX6697_REG_OVERT_MASK, - MAX6697_MAP_BITS(pdata->over_temperature_mask)); + MAX6697_OVERT_MAP_BITS(pdata->over_temperature_mask)); if (ret < 0) return ret;
From: Misono Tomohiro misono.tomohiro@jp.fujitsu.com
[ Upstream commit 8b97f9922211c44a739c5cbd9502ecbb9f17f6d1 ]
Although it rarely happens, we should call free_capabilities() if error happens after read_capabilities() to free allocated strings.
Fixes: de584afa5e188 ("hwmon driver for ACPI 4.0 power meters") Signed-off-by: Misono Tomohiro misono.tomohiro@jp.fujitsu.com Link: https://lore.kernel.org/r/20200625043242.31175-1-misono.tomohiro@jp.fujitsu.... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hwmon/acpi_power_meter.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/hwmon/acpi_power_meter.c b/drivers/hwmon/acpi_power_meter.c index 0db8ef4fd6e18..a270b975e90bb 100644 --- a/drivers/hwmon/acpi_power_meter.c +++ b/drivers/hwmon/acpi_power_meter.c @@ -883,7 +883,7 @@ static int acpi_power_meter_add(struct acpi_device *device)
res = setup_attrs(resource); if (res) - goto exit_free; + goto exit_free_capability;
resource->hwmon_dev = hwmon_device_register(&device->dev); if (IS_ERR(resource->hwmon_dev)) { @@ -896,6 +896,8 @@ static int acpi_power_meter_add(struct acpi_device *device)
exit_remove: remove_attrs(resource); +exit_free_capability: + free_capabilities(resource); exit_free: kfree(resource); exit:
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 1fc98aaf7f85fadcca57c4a86ef17e1940cad2d3 ]
This code doesn't make sense unless the correct "fcport" was found.
Link: https://lore.kernel.org/r/20200619143041.GD267142@mwanda Fixes: 9dd9686b1419 ("scsi: qla2xxx: Add changes for devloss timeout in driver") Reviewed-by: Himanshu Madhani himanshu.madhani@oracle.com Reviewed-by: Shyam Sundar ssundar@marvell.com Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/qla2xxx/qla_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/qla2xxx/qla_init.c b/drivers/scsi/qla2xxx/qla_init.c index caa6b840e4594..cfbb4294fb8bb 100644 --- a/drivers/scsi/qla2xxx/qla_init.c +++ b/drivers/scsi/qla2xxx/qla_init.c @@ -5933,7 +5933,7 @@ qla2x00_find_all_fabric_devs(scsi_qla_host_t *vha) break; }
- if (NVME_TARGET(vha->hw, fcport)) { + if (found && NVME_TARGET(vha->hw, fcport)) { if (fcport->disc_state == DSC_DELETE_PEND) { qla2x00_set_fcport_disc_state(fcport, DSC_GNL); vha->fcport_count--;
From: Michael Kao michael.kao@mediatek.com
[ Upstream commit 14533a5a6c12e8d7de79d309d4085bf186058fe1 ]
MT8183_NUM_ZONES should be set to 1 because MT8183 doesn't have multiple banks.
Fixes: a4ffe6b52d27 ("thermal: mediatek: add support for MT8183") Signed-off-by: Michael Kao michael.kao@mediatek.com Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20200323121537.22697-6-michael.kao@mediatek.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/mtk_thermal.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/thermal/mtk_thermal.c b/drivers/thermal/mtk_thermal.c index 76e30603d4d58..6b7ef1993d7e2 100644 --- a/drivers/thermal/mtk_thermal.c +++ b/drivers/thermal/mtk_thermal.c @@ -211,6 +211,9 @@ enum { /* The total number of temperature sensors in the MT8183 */ #define MT8183_NUM_SENSORS 6
+/* The number of banks in the MT8183 */ +#define MT8183_NUM_ZONES 1 + /* The number of sensing points per bank */ #define MT8183_NUM_SENSORS_PER_ZONE 6
@@ -497,7 +500,7 @@ static const struct mtk_thermal_data mt7622_thermal_data = { */ static const struct mtk_thermal_data mt8183_thermal_data = { .auxadc_channel = MT8183_TEMP_AUXADC_CHANNEL, - .num_banks = MT8183_NUM_SENSORS_PER_ZONE, + .num_banks = MT8183_NUM_ZONES, .num_sensors = MT8183_NUM_SENSORS, .vts_index = mt8183_vts_index, .cali_val = MT8183_CALIBRATION,
From: Tiezhu Yang yangtiezhu@loongson.cn
[ Upstream commit b4147917ad4ff2c755e01a7ca296b14030d2d507 ]
When call function devm_platform_ioremap_resource(), we should use IS_ERR() to check the return value and return PTR_ERR() if failed.
Fixes: 554fdbaf19b1 ("thermal: sprd: Add Spreadtrum thermal driver support") Signed-off-by: Tiezhu Yang yangtiezhu@loongson.cn Reviewed-by: Baolin Wang baolin.wang7@gmail.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/1590371941-25430-1-git-send-email-yangtiezhu@loong... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/sprd_thermal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/sprd_thermal.c b/drivers/thermal/sprd_thermal.c index a340374e8c51a..4cde70dcf6556 100644 --- a/drivers/thermal/sprd_thermal.c +++ b/drivers/thermal/sprd_thermal.c @@ -348,8 +348,8 @@ static int sprd_thm_probe(struct platform_device *pdev)
thm->var_data = pdata; thm->base = devm_platform_ioremap_resource(pdev, 0); - if (!thm->base) - return -ENOMEM; + if (IS_ERR(thm->base)) + return PTR_ERR(thm->base);
thm->nr_sensors = of_get_child_count(np); if (thm->nr_sensors == 0 || thm->nr_sensors > SPRD_THM_MAX_SENSOR) {
From: Dien Pham dien.pham.ry@renesas.com
[ Upstream commit 5f8f06425a0dcdad7bedbb77e67f5c65ab4dacfc ]
As description for DIV_ROUND_CLOSEST in file include/linux/kernel.h. "Result is undefined for negative divisors if the dividend variable type is unsigned and for negative dividends if the divisor variable type is unsigned."
In current code, the FIXPT_DIV uses DIV_ROUND_CLOSEST but has not checked sign of divisor before using. It makes undefined temperature value in case the value is negative.
This patch fixes to satisfy DIV_ROUND_CLOSEST description and fix bug too. Note that the variable name "reg" is not good because it should be the same type as rcar_gen3_thermal_read(). However, it's better to rename the "reg" in a further patch as cleanup.
Signed-off-by: Van Do van.do.xw@renesas.com Signed-off-by: Dien Pham dien.pham.ry@renesas.com [shimoda: minor fixes, add Fixes tag] Fixes: 564e73d283af ("thermal: rcar_gen3_thermal: Add R-Car Gen3 thermal driver") Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Reviewed-by: Niklas Soderlund niklas.soderlund+renesas@ragnatech.se Tested-by: Niklas Soderlund niklas.soderlund+renesas@ragnatech.se Reviewed-by: Amit Kucheria amit.kucheria@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/1593085099-2057-1-git-send-email-yoshihiro.shimoda... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/rcar_gen3_thermal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/rcar_gen3_thermal.c b/drivers/thermal/rcar_gen3_thermal.c index 58fe7c1ef00b1..c48c5e9b8f203 100644 --- a/drivers/thermal/rcar_gen3_thermal.c +++ b/drivers/thermal/rcar_gen3_thermal.c @@ -167,7 +167,7 @@ static int rcar_gen3_thermal_get_temp(void *devdata, int *temp) { struct rcar_gen3_thermal_tsc *tsc = devdata; int mcelsius, val; - u32 reg; + int reg;
/* Read register and convert to mili Celsius */ reg = rcar_gen3_thermal_read(tsc, REG_GEN3_TEMP) & CTEMP_MASK;
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 52782c92ac85c4e393eb4a903a62e6c24afa633f ]
It's handy to keep the kthread_fn just as a unique cookie to identify classes of kthreads. E.g. if you can verify that a given task is running your thread_fn, then you may know what sort of type kthread_data points to.
We'll use this in nfsd to pass some information into the vfs. Note it will need kthread_data() exported too.
Original-patch-by: Tejun Heo tj@kernel.org Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/kthread.h | 1 + kernel/kthread.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 8bbcaad7ef0f4..c2a274b79c429 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -57,6 +57,7 @@ bool kthread_should_stop(void); bool kthread_should_park(void); bool __kthread_should_park(struct task_struct *k); bool kthread_freezable_should_stop(bool *was_frozen); +void *kthread_func(struct task_struct *k); void *kthread_data(struct task_struct *k); void *kthread_probe_data(struct task_struct *k); int kthread_park(struct task_struct *k); diff --git a/kernel/kthread.c b/kernel/kthread.c index bfbfa481be3a5..b84fc7eec0358 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -46,6 +46,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu; + int (*threadfn)(void *); void *data; struct completion parked; struct completion exited; @@ -152,6 +153,20 @@ bool kthread_freezable_should_stop(bool *was_frozen) } EXPORT_SYMBOL_GPL(kthread_freezable_should_stop);
+/** + * kthread_func - return the function specified on kthread creation + * @task: kthread task in question + * + * Returns NULL if the task is not a kthread. + */ +void *kthread_func(struct task_struct *task) +{ + if (task->flags & PF_KTHREAD) + return to_kthread(task)->threadfn; + return NULL; +} +EXPORT_SYMBOL_GPL(kthread_func); + /** * kthread_data - return data value specified on kthread creation * @task: kthread task in question @@ -164,6 +179,7 @@ void *kthread_data(struct task_struct *task) { return to_kthread(task)->data; } +EXPORT_SYMBOL_GPL(kthread_data);
/** * kthread_probe_data - speculative version of kthread_data() @@ -244,6 +260,7 @@ static int kthread(void *_create) do_exit(-ENOMEM); }
+ self->threadfn = threadfn; self->data = data; init_completion(&self->exited); init_completion(&self->parked);
NACK to this and following patch.--b.
On Tue, Jul 07, 2020 at 05:17:15PM +0200, Greg Kroah-Hartman wrote:
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 52782c92ac85c4e393eb4a903a62e6c24afa633f ]
It's handy to keep the kthread_fn just as a unique cookie to identify classes of kthreads. E.g. if you can verify that a given task is running your thread_fn, then you may know what sort of type kthread_data points to.
We'll use this in nfsd to pass some information into the vfs. Note it will need kthread_data() exported too.
Original-patch-by: Tejun Heo tj@kernel.org Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
include/linux/kthread.h | 1 + kernel/kthread.c | 17 +++++++++++++++++ 2 files changed, 18 insertions(+)
diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 8bbcaad7ef0f4..c2a274b79c429 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -57,6 +57,7 @@ bool kthread_should_stop(void); bool kthread_should_park(void); bool __kthread_should_park(struct task_struct *k); bool kthread_freezable_should_stop(bool *was_frozen); +void *kthread_func(struct task_struct *k); void *kthread_data(struct task_struct *k); void *kthread_probe_data(struct task_struct *k); int kthread_park(struct task_struct *k); diff --git a/kernel/kthread.c b/kernel/kthread.c index bfbfa481be3a5..b84fc7eec0358 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -46,6 +46,7 @@ struct kthread_create_info struct kthread { unsigned long flags; unsigned int cpu;
- int (*threadfn)(void *); void *data; struct completion parked; struct completion exited;
@@ -152,6 +153,20 @@ bool kthread_freezable_should_stop(bool *was_frozen) } EXPORT_SYMBOL_GPL(kthread_freezable_should_stop); +/**
- kthread_func - return the function specified on kthread creation
- @task: kthread task in question
- Returns NULL if the task is not a kthread.
- */
+void *kthread_func(struct task_struct *task) +{
- if (task->flags & PF_KTHREAD)
return to_kthread(task)->threadfn;
- return NULL;
+} +EXPORT_SYMBOL_GPL(kthread_func);
/**
- kthread_data - return data value specified on kthread creation
- @task: kthread task in question
@@ -164,6 +179,7 @@ void *kthread_data(struct task_struct *task) { return to_kthread(task)->data; } +EXPORT_SYMBOL_GPL(kthread_data); /**
- kthread_probe_data - speculative version of kthread_data()
@@ -244,6 +260,7 @@ static int kthread(void *_create) do_exit(-ENOMEM); }
- self->threadfn = threadfn; self->data = data; init_completion(&self->exited); init_completion(&self->parked);
-- 2.25.1
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 28df3d1539de5090f7916f6fff03891b67f366f4 ]
We currently revoke read delegations on any write open or any operation that modifies file data or metadata (including rename, link, and unlink). But if the delegation in question is the only read delegation and is held by the client performing the operation, that's not really necessary.
It's not always possible to prevent this in the NFSv4.0 case, because there's not always a way to determine which client an NFSv4.0 delegation came from. (In theory we could try to guess this from the transport layer, e.g., by assuming all traffic on a given TCP connection comes from the same client. But that's not really correct.)
In the NFSv4.1 case the session layer always tells us the client.
This patch should remove such self-conflicts in all cases where we can reliably determine the client from the compound.
To do that we need to track "who" is performing a given (possibly lease-breaking) file operation. We're doing that by storing the information in the svc_rqst and using kthread_data() to map the current task back to a svc_rqst.
Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/locking.rst | 2 ++ fs/locks.c | 3 +++ fs/nfsd/nfs4proc.c | 2 ++ fs/nfsd/nfs4state.c | 14 ++++++++++++++ fs/nfsd/nfsd.h | 2 ++ fs/nfsd/nfssvc.c | 6 ++++++ include/linux/fs.h | 1 + include/linux/sunrpc/svc.h | 1 + 8 files changed, 31 insertions(+)
diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst index 5057e4d9dcd1d..9fdcec4166142 100644 --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -425,6 +425,7 @@ prototypes:: int (*lm_grant)(struct file_lock *, struct file_lock *, int); void (*lm_break)(struct file_lock *); /* break_lease callback */ int (*lm_change)(struct file_lock **, int); + bool (*lm_breaker_owns_lease)(struct file_lock *);
locking rules:
@@ -435,6 +436,7 @@ lm_notify: yes yes no lm_grant: no no no lm_break: yes no no lm_change yes no no +lm_breaker_owns_lease: no no no ========== ============= ================= =========
buffer_head diff --git a/fs/locks.c b/fs/locks.c index b8a31c1c4fff3..a3f186846e93e 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -1557,6 +1557,9 @@ static bool leases_conflict(struct file_lock *lease, struct file_lock *breaker) { bool rc;
+ if (lease->fl_lmops->lm_breaker_owns_lease + && lease->fl_lmops->lm_breaker_owns_lease(lease)) + return false; if ((breaker->fl_flags & FL_LAYOUT) != (lease->fl_flags & FL_LAYOUT)) { rc = false; goto trace; diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c index 0e75f7fb5fec0..a6d73aa51ce4e 100644 --- a/fs/nfsd/nfs4proc.c +++ b/fs/nfsd/nfs4proc.c @@ -2302,6 +2302,8 @@ nfsd4_proc_compound(struct svc_rqst *rqstp) } check_if_stalefh_allowed(args);
+ rqstp->rq_lease_breaker = (void **)&cstate->clp; + trace_nfsd_compound(rqstp, args->opcnt); while (!status && resp->opcnt < args->opcnt) { op = &args->ops[resp->opcnt++]; diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index c107caa565254..f71e5590967bb 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4522,6 +4522,19 @@ nfsd_break_deleg_cb(struct file_lock *fl) return ret; }
+static bool nfsd_breaker_owns_lease(struct file_lock *fl) +{ + struct nfs4_delegation *dl = fl->fl_owner; + struct svc_rqst *rqst; + struct nfs4_client *clp; + + if (!i_am_nfsd()) + return NULL; + rqst = kthread_data(current); + clp = *(rqst->rq_lease_breaker); + return dl->dl_stid.sc_client == clp; +} + static int nfsd_change_deleg_cb(struct file_lock *onlist, int arg, struct list_head *dispose) @@ -4533,6 +4546,7 @@ nfsd_change_deleg_cb(struct file_lock *onlist, int arg, }
static const struct lock_manager_operations nfsd_lease_mng_ops = { + .lm_breaker_owns_lease = nfsd_breaker_owns_lease, .lm_break = nfsd_break_deleg_cb, .lm_change = nfsd_change_deleg_cb, }; diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 2ab5569126b8a..36cdd81b6688a 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -88,6 +88,8 @@ int nfsd_pool_stats_release(struct inode *, struct file *);
void nfsd_destroy(struct net *net);
+bool i_am_nfsd(void); + struct nfsdfs_client { struct kref cl_ref; void (*cl_release)(struct kref *kref); diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index ca9fd348548b8..4f588c0eaaf44 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -601,6 +601,11 @@ static const struct svc_serv_ops nfsd_thread_sv_ops = { .svo_module = THIS_MODULE, };
+bool i_am_nfsd() +{ + return kthread_func(current) == nfsd; +} + int nfsd_create_serv(struct net *net) { int error; @@ -1011,6 +1016,7 @@ nfsd_dispatch(struct svc_rqst *rqstp, __be32 *statp) *statp = rpc_garbage_args; return 1; } + rqstp->rq_lease_breaker = NULL; /* * Give the xdr decoder a chance to change this if it wants * (necessary in the NFSv4.0 compound case) diff --git a/include/linux/fs.h b/include/linux/fs.h index 45cc10cdf6ddd..70a0ac7b8f66a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1045,6 +1045,7 @@ struct lock_manager_operations { bool (*lm_break)(struct file_lock *); int (*lm_change)(struct file_lock *, int, struct list_head *); void (*lm_setup)(struct file_lock *, void **); + bool (*lm_breaker_owns_lease)(struct file_lock *); };
struct lock_manager { diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index fd390894a5849..abf4a57ce4a7d 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -299,6 +299,7 @@ struct svc_rqst { struct net *rq_bc_net; /* pointer to backchannel's * net namespace */ + void ** rq_lease_breaker; /* The v4 client breaking a lease */ };
#define SVC_NET(rqst) (rqst->rq_xprt ? rqst->rq_xprt->xpt_net : rqst->rq_bc_net)
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit 681370f4b00af0fcc65bbfb9f82de526ab7ceb0a ]
We don't drop the reference on the nfsdfs filesystem with mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called until the nfsd module's unloaded, and we can't unload the module as long as there's a reference on nfsdfs. So this prevents module unloading.
Fixes: 2c830dd7209b ("nfsd: persist nfsd filesystem across mounts") Reported-and-Tested-by: Luo Xiaogang lxgrxd@163.com Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4state.c | 8 +++++++- fs/nfsd/nfsctl.c | 22 ++++++++++++---------- fs/nfsd/nfsd.h | 3 +++ 3 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index f71e5590967bb..95e459a1dd7dc 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7873,9 +7873,14 @@ nfs4_state_start_net(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id); int ret;
- ret = nfs4_state_create_net(net); + ret = get_nfsdfs(net); if (ret) return ret; + ret = nfs4_state_create_net(net); + if (ret) { + mntput(nn->nfsd_mnt); + return ret; + } locks_start_grace(net, &nn->nfsd4_manager); nfsd4_client_tracking_init(net); if (nn->track_reclaim_completes && nn->reclaim_str_hashtbl_size == 0) @@ -7944,6 +7949,7 @@ nfs4_state_shutdown_net(struct net *net)
nfsd4_client_tracking_exit(net); nfs4_state_destroy_net(net); + mntput(nn->nfsd_mnt); }
void diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 71687d99b0901..9b22d857549c3 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1424,6 +1424,18 @@ static struct file_system_type nfsd_fs_type = { }; MODULE_ALIAS_FS("nfsd");
+int get_nfsdfs(struct net *net) +{ + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct vfsmount *mnt; + + mnt = vfs_kern_mount(&nfsd_fs_type, SB_KERNMOUNT, "nfsd", NULL); + if (IS_ERR(mnt)) + return PTR_ERR(mnt); + nn->nfsd_mnt = mnt; + return 0; +} + #ifdef CONFIG_PROC_FS static int create_proc_exports_entry(void) { @@ -1451,7 +1463,6 @@ unsigned int nfsd_net_id; static __net_init int nfsd_init_net(struct net *net) { int retval; - struct vfsmount *mnt; struct nfsd_net *nn = net_generic(net, nfsd_net_id);
retval = nfsd_export_init(net); @@ -1478,16 +1489,8 @@ static __net_init int nfsd_init_net(struct net *net) init_waitqueue_head(&nn->ntf_wq); seqlock_init(&nn->boot_lock);
- mnt = vfs_kern_mount(&nfsd_fs_type, SB_KERNMOUNT, "nfsd", NULL); - if (IS_ERR(mnt)) { - retval = PTR_ERR(mnt); - goto out_mount_err; - } - nn->nfsd_mnt = mnt; return 0;
-out_mount_err: - nfsd_reply_cache_shutdown(nn); out_drc_error: nfsd_idmap_shutdown(net); out_idmap_error: @@ -1500,7 +1503,6 @@ static __net_exit void nfsd_exit_net(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- mntput(nn->nfsd_mnt); nfsd_reply_cache_shutdown(nn); nfsd_idmap_shutdown(net); nfsd_export_shutdown(net); diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 36cdd81b6688a..57c832d1b30fd 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -90,6 +90,8 @@ void nfsd_destroy(struct net *net);
bool i_am_nfsd(void);
+int get_nfsdfs(struct net *); + struct nfsdfs_client { struct kref cl_ref; void (*cl_release)(struct kref *kref); @@ -100,6 +102,7 @@ struct dentry *nfsd_client_mkdir(struct nfsd_net *nn, struct nfsdfs_client *ncl, u32 id, const struct tree_descr *); void nfsd_client_rmdir(struct dentry *dentry);
+ #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) #ifdef CONFIG_NFSD_V2_ACL extern const struct svc_version nfsd_acl_version2;
From: J. Bruce Fields bfields@redhat.com
[ Upstream commit bf2654017e0268cc83dc88d56f0e67ff4406631d ]
I don't understand this code well, but I'm seeing a warning about a still-referenced inode on unmount, and every other similar filesystem does a dput() here.
Fixes: e8a79fb14f6b ("nfsd: add nfsd/clients directory") Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfsctl.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index 9b22d857549c3..f298aad41070f 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1335,6 +1335,7 @@ void nfsd_client_rmdir(struct dentry *dentry) WARN_ON_ONCE(ret); fsnotify_rmdir(dir, dentry); d_delete(dentry); + dput(dentry); inode_unlock(dir); }
From: Chen-Yu Tsai wens@csie.org
[ Upstream commit bda8eaa6dee7525f4dac950810a85a88bf6c2ba0 ]
The HPD sense mechanism in Allwinner's old HDMI encoder hardware is more or less an input-only GPIO. Other GPIO-based HPD implementations directly return the current state, instead of polling for a specific state and returning the other if that times out.
Remove the I/O polling from sun4i_hdmi_connector_detect() and directly return a known state based on the current reading. This also gets rid of excessive CPU usage by kworker as reported on Stack Exchange [1] and Armbian forums [2].
[1] https://superuser.com/questions/1515001/debian-10-buster-on-cubietruck-with-... [2] https://forum.armbian.com/topic/14282-headless-systems-and-sun4i_drm_hdmi-a1...
Fixes: 9c5681011a0c ("drm/sun4i: Add HDMI support") Signed-off-by: Chen-Yu Tsai wens@csie.org Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://patchwork.freedesktop.org/patch/msgid/20200629060032.24134-1-wens@ke... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c b/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c index 68d4644ac2dcc..f07e0c32b93a2 100644 --- a/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c +++ b/drivers/gpu/drm/sun4i/sun4i_hdmi_enc.c @@ -262,9 +262,8 @@ sun4i_hdmi_connector_detect(struct drm_connector *connector, bool force) struct sun4i_hdmi *hdmi = drm_connector_to_sun4i_hdmi(connector); unsigned long reg;
- if (readl_poll_timeout(hdmi->base + SUN4I_HDMI_HPD_REG, reg, - reg & SUN4I_HDMI_HPD_HIGH, - 0, 500000)) { + reg = readl(hdmi->base + SUN4I_HDMI_HPD_REG); + if (reg & SUN4I_HDMI_HPD_HIGH) { cec_phys_addr_invalidate(hdmi->cec_adap); return connector_status_disconnected; }
From: Hou Tao houtao1@huawei.com
[ Upstream commit e7eea44eefbdd5f0345a0a8b80a3ca1c21030d06 ]
Else there will be memory leak if alloc_disk() fails.
Fixes: 6a27b656fc02 ("block: virtio-blk: support multi virt queues per virtio-blk device") Signed-off-by: Hou Tao houtao1@huawei.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Reviewed-by: Ming Lei ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/virtio_blk.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 9d21bf0f155ee..980df853ee497 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -878,6 +878,7 @@ static int virtblk_probe(struct virtio_device *vdev) put_disk(vblk->disk); out_free_vq: vdev->config->del_vqs(vdev); + kfree(vblk->vqs); out_free_vblk: kfree(vblk); out_free_index:
From: Paul Aurich paul@darkrain42.org
[ Upstream commit 5391b8e1b7b7e5cfa2dd4ffdc4b8c6b64dfd1866 ]
The flag from the primary tcon needs to be copied into the volume info so that cifs_get_tcon will try to enable extensions on the per-user tcon. At that point, since posix extensions must have already been enabled on the superblock, don't try to needlessly adjust the mount flags.
Fixes: ce558b0e17f8 ("smb3: Add posix create context for smb3.11 posix mounts") Fixes: b326614ea215 ("smb3: allow "posix" mount option to enable new SMB311 protocol extensions") Signed-off-by: Paul Aurich paul@darkrain42.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/connect.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c index 47b9fbb70bf5e..fd8d886e09390 100644 --- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -5307,6 +5307,7 @@ cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) vol_info->nohandlecache = master_tcon->nohandlecache; vol_info->local_lease = master_tcon->local_lease; vol_info->no_linux_ext = !master_tcon->unix_ext; + vol_info->linux_ext = master_tcon->posix_extensions; vol_info->sectype = master_tcon->ses->sectype; vol_info->sign = master_tcon->ses->sign;
@@ -5334,10 +5335,6 @@ cifs_construct_tcon(struct cifs_sb_info *cifs_sb, kuid_t fsuid) goto out; }
- /* if new SMB3.11 POSIX extensions are supported do not remap / and \ */ - if (tcon->posix_extensions) - cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_POSIX_PATHS; - if (cap_unix(ses)) reset_cifs_unix_caps(0, tcon, NULL, vol_info);
From: Sagi Grimberg sagi@grimberg.me
[ Upstream commit ea43d9709f727e728e933a8157a7a7ca1a868281 ]
Commit 59c7c3caaaf8 intended to only silently ignore non retry-able errors (DNR bit set) such that we can still identify misbehaving controllers, and in the other hand propagate retry-able errors (DNR bit cleared) so we don't wrongly abandon a namespace just because it happens to be temporarily inaccessible.
The goal remains the same as the original commit where this was introduced but unfortunately had the logic backwards.
Fixes: 59c7c3caaaf8 ("nvme: fix possible hang when ns scanning fails during error recovery") Reported-by: Keith Busch kbusch@kernel.org Signed-off-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 85ce6c682849e..71d63ed62071e 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1120,10 +1120,16 @@ static int nvme_identify_ns_descs(struct nvme_ctrl *ctrl, unsigned nsid, dev_warn(ctrl->device, "Identify Descriptors failed (%d)\n", status); /* - * Don't treat an error as fatal, as we potentially already - * have a NGUID or EUI-64. + * Don't treat non-retryable errors as fatal, as we potentially + * already have a NGUID or EUI-64. If we failed with DNR set, + * we want to silently ignore the error as we can still + * identify the device, but if the status has DNR set, we want + * to propagate the error back specifically for the disk + * revalidation flow to make sure we don't abandon the + * device just because of a temporal retry-able error (such + * as path of transport errors). */ - if (status > 0 && !(status & NVME_SC_DNR)) + if (status > 0 && (status & NVME_SC_DNR)) status = 0; goto free_data; }
From: Christoph Hellwig hch@lst.de
[ Upstream commit 72d447113bb751ded97b2e2c38f886e4a4139082 ]
For private namespaces ns->head_disk is NULL, so add a NULL check before updating the BDI capabilities.
Fixes: b2ce4d90690b ("nvme-multipath: set bdi capabilities once") Reported-by: Avinash M N Avinash.M.N@wdc.com Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Max Gurtovoy maxg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/multipath.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c index 03bc3aba09871..36db7d2e6a896 100644 --- a/drivers/nvme/host/multipath.c +++ b/drivers/nvme/host/multipath.c @@ -673,10 +673,11 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, struct nvme_id_ns *id) }
if (bdi_cap_stable_pages_required(ns->queue->backing_dev_info)) { - struct backing_dev_info *info = - ns->head->disk->queue->backing_dev_info; + struct gendisk *disk = ns->head->disk;
- info->capabilities |= BDI_CAP_STABLE_WRITES; + if (disk) + disk->queue->backing_dev_info->capabilities |= + BDI_CAP_STABLE_WRITES; } }
From: David Gibson david@gibson.dropbear.id.au
[ Upstream commit 72d0556dca39f45eca6c4c085e9eb0fc70aec025 ]
The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued, which will need the use of the internal command/response buffer. But, we're issuing this *before* we've waited to make sure that buffer is allocated.
This can result in intermittent failures to probe if the hypervisor / TPM implementation doesn't respond quickly enough. I find it fails almost every time with an 8 vcpu guest under KVM with software emulated TPM.
To fix it, just move the tpm2_get_cc_attrs_tlb() call after the existing code to wait for initialization, which will ensure the buffer is allocated.
Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2") Signed-off-by: David Gibson david@gibson.dropbear.id.au Reviewed-by: Jerry Snitselaar jsnitsel@redhat.com Reviewed-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Jarkko Sakkinen jarkko.sakkinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/char/tpm/tpm_ibmvtpm.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/char/tpm/tpm_ibmvtpm.c b/drivers/char/tpm/tpm_ibmvtpm.c index 09fe45246b8cc..994385bf37c0c 100644 --- a/drivers/char/tpm/tpm_ibmvtpm.c +++ b/drivers/char/tpm/tpm_ibmvtpm.c @@ -683,13 +683,6 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, if (rc) goto init_irq_cleanup;
- if (!strcmp(id->compat, "IBM,vtpm20")) { - chip->flags |= TPM_CHIP_FLAG_TPM2; - rc = tpm2_get_cc_attrs_tbl(chip); - if (rc) - goto init_irq_cleanup; - } - if (!wait_event_timeout(ibmvtpm->crq_queue.wq, ibmvtpm->rtce_buf != NULL, HZ)) { @@ -697,6 +690,13 @@ static int tpm_ibmvtpm_probe(struct vio_dev *vio_dev, goto init_irq_cleanup; }
+ if (!strcmp(id->compat, "IBM,vtpm20")) { + chip->flags |= TPM_CHIP_FLAG_TPM2; + rc = tpm2_get_cc_attrs_tbl(chip); + if (rc) + goto init_irq_cleanup; + } + return tpm_chip_register(chip); init_irq_cleanup: do {
From: Kees Cook keescook@chromium.org
[ Upstream commit c3eeaae9fd736b7f2afbda8d3cbb1cbae06decf3 ]
Something changed recently to uncover this warning:
samples/vfs/test-statx.c:24:15: warning: `struct foo' declared inside parameter list will not be visible outside of this definition or declaration 24 | #define statx foo | ^~~
Which is due the use of "struct statx" (here, "struct foo") in a function prototype argument list before it has been defined:
int # 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" foo # 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4 (int __dirfd, const char *__restrict __path, int __flags, unsigned int __mask, struct # 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" foo # 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4 *__restrict __buf) __attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (2, 5)));
Add explicit struct before #include to avoid warning.
Fixes: f1b5618e013a ("vfs: Add a sample program for the new mount API") Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Cc: Miklos Szeredi mszeredi@redhat.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: David Howells dhowells@redhat.com Link: http://lkml.kernel.org/r/202006282213.C516EA6@keescook Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- samples/vfs/test-statx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/samples/vfs/test-statx.c b/samples/vfs/test-statx.c index a3d68159fb510..507f09c38b49f 100644 --- a/samples/vfs/test-statx.c +++ b/samples/vfs/test-statx.c @@ -23,6 +23,8 @@ #include <linux/fcntl.h> #define statx foo #define statx_timestamp foo_timestamp +struct statx; +struct statx_timestamp; #include <sys/stat.h> #undef statx #undef statx_timestamp
From: Chris Packham chris.packham@alliedtelesis.co.nz
[ Upstream commit cd217f2300793a106b49c7dfcbfb26e348bc7593 ]
The PCA9665 datasheet says that I2CSTA = 78h indicates that SCL is stuck low, this differs to the PCA9564 which uses 90h for this indication. Treat either 0x78 or 0x90 as an indication that the SCL line is stuck.
Based on looking through the PCA9564 and PCA9665 datasheets this should be safe for both chips. The PCA9564 should not return 0x78 for any valid state and the PCA9665 should not return 0x90.
Fixes: eff9ec95efaa ("i2c-algo-pca: Add PCA9665 support") Signed-off-by: Chris Packham chris.packham@alliedtelesis.co.nz Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/algos/i2c-algo-pca.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/i2c/algos/i2c-algo-pca.c b/drivers/i2c/algos/i2c-algo-pca.c index 7f10312d1b88f..388978775be04 100644 --- a/drivers/i2c/algos/i2c-algo-pca.c +++ b/drivers/i2c/algos/i2c-algo-pca.c @@ -314,7 +314,8 @@ static int pca_xfer(struct i2c_adapter *i2c_adap, DEB2("BUS ERROR - SDA Stuck low\n"); pca_reset(adap); goto out; - case 0x90: /* Bus error - SCL stuck low */ + case 0x78: /* Bus error - SCL stuck low (PCA9665) */ + case 0x90: /* Bus error - SCL stuck low (PCA9564) */ DEB2("BUS ERROR - SCL Stuck low\n"); pca_reset(adap); goto out;
From: Ricardo Ribalda ribalda@kernel.org
[ Upstream commit db2a8b6f1df93d5311970cca03052c01178de674 ]
Current AMD's zen-based APUs use this core for some of its i2c-buses.
With this patch we re-enable autodetection of hwmon-alike devices, so lm-sensors will be able to work automatically.
It does not affect the boot-time of embedded devices, as the class is set based on the DMI information.
DMI is probed only on Qtechnology QT5222 Industrial Camera Platform.
DocLink: https://qtec.com/camera-technology-camera-platforms/ Fixes: 3eddad96c439 ("i2c: designware: reverts "i2c: designware: Add support for AMD I2C controller"") Signed-off-by: Ricardo Ribalda ribalda@kernel.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-designware-platdrv.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-designware-platdrv.c b/drivers/i2c/busses/i2c-designware-platdrv.c index 5536673060cc6..3a9c2cfbef974 100644 --- a/drivers/i2c/busses/i2c-designware-platdrv.c +++ b/drivers/i2c/busses/i2c-designware-platdrv.c @@ -234,6 +234,17 @@ static const u32 supported_speeds[] = { I2C_MAX_STANDARD_MODE_FREQ, };
+static const struct dmi_system_id dw_i2c_hwmon_class_dmi[] = { + { + .ident = "Qtechnology QT5222", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Qtechnology"), + DMI_MATCH(DMI_PRODUCT_NAME, "QT5222"), + }, + }, + { } /* terminate list */ +}; + static int dw_i2c_plat_probe(struct platform_device *pdev) { struct dw_i2c_platform_data *pdata = dev_get_platdata(&pdev->dev); @@ -349,7 +360,8 @@ static int dw_i2c_plat_probe(struct platform_device *pdev)
adap = &dev->adapter; adap->owner = THIS_MODULE; - adap->class = I2C_CLASS_DEPRECATED; + adap->class = dmi_check_system(dw_i2c_hwmon_class_dmi) ? + I2C_CLASS_HWMON : I2C_CLASS_DEPRECATED; ACPI_COMPANION_SET(&adap->dev, ACPI_COMPANION(&pdev->dev)); adap->dev.of_node = pdev->dev.of_node; adap->nr = -1;
From: Wolfram Sang wsa+renesas@sang-engineering.com
[ Upstream commit 597911287fcd13c3a4b4aa3e0a52b33d431e0a8e ]
I2C_SMBUS_BLOCK_MAX defines already the maximum number as defined in the SMBus 2.0 specs. I don't see a reason to add 1 here. Also, fix the errno to what is suggested for this error.
Fixes: c9bfdc7c16cb ("i2c: mlxcpld: Add support for smbus block read transaction") Signed-off-by: Wolfram Sang wsa+renesas@sang-engineering.com Reviewed-by: Michael Shych michaelsh@mellanox.com Tested-by: Michael Shych michaelsh@mellanox.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-mlxcpld.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-mlxcpld.c b/drivers/i2c/busses/i2c-mlxcpld.c index 2fd717d8dd30e..71d7bae2cbcad 100644 --- a/drivers/i2c/busses/i2c-mlxcpld.c +++ b/drivers/i2c/busses/i2c-mlxcpld.c @@ -337,9 +337,9 @@ static int mlxcpld_i2c_wait_for_tc(struct mlxcpld_i2c_priv *priv) if (priv->smbus_block && (val & MLXCPLD_I2C_SMBUS_BLK_BIT)) { mlxcpld_i2c_read_comm(priv, MLXCPLD_LPCI2C_NUM_DAT_REG, &datalen, 1); - if (unlikely(datalen > (I2C_SMBUS_BLOCK_MAX + 1))) { + if (unlikely(datalen > I2C_SMBUS_BLOCK_MAX)) { dev_err(priv->dev, "Incorrect smbus block read message len\n"); - return -E2BIG; + return -EPROTO; } } else { datalen = priv->xfer.data_len;
From: Jens Axboe axboe@kernel.dk
[ Upstream commit b7db41c9e03b5189bc94993bd50e4506ac9e34c1 ]
When switching to TWA_SIGNAL for task_work notifications, we also made any signal based condition in io_cqring_wait() return -ERESTARTSYS. This breaks applications that rely on using signals to abort someone waiting for events.
Check if we have a signal pending because of queued task_work, and repeat the signal check once we've run the task_work. This provides a reliable way of telling the two apart.
Additionally, only use TWA_SIGNAL if we are using an eventfd. If not, we don't have the dependency situation described in the original commit, and we can get by with just using TWA_RESUME like we previously did.
Fixes: ce593a6c480a ("io_uring: use signal based task_work running") Cc: stable@vger.kernel.org # v5.7 Reported-by: Andres Freund andres@anarazel.de Tested-by: Andres Freund andres@anarazel.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index 51362a619fd50..2be6ea0103405 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4136,14 +4136,22 @@ struct io_poll_table { int error; };
-static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb, - int notify) +static int io_req_task_work_add(struct io_kiocb *req, struct callback_head *cb) { struct task_struct *tsk = req->task; - int ret; + struct io_ring_ctx *ctx = req->ctx; + int ret, notify = TWA_RESUME;
- if (req->ctx->flags & IORING_SETUP_SQPOLL) + /* + * SQPOLL kernel thread doesn't need notification, just a wakeup. + * If we're not using an eventfd, then TWA_RESUME is always fine, + * as we won't have dependencies between request completions for + * other kernel wait conditions. + */ + if (ctx->flags & IORING_SETUP_SQPOLL) notify = 0; + else if (ctx->cq_ev_fd) + notify = TWA_SIGNAL;
ret = task_work_add(tsk, cb, notify); if (!ret) @@ -4174,7 +4182,7 @@ static int __io_async_wake(struct io_kiocb *req, struct io_poll_iocb *poll, * of executing it. We can't safely execute it anyway, as we may not * have the needed state needed for it anyway. */ - ret = io_req_task_work_add(req, &req->task_work, TWA_SIGNAL); + ret = io_req_task_work_add(req, &req->task_work); if (unlikely(ret)) { WRITE_ONCE(poll->canceled, true); tsk = io_wq_get_task(req->ctx->io_wq); @@ -6279,7 +6287,14 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, if (current->task_works) task_work_run(); if (signal_pending(current)) { - ret = -ERESTARTSYS; + if (current->jobctl & JOBCTL_TASK_WORK) { + spin_lock_irq(¤t->sighand->siglock); + current->jobctl &= ~JOBCTL_TASK_WORK; + recalc_sigpending(); + spin_unlock_irq(¤t->sighand->siglock); + continue; + } + ret = -EINTR; break; } if (io_should_wake(&iowq, false)) @@ -6288,7 +6303,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events, } while (1); finish_wait(&ctx->wait, &iowq.wq);
- restore_saved_sigmask_unless(ret == -ERESTARTSYS); + restore_saved_sigmask_unless(ret == -EINTR);
return READ_ONCE(rings->cq.head) == READ_ONCE(rings->cq.tail) ? ret : 0; }
From: Krzysztof Kozlowski krzk@kernel.org
commit 3d87b613d6a3c6f0980e877ab0895785a2dde581 upstream.
If shared interrupt comes late, during probe error path or device remove (could be triggered with CONFIG_DEBUG_SHIRQ), the interrupt handler dspi_interrupt() will access registers with the clock being disabled. This leads to external abort on non-linefetch on Toradex Colibri VF50 module (with Vybrid VF5xx):
$ echo 4002d000.spi > /sys/devices/platform/soc/40000000.bus/4002d000.spi/driver/unbind
Unhandled fault: external abort on non-linefetch (0x1008) at 0x8887f02c Internal error: : 1008 [#1] ARM Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree) Backtrace: (regmap_mmio_read32le) (regmap_mmio_read) (_regmap_bus_reg_read) (_regmap_read) (regmap_read) (dspi_interrupt) (free_irq) (devm_irq_release) (release_nodes) (devres_release_all) (device_release_driver_internal)
The resource-managed framework should not be used for shared interrupt handling, because the interrupt handler might be called after releasing other resources and disabling clocks.
Similar bug could happen during suspend - the shared interrupt handler could be invoked after suspending the device. Each device sharing this interrupt line should disable the IRQ during suspend so handler will be invoked only in following cases: 1. None suspended, 2. All devices resumed.
Fixes: 349ad66c0ab0 ("spi:Add Freescale DSPI driver for Vybrid VF610 platform") Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Tested-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200622110543.5035-3-krzk@kernel.org Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/spi/spi-fsl-dspi.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
--- a/drivers/spi/spi-fsl-dspi.c +++ b/drivers/spi/spi-fsl-dspi.c @@ -1105,6 +1105,8 @@ static int dspi_suspend(struct device *d struct spi_controller *ctlr = dev_get_drvdata(dev); struct fsl_dspi *dspi = spi_controller_get_devdata(ctlr);
+ if (dspi->irq) + disable_irq(dspi->irq); spi_controller_suspend(ctlr); clk_disable_unprepare(dspi->clk);
@@ -1125,6 +1127,8 @@ static int dspi_resume(struct device *de if (ret) return ret; spi_controller_resume(ctlr); + if (dspi->irq) + enable_irq(dspi->irq);
return 0; } @@ -1381,8 +1385,8 @@ static int dspi_probe(struct platform_de goto poll_mode; }
- ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt, - IRQF_SHARED, pdev->name, dspi); + ret = request_threaded_irq(dspi->irq, dspi_interrupt, NULL, + IRQF_SHARED, pdev->name, dspi); if (ret < 0) { dev_err(&pdev->dev, "Unable to attach DSPI interrupt\n"); goto out_clk_put; @@ -1396,7 +1400,7 @@ poll_mode: ret = dspi_request_dma(dspi, res->start); if (ret < 0) { dev_err(&pdev->dev, "can't get dma channels\n"); - goto out_clk_put; + goto out_free_irq; } }
@@ -1411,11 +1415,14 @@ poll_mode: ret = spi_register_controller(ctlr); if (ret != 0) { dev_err(&pdev->dev, "Problem registering DSPI ctlr\n"); - goto out_clk_put; + goto out_free_irq; }
return ret;
+out_free_irq: + if (dspi->irq) + free_irq(dspi->irq, dspi); out_clk_put: clk_disable_unprepare(dspi->clk); out_ctlr_put: @@ -1431,6 +1438,8 @@ static int dspi_remove(struct platform_d
/* Disconnect from the SPI framework */ dspi_release_dma(dspi); + if (dspi->irq) + free_irq(dspi->irq, dspi); clk_disable_unprepare(dspi->clk); spi_unregister_controller(dspi->ctlr);
From: J. Bruce Fields bfields@redhat.com
commit 22cf8419f1319ff87ec759d0ebdff4cbafaee832 upstream.
The server is failing to apply the umask when creating new objects on filesystems without ACL support.
To reproduce this, you need to use NFSv4.2 and a client and server recent enough to support umask, and you need to export a filesystem that lacks ACL support (for example, ext4 with the "noacl" mount option).
Filesystems with ACL support are expected to take care of the umask themselves (usually by calling posix_acl_create).
For filesystems without ACL support, this is up to the caller of vfs_create(), vfs_mknod(), or vfs_mkdir().
Reported-by: Elliott Mitchell ehem+debian@m5p.com Reported-by: Salvatore Bonaccorso carnil@debian.org Tested-by: Salvatore Bonaccorso carnil@debian.org Fixes: 47057abde515 ("nfsd: add support for the umask attribute") Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields bfields@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfsd/vfs.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -1225,6 +1225,9 @@ nfsd_create_locked(struct svc_rqst *rqst iap->ia_mode = 0; iap->ia_mode = (iap->ia_mode & S_IALLUGO) | type;
+ if (!IS_POSIXACL(dirp)) + iap->ia_mode &= ~current_umask(); + err = 0; host_err = 0; switch (type) { @@ -1457,6 +1460,9 @@ do_nfsd_create(struct svc_rqst *rqstp, s goto out; }
+ if (!IS_POSIXACL(dirp)) + iap->ia_mode &= ~current_umask(); + host_err = vfs_create(dirp, dchild, iap->ia_mode, true); if (host_err < 0) { fh_drop_write(fhp);
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
This reverts commit d288dc74f8cf95cb7ae0aaf245b7128627a49bf3 which is commit f0bd62b64016508938df9babe47f65c2c727d25c upstream.
It causes a number of reported issues and a fix for it has not hit Linus's tree yet. Revert this to resolve those problems.
Cc: Alexander Tsoy alexander@tsoy.me Cc: Takashi Iwai tiwai@suse.de Cc: Sasha Levin sashal@kernel.org Cc: Hans de Goede jwrdegoede@fedoraproject.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/card.h | 4 ---- sound/usb/endpoint.c | 43 +++++-------------------------------------- sound/usb/endpoint.h | 1 - sound/usb/pcm.c | 2 -- 4 files changed, 5 insertions(+), 45 deletions(-)
--- a/sound/usb/card.h +++ b/sound/usb/card.h @@ -84,10 +84,6 @@ struct snd_usb_endpoint { dma_addr_t sync_dma; /* DMA address of syncbuf */
unsigned int pipe; /* the data i/o pipe */ - unsigned int framesize[2]; /* small/large frame sizes in samples */ - unsigned int sample_rem; /* remainder from division fs/fps */ - unsigned int sample_accum; /* sample accumulator */ - unsigned int fps; /* frames per second */ unsigned int freqn; /* nominal sampling rate in fs/fps in Q16.16 format */ unsigned int freqm; /* momentary sampling rate in fs/fps in Q16.16 format */ int freqshift; /* how much to shift the feedback value to get Q16.16 */ --- a/sound/usb/endpoint.c +++ b/sound/usb/endpoint.c @@ -124,12 +124,12 @@ int snd_usb_endpoint_implicit_feedback_s
/* * For streaming based on information derived from sync endpoints, - * prepare_outbound_urb_sizes() will call slave_next_packet_size() to + * prepare_outbound_urb_sizes() will call next_packet_size() to * determine the number of samples to be sent in the next packet. * - * For implicit feedback, slave_next_packet_size() is unused. + * For implicit feedback, next_packet_size() is unused. */ -int snd_usb_endpoint_slave_next_packet_size(struct snd_usb_endpoint *ep) +int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep) { unsigned long flags; int ret; @@ -146,29 +146,6 @@ int snd_usb_endpoint_slave_next_packet_s return ret; }
-/* - * For adaptive and synchronous endpoints, prepare_outbound_urb_sizes() - * will call next_packet_size() to determine the number of samples to be - * sent in the next packet. - */ -int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep) -{ - int ret; - - if (ep->fill_max) - return ep->maxframesize; - - ep->sample_accum += ep->sample_rem; - if (ep->sample_accum >= ep->fps) { - ep->sample_accum -= ep->fps; - ret = ep->framesize[1]; - } else { - ret = ep->framesize[0]; - } - - return ret; -} - static void retire_outbound_urb(struct snd_usb_endpoint *ep, struct snd_urb_ctx *urb_ctx) { @@ -213,8 +190,6 @@ static void prepare_silent_urb(struct sn
if (ctx->packet_size[i]) counts = ctx->packet_size[i]; - else if (ep->sync_master) - counts = snd_usb_endpoint_slave_next_packet_size(ep); else counts = snd_usb_endpoint_next_packet_size(ep);
@@ -1086,17 +1061,10 @@ int snd_usb_endpoint_set_params(struct s ep->maxpacksize = fmt->maxpacksize; ep->fill_max = !!(fmt->attributes & UAC_EP_CS_ATTR_FILL_MAX);
- if (snd_usb_get_speed(ep->chip->dev) == USB_SPEED_FULL) { + if (snd_usb_get_speed(ep->chip->dev) == USB_SPEED_FULL) ep->freqn = get_usb_full_speed_rate(rate); - ep->fps = 1000; - } else { + else ep->freqn = get_usb_high_speed_rate(rate); - ep->fps = 8000; - } - - ep->sample_rem = rate % ep->fps; - ep->framesize[0] = rate / ep->fps; - ep->framesize[1] = (rate + (ep->fps - 1)) / ep->fps;
/* calculate the frequency in 16.16 format */ ep->freqm = ep->freqn; @@ -1155,7 +1123,6 @@ int snd_usb_endpoint_start(struct snd_us ep->active_mask = 0; ep->unlink_mask = 0; ep->phase = 0; - ep->sample_accum = 0;
snd_usb_endpoint_start_quirk(ep);
--- a/sound/usb/endpoint.h +++ b/sound/usb/endpoint.h @@ -28,7 +28,6 @@ void snd_usb_endpoint_release(struct snd void snd_usb_endpoint_free(struct snd_usb_endpoint *ep);
int snd_usb_endpoint_implicit_feedback_sink(struct snd_usb_endpoint *ep); -int snd_usb_endpoint_slave_next_packet_size(struct snd_usb_endpoint *ep); int snd_usb_endpoint_next_packet_size(struct snd_usb_endpoint *ep);
void snd_usb_handle_sync_urb(struct snd_usb_endpoint *ep, --- a/sound/usb/pcm.c +++ b/sound/usb/pcm.c @@ -1585,8 +1585,6 @@ static void prepare_playback_urb(struct for (i = 0; i < ctx->packets; i++) { if (ctx->packet_size[i]) counts = ctx->packet_size[i]; - else if (ep->sync_master) - counts = snd_usb_endpoint_slave_next_packet_size(ep); else counts = snd_usb_endpoint_next_packet_size(ep);
From: Daniel Jordan daniel.m.jordan@oracle.com
commit e04ec0de61c1eb9693179093e83ab8ca68a30d08 upstream.
A 5.7 kernel hangs during a tcrypt test of padata that waits for an AEAD request to finish. This is only seen on large machines running many concurrent requests.
The issue is that padata never serializes the request. The removal of the reorder_objects atomic missed that the memory barrier in padata_do_serial() depends on it.
Upgrade the barrier from smp_mb__after_atomic to smp_mb to get correct ordering again.
Fixes: 3facced7aeed1 ("padata: remove reorder_objects") Signed-off-by: Daniel Jordan daniel.m.jordan@oracle.com Cc: Steffen Klassert steffen.klassert@secunet.com Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/padata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/padata.c +++ b/kernel/padata.c @@ -260,7 +260,7 @@ static void padata_reorder(struct parall * * Ensure reorder queue is read after pd->lock is dropped so we see * new objects from another task in padata_do_serial. Pairs with - * smp_mb__after_atomic in padata_do_serial. + * smp_mb in padata_do_serial. */ smp_mb();
@@ -342,7 +342,7 @@ void padata_do_serial(struct padata_priv * with the trylock of pd->lock in padata_reorder. Pairs with smp_mb * in padata_reorder. */ - smp_mb__after_atomic(); + smp_mb();
padata_reorder(pd); }
From: Paul Aurich paul@darkrain42.org
commit cc15461c73d7d044d56c47e869a215e49bd429c8 upstream.
Ensure multiuser SMB3 mounts use encryption for all users' tcons if the mount options are configured to require encryption. Without this, only the primary tcon and IPC tcons are guaranteed to be encrypted. Per-user tcons would only be encrypted if the server was configured to require encryption.
Signed-off-by: Paul Aurich paul@darkrain42.org CC: Stable stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/connect.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -5310,6 +5310,7 @@ cifs_construct_tcon(struct cifs_sb_info vol_info->linux_ext = master_tcon->posix_extensions; vol_info->sectype = master_tcon->ses->sectype; vol_info->sign = master_tcon->ses->sign; + vol_info->seal = master_tcon->seal;
rc = cifs_set_vol_auth(vol_info, master_tcon->ses); if (rc) {
From: Paul Aurich paul@darkrain42.org
commit 00dfbc2f9c61185a2e662f27c45a0bb29b2a134f upstream.
Without this:
- persistent handles will only be enabled for per-user tcons if the server advertises the 'Continuous Availabity' capability - resilient handles would never be enabled for per-user tcons
Signed-off-by: Paul Aurich paul@darkrain42.org CC: Stable stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/connect.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -5306,6 +5306,8 @@ cifs_construct_tcon(struct cifs_sb_info vol_info->nocase = master_tcon->nocase; vol_info->nohandlecache = master_tcon->nohandlecache; vol_info->local_lease = master_tcon->local_lease; + vol_info->resilient = master_tcon->use_resilient; + vol_info->persistent = master_tcon->use_persistent; vol_info->no_linux_ext = !master_tcon->unix_ext; vol_info->linux_ext = master_tcon->posix_extensions; vol_info->sectype = master_tcon->ses->sectype;
From: Paul Aurich paul@darkrain42.org
commit ad35f169db6cd5a4c5c0a5a42fb0cad3efeccb83 upstream.
Fixes: 3e7a02d47872 ("smb3: allow disabling requesting leases") Signed-off-by: Paul Aurich paul@darkrain42.org CC: Stable stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/connect.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -5306,6 +5306,7 @@ cifs_construct_tcon(struct cifs_sb_info vol_info->nocase = master_tcon->nocase; vol_info->nohandlecache = master_tcon->nohandlecache; vol_info->local_lease = master_tcon->local_lease; + vol_info->no_lease = master_tcon->no_lease; vol_info->resilient = master_tcon->use_resilient; vol_info->persistent = master_tcon->use_persistent; vol_info->no_linux_ext = !master_tcon->unix_ext;
From: Paul Aurich paul@darkrain42.org
commit 6b356f6cf941d5054d7fab072cae4a5f8658e3db upstream.
Fixes: ca567eb2b3f0 ("SMB3: Allow persistent handle timeout to be configurable on mount") Signed-off-by: Paul Aurich paul@darkrain42.org CC: Stable stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/connect.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -5309,6 +5309,7 @@ cifs_construct_tcon(struct cifs_sb_info vol_info->no_lease = master_tcon->no_lease; vol_info->resilient = master_tcon->use_resilient; vol_info->persistent = master_tcon->use_persistent; + vol_info->handle_timeout = master_tcon->handle_timeout; vol_info->no_linux_ext = !master_tcon->unix_ext; vol_info->linux_ext = master_tcon->posix_extensions; vol_info->sectype = master_tcon->ses->sectype;
From: Zhang Xiaoxu zhangxiaoxu5@huawei.com
commit 9ffad9263b467efd8f8dc7ae1941a0a655a2bab2 upstream.
When xfstest generic/035, we found the target file was deleted if the rename return -EACESS.
In cifs_rename2, we unlink the positive target dentry if rename failed with EACESS or EEXIST, even if the target dentry is positived before rename. Then the existing file was deleted.
We should just delete the target file which created during the rename.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhang Xiaoxu zhangxiaoxu5@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Reviewed-by: Aurelien Aptel aaptel@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/inode.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c @@ -1855,6 +1855,7 @@ cifs_rename2(struct inode *source_dir, s FILE_UNIX_BASIC_INFO *info_buf_target; unsigned int xid; int rc, tmprc; + bool new_target = d_really_is_negative(target_dentry);
if (flags & ~RENAME_NOREPLACE) return -EINVAL; @@ -1931,8 +1932,13 @@ cifs_rename2(struct inode *source_dir, s */
unlink_target: - /* Try unlinking the target dentry if it's not negative */ - if (d_really_is_positive(target_dentry) && (rc == -EACCES || rc == -EEXIST)) { + /* + * If the target dentry was created during the rename, try + * unlinking it if it's not negative + */ + if (new_target && + d_really_is_positive(target_dentry) && + (rc == -EACCES || rc == -EEXIST)) { if (d_is_dir(target_dentry)) tmprc = cifs_rmdir(target_dir, target_dentry); else
From: Joseph Salisbury joseph.salisbury@microsoft.com
commit 77b48bea2fee47c15a835f6725dd8df0bc38375a upstream.
When the kernel panics, one page of kmsg data may be collected and sent to Hyper-V to aid in diagnosing the failure. The collected kmsg data typically contains 50 to 100 lines, each of which has a log level prefix that isn't very useful from a diagnostic standpoint. So tell kmsg_dump_get_buffer() to not include the log level, enabling more information that *is* useful to fit in the page.
Requesting in stable kernels, since many kernels running in production are stable releases.
Cc: stable@vger.kernel.org Signed-off-by: Joseph Salisbury joseph.salisbury@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Link: https://lore.kernel.org/r/1593210497-114310-1-git-send-email-joseph.salisbur... Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hv/vmbus_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1328,7 +1328,7 @@ static void hv_kmsg_dump(struct kmsg_dum * Write dump contents to the page. No need to synchronize; panic should * be single-threaded. */ - kmsg_dump_get_buffer(dumper, true, hv_panic_page, HV_HYP_PAGE_SIZE, + kmsg_dump_get_buffer(dumper, false, hv_panic_page, HV_HYP_PAGE_SIZE, &bytes_written); if (bytes_written) hyperv_report_panic_msg(panic_pa, bytes_written);
From: Jan Kundrát jan.kundrat@cesnet.cz
commit b4c8af4c2a226fc9c25e1decbd26fdab1b0993ee upstream.
Commit 16358542f32f ("hwmon: (pmbus) Implement multi-phase support") added support for multi-phase pmbus devices. However, when calling pmbus_add_sensor() for fans, the patch swapped the `page` and `reg` attributes. As a result, the fan speeds were reported as 0 RPM on my device.
Signed-off-by: Jan Kundrát jan.kundrat@cesnet.cz Fixes: 16358542f32f ("hwmon: (pmbus) Implement multi-phase support") Cc: stable@vger.kernel.org # v5.7+ Link: https://lore.kernel.org/r/449bc9e6c0e4305581e45905ce9d043b356a9932.159290438... [groeck: Fixed references to offending commit] Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/pmbus/pmbus_core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/hwmon/pmbus/pmbus_core.c +++ b/drivers/hwmon/pmbus/pmbus_core.c @@ -1869,7 +1869,7 @@ static int pmbus_add_fan_ctrl(struct i2c struct pmbus_sensor *sensor;
sensor = pmbus_add_sensor(data, "fan", "target", index, page, - PMBUS_VIRT_FAN_TARGET_1 + id, 0xff, PSC_FAN, + 0xff, PMBUS_VIRT_FAN_TARGET_1 + id, PSC_FAN, false, false, true);
if (!sensor) @@ -1880,14 +1880,14 @@ static int pmbus_add_fan_ctrl(struct i2c return 0;
sensor = pmbus_add_sensor(data, "pwm", NULL, index, page, - PMBUS_VIRT_PWM_1 + id, 0xff, PSC_PWM, + 0xff, PMBUS_VIRT_PWM_1 + id, PSC_PWM, false, false, true);
if (!sensor) return -ENOMEM;
sensor = pmbus_add_sensor(data, "pwm", "enable", index, page, - PMBUS_VIRT_PWM_ENABLE_1 + id, 0xff, PSC_PWM, + 0xff, PMBUS_VIRT_PWM_ENABLE_1 + id, PSC_PWM, true, false, false);
if (!sensor) @@ -1929,7 +1929,7 @@ static int pmbus_add_fan_attributes(stru continue;
if (pmbus_add_sensor(data, "fan", "input", index, - page, pmbus_fan_registers[f], 0xff, + page, 0xff, pmbus_fan_registers[f], PSC_FAN, true, true, true) == NULL) return -ENOMEM;
From: Finley Xiao finley.xiao@rock-chips.com
commit 371a3bc79c11b707d7a1b7a2c938dc3cc042fffb upstream.
The function cpu_power_to_freq is used to find a frequency and set the cooling device to consume at most the power to be converted. For example, if the power to be converted is 80mW, and the em table is as follow. struct em_cap_state table[] = { /* KHz mW */ { 1008000, 36, 0 }, { 1200000, 49, 0 }, { 1296000, 59, 0 }, { 1416000, 72, 0 }, { 1512000, 86, 0 }, }; The target frequency should be 1416000KHz, not 1512000KHz.
Fixes: 349d39dc5739 ("thermal: cpu_cooling: merge frequency and power tables") Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: Finley Xiao finley.xiao@rock-chips.com Acked-by: Viresh Kumar viresh.kumar@linaro.org Reviewed-by: Amit Kucheria amit.kucheria@linaro.org Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://lore.kernel.org/r/20200619090825.32747-1-finley.xiao@rock-chips.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/thermal/cpufreq_cooling.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -123,12 +123,12 @@ static u32 cpu_power_to_freq(struct cpuf { int i;
- for (i = cpufreq_cdev->max_level - 1; i >= 0; i--) { - if (power > cpufreq_cdev->em->table[i].power) + for (i = cpufreq_cdev->max_level; i >= 0; i--) { + if (power >= cpufreq_cdev->em->table[i].power) break; }
- return cpufreq_cdev->em->table[i + 1].frequency; + return cpufreq_cdev->em->table[i].frequency; }
/**
From: Sumeet Pawnikar sumeet.r.pawnikar@intel.com
commit 0318e8374e87b32def1d5c279013ca7730a74982 upstream.
Tiger Lake's new unique ACPI device ID for Fan is not valid because of missing 'C' in the ID. Use correct fan device ID.
Fixes: c248dfe7e0ca ("ACPI: fan: Add Tiger Lake ACPI device ID") Signed-off-by: Sumeet Pawnikar sumeet.r.pawnikar@intel.com Cc: 5.6+ stable@vger.kernel.org # 5.6+ [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/fan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/acpi/fan.c +++ b/drivers/acpi/fan.c @@ -25,8 +25,8 @@ static int acpi_fan_remove(struct platfo
static const struct acpi_device_id fan_device_ids[] = { {"PNP0C0B", 0}, - {"INT1044", 0}, {"INT3404", 0}, + {"INTC1044", 0}, {"", 0}, }; MODULE_DEVICE_TABLE(acpi, fan_device_ids);
From: Bob Peterson rpeterso@redhat.com
commit 58e08e8d83ab03a1ca25d53420bd0b87f2dfe458 upstream.
Log flush operations (gfs2_log_flush()) can target a specific transaction. But if the function encounters errors (e.g. io errors) and withdraws, the transaction was only freed it if was queued to one of the ail lists. If the withdraw occurred before the transaction was queued to the ail1 list, function ail_drain never freed it. The result was:
BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown()
This patch makes log_flush() add the targeted transaction to the ail1 list so that function ail_drain() will find and free it properly.
Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/gfs2/log.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -987,6 +987,16 @@ void gfs2_log_flush(struct gfs2_sbd *sdp
out: if (gfs2_withdrawn(sdp)) { + /** + * If the tr_list is empty, we're withdrawing during a log + * flush that targets a transaction, but the transaction was + * never queued onto any of the ail lists. Here we add it to + * ail1 just so that ail_drain() will find and free it. + */ + spin_lock(&sdp->sd_ail_lock); + if (tr && list_empty(&tr->tr_list)) + list_add(&tr->tr_list, &sdp->sd_ail1_list); + spin_unlock(&sdp->sd_ail_lock); ail_drain(sdp); /* frees all transactions */ tr = NULL; }
From: Sean Christopherson sean.j.christopherson@intel.com
commit 009bce1df0bb5eb970b9eb98d963861f7fe353c7 upstream.
Choo! Choo! All aboard the Split Lock Express, with direct service to Wreckage!
Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible SLD-enabled CPU model to avoid writing MSR_TEST_CTRL. MSR_TEST_CTRL exists, and is writable, on many generations of CPUs. Writing the MSR, even with '0', can result in bizarre, undocumented behavior.
This fixes a crash on Haswell when resuming from suspend with a live KVM guest. Because APs use the standard SMP boot flow for resume, they will go through split_lock_init() and the subsequent RDMSR/WRMSR sequence, which runs even when sld_state==sld_off to ensure SLD is disabled. On Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will succeed and _may_ take the SMT _sibling_ out of VMX root mode.
When KVM has an active guest, KVM performs VMXON as part of CPU onlining (see kvm_starting_cpu()). Because SMP boot is serialized, the resulting flow is effectively:
on_each_ap_cpu() { WRMSR(MSR_TEST_CTRL, 0) VMXON }
As a result, the WRMSR can disable VMX on a different CPU that has already done VMXON. This ultimately results in a #UD on VMPTRLD when KVM regains control and attempt run its vCPUs.
The above voodoo was confirmed by reworking KVM's VMXON flow to write MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above. Further verification of the insanity was done by redoing VMXON on all APs after the initial WRMSR->VMXON sequence. The additional VMXON, which should VM-Fail, occasionally succeeded, and also eliminated the unexpected #UD on VMPTRLD.
The damage done by writing MSR_TEST_CTRL doesn't appear to be limited to VMX, e.g. after suspend with an active KVM guest, subsequent reboots almost always hang (even when fudging VMXON), a #UD on a random Jcc was observed, suspend/resume stability is qualitatively poor, and so on and so forth.
kernel BUG at arch/x86/kvm/x86.c:386! CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G D Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014 RIP: 0010:kvm_spurious_fault+0xf/0x20 Call Trace: vmx_vcpu_load_vmcs+0x1fb/0x2b0 vmx_vcpu_load+0x3e/0x160 kvm_arch_vcpu_load+0x48/0x260 finish_task_switch+0x140/0x260 __schedule+0x460/0x720 _cond_resched+0x2d/0x40 kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0 kvm_vcpu_ioctl+0x363/0x5c0 ksys_ioctl+0x88/0xa0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x4c/0x170 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: dbaba47085b0c ("x86/split_lock: Rework the initialization flow of split lock detection") Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200605192605.7439-1-sean.j.christopherson@intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/intel.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -49,6 +49,13 @@ static enum split_lock_detect_state sld_ static u64 msr_test_ctrl_cache __ro_after_init;
/* + * With a name like MSR_TEST_CTL it should go without saying, but don't touch + * MSR_TEST_CTL unless the CPU is one of the whitelisted models. Writing it + * on CPUs that do not support SLD can cause fireworks, even when writing '0'. + */ +static bool cpu_model_supports_sld __ro_after_init; + +/* * Processors which have self-snooping capability can handle conflicting * memory type across CPUs by snooping its own cache. However, there exists * CPU models in which having conflicting memory types still leads to @@ -1064,7 +1071,8 @@ static void sld_update_msr(bool on)
static void split_lock_init(void) { - split_lock_verify_msr(sld_state != sld_off); + if (cpu_model_supports_sld) + split_lock_verify_msr(sld_state != sld_off); }
static void split_lock_warn(unsigned long ip) @@ -1167,5 +1175,6 @@ void __init cpu_set_core_cap_bits(struct return; }
+ cpu_model_supports_sld = true; split_lock_setup(); }
From: Martin Blumenstingl martin.blumenstingl@googlemail.com
commit 03e62fd67d3ab33f39573fc8787d89dc9b4d7255 upstream.
The dt-bindings for the GSWIP describe that the node should be named "switch". Use the same name in sysctrl.c so the GSWIP driver can actually find the "gphy0" and "gphy1" clocks.
Fixes: 14fceff4771e51 ("net: dsa: Add Lantiq / Intel DSA driver for vrx200") Cc: stable@vger.kernel.org Signed-off-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Hauke Mehrtens hauke@hauke-m.de Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/lantiq/xway/sysctrl.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/mips/lantiq/xway/sysctrl.c +++ b/arch/mips/lantiq/xway/sysctrl.c @@ -514,8 +514,8 @@ void __init ltq_soc_init(void) clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH | PMU_PPE_DP | PMU_PPE_TC); clkdev_add_pmu("1da00000.usif", "NULL", 1, 0, PMU_USIF); - clkdev_add_pmu("1e108000.gswip", "gphy0", 0, 0, PMU_GPHY); - clkdev_add_pmu("1e108000.gswip", "gphy1", 0, 0, PMU_GPHY); + clkdev_add_pmu("1e108000.switch", "gphy0", 0, 0, PMU_GPHY); + clkdev_add_pmu("1e108000.switch", "gphy1", 0, 0, PMU_GPHY); clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU); clkdev_add_pmu("1e116000.mei", "afe", 1, 2, PMU_ANALOG_DSL_AFE); clkdev_add_pmu("1e116000.mei", "dfe", 1, 0, PMU_DFE); @@ -538,8 +538,8 @@ void __init ltq_soc_init(void) PMU_SWITCH | PMU_PPE_DPLUS | PMU_PPE_DPLUM | PMU_PPE_EMA | PMU_PPE_TC | PMU_PPE_SLL01 | PMU_PPE_QSB | PMU_PPE_TOP); - clkdev_add_pmu("1e108000.gswip", "gphy0", 0, 0, PMU_GPHY); - clkdev_add_pmu("1e108000.gswip", "gphy1", 0, 0, PMU_GPHY); + clkdev_add_pmu("1e108000.switch", "gphy0", 0, 0, PMU_GPHY); + clkdev_add_pmu("1e108000.switch", "gphy1", 0, 0, PMU_GPHY); clkdev_add_pmu("1e103000.sdio", NULL, 1, 0, PMU_SDIO); clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU); clkdev_add_pmu("1e116000.mei", "dfe", 1, 0, PMU_DFE);
From: Hauke Mehrtens hauke@hauke-m.de
commit fcec538ef8cca0ad0b84432235dccd9059c8e6f8 upstream.
This resolves the hazard between the mtc0 in the change_c0_status() and the mfc0 in configure_exception_vector(). Without resolving this hazard configure_exception_vector() could read an old value and would restore this old value again. This would revert the changes change_c0_status() did. I checked this by printing out the read_c0_status() at the end of per_cpu_trap_init() and the ST0_MX is not set without this patch.
The hazard is documented in the MIPS Architecture Reference Manual Vol. III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev 6.03 table 8.1 which includes:
Producer | Consumer | Hazard ----------|----------|---------------------------- mtc0 | mfc0 | any coprocessor 0 register
I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU. There the change_c0_status() function would activate the DSPen by setting ST0_MX in the c0_status register. This was reverted and then the system got a DSP exception when the DSP registers were saved in save_dsp() in the first process switch. The crash looks like this:
[ 0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear) [ 0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear) [ 0.107070] Kernel panic - not syncing: Unexpected DSP exception [ 0.113470] Rebooting in 1 seconds..
We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs, not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it did not happen.
In the kernel I bisected this problem to commit 9012d011660e ("compiler: allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was reverted it also happened after commit 172dcd935c34b ("MIPS: Always allocate exception vector for MIPSr2+").
Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.") does similar changes to a different file. I am not sure if there are more places affected by this problem.
Signed-off-by: Hauke Mehrtens hauke@hauke-m.de Cc: stable@vger.kernel.org Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/mips/kernel/traps.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -2121,6 +2121,7 @@ static void configure_status(void)
change_c0_status(ST0_CU|ST0_MX|ST0_RE|ST0_FR|ST0_BEV|ST0_TS|ST0_KX|ST0_SX|ST0_UX, status_set); + back_to_back_c0_hazard(); }
unsigned int hwrena;
From: Rodrigo Vivi rodrigo.vivi@intel.com
commit 55fd7e0222ea01246ef3e6aae28b5721fdfb790f upstream.
Alexandre Oliva has recently removed these files from Linux Libre with concerns that the sources weren't available.
The sources are available on IGT repository, and only open source tools are used to generate the {ivb,hsw}_clear_kernel.c files.
However, the remaining concern from Alexandre Oliva was around GPL license and the source not been present when distributing the code.
So, it looks like 2 alternatives are possible, the use of linux-firmware.git repository to store the blob or making sure that the source is also present in our tree. Since the goal is to limit the i915 firmware to only the micro-controller blobs let's make sure that we do include the asm sources here in our tree.
Btw, I tried to have some diligence here and make sure that the asms that these commits are adding are truly the source for the mentioned files:
igt$ ./scripts/generate_clear_kernel.sh -g ivb \ -m ~/mesa/build/src/intel/tools/i965_asm Output file not specified - using default file "ivb-cb_assembled"
Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c" for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \ ivb_clear_kernel.c
< * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC
- Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT
61c61 < };
};
\ No newline at end of file
igt$ ./scripts/generate_clear_kernel.sh -g hsw \ -m ~/mesa/build/src/intel/tools/i965_asm Output file not specified - using default file "hsw-cb_assembled"
Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c" for i915 driver...
igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \ hsw_clear_kernel.c 5c5 < * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC
- Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT
61c61 < };
};
\ No newline at end of file
Used IGT and Mesa master repositories from Fri Jun 5 2020) IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging the connector") Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with RADV_DEBUG=shaderstats") Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \ -D gallium-drivers=iris -D prefix=/usr \ -D libdir=/usr/lib64/ -Dtools=intel \ -Dkulkan-drivers=intel && ninja -C build
v2: Header clean-up and include build instructions in a readme (Chris) Modified commit message to respect check-patch
Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003374.html Reference: http://www.fsfla.org/pipermail/linux-libre/2020-June/003375.html Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") Cc: stable@vger.kernel.org # v5.7+ Cc: Alexandre Oliva lxoliva@fsfla.org Cc: Prathap Kumar Valsan prathap.kumar.valsan@intel.com Cc: Akeem G Abodunrin akeem.g.abodunrin@intel.com Cc: Mika Kuoppala mika.kuoppala@linux.intel.com Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Jani Nikula jani.nikula@intel.com Cc: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Jon Bloomfield jon.bloomfield@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20200610201807.191440-1-rodrig... (cherry picked from commit 5a7eeb8ba143d860050ecea924a8f074f02d8023) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/gt/shaders/README | 46 +++++++ drivers/gpu/drm/i915/gt/shaders/clear_kernel/hsw.asm | 119 +++++++++++++++++++ drivers/gpu/drm/i915/gt/shaders/clear_kernel/ivb.asm | 117 ++++++++++++++++++ 3 files changed, 282 insertions(+)
--- /dev/null +++ b/drivers/gpu/drm/i915/gt/shaders/README @@ -0,0 +1,46 @@ +ASM sources for auto generated shaders +====================================== + +The i915/gt/hsw_clear_kernel.c and i915/gt/ivb_clear_kernel.c files contain +pre-compiled batch chunks that will clear any residual render cache during +context switch. + +They are generated from their respective platform ASM files present on +i915/gt/shaders/clear_kernel directory. + +The generated .c files should never be modified directly. Instead, any modification +needs to be done on the on their respective ASM files and build instructions below +needes to be followed. + +Building +======== + +Environment +----------- + +IGT GPU tool scripts and the Mesa's i965 instruction assembler tool are used +on building. + +Please make sure your Mesa tool is compiled with "-Dtools=intel" and +"-Ddri-drivers=i965", and run this script from IGT source root directory" + +The instructions bellow assume: + * IGT gpu tools source code is located on your home directory (~) as ~/igt + * Mesa source code is located on your home directory (~) as ~/mesa + and built under the ~/mesa/build directory + * Linux kernel source code is under your home directory (~) as ~/linux + +Instructions +------------ + +~ $ cp ~/linux/drivers/gpu/drm/i915/gt/shaders/clear_kernel/ivb.asm \ + ~/igt/lib/i915/shaders/clear_kernel/ivb.asm +~ $ cd ~/igt +igt $ ./scripts/generate_clear_kernel.sh -g ivb \ + -m ~/mesa/build/src/intel/tools/i965_asm + +~ $ cp ~/linux/drivers/gpu/drm/i915/gt/shaders/clear_kernel/hsw.asm \ + ~/igt/lib/i915/shaders/clear_kernel/hsw.asm +~ $ cd ~/igt +igt $ ./scripts/generate_clear_kernel.sh -g hsw \ + -m ~/mesa/build/src/intel/tools/i965_asm \ No newline at end of file --- /dev/null +++ b/drivers/gpu/drm/i915/gt/shaders/clear_kernel/hsw.asm @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +/* + * Kernel for PAVP buffer clear. + * + * 1. Clear all 64 GRF registers assigned to the kernel with designated value; + * 2. Write 32x16 block of all "0" to render target buffer which indirectly clears + * 512 bytes of Render Cache. + */ + +/* Store designated "clear GRF" value */ +mov(1) f0.1<1>UW g1.2<0,1,0>UW { align1 1N }; + +/** + * Curbe Format + * + * DW 1.0 - Block Offset to write Render Cache + * DW 1.1 [15:0] - Clear Word + * DW 1.2 - Delay iterations + * DW 1.3 - Enable Instrumentation (only for debug) + * DW 1.4 - Rsvd (intended for context ID) + * DW 1.5 - [31:16]:SliceCount, [15:0]:SubSlicePerSliceCount + * DW 1.6 - Rsvd MBZ (intended for Enable Wait on Total Thread Count) + * DW 1.7 - Rsvd MBZ (inteded for Total Thread Count) + * + * Binding Table + * + * BTI 0: 2D Surface to help clear L3 (Render/Data Cache) + * BTI 1: Wait/Instrumentation Buffer + * Size : (SliceCount * SubSliceCount * 16 EUs/SubSlice) rows * (16 threads/EU) cols (Format R32_UINT) + * Expected to be initialized to 0 by driver/another kernel + * Layout: + * RowN: Histogram for EU-N: (SliceID*SubSlicePerSliceCount + SSID)*16 + EUID [assume max 16 EUs / SS] + * Col-k[DW-k]: Threads Executed on ThreadID-k for EU-N + */ +add(1) g1.2<1>UD g1.2<0,1,0>UD 0x00000001UD { align1 1N }; /* Loop count to delay kernel: Init to (g1.2 + 1) */ +cmp.z.f0.0(1) null<1>UD g1.3<0,1,0>UD 0x00000000UD { align1 1N }; +(+f0.0) jmpi(1) 352D { align1 WE_all 1N }; + +/** + * State Register has info on where this thread is running + * IVB: sr0.0 :: [15:13]: MBZ, 12: HSID (Half-Slice ID), [11:8]EUID, [2:0] ThreadSlotID + * HSW: sr0.0 :: 15: MBZ, [14:13]: SliceID, 12: HSID (Half-Slice ID), [11:8]EUID, [2:0] ThreadSlotID + */ +mov(8) g3<1>UD 0x00000000UD { align1 1Q }; +shr(1) g3<1>D sr0<0,1,0>D 12D { align1 1N }; +and(1) g3<1>D g3<0,1,0>D 1D { align1 1N }; /* g3 has HSID */ +shr(1) g3.1<1>D sr0<0,1,0>D 13D { align1 1N }; +and(1) g3.1<1>D g3.1<0,1,0>D 3D { align1 1N }; /* g3.1 has sliceID */ +mul(1) g3.5<1>D g3.1<0,1,0>D g1.10<0,1,0>UW { align1 1N }; +add(1) g3<1>D g3<0,1,0>D g3.5<0,1,0>D { align1 1N }; /* g3 = sliceID * SubSlicePerSliceCount + HSID */ +shr(1) g3.2<1>D sr0<0,1,0>D 8D { align1 1N }; +and(1) g3.2<1>D g3.2<0,1,0>D 15D { align1 1N }; /* g3.2 = EUID */ +mul(1) g3.4<1>D g3<0,1,0>D 16D { align1 1N }; +add(1) g3.2<1>D g3.2<0,1,0>D g3.4<0,1,0>D { align1 1N }; /* g3.2 now points to EU row number (Y-pixel = V address ) in instrumentation surf */ + +mov(8) g5<1>UD 0x00000000UD { align1 1Q }; +and(1) g3.3<1>D sr0<0,1,0>D 7D { align1 1N }; +mul(1) g3.3<1>D g3.3<0,1,0>D 4D { align1 1N }; + +mov(8) g4<1>UD g0<8,8,1>UD { align1 1Q }; /* Initialize message header with g0 */ +mov(1) g4<1>UD g3.3<0,1,0>UD { align1 1N }; /* Block offset */ +mov(1) g4.1<1>UD g3.2<0,1,0>UD { align1 1N }; /* Block offset */ +mov(1) g4.2<1>UD 0x00000003UD { align1 1N }; /* Block size (1 row x 4 bytes) */ +and(1) g4.3<1>UD g4.3<0,1,0>UW 0xffffffffUD { align1 1N }; + +/* Media block read to fetch current value at specified location in instrumentation buffer */ +sendc(8) g5<1>UD g4<8,8,1>F 0x02190001 + + render MsgDesc: media block read MsgCtrl = 0x0 Surface = 1 mlen 1 rlen 1 { align1 1Q }; +add(1) g5<1>D g5<0,1,0>D 1D { align1 1N }; + +/* Media block write for updated value at specified location in instrumentation buffer */ +sendc(8) g5<1>UD g4<8,8,1>F 0x040a8001 + render MsgDesc: media block write MsgCtrl = 0x0 Surface = 1 mlen 2 rlen 0 { align1 1Q }; + +/* Delay thread for specified parameter */ +add.nz.f0.0(1) g1.2<1>UD g1.2<0,1,0>UD -1D { align1 1N }; +(+f0.0) jmpi(1) -32D { align1 WE_all 1N }; + +/* Store designated "clear GRF" value */ +mov(1) f0.1<1>UW g1.2<0,1,0>UW { align1 1N }; + +/* Initialize looping parameters */ +mov(1) a0<1>D 0D { align1 1N }; /* Initialize a0.0:w=0 */ +mov(1) a0.4<1>W 127W { align1 1N }; /* Loop count. Each loop contains 16 GRF's */ + +/* Write 32x16 all "0" block */ +mov(8) g2<1>UD g0<8,8,1>UD { align1 1Q }; +mov(8) g127<1>UD g0<8,8,1>UD { align1 1Q }; +mov(2) g2<1>UD g1<2,2,1>UW { align1 1N }; +mov(1) g2.2<1>UD 0x000f000fUD { align1 1N }; /* Block size (16x16) */ +and(1) g2.3<1>UD g2.3<0,1,0>UW 0xffffffefUD { align1 1N }; +mov(16) g3<1>UD 0x00000000UD { align1 1H }; +mov(16) g4<1>UD 0x00000000UD { align1 1H }; +mov(16) g5<1>UD 0x00000000UD { align1 1H }; +mov(16) g6<1>UD 0x00000000UD { align1 1H }; +mov(16) g7<1>UD 0x00000000UD { align1 1H }; +mov(16) g8<1>UD 0x00000000UD { align1 1H }; +mov(16) g9<1>UD 0x00000000UD { align1 1H }; +mov(16) g10<1>UD 0x00000000UD { align1 1H }; +sendc(8) null<1>UD g2<8,8,1>F 0x120a8000 + render MsgDesc: media block write MsgCtrl = 0x0 Surface = 0 mlen 9 rlen 0 { align1 1Q }; +add(1) g2<1>UD g1<0,1,0>UW 0x0010UW { align1 1N }; +sendc(8) null<1>UD g2<8,8,1>F 0x120a8000 + render MsgDesc: media block write MsgCtrl = 0x0 Surface = 0 mlen 9 rlen 0 { align1 1Q }; + +/* Now, clear all GRF registers */ +add.nz.f0.0(1) a0.4<1>W a0.4<0,1,0>W -1W { align1 1N }; +mov(16) g[a0]<1>UW f0.1<0,1,0>UW { align1 1H }; +add(1) a0<1>D a0<0,1,0>D 32D { align1 1N }; +(+f0.0) jmpi(1) -64D { align1 WE_all 1N }; + +/* Terminante the thread */ +sendc(8) null<1>UD g127<8,8,1>F 0x82000010 + thread_spawner MsgDesc: mlen 1 rlen 0 { align1 1Q EOT }; --- /dev/null +++ b/drivers/gpu/drm/i915/gt/shaders/clear_kernel/ivb.asm @@ -0,0 +1,117 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +/* + * Kernel for PAVP buffer clear. + * + * 1. Clear all 64 GRF registers assigned to the kernel with designated value; + * 2. Write 32x16 block of all "0" to render target buffer which indirectly clears + * 512 bytes of Render Cache. + */ + +/* Store designated "clear GRF" value */ +mov(1) f0.1<1>UW g1.2<0,1,0>UW { align1 1N }; + +/** + * Curbe Format + * + * DW 1.0 - Block Offset to write Render Cache + * DW 1.1 [15:0] - Clear Word + * DW 1.2 - Delay iterations + * DW 1.3 - Enable Instrumentation (only for debug) + * DW 1.4 - Rsvd (intended for context ID) + * DW 1.5 - [31:16]:SliceCount, [15:0]:SubSlicePerSliceCount + * DW 1.6 - Rsvd MBZ (intended for Enable Wait on Total Thread Count) + * DW 1.7 - Rsvd MBZ (inteded for Total Thread Count) + * + * Binding Table + * + * BTI 0: 2D Surface to help clear L3 (Render/Data Cache) + * BTI 1: Wait/Instrumentation Buffer + * Size : (SliceCount * SubSliceCount * 16 EUs/SubSlice) rows * (16 threads/EU) cols (Format R32_UINT) + * Expected to be initialized to 0 by driver/another kernel + * Layout : + * RowN: Histogram for EU-N: (SliceID*SubSlicePerSliceCount + SSID)*16 + EUID [assume max 16 EUs / SS] + * Col-k[DW-k]: Threads Executed on ThreadID-k for EU-N + */ +add(1) g1.2<1>UD g1.2<0,1,0>UD 0x00000001UD { align1 1N }; /* Loop count to delay kernel: Init to (g1.2 + 1) */ +cmp.z.f0.0(1) null<1>UD g1.3<0,1,0>UD 0x00000000UD { align1 1N }; +(+f0.0) jmpi(1) 44D { align1 WE_all 1N }; + +/** + * State Register has info on where this thread is running + * IVB: sr0.0 :: [15:13]: MBZ, 12: HSID (Half-Slice ID), [11:8]EUID, [2:0] ThreadSlotID + * HSW: sr0.0 :: 15: MBZ, [14:13]: SliceID, 12: HSID (Half-Slice ID), [11:8]EUID, [2:0] ThreadSlotID + */ +mov(8) g3<1>UD 0x00000000UD { align1 1Q }; +shr(1) g3<1>D sr0<0,1,0>D 12D { align1 1N }; +and(1) g3<1>D g3<0,1,0>D 1D { align1 1N }; /* g3 has HSID */ +shr(1) g3.1<1>D sr0<0,1,0>D 13D { align1 1N }; +and(1) g3.1<1>D g3.1<0,1,0>D 3D { align1 1N }; /* g3.1 has sliceID */ +mul(1) g3.5<1>D g3.1<0,1,0>D g1.10<0,1,0>UW { align1 1N }; +add(1) g3<1>D g3<0,1,0>D g3.5<0,1,0>D { align1 1N }; /* g3 = sliceID * SubSlicePerSliceCount + HSID */ +shr(1) g3.2<1>D sr0<0,1,0>D 8D { align1 1N }; +and(1) g3.2<1>D g3.2<0,1,0>D 15D { align1 1N }; /* g3.2 = EUID */ +mul(1) g3.4<1>D g3<0,1,0>D 16D { align1 1N }; +add(1) g3.2<1>D g3.2<0,1,0>D g3.4<0,1,0>D { align1 1N }; /* g3.2 now points to EU row number (Y-pixel = V address ) in instrumentation surf */ + +mov(8) g5<1>UD 0x00000000UD { align1 1Q }; +and(1) g3.3<1>D sr0<0,1,0>D 7D { align1 1N }; +mul(1) g3.3<1>D g3.3<0,1,0>D 4D { align1 1N }; + +mov(8) g4<1>UD g0<8,8,1>UD { align1 1Q }; /* Initialize message header with g0 */ +mov(1) g4<1>UD g3.3<0,1,0>UD { align1 1N }; /* Block offset */ +mov(1) g4.1<1>UD g3.2<0,1,0>UD { align1 1N }; /* Block offset */ +mov(1) g4.2<1>UD 0x00000003UD { align1 1N }; /* Block size (1 row x 4 bytes) */ +and(1) g4.3<1>UD g4.3<0,1,0>UW 0xffffffffUD { align1 1N }; + +/* Media block read to fetch current value at specified location in instrumentation buffer */ +sendc(8) g5<1>UD g4<8,8,1>F 0x02190001 + render MsgDesc: media block read MsgCtrl = 0x0 Surface = 1 mlen 1 rlen 1 { align1 1Q }; +add(1) g5<1>D g5<0,1,0>D 1D { align1 1N }; + +/* Media block write for updated value at specified location in instrumentation buffer */ +sendc(8) g5<1>UD g4<8,8,1>F 0x040a8001 + render MsgDesc: media block write MsgCtrl = 0x0 Surface = 1 mlen 2 rlen 0 { align1 1Q }; +/* Delay thread for specified parameter */ +add.nz.f0.0(1) g1.2<1>UD g1.2<0,1,0>UD -1D { align1 1N }; +(+f0.0) jmpi(1) -4D { align1 WE_all 1N }; + +/* Store designated "clear GRF" value */ +mov(1) f0.1<1>UW g1.2<0,1,0>UW { align1 1N }; + +/* Initialize looping parameters */ +mov(1) a0<1>D 0D { align1 1N }; /* Initialize a0.0:w=0 */ +mov(1) a0.4<1>W 127W { align1 1N }; /* Loop count. Each loop contains 16 GRF's */ + +/* Write 32x16 all "0" block */ +mov(8) g2<1>UD g0<8,8,1>UD { align1 1Q }; +mov(8) g127<1>UD g0<8,8,1>UD { align1 1Q }; +mov(2) g2<1>UD g1<2,2,1>UW { align1 1N }; +mov(1) g2.2<1>UD 0x000f000fUD { align1 1N }; /* Block size (16x16) */ +and(1) g2.3<1>UD g2.3<0,1,0>UW 0xffffffefUD { align1 1N }; +mov(16) g3<1>UD 0x00000000UD { align1 1H }; +mov(16) g4<1>UD 0x00000000UD { align1 1H }; +mov(16) g5<1>UD 0x00000000UD { align1 1H }; +mov(16) g6<1>UD 0x00000000UD { align1 1H }; +mov(16) g7<1>UD 0x00000000UD { align1 1H }; +mov(16) g8<1>UD 0x00000000UD { align1 1H }; +mov(16) g9<1>UD 0x00000000UD { align1 1H }; +mov(16) g10<1>UD 0x00000000UD { align1 1H }; +sendc(8) null<1>UD g2<8,8,1>F 0x120a8000 + render MsgDesc: media block write MsgCtrl = 0x0 Surface = 0 mlen 9 rlen 0 { align1 1Q }; +add(1) g2<1>UD g1<0,1,0>UW 0x0010UW { align1 1N }; +sendc(8) null<1>UD g2<8,8,1>F 0x120a8000 + render MsgDesc: media block write MsgCtrl = 0x0 Surface = 0 mlen 9 rlen 0 { align1 1Q }; + +/* Now, clear all GRF registers */ +add.nz.f0.0(1) a0.4<1>W a0.4<0,1,0>W -1W { align1 1N }; +mov(16) g[a0]<1>UW f0.1<0,1,0>UW { align1 1H }; +add(1) a0<1>D a0<0,1,0>D 32D { align1 1N }; +(+f0.0) jmpi(1) -8D { align1 WE_all 1N }; + +/* Terminante the thread */ +sendc(8) null<1>UD g127<8,8,1>F 0x82000010 + thread_spawner MsgDesc: mlen 1 rlen 0 { align1 1Q EOT };
From: Ivan Mironov mironov.ivan@gmail.com
commit 7e89e4aaa9ae83107d059c186955484b3aa6eb23 upstream.
I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and following started to happen on each boot:
... BUG: kernel NULL pointer dereference, address: 0000000000000128 ... CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 #1 Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020 RIP: 0010:lock_bus+0x42/0x60 [amdgpu] ... Call Trace: i2c_smbus_xfer+0x3d/0xf0 i2c_default_probe+0xf3/0x130 i2c_detect.isra.0+0xfe/0x2b0 ? kfree+0xa3/0x200 ? kobject_uevent_env+0x11f/0x6a0 ? i2c_detect.isra.0+0x2b0/0x2b0 __process_new_driver+0x1b/0x20 bus_for_each_dev+0x64/0x90 ? 0xffffffffc0f34000 i2c_register_driver+0x73/0xc0 do_one_initcall+0x46/0x200 ? _cond_resched+0x16/0x40 ? kmem_cache_alloc_trace+0x167/0x220 ? do_init_module+0x23/0x260 do_init_module+0x5c/0x260 __do_sys_init_module+0x14f/0x170 do_syscall_64+0x5b/0xf0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ...
Error appears when some i2c device driver tries to probe for devices using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`. Code supporting this adapter requires `adev->psp.ras.ras` to be not NULL, which is true only when `amdgpu_ras_init()` detects HW support by calling `amdgpu_ras_check_supported()`.
Before 9015d60c9ee1, adapter was registered by
-> amdgpu_device_ip_init() -> amdgpu_ras_recovery_init() -> amdgpu_ras_eeprom_init() -> smu_v11_0_i2c_eeprom_control_init()
after verifying that `adev->psp.ras.ras` is not NULL in `amdgpu_ras_recovery_init()`. Currently it is registered unconditionally by
-> amdgpu_device_ip_init() -> pp_sw_init() -> hwmgr_sw_init() -> vega20_smu_init() -> smu_v11_0_i2c_eeprom_control_init()
Fix simply adds HW support check (ras == NULL => no support) before calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.
Please note that there is a chance that similar fix is also required for CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense when there is no HW support.
Cc: stable@vger.kernel.org Fixes: 9015d60c9ee1 ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device") Signed-off-by: Ivan Mironov mironov.ivan@gmail.com Tested-by: Bjorn Nostvold bjorn.nostvold@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c +++ b/drivers/gpu/drm/amd/powerplay/smumgr/vega20_smumgr.c @@ -508,9 +508,11 @@ static int vega20_smu_init(struct pp_hwm priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].version = 0x01; priv->smu_tables.entry[TABLE_ACTIVITY_MONITOR_COEFF].size = sizeof(DpmActivityMonitorCoeffInt_t);
- ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c); - if (ret) - goto err4; + if (adev->psp.ras.ras) { + ret = smu_v11_0_i2c_eeprom_control_init(&adev->pm.smu_i2c); + if (ret) + goto err4; + }
return 0;
@@ -546,7 +548,8 @@ static int vega20_smu_fini(struct pp_hwm (struct vega20_smumgr *)(hwmgr->smu_backend); struct amdgpu_device *adev = hwmgr->adev;
- smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c); + if (adev->psp.ras.ras) + smu_v11_0_i2c_eeprom_control_fini(&adev->pm.smu_i2c);
if (priv) { amdgpu_bo_free_kernel(&priv->smu_tables.entry[TABLE_PPTABLE].handle,
From: Nicholas Kazlauskas nicholas.kazlauskas@amd.com
commit 6eb3cf2e06d22b2b08e6b0ab48cb9c05a8e1a107 upstream.
[Why] Changes that are fast don't require updating DLG parameters making this call unnecessary. Considering this is an expensive call it should not be done on every flip.
DML touches clocks, p-state support, DLG params and a few other DC internal flags and these aren't expected during fast. A hang has been reported with this change when called on every flip which suggests that modifying these fields is not recommended behavior on fast updates.
[How] Guard the validation to only happen if update type isn't FAST.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1191 Fixes: a24eaa5c51255b ("drm/amd/display: Revalidate bandwidth before commiting DC updates") Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Roman Li Roman.Li@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/display/dc/core/dc.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c @@ -2533,10 +2533,12 @@ void dc_commit_updates_for_stream(struct
copy_stream_update_to_stream(dc, context, stream, stream_update);
- if (!dc->res_pool->funcs->validate_bandwidth(dc, context, false)) { - DC_ERROR("Mode validation failed for stream update!\n"); - dc_release_state(context); - return; + if (update_type > UPDATE_TYPE_FAST) { + if (!dc->res_pool->funcs->validate_bandwidth(dc, context, false)) { + DC_ERROR("Mode validation failed for stream update!\n"); + dc_release_state(context); + return; + } }
commit_planes_for_stream(
From: Alex Deucher alexander.deucher@amd.com
commit beaf10efca64ac824240838ab1f054dfbefab5e6 upstream.
Large clock values may overflow and show up as negative.
Reported by prOMiNd on IRC.
Acked-by: Nirmoy Das nirmoy.das@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c @@ -2590,7 +2590,7 @@ static ssize_t amdgpu_hwmon_show_sclk(st if (r) return r;
- return snprintf(buf, PAGE_SIZE, "%d\n", sclk * 10 * 1000); + return snprintf(buf, PAGE_SIZE, "%u\n", sclk * 10 * 1000); }
static ssize_t amdgpu_hwmon_show_sclk_label(struct device *dev, @@ -2622,7 +2622,7 @@ static ssize_t amdgpu_hwmon_show_mclk(st if (r) return r;
- return snprintf(buf, PAGE_SIZE, "%d\n", mclk * 10 * 1000); + return snprintf(buf, PAGE_SIZE, "%u\n", mclk * 10 * 1000); }
static ssize_t amdgpu_hwmon_show_mclk_label(struct device *dev,
From: Alex Deucher alexander.deucher@amd.com
commit d7a6634a4cfba073ff6a526cb4265d6e58ece234 upstream.
Renoir uses integrated_system_info table v12. The table has the same layout as v11 with respect to this data. Just reuse the existing code for v12 for stable.
Fixes incorrectly reported vram info in the driver output.
Acked-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c @@ -204,6 +204,7 @@ amdgpu_atomfirmware_get_vram_info(struct (mode_info->atom_context->bios + data_offset); switch (crev) { case 11: + case 12: mem_channel_number = igp_info->v11.umachannelnumber; /* channel width is 64 */ if (vram_width)
From: Sumit Semwal sumit.semwal@linaro.org
commit 4ab59c3c638c6c8952bf07739805d20eb6358a4d upstream.
Charan Teja reported a 'use-after-free' in dmabuffs_dname [1], which happens if the dma_buf_release() is called while the userspace is accessing the dma_buf pseudo fs's dmabuffs_dname() in another process, and dma_buf_release() releases the dmabuf object when the last reference to the struct file goes away.
I discussed with Arnd Bergmann, and he suggested that rather than tying the dma_buf_release() to the file_operations' release(), we can tie it to the dentry_operations' d_release(), which will be called when the last ref to the dentry is removed.
The path exercised by __fput() calls f_op->release() first, and then calls dput, which eventually calls d_op->d_release().
In the 'normal' case, when no userspace access is happening via dma_buf pseudo fs, there should be exactly one fd, file, dentry and inode, so closing the fd will kill of everything right away.
In the presented case, the dentry's d_release() will be called only when the dentry's last ref is released.
Therefore, lets move dma_buf_release() from fops->release() to d_ops->d_release()
Many thanks to Arnd for his FS insights :)
[1]: https://lore.kernel.org/patchwork/patch/1238278/
Fixes: bb2bb9030425 ("dma-buf: add DMA_BUF_SET_NAME ioctls") Reported-by: syzbot+3643a18836bce555bff6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org [5.3+] Cc: Arnd Bergmann arnd@arndb.de Reported-by: Charan Teja Reddy charante@codeaurora.org Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sumit Semwal sumit.semwal@linaro.org Tested-by: Charan Teja Reddy charante@codeaurora.org Link: https://patchwork.freedesktop.org/patch/msgid/20200611114418.19852-1-sumit.s... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/dma-buf/dma-buf.c | 54 +++++++++++++++++++++------------------------- 1 file changed, 25 insertions(+), 29 deletions(-)
--- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -54,37 +54,11 @@ static char *dmabuffs_dname(struct dentr dentry->d_name.name, ret > 0 ? name : ""); }
-static const struct dentry_operations dma_buf_dentry_ops = { - .d_dname = dmabuffs_dname, -}; - -static struct vfsmount *dma_buf_mnt; - -static int dma_buf_fs_init_context(struct fs_context *fc) -{ - struct pseudo_fs_context *ctx; - - ctx = init_pseudo(fc, DMA_BUF_MAGIC); - if (!ctx) - return -ENOMEM; - ctx->dops = &dma_buf_dentry_ops; - return 0; -} - -static struct file_system_type dma_buf_fs_type = { - .name = "dmabuf", - .init_fs_context = dma_buf_fs_init_context, - .kill_sb = kill_anon_super, -}; - -static int dma_buf_release(struct inode *inode, struct file *file) +static void dma_buf_release(struct dentry *dentry) { struct dma_buf *dmabuf;
- if (!is_dma_buf_file(file)) - return -EINVAL; - - dmabuf = file->private_data; + dmabuf = dentry->d_fsdata;
BUG_ON(dmabuf->vmapping_counter);
@@ -110,9 +84,32 @@ static int dma_buf_release(struct inode module_put(dmabuf->owner); kfree(dmabuf->name); kfree(dmabuf); +} + +static const struct dentry_operations dma_buf_dentry_ops = { + .d_dname = dmabuffs_dname, + .d_release = dma_buf_release, +}; + +static struct vfsmount *dma_buf_mnt; + +static int dma_buf_fs_init_context(struct fs_context *fc) +{ + struct pseudo_fs_context *ctx; + + ctx = init_pseudo(fc, DMA_BUF_MAGIC); + if (!ctx) + return -ENOMEM; + ctx->dops = &dma_buf_dentry_ops; return 0; }
+static struct file_system_type dma_buf_fs_type = { + .name = "dmabuf", + .init_fs_context = dma_buf_fs_init_context, + .kill_sb = kill_anon_super, +}; + static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma) { struct dma_buf *dmabuf; @@ -412,7 +409,6 @@ static void dma_buf_show_fdinfo(struct s }
static const struct file_operations dma_buf_fops = { - .release = dma_buf_release, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
From: Marc Zyngier maz@kernel.org
commit 005c34ae4b44f085120d7f371121ec7ded677761 upstream.
The GIC driver uses a RMW sequence to update the affinity, and relies on the gic_lock_irqsave/gic_unlock_irqrestore sequences to update it atomically.
But these sequences only expand into anything meaningful if the BL_SWITCHER option is selected, which almost never happens.
It also turns out that using a RMW and locks is just as silly, as the GIC distributor supports byte accesses for the GICD_TARGETRn registers, which when used make the update atomic by definition.
Drop the terminally broken code and replace it by a byte write.
Fixes: 04c8b0f82c7d ("irqchip/gic: Make locking a BL_SWITCHER only feature") Cc: stable@vger.kernel.org Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/irqchip/irq-gic.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
--- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -329,10 +329,8 @@ static int gic_irq_set_vcpu_affinity(str static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val, bool force) { - void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + (gic_irq(d) & ~3); - unsigned int cpu, shift = (gic_irq(d) % 4) * 8; - u32 val, mask, bit; - unsigned long flags; + void __iomem *reg = gic_dist_base(d) + GIC_DIST_TARGET + gic_irq(d); + unsigned int cpu;
if (!force) cpu = cpumask_any_and(mask_val, cpu_online_mask); @@ -342,13 +340,7 @@ static int gic_set_affinity(struct irq_d if (cpu >= NR_GIC_CPU_IF || cpu >= nr_cpu_ids) return -EINVAL;
- gic_lock_irqsave(flags); - mask = 0xff << shift; - bit = gic_cpu_map[cpu] << shift; - val = readl_relaxed(reg) & ~mask; - writel_relaxed(val | bit, reg); - gic_unlock_irqrestore(flags); - + writeb_relaxed(gic_cpu_map[cpu], reg); irq_data_update_effective_affinity(d, cpumask_of(cpu));
return IRQ_SET_MASK_OK_DONE;
From: Mike Kravetz mike.kravetz@oracle.com
commit 1139d336fff425f9a20374945cdd28eb44d09fa8 upstream.
The routine hpage_nr_pages() was incorrectly used to calculate the number of base pages in a hugetlb page. hpage_nr_pages is designed to be called for THP pages and will return HPAGE_PMD_NR for hugetlb pages of any size.
Due to the context in which hpage_nr_pages was called, it is unlikely to produce a user visible error. The routine with the incorrect call is only exercised in the case of hugetlb memory error or migration. In addition, this would need to be on an architecture which supports huge page sizes less than PMD_SIZE. And, the vma containing the huge page would also need to smaller than PMD_SIZE.
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization") Reported-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Mike Kravetz mike.kravetz@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Cc: Michal Hocko mhocko@kernel.org Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20200629185003.97202-1-mike.kravetz@oracle.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/hugetlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1594,7 +1594,7 @@ static struct address_space *_get_hugetl
/* Use first found vma */ pgoff_start = page_to_pgoff(hpage); - pgoff_end = pgoff_start + hpage_nr_pages(hpage) - 1; + pgoff_end = pgoff_start + pages_per_huge_page(page_hstate(hpage)) - 1; anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff_start, pgoff_end) { struct vm_area_struct *vma = avc->vma;
From: Barry Song song.bao.hua@hisilicon.com
commit 40366bd70bbbbf822ca224dfc227a8c8e868c44f upstream.
Calling cma_declare_contiguous_nid() with false exact_nid for per-numa reservation can easily cause cma leak and various confusion. For example, mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages. But it can easily leak cma and make users confused when system has memoryless nodes.
In case the system has 4 numa nodes, and only numa node0 has memory. if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4 different numa nodes. since exact_nid=false in current code, all 4 numa nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will never be available to hugepage will only allocate memory from hugetlb_cma[0].
In case the system has 4 numa nodes, both numa node0&2 has memory, other nodes have no memory. if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4 different numa nodes. since exact_nid=false in current code, all 4 numa nodes will get cma successfully from node0 or 2, but hugetlb_cma[1] and [3] will never be available to hugepage as mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and hugetlb_cma[2]. This causes permanent leak of the cma areas which are supposed to be used by memoryless node.
Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma areas in alloc_gigantic_page() even node_mask includes node0 only. that means when node_mask includes node0 only, we can get page from hugetlb_cma[1] to hugetlb_cma[3]. But this will cause kernel crash in free_gigantic_page() while it wants to free page by: cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)
On the other hand, exact_nid=false won't consider numa distance, it might be not that useful to leverage cma areas on remote nodes. I feel it is much simpler to make exact_nid true to make everything clear. After that, memoryless nodes won't be able to reserve per-numa CMA from other nodes which have memory.
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma") Signed-off-by: Barry Song song.bao.hua@hisilicon.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Acked-by: Roman Gushchin guro@fb.com Cc: Jonathan Cameron Jonathan.Cameron@huawei.com Cc: Aslan Bakirov aslan@fb.com Cc: Michal Hocko mhocko@kernel.org Cc: Andreas Schaufler andreas.schaufler@gmx.de Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Rik van Riel riel@surriel.com Cc: Joonsoo Kim js1304@gmail.com Cc: Robin Murphy robin.murphy@arm.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/cma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/cma.c +++ b/mm/cma.c @@ -339,13 +339,13 @@ int __init cma_declare_contiguous_nid(ph */ if (base < highmem_start && limit > highmem_start) { addr = memblock_alloc_range_nid(size, alignment, - highmem_start, limit, nid, false); + highmem_start, limit, nid, true); limit = highmem_start; }
if (!addr) { addr = memblock_alloc_range_nid(size, alignment, base, - limit, nid, false); + limit, nid, true); if (!addr) { ret = -ENOMEM; goto err;
From: Hou Tao houtao1@huawei.com
commit 7b2377486767503d47265e4d487a63c651f6b55d upstream.
The unit of max_io_len is sector instead of byte (spotted through code review), so fix it.
Fixes: 3b1a94c88b79 ("dm zoned: drive-managed zoned block device target") Cc: stable@vger.kernel.org Signed-off-by: Hou Tao houtao1@huawei.com Reviewed-by: Damien Le Moal damien.lemoal@wdc.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/dm-zoned-target.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/dm-zoned-target.c +++ b/drivers/md/dm-zoned-target.c @@ -790,7 +790,7 @@ static int dmz_ctr(struct dm_target *ti, }
/* Set target (no write same support) */ - ti->max_io_len = dev->zone_nr_sectors << 9; + ti->max_io_len = dev->zone_nr_sectors; ti->num_flush_bios = 1; ti->num_discard_bios = 1; ti->num_write_zeroes_bios = 1;
From: Peter Jones pjones@redhat.com
commit 435d1a471598752446a72ad1201b3c980526d869 upstream.
In most cases, such as CONFIG_ACPI_CUSTOM_DSDT and CONFIG_ACPI_TABLE_UPGRADE, boot-time modifications to firmware tables are tied to specific Kconfig options. Currently this is not the case for modifying the ACPI SSDT via the efivar_ssdt kernel command line option and associated EFI variable.
This patch adds CONFIG_EFI_CUSTOM_SSDT_OVERLAYS, which defaults disabled, in order to allow enabling or disabling that feature during the build.
Cc: stable@vger.kernel.org Signed-off-by: Peter Jones pjones@redhat.com Link: https://lore.kernel.org/r/20200615202408.2242614-1-pjones@redhat.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/firmware/efi/Kconfig | 11 +++++++++++ drivers/firmware/efi/efi.c | 2 +- 2 files changed, 12 insertions(+), 1 deletion(-)
--- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -267,3 +267,14 @@ config EFI_EARLYCON depends on SERIAL_EARLYCON && !ARM && !IA64 select FONT_SUPPORT select ARCH_USE_MEMREMAP_PROT + +config EFI_CUSTOM_SSDT_OVERLAYS + bool "Load custom ACPI SSDT overlay from an EFI variable" + depends on EFI_VARS && ACPI + default ACPI_TABLE_UPGRADE + help + Allow loading of an ACPI SSDT overlay from an EFI variable specified + by a kernel command line option. + + See Documentation/admin-guide/acpi/ssdt-overlays.rst for more + information. --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -189,7 +189,7 @@ static void generic_ops_unregister(void) efivars_unregister(&generic_efivars); }
-#if IS_ENABLED(CONFIG_ACPI) +#ifdef CONFIG_EFI_CUSTOM_SSDT_OVERLAYS #define EFIVAR_SSDT_NAME_MAX 16 static char efivar_ssdt[EFIVAR_SSDT_NAME_MAX] __initdata; static int __init efivar_ssdt_setup(char *str)
On Tue, 7 Jul 2020 at 20:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.7.8-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.7.y git commit: b371afd12a4884e417026e432f135844c56afb05 git describe: v5.7.6-374-gb371afd12a48 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.7-oe/build/v5.7.6-374-g...
No regressions (compared to build v5.7.6)
No fixes (compared to build v5.7.6)
Ran 31484 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - juno-r2-compat - juno-r2-kasan - nxp-ls2088 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86 - x86-kasan
Test Suites ----------- * build * install-android-platform-tools-r2600 * install-android-platform-tools-r2800 * kselftest * kselftest/drivers * kselftest/filesystems * kselftest/net * libhugetlbfs * linux-log-parser * ltp-containers-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-mm-tests * ltp-sched-tests * ltp-syscalls-tests * perf * v4l2-compliance * kvm-unit-tests * ltp-cap_bounds-tests * ltp-commands-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-math-tests * network-basic-tests * ltp-controllers-tests * ltp-nptl-tests * ltp-open-posix-tests * ltp-pty-tests * ltp-securebits-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-native/drivers * kselftest-vsyscall-mode-native/filesystems * kselftest-vsyscall-mode-native/net * kselftest-vsyscall-mode-none * kselftest-vsyscall-mode-none/drivers * kselftest-vsyscall-mode-none/filesystems * kselftest-vsyscall-mode-none/net
On Wed, Jul 08, 2020 at 10:38:23AM +0530, Naresh Kamboju wrote:
On Tue, 7 Jul 2020 at 20:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Thanks for testing all of these and letting me know.
greg k-h
On 7/7/20 9:16 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Wed, Jul 08, 2020 at 07:05:12AM -0600, Shuah Khan wrote:
On 7/7/20 9:16 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.7.8-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.7.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Thanks for testing all of these and letting me know.
greg k-h
On Tue, Jul 7, 2020 at 9:01 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
Compiled and booted on my computer running x86_64, No dmesg Regressions found.
thanks, --Puranjay
On Tue, Jul 07, 2020 at 05:16:05PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 431 pass: 431 fail: 0
Guenter
On Wed, Jul 08, 2020 at 10:53:39AM -0700, Guenter Roeck wrote:
On Tue, Jul 07, 2020 at 05:16:05PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.7.8 release. There are 112 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 09 Jul 2020 14:57:34 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 431 pass: 431 fail: 0
Thanks for testing all of these and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org